AI COMPUTE

The hardware layer that powers training and inference — GPUs, TPUs, and the supply chains that constrain them.

AI compute is the specialized hardware infrastructure — primarily GPUs, TPUs, and custom accelerators — required to train and run large AI models at scale.

Why AI needs specialized hardware

Training a large language model involves performing billions of matrix multiplications across trillions of parameters. Standard CPUs execute these operations sequentially and are poorly suited to the task. GPUs were originally designed for parallel graphics rendering — and it turns out the same massively parallel architecture that renders video game frames is highly efficient for matrix math. NVIDIA recognized this early and built CUDA, a software layer that made GPUs programmable for general scientific computing, long before AI demand existed at scale.

The result is that NVIDIA captured an estimated 70–80% of the AI training chip market. AMD is the credible alternative on the hardware side; Intel has struggled to compete. Google builds its own TPUs (Tensor Processing Units) for internal use. A handful of AI chip startups — Cerebras, Groq, Graphcore — have built specialized architectures with theoretical advantages in specific workloads but limited market penetration.

Training vs. inference

AI compute divides into two distinct workloads with different requirements. Training is the process of adjusting model weights across a massive dataset — it is computationally intensive, runs for days or weeks, and benefits from as many high-memory GPUs as can be clustered together. Inference is the process of running a trained model to generate outputs — it is lower in raw compute intensity but much higher in throughput requirements (millions of requests per second) and extremely sensitive to latency.

The economics are different. Training clusters are optimized for throughput, tolerate high latency, and can be run at scheduled times (not real-time). Inference infrastructure is optimized for low latency, high availability, and cost per output token. The shift in the AI industry from primarily a training problem to primarily an inference problem — as models mature and usage scales — has significant implications for what hardware gets purchased and where it gets deployed.

The supply chain constraint

NVIDIA GPUs are manufactured by TSMC in Taiwan on advanced process nodes (currently 4nm and 3nm). The supply chain from chip design to finished GPU involves TSMC's foundry capacity, NVIDIA's proprietary high-bandwidth memory (from SK Hynix and Micron), advanced packaging technology, and global logistics. Any disruption at any point — a Taiwan Strait conflict, TSMC fab capacity limits, memory supply constraints — propagates through to AI infrastructure buildout timelines worldwide.

US export controls on advanced AI chips to China have added a geopolitical dimension. Chips at or above a certain compute density threshold are restricted from export. This has accelerated Chinese domestic chip development (Huawei's Ascend series being the primary example) and has bifurcated the global AI infrastructure market into two supply chains with limited interoperability.

What comes next

The GPU architecture that dominates today — the NVIDIA H100/H200/B200 lineage — will not be the final form of AI compute. Several architectural shifts are underway: more on-chip memory (HBM4 and beyond), more integration of memory and compute, optical interconnects for inter-chip communication, and specialized silicon for inference workloads. The companies that get the hardware-software co-design right for the next generation of models will have a significant cost and performance advantage.

The same hardware that trains models also powers the generative AI applications now reaching mainstream users, tying consumer-facing growth directly to the compute supply chain.

For anyone building on AI infrastructure today, the key question is how dependent their strategy is on current hardware economics — and how it changes if those economics shift significantly, as they almost certainly will. At The Best Blog Ever, we track how these hardware shifts reshape the broader AI economy.

Open Questions

Will inference-optimized chips displace general-purpose GPU clusters as the dominant form of AI compute, or will training demand keep GPUs central to the stack?
Can Chinese domestic alternatives (Huawei Ascend and successors) close the performance gap with NVIDIA fast enough to matter for frontier model development, or does the export control regime permanently bifurcate capability trajectories?
At what point does energy cost — not chip cost — become the primary constraint on AI compute scaling, and which geographies or energy sources become structurally advantaged as a result?

Part of the knowledge graph at The Best Blog Ever — reference definitions for ideas that matter.

Related Concepts

Large Language Models Energy Economics Capital Allocation Generative AI

Frequently Asked

Why does AI require specialized hardware like GPUs instead of CPUs?+

Training and running large AI models involves billions of parallel matrix multiplications, which standard CPUs execute too slowly. GPUs and TPUs use a massively parallel architecture that is far more efficient for this kind of math.

What is the difference between AI training and inference compute?+

Training adjusts a model's weights across a massive dataset and is compute-intensive, tolerating high latency over days or weeks. Inference runs the trained model to generate outputs and is optimized for low latency, high throughput, and cost per token.

Why is NVIDIA so dominant in the AI chip market?+

NVIDIA captured an estimated 70 to 80 percent of the AI training chip market by pairing its parallel GPUs with CUDA, a software layer that made them programmable for general computing long before AI demand existed at scale.