Key Points
- The rise of intelligent AI Agents has shifted bottlenecks from GPUs to CPUs, with CPU-based tool processing consuming up to 90.6% of latency in AI Agent execution.
- Nvidia (Yingweida 英伟达) is investing heavily in CPUs, including ¥14.45 billion RMB ($2 billion USD) in CoreWeave for specialized Vera CPUs and planning to increase CPU core counts in its Rubin architecture.
- CPUs are crucial for AI Agents because they excel at conditional “if/then” logic and task execution, making GPUs inefficient for these branched operations.
- There’s a severe supply crisis for server CPUs, with Intel (Yingte’er 英特尔) and AMD (AMD) sold out for 2026, leading to 10% to 15% price increases and redirection of production capacity.
- The optimal architecture for modern AI Agent workloads combines a CPU with high memory capacity for KV Cache (Key-Value Cache) and a GPU for heavy computation.

For years, the AI conversation has been all about GPUs.
Everyone assumed GPU power = AI power.
That assumption is about to get flipped on its head.
As the industry shifts from simple chatbots to intelligent AI Agents that actually execute tasks, something unexpected is happening: CPUs are becoming the real constraint.
Not GPUs.
CPUs.
The Data Tells a Wild Story
Research shows that in a typical AI Agent execution chain, CPU-based tool processing can eat up to 90.6% of total end-to-end latency.
That’s not a small number.
That’s the entire bottleneck.
When you add high-concurrency scenarios to the mix (which is what happens when you deploy Agents at scale), things get worse: CPU latency jumps from 2.9 seconds to over 6.3 seconds.
Translation: the GPU isn’t the problem anymore.
The CPU’s ability to handle concurrent tasks and schedule them efficiently? That’s the real constraint.
Find Top Talent on China's Leading Networks
- Post Across China's Job Sites from $299 / role
- Qualified Applicant Bundles
- One Central Candidate Hub
Your First Job Post Use Checkout Code 'Fresh20'

Even Nvidia (Yingweida 英伟达) Is Betting on CPUs
This isn’t some niche theory.
The biggest GPU company in the world is signaling this shift.
Nvidia (Yingweida 英伟达) recently invested ¥14.45 billion RMB ($2 billion USD) in additional shares of CoreWeave, a major cloud infrastructure player.
The reason? CoreWeave plans to deploy a specialized processor called the Vera CPU—designed specifically for what the industry calls “Agentic Reasoning.”
This isn’t casual investment.
This is Nvidia (Yingweida 英伟达) putting real money behind CPU infrastructure.
The company is also reportedly planning to:
- Significantly increase CPU core counts in its next-generation Rubin architecture
- Open up its NVL72 rack to support x86 CPUs
- Move away from ARM CPU bottlenecks
These aren’t small tweaks.
This is a fundamental architectural shift from a company that basically invented the GPU computing market.
ExpatInvest China
Grow Your RMB in China:
- Invest Your RMB Locally
- Buy & Sell Online in CN¥
- No Lock-In Periods
- English Service & Data
- Start with Only ¥1,000

What Wall Street Thinks Is Happening
Soochow Securities (Dongwu Zhengquan 东吴证券) analyzed this shift and dropped a key insight:
“Nvidia’s (Yingweida 英伟达) active push to increase CPU weight is a system-level confirmation: in long-context and high-concurrency Agent scenarios, CPUs with large memory are the optimal containers for carrying massive KV Cache (Key-Value Cache).”
Translation: CPUs aren’t just important—they’re becoming the preferred infrastructure for handling the data AI Agents need to reason effectively.
Resume Captain
Your AI Career Toolkit:
- AI Resume Optimization
- Custom Cover Letters
- LinkedIn Profile Boost
- Interview Question Prep
- Salary Negotiation Agent

The Supply Crisis Is Real (And It’s Getting Worse)
Here’s where it gets interesting from a market perspective.
Both Intel (Yingte’er 英特尔) and AMD (AMD) are essentially sold out of server CPUs for all of 2026.
Why? Because big cloud providers are “sweeping the shelves”—buying up inventory as fast as it’s produced.
The result: both companies are raising server CPU prices by 10% to 15%.
Intel (Yingte’er 英特尔) has gotten so aggressive about meeting server demand that it’s actually redirected consumer electronics production capacity, temporarily impacting PC and laptop chip deliveries.
During Intel’s (Yingte’er 英特尔) Q4 2025 earnings call, CEO Pat Gelsinger (Chen Liwu 陈立武) admitted:
“While the AI era has brought unprecedented demand for Semiconductors (Ban Daoti 半导体), in the short term, I regret that we have not fully met market demand.”
That’s CEO-speak for “we can’t make enough chips fast enough.”

Why Did This Happen? The Shift From Conversation to Execution
To understand why CPUs matter now, you need to understand how AI Agents work differently than chatbots.
The Three Big Reasons CPU Demand Just Exploded
- Application scheduling pressure: Increased server-side calls as Agents scale
- High-concurrency tool call scaling: Bottlenecks created by simultaneous external tool usage
- Sandbox isolation overhead: Computational overhead from secure task environments
- KV Cache offloading: Requirements for large-capacity CPU memory to support long context
According to research from institutions like Guojin Securities (Guojin Zhengquan 国金证券), the CPU surge stems from three areas:
- Application scheduling pressure: As more AI Agents (AI Zhinengti AI 智能体) are deployed, server-side calls increase dramatically, creating more system overhead.
- High-concurrency tool calling bottlenecks: When Agents call multiple external tools simultaneously, CPUs handle this better than GPUs.
- Sandbox isolation overhead: Running tasks in isolated environments creates rigid computational demands that favor CPU architecture.
GPUs Are Bad at “If/Then” Logic
Here’s a key technical insight from Soochow Securities (Dongwu Zhengquan 国金证券):
AI has shifted from “pure conversation” (where GPUs excel) to “task execution” (where CPUs are better).
When an AI Agent executes tasks, it needs to make thousands of conditional decisions: “if this, do that. If something else, do that instead.”
If you try to run all these branching logic decisions on a GPU, something bad happens: the GPU’s computing power utilization collapses.
Why?
Because GPUs are designed to do the same calculation millions of times in parallel.
They’re terrible at conditional branches.
CPUs, on the other hand, have microarchitectures specifically designed for exactly this type of work.
The “Perception-Planning-Tool Calling-Re-reasoning” Loop
According to GF Securities (Guangfa Zhengquan 广发证券), the Agent computation process follows a specific loop:
- Perception: AI observes the current state
- Planning: AI decides what to do
- Tool Calling: AI uses external tools to gather information or take action
- Re-reasoning: AI processes new information and repeats
The critical insight: all the heavy-lifting tasks (tool calling, task scheduling, information retrieval) run on the CPU.
As more Agents get deployed and make more tool calls, CPU occupancy increases linearly.
There’s no way around it.
The KV Cache Problem (And Why It Points to CPUs)
When AI Agents handle long-context reasoning, they need a ton of memory to store what’s called KV Cache (Key-Value Cache).
This data gets too big for GPU HBM (High Bandwidth Memory).
The industry’s solution: KV Cache Offload.
Move the data from the GPU to the CPU’s much larger DDR5/LPDDR5 memory.
This means the optimal architecture for modern AI Agent workloads is:
- A CPU with high memory capacity (acting as the KV Cache container)
- A GPU for the heavy computation (acting as the inference engine)
The CPU isn’t some secondary player anymore.
It’s literally where the data lives.

What This Means for Investors (The Market Opportunity)
From a pure investment perspective, China Merchants Securities (Zhaoshang Zhengquan 招商证券) sees direct beneficiaries:
- Domestic partners of overseas CPU giants will see direct benefits from price increases
- Independent Chinese CPU manufacturers pushing domestic alternatives will see massive demand
- Hardware and software ecosystem players will need to systematically adapt their offerings
This isn’t just about selling more chips.
It’s about building the entire stack.

The Bigger Picture: Data Centers Are in Upgrade Mode
Sealand Securities (Guohai Zhengquan 国海证券) broke down what’s actually happening right now:
Hyperscale Data Centers (Shuju Zhongxin 数据中心) aren’t just buying new servers—they’re replacing old server CPU architecture entirely.
The forecast: server CPU shipments could grow by 25% in 2026.
When you combine this with the broader AI computing boom and the push for domestic tech independence in China, you get what Sealand Securities calls a “triple resonance cycle”:
- Stock upgrades: Existing server deployments need to be replaced
- Domestic substitution: Chinese companies building alternatives to Intel (Yingte’er 英特尔) and AMD (AMD)
- Model iteration: New CPU architectures designed specifically for AI workloads
Together, these trends could trigger a significant revaluation of the entire server CPU market.

Bottom Line: The CPU Era Is Here
For a decade, the AI story has been about GPU power and





