DeepSeek's New AI is Playing the Long Game, Betting on China's Homegrown Chips with UE8M0 FP8

Key Points

DeepSeek’s new AI model, DeepSeek-V3.1, uses UE8M0 FP8 precision, explicitly designed for domestic Chinese chips, signaling a strategic co-development with China’s semiconductor industry.
DeepSeek-V3.1 introduces an “agent era” with enhanced Programming Agent and Search Agent capabilities through post-training optimization.
A new “Hybrid Inference” mode allows the model to switch between “Quick Response” (deepseek-chat) and “Deep Thinking” (deepseek-reasoner), achieving comparable quality to previous models with a 20%-50% reduction in output tokens.
API costs are increasing, with input tokens now ¥4 RMB ($0.55 USD) and output tokens ¥12 RMB ($1.64 USD) per million tokens, both supporting a 128K context window.
V3.1 was trained on an additional 840 billion tokens, is open-source on Huggingface and ModelScope (Modata 魔搭), and now supports the Anthropic API format.

A new AI model from DeepSeek (Shenxun 深思) is making waves, but the biggest story isn’t just about smarter agents or faster responses.

The real headline is a quiet technical detail with massive implications: the model utilizes UE8M0 FP8 scale parameter precision, a format explicitly designed for the next generation of domestic Chinese chips.

What’s the Big Deal with UE8M0 FP8 and Domestic Chips?

This is more than just a model update; it’s a strategic move.

On August 21, DeepSeek dropped a comment on its official WeChat (Weixin 微信) account that sent a clear signal to the tech world.

The company confirmed that the new UE8M0 FP8 format in their DeepSeek-V3.1 model is a forward-looking decision, specifically tailored for upcoming homegrown hardware.

This suggests a tight-knit, co-development strategy between China’s leading AI labs and its burgeoning semiconductor industry.

They’re not just building AI software; they’re building an entire, self-reliant ecosystem from the silicon up.

DeepSeek-V3.1 Key Innovations and Implications
Feature/Innovation	Description	Implication
UE8M0 FP8 Precision	Explicitly designed for domestic Chinese chips.	Signals strategic co-development with China’s semiconductor industry; fostering self-reliant AI ecosystem.
“Agent Era” Focus	Enhanced Programming and Search Agent capabilities through post-training optimization.	Model is better at performing complex tasks and tool use, moving beyond simple Q&A.
Hybrid Inference Mode	Switches between “Quick Response” (deepseek-chat) and “Deep Thinking” (deepseek-reasoner).	Achieves comparable quality to previous models with 20%-50% reduction in output tokens; highly efficient and versatile.
Increased API Costs	Input: ¥4 RMB ($0.55 USD), Output: ¥12 RMB ($1.64 USD) per million tokens.	Reflects value of improved capabilities; supports 128K context window.
Additional Training Data	Trained on an additional 840 billion tokens.	Enhances model’s knowledge and performance without full retraining from scratch.
Open-Source Availability	Available on Huggingface and ModelScope (Modata 魔搭).	Fosters community development and integration of DeepSeek’s advanced models.
Anthropic API Support	Supports Anthropic API format for integration.	Increases usability and developer adoption via compatibility with existing Claude Code framework.

Resume Captain

Your AI Career Toolkit:

AI Resume Optimization
Custom Cover Letters
LinkedIn Profile Boost
Interview Question Prep
Salary Negotiation Agent

Get Started Free

Meet DeepSeek-V3.1: The “Agent Era” Begins

While the hardware angle is the long-term play, the new DeepSeek-V3.1 model is a significant step forward in its own right.

DeepSeek is calling it “our first step toward the agent era.”

Here’s the breakdown of what’s new:

1. Beefed-Up Agent Capabilities

Through clever post-training optimization, V3.1 shows major gains in tasks that require action and tool use.

Programming Agent: Improved performance in coding-related tasks.
Search Agent: Better at using search tools to find and synthesize information.

Essentially, the model is getting much better at doing things, not just answering questions.

2. The “Hybrid Inference” Mode is Genius

This is one of the coolest features.

V3.1 has a single model that can operate in two modes:

Quick Response (deepseek-chat): For simple, fast answers.
Deep Thinking (deepseek-reasoner): For complex problems that require comprehensive analysis.

You can toggle between them with a “Deep Thinking” button in the app and on the web platform.

As one user on X (formerly Twitter) put it, “Hybrid inference is fantastic. Having a model that can switch between deep thinking and quick responses feels like the future of practical AI.”

This dual-mode approach is incredibly efficient, avoiding wasted compute on simple tasks while dedicating power where it’s needed most.

3. Higher Thinking Efficiency

The new “thinking” mode (V3.1-Think) is not only smart but also fast.

DeepSeek confirmed that it delivers response quality comparable to their previous R1-0528 model but with a 20%-50% reduction in the number of output tokens.

This is thanks to Chain-of-Thought (CoT) compression training, making the model more concise without sacrificing performance.

The non-thinking mode is also more efficient, giving you the same quality answers with a shorter output length compared to the V3 model.

DeepSeek-V3.1 Agent Capability Improvements
Agent Type	Description of Improvement	Impact
Programming Agent	Enhanced performance in coding-related tasks, including code generation, debugging, and review.	Increases efficiency for developers and automation of programming workflows.
Search Agent	Improved ability to use search tools, find relevant information, and synthesize complex data from multiple sources.	Provides more comprehensive and accurate answers, reducing manual research time.

DeepSeek-V3.1 Hybrid Inference Modes
Mode Name	Purpose	Key Characteristic
Quick Response (deepseek-chat)	For simple, direct questions requiring fast answers.	Optimized for speed and conciseness.
Deep Thinking (deepseek-reasoner)	For complex problems requiring comprehensive analysis and problem-solving.	Engages in more thorough reasoning processes; comparable quality to previous models with 20-50% less output tokens.

The Price of Progress: API Costs are Going Up

With new features comes a new pricing structure.

DeepSeek is adjusting its API call costs, and the night-time discount will be discontinued starting September 6th.

Here’s the new pricing for the DeepSeek-V3.1 API:

Input (Uncached): ¥4 RMB ($0.55 USD) per million tokens (up from ¥2 RMB ($0.27 USD) for V3).
Input (Cached): ¥0.5 RMB ($0.07 USD) per million tokens.
Output: ¥12 RMB ($1.64 USD) per million tokens (up from ¥8 RMB ($1.09 USD) for V3).

Both API modes now support a generous 128K context window.

DeepSeek-V3.1 API Pricing Structure
API Usage Type	Cost Per Million Tokens (RMB)	Cost Per Million Tokens (USD Approx.)	Notes
Input (Uncached)	¥4	$0.55	Up from ¥2 for V3
Input (Cached)	¥0.5	$0.07	New tier for cached inputs
Output	¥12	$1.64	Up from ¥8 for V3

Find Top Talent on China's Leading Networks

Post Across China's Job Sites from $299 / role, or
Hire Our Recruiting Pros from $799 / role

- - - - - - - -

Qualified Candidate Bundles
Lower Hiring Costs by 80%+
Expert Team Since 2014

Get 25% Off
Your First Job Post

More Training, Open Source, and a Nod to Anthropic

DeepSeek isn’t keeping all this progress to itself.

The company shared a few more important updates:

More Data: The V3.1 base model was trained on an additional 840 billion tokens on top of the original V3 foundation.

Open Source: Both the base model and the post-training model are now available on Huggingface and ModelScope (Modata 魔搭) for the community to use and build upon.

Anthropic API Support: In a smart move “to meet user demand,” DeepSeek now supports the Anthropic API format. This means developers can easily integrate V3.1’s power into the Claude Code framework.

DeepSeek-V3.1 Additional Updates Summary

Training Data: An additional 840 billion tokens were used to train the V3.1 base model, building upon the original V3 foundation.
Open-Source Availability: Both the base model and the post-training model are now open-source on Huggingface and ModelScope (Modata 魔搭), promoting community access and collaboration.
Anthropic API Support: DeepSeek-V3.1 now supports the Anthropic API format, allowing seamless integration with the Claude Code framework to meet developer demand.

The Takeaway

DeepSeek-V3.1 is a powerful iterative update that pushes the boundaries of AI agent capabilities and efficiency.

But the real story is the strategic alignment with future hardware.

By optimizing for China’s next-gen processors with DeepSeek’s UE8M0 FP8 precision, the company isn’t just building a model; it’s building a moat and signaling a future where Chinese AI runs best on Chinese silicon.

DeepSeek’s New AI is Playing the Long Game, Betting on China’s Homegrown Chips with UE8M0 FP8

Key Points

What’s the Big Deal with UE8M0 FP8 and Domestic Chips?

Resume Captain

Your AI Career Toolkit:

Meet DeepSeek-V3.1: The “Agent Era” Begins

1. Beefed-Up Agent Capabilities

2. The “Hybrid Inference” Mode is Genius

3. Higher Thinking Efficiency

The Price of Progress: API Costs are Going Up

Find Top Talent on China's Leading Networks

More Training, Open Source, and a Nod to Anthropic

The Takeaway

References

DeepSeek’s New AI is Playing the Long Game, Betting on China’s Homegrown Chips with UE8M0 FP8

Key Points

What’s the Big Deal with UE8M0 FP8 and Domestic Chips?

Resume Captain

Your AI Career Toolkit:

Meet DeepSeek-V3.1: The “Agent Era” Begins

1. Beefed-Up Agent Capabilities

2. The “Hybrid Inference” Mode is Genius

3. Higher Thinking Efficiency

The Price of Progress: API Costs are Going Up

Find Top Talent on China's Leading Networks

More Training, Open Source, and a Nod to Anthropic

ExpatInvest China

Grow Your RMB in China:

The Takeaway

References

Related Freshness