China's AI Revolution: Huawei's Pangu MoE & Domestic Compute Power Ignites New Era

Key Points

Huawei (华为) has unveiled the Pangu Ultra MoE (盘古Ultra MoE) model with 718 billion parameters, trained entirely on the domestic Shengteng (昇腾) AI Compute Platform, showcasing significant progress in domestic computing power and large-scale AI training.
Huawei’s (华为) Pangu (盘古) team developed innovative architectures like DSSN and TinyInit, along with methods like EP loss optimization and Dropless training, enabling stable training of massive MoE models and achieving a significant jump in MFU (Model FLOPs Utilization) from 30% to 41% on ten-thousand-card clusters.
The recently released Pangu Pro MoE (盘古Pro MoE) large model, with 72 billion parameters (only 16 billion activated), demonstrated strong performance rivaling models with hundreds of billions of parameters and ranked highly on the SuperCLUE evaluation list.
DeepSeek (深度求索) continues to advance its DeepSeek-R1 model, which has previously outperformed Western competitors on benchmarks at a reportedly lower cost (millions of USD), and its V3 model achieved high scores on evaluations.
Tencent (腾讯) outlined its comprehensive AI strategy, with its Hunyuan TurboS (混元TurboS) model reaching the global top eight on Chatbot Arena and integrating with several popular Tencent (腾讯) applications, demonstrating collaboration within the Chinese AI ecosystem.

Huawei Pangu Ultra MoE Key Specifications
Feature	Details
Model Name	Pangu Ultra MoE (盘古Ultra MoE)
Total Parameters	718 Billion
Training Platform	Shengteng AI Compute Platform (昇腾)
Architecture Innovations	DSSN, TinyInit, MLA, MTP
Training Methods	EP Loss Optimization, Dropless Training
MFU Improvement (Ten-thousand cards)	From 30% to 41%

Get ready, because China’s AI scene is absolutely buzzing, and Huawei (Huawei 华为) just unveiled a game-changer that signals a monumental leap in domestic computing power and sophisticated AI model development. This isn’t just another update; it’s a potential booster shot for the entire Chinese AI industry.

Heads up, tech watchers and investors: On May 30, word got out from Huawei (Huawei 华为) itself. They’ve pushed the boundaries big time in Mixture of Experts (MoE) model training.

Meet the Pangu Ultra MoE (Pangu Ultra MoE 盘古Ultra MoE) model.

This beast boasts a staggering 718 billion parameters.

Think about that – it’s a near-trillion parameter MoE model, and here’s the kicker: it was trained *entirely* on the Shengteng AI Compute Platform (Shengteng 昇腾).

Huawei (Huawei 华为) didn’t just drop the model; they also released a detailed technical report on the Pangu Ultra MoE (Pangu Ultra MoE 盘古Ultra MoE) architecture and its training methods.

This isn’t just flexing; it’s a transparent showcase of Shengteng’s (Shengteng 昇腾) incredible progress in handling ultra-large scale MoE training.

Industry insiders are saying this is huge.

The launch of Huawei’s (Huawei 华为) Pangu Ultra MoE (Pangu Ultra MoE 盘古Ultra MoE) and Pangu Pro MoE (Pangu Pro MoE 盘古Pro MoE) model series isn’t just about new AI toys.

It’s solid proof that Huawei (Huawei 华为) has nailed the entire pipeline: from independent, controllable training using domestic computing power to developing domestic models that perform at the top tier. Their cluster training systems? Industry-leading.

This is a massive validation for the independent innovation capabilities of China’s domestic AI infrastructure.

It’s like a “reassurance pill,” as some put it, for the future growth of China’s artificial intelligence industry.

Mind-Blowing Breakthroughs: Domestic Compute Meets Domestic AI Models

Let’s be real: training ultra-large scale and highly sparse Mixture of Experts (MoE) models is incredibly tough. Keeping them stable during that intense training process? Even harder.

So, how did the Huawei (Huawei 华为) Pangu (Pangu 盘古) team crack this?

They got innovative with both model architecture and training methods, successfully training a near-trillion parameter MoE model on their own Shengteng Platform (Shengteng 昇腾).

Inside the Pangu (Pangu 盘古) Model Architecture

The Pangu (Pangu 盘古) team introduced the Depth-Scaled Sandwich-Norm (DSSN) stable architecture.
They also developed the TinyInit small initialization method.
These innovations allowed for long-term, stable training on the Shengteng Platform (Shengteng 昇腾), chugging through over 18TB of data. That’s a lot of data.
But wait, there’s more: they brought in the EP loss load optimization method. This clever design ensures experts are well-balanced in their workload and sharpens their domain-specific skills.
Pangu Ultra MoE (Pangu Ultra MoE 盘古Ultra MoE) also leverages industry-leading MLA and MTP architectures.
Crucially, it uses a Dropless training strategy during both pre-training and post-training. This helps achieve that sweet spot between model effectiveness and efficiency for these massive MoE architectures.

Revamping Training Methods on Shengteng (Shengteng 昇腾)

The Huawei (Huawei 华为) team also pulled back the curtain on some key training technologies:

They revealed, for the first time, how they efficiently integrated high-sparsity Mixture of Experts (MoE) Reinforcement Learning (RL) post-training frameworks on their Shengteng CloudMatrix 384 supernodes. This ushers RL post-training into the supernode cluster era – a big step up.
Building on tech they released in early May, the team pushed another round of upgrades in less than a month. Talk about rapid iteration!
These upgrades include:
- An adaptive pipeline masking strategy specifically tailored for Shengteng (Shengteng 昇腾) hardware. This optimizes operator execution, cuts down on Host-Bound issues, and beefs up EP communication masking.
- Development of an adaptive memory optimization strategy.
- Data reordering to achieve Attention load balancing between DP (Data Parallelism) instances.
- Shengteng-affinity operator optimizations.
The result of these tech boosts? A significant jump in pre-training MFU (Model FLOPs Utilization) for ten-thousand-card clusters, soaring from 30% to 41%. That’s a massive efficiency gain.

Huawei Pangu Pro MoE Model Overview
Feature	Details
Total Parameters	72 Billion
Activated Parameters	16 Billion
Key Design	Dynamic Activation of Expert Networks
Performance Comparison	Rivals Models with Hundreds of Billions of Parameters
SuperCLUE May 2025 Ranking (under 100B params)	Tied for First Place Nationally

And don’t forget the recently released Pangu Pro MoE (Pangu Pro MoE 盘古Pro MoE) large model.

With just 72 billion total parameters and only 16 billion activated parameters, it’s punching way above its weight class thanks to an innovative design for dynamically activating expert networks.

This “small model achieves big results” strategy is so effective it’s rivaling models with hundreds of billions of parameters.

It even snagged a tied-for-first place nationally among large models with under 100 billion parameters on the latest May 2025 SuperCLUE authoritative large model evaluation list. Pretty impressive.

So, what’s the core takeaway from Huawei’s (Huawei 华为) big reveal?

Industry analysts believe it proves that ultra-large scale sparse models (specifically, Mixture of Experts – MoE) capable of world-class performance can be efficiently and stably trained and optimized entirely on a domestic AI compute platform (Shengteng 昇腾).

This achieves a “full-stack localization” and “full-process independent control.”

We’re talking a closed-loop system: from hardware to software, training to optimization, basic research to engineering implementation – all while hitting industry-leading performance metrics.

This is a significant stride for China’s AI self-sufficiency.

Resume Captain

Your AI Career Toolkit:

AI Resume Optimization
Custom Cover Letters
LinkedIn Profile Boost
Interview Question Prep
Salary Negotiation Agent

Get Started Free

The Domestic Large Model Scene is Popping Off

Huawei (Huawei 华为) isn’t the only one making waves. The entire domestic large model space in China is buzzing with activity.

DeepSeek (Shenduqiusuo 深度求索) Keeps Pushing Boundaries

On May 28, news broke from DeepSeek (Shenduqiusuo 深度求索) that their DeepSeek-R1 model just got a minor trial upgrade.

You can test it out on their official webpage, app, and mini-program (just look for “Deep Thinking”). API interfaces and usage methods are staying the same, making it easy for devs.

This Hangzhou-based startup already wowed the global tech community back in January when it released the DeepSeek-R1 AI model.

The R1 model actually outperformed Western competitors on multiple standardized benchmarks.

And the kicker? Its cost was reportedly only millions of U.S. dollars ($ USD).

This news sent ripples through global tech stocks, making investors wonder if leading companies really need to keep pouring billions into building AI services.

This was DeepSeek’s (Shenduqiusuo 深度求索) latest move since late March.

On March 25, they officially announced a minor version upgrade for their V3 model, detailing improvements in the new DeepSeek-V3-0324 model.

Key enhancements included better reasoning, frontend development support, Chinese writing, and Chinese search capabilities.

At that time, according to an overseas professional AI model evaluation agency, the new V3 model was the highest-scoring non-reasoning model out there, even surpassing xAI’s Grok3 and OpenAI’s GPT-4.5 (preview). That’s a serious claim to fame.

Tencent (Tengxun 腾讯) Unveils its Grand AI Strategy

Then there’s Tencent (Tengxun 腾讯).

On May 21, at the 2025 Tencent Cloud (Tengxunyun 腾讯云) AI Industry Application Summit, Tencent (Tengxun 腾讯) laid out its full large model strategy.

Tencent Hunyuan TurboS Highlights
Feature	Details
Model Name	Hunyuan TurboS (混元TurboS)
Chatbot Arena Global Ranking	Top 8 Globally
Domestic Ranking (Chatbot Arena)	Second only to DeepSeek
Science Capabilities Ranking	Global Top 10 (Coding, Mathematics)

Their large model matrix products got a comprehensive upgrade – from the self-developed Hunyuan (Hunyuan 混元) large model to AI cloud infrastructure, agent development tools, knowledge bases, and scenario-specific applications.

Tencent (Tengxun 腾讯) is clearly focused on refining its tech and product lineup to create genuinely “easy-to-use AI” for both businesses and everyday users in this large model era.

In the fast-paced global race for large model dominance, Tencent Hunyuan (Tengxun Hunyuan 腾讯混元) is making quick, iterative progress, constantly leveling up its technical capabilities.

Dowson Tong (Tang Daosheng 汤道生), Senior Executive Vice President of Tencent Group and CEO of Cloud and Smart Industries Group, dropped some impressive news at the summit.

The Hunyuan TurboS (Hunyuan TurboS 混元TurboS) model has climbed into the global top eight on Chatbot Arena, a globally recognized and authoritative large language model evaluation platform.

Domestically, that puts it second only to DeepSeek (Shenduqiusuo 深度求索).

Plus, Hunyuan TurboS (Hunyuan TurboS 混元TurboS) also cracked the global top ten for science-related capabilities like coding and mathematics.

And just recently, on May 29, several AI applications under the Tencent (Tengxun 腾讯) umbrella announced they’re integrating with DeepSeek R1-0528.

These include popular apps like Tencent Yuanbao (Tengxun Yuanbao 腾讯元宝), ima, Sogou Input Method (Sougou Shunfa 搜狗输入法), QQ Browser (QQ Liulanqi QQ浏览器), Tencent Docs (Tengxun Wendang 腾讯文档), Tencent Maps (Tengxun Ditu 腾讯地图), and Tencent Lexiang (Tengxun Lexiang 腾讯乐享).

Users can now select the DeepSeek R1 (DeepSeek R1 深度求索R1) “Deep Thinking” model within these products to tap into its latest deep thinking, programming, and long text processing prowess.

Tencent Applications Integrating DeepSeek R1

Tencent Yuanbao (腾讯元宝)
ima
Sogou Input Method (搜狗输入法)
QQ Browser (QQ浏览器)
Tencent Docs (腾讯文档)
Tencent Maps (腾讯地图)
Tencent Lexiang (腾讯乐享)

This shows an interesting dynamic of competition and collaboration within the Chinese tech ecosystem.

These flurry of announcements and advancements underscore a vibrant and rapidly evolving Chinese AI industry, one that’s increasingly demonstrating its capacity for cutting-edge, independent innovation in large language models and the crucial underlying domestic computing power.