Huawei’s Pangu AI Smashes Records: 718 Billion Parameters on Domestic Silicon Signals New Era for China’s AI Industry

Key Points

  • Huawei’s Pangu Ultra MoE: Rolled out a large Mixture of Experts (MoE) model with a staggering 718 billion parameters, trained end-to-end on their domestic Ascend (Shēngténg 昇腾) AI Computing Platform.
  • Full-Stack Domestic Practice: The Pangu Ultra MoE demonstrates Huawei’s achievement of a full-process, self-controlled training practice combining domestic hardware (Ascend) and domestically developed models with industry-leading performance.
  • Training Innovations: Huawei successfully trained the complex MoE model on Ascend using methods like a Depth-Scaled Sandwich-Norm (DSSN) architecture and achieving significant MFU efficiency improvements (from 30% to 41%) in their ten-thousand-card cluster training.
  • DeepSeek and Tencent’s Contributions: DeepSeek (Shēndù Qiúsuǒ 深度求索) offers high-performance, cost-effective models validated by high rankings, while Tencent (Téngxùn 腾讯) is building an “easy-to-use AI” ecosystem with their Hunyuan (Hùnyuán 混元) large model, which ranks in the global top eight on Chatbot Arena.
  • Synergy and Ecosystem Growth: The Chinese AI landscape is vibrant, with collaborations like Tencent integrating DeepSeek’s model into several popular applications, highlighting the growing strength and interconnectedness of the domestic AI industry.
Decorative Image

Get ready, because China’s AI industry just got a massive shot in the arm, and it’s all thanks to a groundbreaking development in domestic computing power and AI models.

On May 30, 2025, word got out from Huawei (Huáwèi 华为) about some serious strides in the Mixture of Experts (MoE) model training arena.

They’ve officially rolled out a beast of a model: the Pangu (Pángǔ 盘古) Ultra MoE, flexing a staggering 718 billion parameters.

Think about that – this is a near-trillion-parameter MoE model, trained completely end-to-end on their own Ascend (Shēngténg 昇腾) AI Computing Platform.

Talk about a power move!

Huawei’s Pangu AI: Unveiling the Tech Behind the Trillion-Parameter Ambition

Huawei (Huáwèi 华为) didn’t just drop the model and walk away.

They also released a detailed technical report on the Pangu (Pángǔ 盘古) Ultra MoE model’s architecture and training methods.

This transparency showcases the massive leap in Ascend’s (Shēngténg 昇腾) performance for handling ultra-large-scale MoE training.

Resume Captain Logo

Resume Captain

Your AI Career Toolkit:

  • AI Resume Optimization
  • Custom Cover Letters
  • LinkedIn Profile Boost
  • Interview Question Prep
  • Salary Negotiation Agent
Get Started Free
Decorative Image

Why This Pangu AI Development is a Big Deal for China’s Tech Scene

Industry watchers are buzzing, and here’s why:

The launch of Huawei’s (Huáwèi 华为) Pangu (Pángǔ 盘古) Ultra MoE and the Pangu (Pángǔ 盘古) Pro MoE series isn’t just another AI model release.

It’s solid proof that Huawei (Huáwèi 华为) has nailed a full-process, self-controlled training practice.

This means they’re combining domestic computing power (hello, Ascend!) with domestically developed models.

And they’re not just doing it; they’re achieving industry-leading performance in cluster training systems.

This is a huge confidence booster, a “reassurance pill” if you will, for the entire trajectory of China’s artificial intelligence industry.

It firmly validates the independent innovation muscle of China’s domestic AI infrastructure.

Decorative Image

Cracking the Code: How Huawei Trained Pangu Ultra MoE on Ascend

Let’s be real: training ultra-large scale and highly sparse MoE models is incredibly tough.

Keeping things stable during the training process? Often a nightmare.

But the Huawei (Huáwèi 华为) Pangu (Pángǔ 盘古) team tackled this head-on with some slick innovative designs in both model architecture and training methods.

The result? They successfully trained this near-trillion-parameter MoE model (with Pangu Ultra MoE at 718B parameters as a prime example) entirely on the Ascend (Shēngténg 昇腾) platform.

Innovations in Pangu AI Model Architecture

The Pangu (Pángǔ 盘古) team brought some cool new ideas to the table:

  • Depth-Scaled Sandwich-Norm (DSSN) stable architecture: This is key for robust performance.
  • TinyInit small initialization method: Helping achieve long-term stable training with over 18TB of data on the Ascend (Shēngténg 昇腾) platform. That’s a LOT of data.
  • EP loss load optimization method: This clever design ensures good load balancing among the ‘experts’ in the MoE model while also beefing up their domain-specific smarts.

Plus, Pangu (Pángǔ 盘古) Ultra MoE leverages the cutting-edge MLA (Mixture-of-LoRA-Experts) and MTP (Mixture-of-Token-Experts) architectures.

They also adopted a Dropless training strategy for both pre-training and post-training phases.

This strategy hits the sweet spot between model effectiveness and efficiency for these massive MoE setups.

Breakthroughs in Pangu AI Training Methods on Ascend Compute

On the training side, Huawei (Huáwèi 华为) pulled back the curtain on some key tech for the first time:

  • Efficient MoE Reinforcement Learning (RL) on Ascend CloudMatrix: They’ve figured out how to efficiently connect the MoE Reinforcement Learning (RL) post-training framework (with a large sparsity ratio) on the Ascend (Shēngténg 昇腾) CloudMatrix 384 supernode.
  • This effectively pushes RL post-training into the era of supernode clusters. Big step up!

And there’s more.

Building on their pre-training system acceleration tech released in early May, the team shipped another iterative upgrade in less than a month. Pretty fast, right?

This upgrade includes:

  • An adaptive pipeline masking strategy tailored for Ascend (Shēngténg 昇腾) hardware.
  • Further optimization of operator execution procedures.
  • Reduced Host-Bound and improved EP communication masking.
  • Development of an adaptive memory optimization strategy.
  • Data reordering to achieve Attention load balancing between DPs (Data Parallelism instances).
  • Ascend (Shēngténg 昇腾)-friendly operator optimization.

The impact of these tech improvements? A significant jump in the MFU (Memory-Float Unit) efficiency of their ten-thousand-card cluster pre-training, rocketing from 30% to 41%.

That’s a substantial boost in how well they’re using their hardware.

Don’t Forget Pangu Pro MoE: Small Footprint, Big Impact in Chinese AI

Alongside the Ultra, there’s the recently released Pangu (Pángǔ 盘古) Pro MoE large model.

This one is a great example of “winning big with small.”

It has a parameter size of only 72 billion, but through an innovative design using dynamically activated expert networks, it effectively utilizes 16 billion parameters.

The result? Excellent performance, even comparable to models with hundreds of billions of parameters.

Proof: On the latest SuperCLUE authoritative ranking (released May 2025), it ranked first domestically among large models with parameters below 100 billion. Impressive.

The Bottom Line on Huawei’s AI Strides: Full-Stack Domestic Power

So, what’s the core takeaway from Huawei’s (Huáwèi 华为) latest AI bombshell?

It’s proof positive that on a domestic AI computing platform like Ascend (Shēngténg 昇腾), it’s entirely possible to efficiently and stably train and optimize ultra-large-scale sparse models (MoE) to reach international top-tier levels.

This creates a “full-stack domestic” and “full-process self-controlled” closed loop.

We’re talking from hardware to software, from training to optimization, and from basic research all the way to engineering implementation.

And crucially, they’re hitting industry-leading levels in key performance indicators.

This is a massive statement for China’s AI sovereignty.

Pangu AI Model Comparison (May 2025)
ModelParametersExpert Network Activation (Effective Parameters)Notes
Pangu Ultra MoE718 BillionNot Applicable (Total Parameters)Ultra-large scale, trained on Ascend.
Pangu Pro MoE72 Billion16 BillionAchieves performance comparable to larger models. Ranked #1 domestically below 100B param models (SuperCLUE May 2025).
TeamedUp China Logo

Find Top Talent on China's Leading Networks

  • Post Across China's Job Sites from $299 / role, or
  • Hire Our Recruiting Pros from $799 / role
  • - - - - - - - -
  • Qualified Candidate Bundles
  • Lower Hiring Costs by 80%+
  • Expert Team Since 2014
Get 25% Off
Your First Job Post
Decorative Image

The Chinese AI Ecosystem is Heating Up: More Than Just Huawei

While Huawei’s (Huáwèi 华为) news is huge, it’s not happening in a vacuum.

The domestic large model scene in China is buzzing with activity.

DeepSeek (Shēndù Qiúsuǒ 深度求索): The Cost-Effective AI Challenger

On May 28, DeepSeek (Shēndù Qiúsuǒ 深度求索) announced its DeepSeek-R1 (DeepSeek-R1 DeepSeek-R1) model completed a minor version trial upgrade.

It’s now available for testing on their official webpage, app, and mini-programs (just open “Deep Thinking”).

The API interface and usage methods are unchanged, making it easy for devs.

This Hangzhou-based startup made serious waves globally back in January of this year (2025).

They released their DeepSeek-R1 (DeepSeek-R1 DeepSeek-R1) AI model, which outperformed Western competitors on several standardized metrics.

The kicker? Its reported cost was only several million dollars.

This actually caused a dip in global tech stocks, as investors started wondering if leading companies really needed to keep pouring billions into building AI services.

Low cost, high performance – that’s a combo that gets attention.

This latest R1 upgrade follows their actions from late March.

On March 25, DeepSeek (Shēndù Qiúsuǒ 深度求索) officially announced a minor version upgrade of its V3 model.

The new DeepSeek-V3-0324 model brought enhancements in:

  • Reasoning
  • Front-end development
  • Chinese writing
  • Chinese search capabilities

At that time, according to overseas professional AI model evaluation agencies, the new V3 model was the highest-scoring non-reasoning model, even surpassing xAI’s Grok3 and OpenAI’s GPT-4.5 (preview).

So, DeepSeek (Shēndù Qiúsuǒ 深度求索) is definitely a player to watch in the Chinese AI model landscape.

Tencent (Téngxùn 腾讯): Building an “Easy-to-Use AI” Powerhouse

Tech giant Tencent (Téngxùn 腾讯) is also making significant moves in the AI space.

On May 21, at the 2025 Tencent (Téngxùn 腾讯) Cloud AI Industry Application Summit, they laid out their large model strategy for the first time.

It’s a comprehensive vision, covering:

  • Their self-developed Hunyuan (Hùnyuán 混元) large model.
  • AI cloud infrastructure.
  • Intelligent agent development tools.
  • Knowledge base solutions.
  • Scenario-oriented applications.

Tencent’s (Téngxùn 腾讯) large model matrix products have all been fully upgraded.

Their goal? To build truly “easy-to-use AI” for enterprises and users in this new era of large models.

In the fierce global race for large model tech, Tencent’s (Téngxùn 腾讯) Hunyuan (Hùnyuán 混元) is making steady progress and iterating fast.

Its technical capabilities are continuously on the upswing.

Tang Daosheng (Tāng Dàoshēng 汤道生), Senior Executive Vice President of Tencent (Téngxùn 腾讯) Group and CEO of the Cloud and Smart Industries Group, shared some impressive stats at the conference:

Hunyuan (Hùnyuán 混元) TurboS has climbed into the top eight globally on Chatbot Arena.

For context, Chatbot Arena is a highly respected global platform for evaluating large language models.

Domestically, it’s second only to DeepSeek (Shēndù Qiúsuǒ 深度求索).

Notably, Hunyuan (Hùnyuán 混元) TurboS also cracked the global top ten for science-related abilities, like code and mathematics.

DeepSeek-V3-0324 Model Enhancements
CategoryEnhancements
ReasoningImproved logical processing and problem-solving.
Front-end DevelopmentEnhanced capabilities for generating and understanding web development code.
Chinese WritingBetter fluency, style, and quality in generating Chinese text.
Chinese Search CapabilitiesMore accurate and relevant results when processing Chinese search queries.

Synergy in Action: Tencent Integrates DeepSeek’s AI Prowess

Showing the dynamic nature of the ecosystem, on May 29, Tencent (Téngxùn 腾讯) announced a cool collaboration.

Several of its popular AI applications are integrating DeepSeek (Shēndù Qiúsuǒ 深度求索) R1-0528.

These include:

  • Tencent Yuanbao (Téngxùn Yuánbǎo 腾讯元宝)
  • ima (ima ima)
  • Sogou (Sōugǒu 搜狗) Input Method
  • QQ (QQ QQ) Browser
  • Tencent Docs (Téngxùn Wéndàng 腾讯文档)
  • Tencent Maps (Téngxùn Dìtú 腾讯地图)
  • Tencent Lexiang (Téngxùn Lèxiǎng 腾讯乐享)

Users of these products who opt for the DeepSeek (Shēndù Qiúsuǒ 深度求索) R1 “Deep Thinking” model can now tap into the latest deep thinking, programming, and long-text processing capabilities of DeepSeek (Shēndù Qiúsuǒ 深度求索) R1-0528.

This kind of integration highlights how different players are leveraging each other’s strengths to push the whole Chinese AI ecosystem forward.

Tencent Hunyuan TurboS Global Rankings (May 2025)
  • Chatbot Arena: Top 8 Global
  • Science Capabilities (Code, Math): Top 10 Global
ExpatInvest China Logo

ExpatInvest China

Grow Your RMB in China:

  • Invest Your RMB Locally
  • Buy & Sell Online in CN¥
  • No Lock-In Periods
  • English Service & Data
  • Start with Only ¥1,000
View Funds & Invest
Decorative Image

The Future is Now: Domestic Computing Power and AI Models are Reshaping China’s Tech Landscape

Huawei’s (Huáwèi 华为) Pangu (Pángǔ 盘古) Ultra MoE is more than just a technical achievement; it’s a landmark moment.

It underscores a powerful trend: the rise of independently developed Chinese AI, built on robust domestic computing power.

Combined with the rapid advancements from companies like DeepSeek (Shēndù Qiúsuǒ 深度求索) and Tencent (Téngxùn 腾讯), it’s clear that the innovation engine within China’s AI industry is firing on all cylinders, promising an exciting and transformative future.

Decorative Image

References

In this article
Scroll to Top