DeepSeek V3.2 keeps Open Source at the Frontier

DeepSeek V3.2 Speciale competes at gold-level on math, while DeepSeek V3.2 Thinking offers frontier-level capabilities for coding and agentic tasks at a fraction of the cost of leading AI models.

Dec 05, 2025

A whale in the water

AI-generated content may be incorrect. — Figure 1. AI art inspired by the DeepSeek release.

DeepSeek Releases V3.2

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. – DeepSeek

Eleven months after the original DeepSeek moment, DeepSeek has done it again: DeepSeek released two AI reasoning models with top-tier performance on benchmarks despite their limited compute resources. DeepSeek released two new models, DeepSeek-V3.2 Standard (available on web/API) and DeepSeek-V3.2 Speciale (API only), with the Speciale version designed for deep reasoning tasks.

DeepSeek open-sourced both of these models under the MIT license and shared weights on HuggingFace. They also published their training details in the technical report “DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models.” DeepSeek made up for limited compute resources with a number of innovations in AI model architecture and training process that made both training and the model itself more efficient without compromising performance.

Benchmark Performance

The headline benchmark achievement is that DeepSeek-V3.2 Speciale achieved gold-medal level in three major math and coding competitions, the IMO, IOI, and ICPC World Finals, the first open AI model to achieve this:

Scored 35/42 on the International Mathematical Olympiad (IMO), achieving gold-medal level.
Scored 492/600 on the International Olympiad in Informatics (IOI), placing it 10th globally if it were human.
Solved 10 out of 12 problems on the ICPC World Finals equivalent, ranking it 2nd globally.

Most impressively, DeepSeek-V3.2 Speciale did this at 25-30 times cheaper cost than GPT-5 and Gemini 3 Pro on output tokens.

DeepSeek V3.2 Speciale’s focus on reasoning yields impressive results across several math, reasoning, and coding benchmarks. It surpasses GPT-5 High and Gemini 3.0 Pro on AIME-25, and it outperforms GPT-5 High on CodeForces and LiveCodeBench. On Humanity’s Last Exam, DeepSeek-V3.2 Speciale achieved 30%, while DeepSeek-V3.2 Standard thinking version gets 25%.

A graph of different colored bars

AI-generated content may be incorrect. — Figure 2. DeepSeek-V3.2 benchmark performance in comparison with leading AI models.

However, Claude 4.5 Opus and Gemini 3.0 Pro still edge out DeepSeek V3.2 Speciale in certain reasoning and specific coding tasks. Further, while the cost per token is low, the Speciale model is token-inefficient, sometimes generating much more thinking tokens to solve hard problems than competitors like Gemini 3 Pro. This makes the model slower and undermines its cost advantage even if each token costs much less.

In contrast to DeepSeek V3.2 Speciale pushing the limits on reasoning, the DeepSeek V3.2 Thinking model is built for general-purpose use. DeepSeek V3.2 Thinking is also excellent for its size and SOTA for an open AI model on some tasks, with 73.1% on SWE-Bench verified and 46.4% on TerminalBench 2.0.

Agentic Capabilities

The DeepSeek V3.2 models have particular strength in agentic capabilities. The models introduce interleaved tool usage during thinking traces, a concept first introduced earlier this year with Claude models. This allows the model to use tools while reasoning, though they discard historical thinking traces between turns, which may affect some coding agents.

DeepSeek-V3.2 Technical Details

DeepSeek shared many details on the architecture and training of DeepSeek-V3.2 in their “DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models” paper.

DeepSeek-V3.2 uses the same architecture as its predecessor DeepSeek-V3.2-Exp, which introduced DeepSeek Sparse Attention (DSA).

DeepSeek Sparse Attention (DSA) is an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.

DeepSeek Sparse Attention (DSA) achieves its improved efficiency in attention by dynamically selecting relevant tokens rather than processing the full sequence. A “Lightning Indexer” in DeepSeek Sparse Attention scans for the top relevant tokens rather than the entire context. This significantly reduces pre-filling and decoding costs for large contexts while maintaining quality. (It reduces attention computational complexity from quadratic on the context size L to O(𝐿𝑘), where 𝑘 (≪ 𝐿) is the number of selected tokens.)

A diagram of a computer program

AI-generated content may be incorrect. — Figure 3. The DeepSeek Sparse Attention architecture in DeepSeek-V3.2, where DSA is instantiated under MLA. The green part illustrates how DSA selects the top-k key-value entries according to the indexer.

DeepSeek added resources toward scaling post-training RL in DeepSeek-V3.2, a trend we have seen in the training other AI models such as Grok 4. In DeepSeek-V3.2, they allocated over 10% of compute for the standard model and over 20% for the Special version.

In post-training for DeepSeek-V3.2, they used an RL distillation strategy. Instead of training a general model with RL, they first trained separate domain expert models using RL, in six areas: Mathematics, programming, general logical reasoning, general agentic tasks, agentic coding, and agentic search.

Once the specialist models are prepared, they are used to produce the domain-specific data for the final checkpoint. Experimental results demonstrate that models trained on the distilled data achieve performance levels only marginally below those of domain-specific specialists, with the performance gap being effectively eliminated through subsequent RL training.

A final innovation DeepSeek touts is their scaled-up Agentic Task Synthesis Pipeline:

To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.

This helped to train their AI models for agentic tasks and to get better at interleaving thinking and tool use.

Keeping Open Source Relevant with Innovation

Google, OpenAI, and Anthropic have really pushed the frontiers of AI with their latest AI models - Gemini 3.0 Pro, GPT-5.1, and Claude 4.5 Opus - raising questions of whether other AI labs can keep up.

DeepSeek V3.2 answers them. It offers gold-medal-level reasoning and coding performance in an open-source AI model that is drastically cheaper than closed-source competitors. DeepSeek-V3.2 is significant because it proves open-source AI can compete at the level of Gemini 3 Pro and GPT-5, continuing the commoditization of frontier AI.

While it is a disruptive and powerful AI model, DeepSeek V3.2 still trails slightly behind top-tier closed models like Gemini 3.0 Pro in overall token efficiency and specific benchmarks, and it currently lacks the broad ecosystem of competitors like Gemini or OpenAI.

The well-funded leading AI labs like Google and OpenAI have resource advantages that make it hard for competitors to keep up. With limited resources, DeepSeek cannot put the same amount of compute into training a model as. DeepSeek admits there are gaps in broad world knowledge compared to frontier proprietary models due to fewer total training FLOPs. They are planning to address this in future iterations like V4, but there will still be a gap with the largest leading AI labs.

DeepSeek has made up for that gap with research and innovations: DeepSeek Sparse Attention for a more efficient architecture, and RL training innovations to make their training and inference more efficient, helping to make this AI model SOTA for its size and openness.

Ilya Sutskever, in a recent interview with Dwarkesh Patel, advocated that we are moving from AI scaling to a new age of AI research. Companies like DeepSeek are showing that research and innovation is still a path to competing and winning the AI race.

Neural Foundry

Brillaint breakdown of how DeepSeek is making scalable archictecture innovations actually matter. The RL distillation approach where they train domain experts separately then merge that knowledge is really clever for getting around limited compute budgets. What stands out is how token inefficiency might erode thier cost advantage on harder problems even with the lower per-token pricing. The sparse attention mechanism is fasinating but when Speciale needs way more thinking tokens than Gemini, you end up in a situation where 25x cheaper per token can still mean similr total cost.

Expand full comment

AI Changes Everything

Discussion about this post

Ready for more?