Reasoning / Math Models
Data as of June 18, 2026| Model | FrMathA | AIMEB | MATH L5C | NotesβΌ | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.β‘ | 2025/12 | ? | 128K | $2/$14 | 73 | 130.34 | πποΈπ | β | 240.7 | 296.1 | 198.1 | 1.7 | - | |
| 2.β‘ | 2026/4 | 1T (32B active, MoE) | 256K | - | - | - | πποΈ | β | - | 196.4 | - | 2.3 | - | |
| 3.β‘ | 2025/4 | ? | 128K | - | 161 | 21.94 | π | β | - | - | 297.8 | 3 | - | |
| 4.β‘ | 2026/3 | ? | 128K | - | - | - | πποΈπ | β | 150.0 | - | - | 3.3 | - | |
| 5.β‘ | 2025/4 | ? | 200K | - | 118 | 5.38 | π | β | - | - | 397.8 | 3.3 | - | |
| 6.β‘ | 2025/7 | 754B MoE | 200K | - | - | - | π | β | - | 495.3 | - | 3.3 | Open-weight MoE; basis for Big Pickle; strong reasoning + tool use | |
| 7.β‘ | 2026/5 | 74B (4B active, MoE) | ? | - | - | - | π | β | - | - | - | 3.7 | Pre-RL base checkpoint; trained on AMD hardware. Marks Zyphra's move beyond small-MoE experimentation. Companion ZAYA1-VL-8B (700M active) released alongside. | |
| 8.β‘ | 2026/4 | 120B (12B active, MoE) | 1M | - | - | - | π | β | - | - | - | 3.7 | Hybrid Mamba-Attention MoE w/ LatentMoE; trained on 25T tokens; 2.2x throughput vs GPT-OSS-120B, 7.5x vs Qwen3.5-122B; native MTP speculative decoding | |
| 9.β‘ | 2026/4 | 295B (21B active, MoE) | 256K | - | - | - | π | β | - | - | - | 3.7 | Open-weight Hunyuan 3 preview; 192 experts, top-8 routing; strong STEM/reasoning release | |
| 10.β‘ | 2025/9 | ? | 200K | $3/$15 | - | - | πποΈ | β | - | - | 497.7 | 3.7 | - | |
| 11.β‘ | 2026/2 | ? | 1M | $2/$12 | 109 | 29.71 | πποΈπ | β | 436.9 | 395.6 | - | 3.7 | - | |
| 12.β‘ | 2025/10 | 754B MoE | 200K | free/free | - | - | π | β | - | - | - | 3.7 | Stealth model (GLM 4.6); free on OpenCode Zen; reasoning + tool calling; text-only | |
| 13.β‘ | 2026/4 | 754B MoE | 200K | $0.95/$3 | 74 | 1.64 | πποΈ | β | - | 595.3 | - | 3.7 | - | |
| 14.β‘ | 2026/5 | ? | 200K | $15/$75 | - | - | πποΈ | β | - | - | - | 3.7 | Incremental over 4.7; high CoT faithfulness; no steganographic reasoning found in white-box SAE analysis | |
| 15.β‘ | 2026/4 | ? | 200K | $15/$75 | - | - | πποΈ | β | - | - | - | 3.7 | - | |
| 16.β‘ | 2026/4 | ? | 400K | - | - | - | πποΈπ | β | - | - | - | 3.7 | - | |
| 17.β‘ | 2026/4 | 1.6T (49B active, MoE) | 1M | $2/$3 | - | - | π | β | - | - | - | 3.7 | NIST/CAISI eval (May 2026): most capable PRC model tested, ~8mo behind US frontier; IRT-Elo 800 vs GPT-5.5 1260, Opus 4.6 999. CAISI scores: GPQA-Diamond 90%, OTIS-AIME-2025 97%, SWE-Bench Verified 74%, ARC-AGI-2 46%. Pricing is DeepSeek V4 Pro (developer-reported). | |
| 18.β‘ | 2026/4 | 48B (3B active, MoE) | 1M | - | - | - | π | β | - | - | - | 3.7 | Novel architecture: 3:1 KDA-to-MLA ratio (Kimi Delta Attention + Multi-head Latent Attention). 75% smaller KV cache, 6x decoding throughput at 1M context. Paired with FlashKDA CUTLASS kernels. Efficiency-focused, not flagship-capability. | |
| 19.β‘ | 2026/2 | ? | 200K | - | - | - | πποΈ | β | - | - | - | 3.7 | - | |
| 20.β‘ | 2025/10 | ? | 200K | - | - | - | πποΈ | β | - | - | - | 3.7 | - | |
| 21.β‘ | 2026/5 | 1T (63B active, MoE) | 262K | - | - | - | π | β | - | - | - | 3.7 | Trillion-parameter thinking model w/ adaptive reasoning; tuned for coding agents, tool use, long-horizon tasks; high/xhigh reasoning modes | |
| 22.β‘ | 2026/4 | - | - | - | - | - | π | β | - | - | - | 3.7 | xAI reasoning/agent flagship. Scores pending public benchmark publication. | |
| 23.β‘ | 2026/3 | 230B (10B active, MoE) | 200K | - | - | - | π | β | - | - | - | 3.7 | MiniMax reasoning model (text-only). Open weights, non-commercial license. Scores: AA Intelligence Index 50. | |
| 24.β‘ | 2025/12 | - | - | - | - | - | π | β | - | - | - | 3.7 | Iterative GPT-5 release. Scores pending. | |
| 25.β‘ | 2026/3 | - | - | - | - | - | πποΈ | β | - | - | - | 3.7 | Fast tier of Gemini 3 family. Scores pending. | |
| 26.β‘ | 2025/3 | - | - | - | - | - | πποΈ | β | - | - | - | 3.7 | Previous-gen Gemini flagship. Largely superseded by 3.1 Pro. | |
| 27.β‘ | 2026/1 | - | - | - | - | - | π | β | - | - | - | 3.7 | Predecessor to K2.6. Scores pending. | |
| 28.β‘ | 2026/2 | - | - | - | - | - | π | β | - | - | - | 3.7 | DeepSeek V3.2-Exp release. Scores pending. | |
| 29.β‘ | 2026/6 | ? | 1M | $0.60/$2 | - | - | πποΈ | β | - | - | - | 3.7 | Open-weight frontier coding + 1M context + native multimodality. SWE-Bench Pro 59.0, Terminal-Bench 2.1 66.0, BrowseComp 83.5. MSA: ~9x faster prefill / 15x decode at 1M. | |
| 30.β‘ | 2026/5 | ? | 1M | $2/$9 | - | - | πποΈπ | β | - | - | - | 3.7 | ~4x faster than prior frontier; beats Gemini 3.1 Pro on coding/agentic (Terminal-Bench 2.1 76.2, MCP Atlas 83.6, CharXiv 84.2). Dynamic thinking on by default. | |
| 31.β‘ | 2026/6 | ? | 1M | $0.40/$2 | - | - | πποΈπ₯ | β | - | - | - | 3.7 | AA Intelligence Index 53.3 (coding 46.5). Multimodal (text/image/video in). ScreenSpot Pro 79.0 GUI grounding. | |
| 32.β‘ | 2026/6 | 550B (55B active, MoE) | 1M | - | 146.3 | - | π | β | - | - | - | 3.7 | AA Intelligence Index 48; AA-Omniscience 78.7 (top non-hallucination in set). ~20T train tokens, 11 langs + 43 prog langs. | |
| 33.β‘ | 2025/11 | 27B (Gemma 3) | 128K | - | - | - | πποΈ | β | - | - | - | 3.7 | Southeast Asian multilingual (11 langs incl Malay); Gemma 3 27B base. #4 on SEA-HELM, #1 Tamil/Filipino; runs on a 32GB laptop. | |
| 34.β‘ | 2024/1 | 5B | 20K | - | - | - | π | β | - | - | - | 3.7 | Malaysia LLM (Mistral-based); Malay slang/colloquialisms + 16 regional MY languages, ~90B Malay tokens. | |
| 35.β‘ | 2024/7 | 7B (Qwen2) | 32K | - | - | - | π | β | - | - | - | 3.7 | Southeast Asian multilingual (Malay/Indonesian/Thai/Vietnamese/β¦); Qwen2-based, SOTA for its size on SEA tasks. | |
| 36.β‘ | 2026/5 | ? | 1M | $3/$8 | - | - | πποΈ | β | - | - | - | 3.7 | Qwen3.7 flagship; agent-centric (long-horizon ~35h). AA Intelligence Index 56.6 (#5, top Chinese at launch); Terminal-Bench 2.0 69.7. | |
| 37.β‘ | 2026/4 | ? | 128K | - | - | - | πποΈ | β | - | 695.3 | - | 4 | - | |
| 38.β‘ | 2025/9 | ? | 128K | - | - | - | πποΈ | β | - | - | 597.1 | 4 | - | |
| 39.β‘ | 2025/8 | ? | 128K | - | - | - | πποΈπ | β | - | - | 696.8 | 4.3 | - | |
| 40.β‘ | 2026/6 | 35B active (SMoE) | 256K | - | - | - | π | β | - | 794.5 | - | 4.3 | Microsoft's first in-house reasoning model (Build 2026); no distillation from OpenAI/Anthropic. AIME 2026 94.5; SWE-Bench Pro 53 (~Opus 4.6). Foundry private preview. | |
| 41.β‘ | 2025/1 | 685B | 128K | $0.55/$2 | 60 | 0.84 | π | β | - | - | 796.6 | 4.7 | - | |
| 42.β‘ | 2026/2 | ? | 200K | $5/$25 | 40 | 1.78 | πποΈ | β | 340.7 | 894.4 | - | 5 | - | |
| 43.β‘ | 2025/1 | ? | 128K | - | 160 | 7.12 | π | β | - | - | 896.5 | 5 | - | |
| 44.β‘ | - | ? | 128K | - | - | - | π | β | - | 994.2 | - | 5 | - | |
| 45.β‘ | - | 31B | 128K | - | - | - | π | β | - | 1089.2 | - | 5.3 | - | |
#3.3
Z.ai754B MoE200Kπ
AIME 95.3
Open-weight MoE; basis for Big Pickle; strong reasoning + tool use
#3.7
Zyphra74B (4B active, MoE)?π
Pre-RL base checkpoint; trained on AMD hardware. Marks Zyphra's move beyond small-MoE experimentation. Companion ZAYA1-VL-8B (700M active) released alongside.
#3.7
NVIDIA120B (12B active, MoE)1Mπ
Hybrid Mamba-Attention MoE w/ LatentMoE; trained on 25T tokens; 2.2x throughput vs GPT-OSS-120B, 7.5x vs Qwen3.5-122B; native MTP speculative decoding
#3.7
Tencent Hunyuan295B (21B active, MoE)256Kπ
Open-weight Hunyuan 3 preview; 192 experts, top-8 routing; strong STEM/reasoning release
#3.7
OpenCode Zen754B MoE200Kfree/freeπ
Stealth model (GLM 4.6); free on OpenCode Zen; reasoning + tool calling; text-only
#3.7
Anthropic?200K$15/$75πποΈ
Incremental over 4.7; high CoT faithfulness; no steganographic reasoning found in white-box SAE analysis
#3.7
Anthropic?200K$15/$75πποΈ
#3.7
OpenAI?400KπποΈπ
#3.7
DeepSeek1.6T (49B active, MoE)1M$2/$3π
NIST/CAISI eval (May 2026): most capable PRC model tested, ~8mo behind US frontier; IRT-Elo 800 vs GPT-5.5 1260, Opus 4.6 999. CAISI scores: GPQA-Diamond 90%, OTIS-AIME-2025 97%, SWE-Bench Verified 74%, ARC-AGI-2 46%. Pricing is DeepSeek V4 Pro (developer-reported).
#3.7
Moonshot48B (3B active, MoE)1Mπ
Novel architecture: 3:1 KDA-to-MLA ratio (Kimi Delta Attention + Multi-head Latent Attention). 75% smaller KV cache, 6x decoding throughput at 1M context. Paired with FlashKDA CUTLASS kernels. Efficiency-focused, not flagship-capability.
#3.7
Anthropic?200KπποΈ
#3.7
Anthropic?200KπποΈ
#3.7
InclusionAI1T (63B active, MoE)262Kπ
Trillion-parameter thinking model w/ adaptive reasoning; tuned for coding agents, tool use, long-horizon tasks; high/xhigh reasoning modes
#3.7
MiniMax230B (10B active, MoE)200Kπ
MiniMax reasoning model (text-only). Open weights, non-commercial license. Scores: AA Intelligence Index 50.
#3.7
MiniMax?1M$0.60/$2πποΈ
Open-weight frontier coding + 1M context + native multimodality. SWE-Bench Pro 59.0, Terminal-Bench 2.1 66.0, BrowseComp 83.5. MSA: ~9x faster prefill / 15x decode at 1M.
#3.7
Google?1M$2/$9πποΈπ
~4x faster than prior frontier; beats Gemini 3.1 Pro on coding/agentic (Terminal-Bench 2.1 76.2, MCP Atlas 83.6, CharXiv 84.2). Dynamic thinking on by default.
#3.7
Alibaba?1M$0.40/$2πποΈπ₯
AA Intelligence Index 53.3 (coding 46.5). Multimodal (text/image/video in). ScreenSpot Pro 79.0 GUI grounding.
#3.7
NVIDIA550B (55B active, MoE)1M146.3π
AA Intelligence Index 48; AA-Omniscience 78.7 (top non-hallucination in set). ~20T train tokens, 11 langs + 43 prog langs.
#3.7
AI Singapore27B (Gemma 3)128KπποΈ
Southeast Asian multilingual (11 langs incl Malay); Gemma 3 27B base. #4 on SEA-HELM, #1 Tamil/Filipino; runs on a 32GB laptop.
#3.7
Mesolitica5B20Kπ
Malaysia LLM (Mistral-based); Malay slang/colloquialisms + 16 regional MY languages, ~90B Malay tokens.
#3.7
Alibaba DAMO7B (Qwen2)32Kπ
Southeast Asian multilingual (Malay/Indonesian/Thai/Vietnamese/β¦); Qwen2-based, SOTA for its size on SEA tasks.
#3.7
Alibaba?1M$3/$8πποΈ
Qwen3.7 flagship; agent-centric (long-horizon ~35h). AA Intelligence Index 56.6 (#5, top Chinese at launch); Terminal-Bench 2.0 69.7.
#4.3
Microsoft AI35B active (SMoE)256Kπ
AIME 94.5
Microsoft's first in-house reasoning model (Build 2026); no distillation from OpenAI/Anthropic. AIME 2026 94.5; SWE-Bench Pro 53 (~Opus 4.6). Foundry private preview.