Text-to-Speech (TTS)
Data as of May 24, 2026| Model | ArenaB | |||||||
|---|---|---|---|---|---|---|---|---|
| 1.β‘ | Vocu V3.0Vocu | - | 200 | 20 | - | β | 11582 | 1 |
| 2.β‘ | Inworld TTS MAXInworld AI | 2025/6 | 20 | 11 | $5 | β | 21575 | 2 |
| 3.β‘ | CastleFlow v1.0CastleFlow | - | 50 | 10 | - | β | 31574 | 3 |
| 4.β‘ | Orpheus 3BCanopy Labs | 2025/3 | 8 | 7 | $22 | β | 41570 | 4 |
| 5.β‘ | Hume OctaveHume AI | 2025/1 | 100 | 11 | $50 | β | 51563 | 5 |
| 6.β‘ | Papla P1Papla | - | 30 | 15 | - | β | 61562 | 6 |
| 7.β‘ | 2025/4 | 300 | 32 | $30 | β | 71544 | 7 | |
| 8.β‘ | 2026/3 | - | 80 | free | β | - | 8 | |
| 9.β‘ | 2026/3 | - | 9 | $16 | β | - | 8 | |
| 10.β‘ | Dia 1.6BNari Labs | 2025/4 | - | 1 | free | β | - | 8 |
| 11.β‘ | PiperRhasspy / community | - | - | 30 | free | β | - | 8 |
| 12.β‘ | 2026/5 | - | - | - | β | - | 8 | |
| 13.β‘ | 2025/6 | - | 70 | - | β | - | 8 | |
| 14.β‘ | 2025/3 | 1 | 1 | free | β | - | 8 | |
| 15.β‘ | 2024/7 | 5000 | 32 | $150 | β | 81544 | 8 | |
| 16.β‘ | 2026/3 | 100 | 2 | free | β | - | 8 | |
| 17.β‘ | Cartesia Sonic 2Cartesia | 2025/2 | 200 | 25 | $20 | β | 91513 | 9 |
| 18.β‘ | Chatterbox MultilingualResemble AI | 2025/5 | 100 | 23 | free | β | 101506 | 10 |
| 19.β‘ | Kokoro v1.0Community | 2025/1 | 54 | 8 | free | β | 111500 | 11 |
| 20.β‘ | NeuTTS MaxNeuTTS | - | 40 | 8 | - | β | 121479 | 12 |
| 21.β‘ | PlayHT 2.0PlayHT | 2023/8 | 600 | 28 | $50 | β | 131405 | 13 |
| 22.β‘ | StyleTTS 2Community | 2023/6 | 1 | 2 | free | β | 141369 | 14 |
| 23.β‘ | 2025/5 | 200 | 9 | free | β | 151358 | 15 | |
| 24.β‘ | - | 80 | 10 | - | β | 161342 | 16 | |
1.Vocu V3.0
#1Vocu20020
Arena 1582
2.Inworld TTS MAX
#2Inworld AI2011$5
Arena 1575
3.CastleFlow v1.0
#3CastleFlow5010
Arena 1574
4.Orpheus 3Bβ
#4Canopy Labs87$22
Arena 1570
5.Hume Octave
#5Hume AI10011$50
Arena 1563
6.Papla P1
#6Papla3015
Arena 1562
7.
MiniMax Speech-02-HD
#7MiniMax30032$30
Arena 1544
#8
Fish Audio80free
#8
Mistral9$16
10.Dia 1.6Bβ
#8Nari Labs1free
11.Piperβ
#8Rhasspy / community30free
#8
OpenAI
#8
ElevenLabs70
#8
Sesame AI Labs11free
16.
Ming-Omni-TTSβ
#8InclusionAI1002free
17.Cartesia Sonic 2
#9Cartesia20025$20
Arena 1513
18.Chatterbox Multilingualβ
#10Resemble AI10023free
Arena 1506
19.Kokoro v1.0β
#11Community548free
Arena 1500
20.NeuTTS Max
#12NeuTTS408
Arena 1479
21.PlayHT 2.0
#13PlayHT60028$50
Arena 1405
22.StyleTTS 2β
#14Community12free
Arena 1369
24.
Spark TTS
#16iFlytek8010
Arena 1342
Music Generation
Data as of June 9, 2026| Model | Quality | NotesβΌ | ||||||
|---|---|---|---|---|---|---|---|---|
| 1.β‘ | 2026/3 | 8 min | $0.08 | β | 194.0 | 1 | Voices (voice cloning), Custom Models (fine-tuning), My Taste; Suno Studio DAW. quality=editorial est. | |
| 2.β‘ | Suno v5Suno | 2025/10 | 8 min | $0.08 | β | 292.0 | 2 | Most popular consumer music gen |
| 3.β‘ | Udio 2Udio | 2025/6 | 15 min | $0.10 | β | 390.0 | 3 | Longer form; a16z-backed |
| 4.β‘ | 2026/3 | 3 min | - | β | 490.0 | 4 | Full vocals, image-guided gen, negative prompts, structural control (intro/verse/chorus); licensed training data. quality=editorial est. | |
| 5.β‘ | 2026/1 | 4 min | - | β | 590.0 | 5 | Studio-grade humanized vocals, 100+ instrument tones, 14 composition tags for structural control. quality=editorial est. | |
| 6.β‘ | Suno v4.5Suno | 2025/4 | 4 min | $0.08 | β | 688.0 | 6 | Widely used; MIT lawsuit pending |
| 7.β‘ | 2025/5 | 5 min | - | β | 787.0 | 7 | YouTube Music AI; restricted API | |
| 8.β‘ | 2025/11 | 4 min | free | β | 880.0 | 8 | Open weights; instrumental focus | |
| 9.β‘ | Riffusion 3Riffusion | 2025/8 | 3 min | $0.05 | β | 978.0 | 9 | YC-backed; fast generation |
| 10.β‘ | 2023/1 | 5 min | - | β | 1072.0 | 10 | Research preview; superseded by Lyria | |
| 11.β‘ | 2024/6 | 30s | free | β | 1170.0 | 11 | 3.3B params; open research baseline | |
#1
Suno8 min$0.08
Quality 94.0
Voices (voice cloning), Custom Models (fine-tuning), My Taste; Suno Studio DAW. quality=editorial est.
2.Suno v5
#2Suno8 min$0.08
Quality 92.0
Most popular consumer music gen
3.Udio 2
#3Udio15 min$0.10
Quality 90.0
Longer form; a16z-backed
#4
Google DeepMind3 min
Quality 90.0
Full vocals, image-guided gen, negative prompts, structural control (intro/verse/chorus); licensed training data. quality=editorial est.
MiniMax4 min
Quality 90.0
Studio-grade humanized vocals, 100+ instrument tones, 14 composition tags for structural control. quality=editorial est.
6.Suno v4.5
#6Suno4 min$0.08
Quality 88.0
Widely used; MIT lawsuit pending
9.Riffusion 3
#9Riffusion3 min$0.05
Quality 78.0
YC-backed; fast generation
Speech-to-Text (ASR)
Data as of May 11, 2026| Model | WER %B | NotesβΌ | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 1.β‘ | 2026/3 | 2B | 14 | free | β | 115.4 | 1 | Open-source; tops HF Open ASR leaderboard; 525x real-time on consumer GPUs; free API | |
| 2.β‘ | 2024/2 | 1B | 4 | free | β | 106.5 | 2 | EN/ES/FR/DE only; top of HF OpenASR | |
| 3.β‘ | 2025/2 | - | 36 | $4 | β | 96.8 | 3 | Lowest WER in English; 54% lower than Whisper | |
| 4.β‘ | 2025/2 | - | 99 | $3 | β | 87.2 | 4 | Multi-speaker; ElevenLabs first ASR | |
| 5.β‘ | 2024/10 | - | 25 | $6 | β | 77.7 | 5 | Best speaker diarization | |
| 6.β‘ | 2024/10 | 809M | 99 | $6 | β | 68.4 | 6 | Best open baseline; 8x faster than v3 | |
| 7.β‘ | 2023/11 | 1550M | 99 | $6 | β | 38.5 | 7 | Gold standard; 680k hours of training data | |
| 8.β‘ | 2024/9 | - | 99 | $3 | β | 48.5 | 8 | Cheapest Whisper API; 300x real-time | |
| 9.β‘ | 2024/7 | - | 99 | $3 | β | 58.5 | 9 | Fastest inference via LPU | |
| 10.β‘ | 2024/9 | - | 100 | $5 | β | 29.1 | 10 | Hallucination-resistant; EU-hosted | |
| 11.β‘ | 2024/10 | 27M | 1 | free | β | 19.8 | 11 | On-device; English only | |
| 12.β‘ | 2026/5 | - | 99 | - | β | - | 12 | Live transcription Realtime API model. Pricing: $0.017/minute. Same launch as GPT-Realtime-2. | |
| 13.β‘ | 2026/5 | - | - | - | β | - | 13 | Live translation Realtime API model. Pricing: $0.034/minute. | |
| 14.β‘ | 2026/4 | ? | 1 | - | β | - | 14 | Newest XiaomiMiMo release; Mandarin-focused; flagged by daily audit | |
#1
Cohere2B14free
WER % 5.4
Open-source; tops HF Open ASR leaderboard; 525x real-time on consumer GPUs; free API
2.
NVIDIA Canary 1Bβ
#2NVIDIA1B4free
WER % 6.5
EN/ES/FR/DE only; top of HF OpenASR
3.
Deepgram Nova-3
#3Deepgram36$4
WER % 6.8
Lowest WER in English; 54% lower than Whisper
5.
AssemblyAI Universal-2
#5AssemblyAI25$6
WER % 7.7
Best speaker diarization
8.
Fireworks Whisper-v3
#8Fireworks AI99$3
WER % 8.5
Cheapest Whisper API; 300x real-time
9.
Groq Whisper Large v3
#9Groq99$3
WER % 8.5
Fastest inference via LPU
10.
Gladia Whisper-Zero
#10Gladia100$5
WER % 9.1
Hallucination-resistant; EU-hosted
11.
Moonshine Tinyβ
#11Useful Sensors27M1free
WER % 9.8
On-device; English only
#12
OpenAI99
Live transcription Realtime API model. Pricing: $0.017/minute. Same launch as GPT-Realtime-2.
#14
Xiaomi MiMo?1
Newest XiaomiMiMo release; Mandarin-focused; flagged by daily audit