Robo2u

Text-to-Speech (TTS)

Data as of May 24, 2026
1.Vocu V3.0
#1
Vocu20020
Arena 1582
2.Inworld TTS MAX
#2
Inworld AI2011$5
Arena 1575
3.CastleFlow v1.0
#3
CastleFlow5010
Arena 1574
4.Orpheus 3B●
#4
Canopy Labs87$22
Arena 1570
5.Hume Octave
#5
Hume AI10011$50
Arena 1563
6.Papla P1
#6
Papla3015
Arena 1562
7.MiniMax Speech-02-HD
#7
MiniMax30032$30
Arena 1544
Fish Audio80free
9.Voxtral TTS●
#8
Mistral9$16
10.Dia 1.6B●
#8
Nari Labs1free
11.Piper●
#8
Rhasspy / community30free
OpenAI
ElevenLabs70
14.Sesame CSM-1B●
#8
Sesame AI Labs11free
ElevenLabs500032$150
Arena 1544
16.Ming-Omni-TTS●
#8
InclusionAI1002free
17.Cartesia Sonic 2
#9
Cartesia20025$20
Arena 1513
18.Chatterbox Multilingual●
#10
Resemble AI10023free
Arena 1506
19.Kokoro v1.0●
#11
Community548free
Arena 1500
20.NeuTTS Max
#12
NeuTTS408
Arena 1479
21.PlayHT 2.0
#13
PlayHT60028$50
Arena 1405
22.StyleTTS 2●
#14
Community12free
Arena 1369
23.CosyVoice 3●
#15
Alibaba2009free
Arena 1358
24.Spark TTS
#16
iFlytek8010
Arena 1342

Music Generation

Data as of June 9, 2026
Suno8 min$0.08
Quality 94.0
Voices (voice cloning), Custom Models (fine-tuning), My Taste; Suno Studio DAW. quality=editorial est.
2.Suno v5
#2
Suno8 min$0.08
Quality 92.0
Most popular consumer music gen
3.Udio 2
#3
Udio15 min$0.10
Quality 90.0
Longer form; a16z-backed
Google DeepMind3 min
Quality 90.0
Full vocals, image-guided gen, negative prompts, structural control (intro/verse/chorus); licensed training data. quality=editorial est.
MiniMax4 min
Quality 90.0
Studio-grade humanized vocals, 100+ instrument tones, 14 composition tags for structural control. quality=editorial est.
6.Suno v4.5
#6
Suno4 min$0.08
Quality 88.0
Widely used; MIT lawsuit pending
Google DeepMind5 min
Quality 87.0
YouTube Music AI; restricted API
Stability AI4 minfree
Quality 80.0
Open weights; instrumental focus
9.Riffusion 3
#9
Riffusion3 min$0.05
Quality 78.0
YC-backed; fast generation
#10
Google5 min
Quality 72.0
Research preview; superseded by Lyria
#11
Meta30sfree
Quality 70.0
3.3B params; open research baseline

Speech-to-Text (ASR)

Data as of May 11, 2026
Cohere2B14free
WER % 5.4
Open-source; tops HF Open ASR leaderboard; 525x real-time on consumer GPUs; free API
2.NVIDIA Canary 1B●
#2
NVIDIA1B4free
WER % 6.5
EN/ES/FR/DE only; top of HF OpenASR
3.Deepgram Nova-3
#3
Deepgram36$4
WER % 6.8
Lowest WER in English; 54% lower than Whisper
ElevenLabs99$3
WER % 7.2
Multi-speaker; ElevenLabs first ASR
5.AssemblyAI Universal-2
#5
AssemblyAI25$6
WER % 7.7
Best speaker diarization
OpenAI809M99$6
WER % 8.4
Best open baseline; 8x faster than v3
OpenAI1550M99$6
WER % 8.5
Gold standard; 680k hours of training data
8.Fireworks Whisper-v3
#8
Fireworks AI99$3
WER % 8.5
Cheapest Whisper API; 300x real-time
9.Groq Whisper Large v3
#9
Groq99$3
WER % 8.5
Fastest inference via LPU
10.Gladia Whisper-Zero
#10
Gladia100$5
WER % 9.1
Hallucination-resistant; EU-hosted
11.Moonshine Tiny●
#11
Useful Sensors27M1free
WER % 9.8
On-device; English only
OpenAI99
Live transcription Realtime API model. Pricing: $0.017/minute. Same launch as GPT-Realtime-2.
OpenAI
Live translation Realtime API model. Pricing: $0.034/minute.
14.MiMo-V2.5-ASR●
#14
Xiaomi MiMo?1
Newest XiaomiMiMo release; Mandarin-focused; flagged by daily audit