Vision (Image Understanding)

Data as of June 12, 2026

Model							ArenaB	MMMU-ProA	MathVis+PyB	V*+PyB
1.□	Claude Opus 4.6Anthropic	2026/2	?	200K	$5/$25	○	11295	-	584.6	586.4	1
2.□	Gemini 3 ProGoogle	2025/11	?	1M	$2/$12	○	21287	-	-	-	2
3.□	GPT 5.2OpenAI	2025/12	?	128K	$2/$14	○	31278	-	-	-	3
4.□	Claude Sonnet 4.6Anthropic	2026/2	?	200K	$3/$15	○	41271	-	-	-	4
5.□	Gemini 3 FlashGoogle	2025/12	?	1M	$0.50/$3	○	51268	-	-	-	5
6.□	Dola Seed 2.0ByteDance	2026/2	?	128K	-	○	61257	-	-	-	6
7.□	GLM 5V TurboZ.ai	2026/4	744B (40B active, MoE)	200K	$1/$4	○	-	-	-	-	7
8.□	GPT 5.4OpenAI	2026/3	?	128K	$2/$14	○	-	-	196.1	198.4	7
9.□	GPT 5.1OpenAI	2025/11	?	128K	$2/$14	○	71249	-	-	-	7
10.□	Mistral Medium 3.5Mistral	2026/4	128B	256K	$2/$8	●	-	-	-	-	7
11.□	Claude Opus 4.8Anthropic	2026/5	?	200K	$15/$75	○	-	-	-	-	7
12.□	Claude Opus 4.7Anthropic	2026/4	?	200K	$15/$75	○	-	-	-	-	7
13.□	Claude Haiku 4.5Anthropic	2025/10	?	200K	$1/$5	○	-	-	-	-	7
14.□	GPT-5.5OpenAI	2026/4	?	400K	-	○	-	-	-	-	7
15.□	Gemini 3.1 ProGoogle	2026/2	?	1M	$2/$12	○	-	-	295.7	296.9	7
16.□	Kimi K2.6Moonshot	2026/4	1T	128K	-	●	-	179.4	393.2	396.9	7
17.□	Ming-flash-omni-PreviewInclusionAI	2025/10	?	?	-	●	-	-	-	-	7
18.□	GLM 5.1Z.ai	2026/4	-	-	-	●	-	-	-	-	7
19.□	DeepSeek V4DeepSeek	2026/4	-	-	-	○	-	-	-	-	7
20.□	Qwen 3.6 PlusAlibaba	2026/4	-	-	-	●	-	-	-	-	7
21.□	DeepSeek V3.2DeepSeek	2026/2	-	-	-	●	-	-	-	-	7
22.□	GLM 4.6Z.ai	2025/11	-	-	-	●	-	-	-	-	7
23.□	MiniMax M3MiniMax	2026/6	?	1M	$0.60/$2	●	-	-	-	-	7
24.□	Gemini 3.5 FlashGoogle	2026/5	?	1M	$2/$9	○	-	-	-	-	7
25.□	Qwen3.7 PlusAlibaba	2026/6	?	1M	$0.40/$2	○	-	-	-	-	7
26.□	Qwen3.7 MaxAlibaba	2026/5	?	1M	$3/$8	○	-	-	-	-	7
27.□	Gemini 2.5 ProGoogle	2025/3	?	1M	$1/$10	○	81246	-	-	-	8
28.□	Kimi K2.5 ThinkingMoonshot	2026/1	1T	128K	-	○	91245	-	485.0	486.9	9
29.□	Grok 4.20 ReasoningxAI	-	?	128K	-	○	101243	-	-	-	10
30.□	Qwen 3.5 397BAlibaba	2026/2	397B	128K	-	●	111240	-	-	-	11
31.□	GPT 5OpenAI	2025/8	?	128K	$2/$14	○	121225	-	-	-	12
32.□	Qwen 3 VL 235BAlibaba	2025/9	235B	128K	-	●	131215	-	-	-	13
33.□	Llama 4 MaverickMeta	2025/4	400B	1M	-	●	141147	-	-	-	14

Claude Opus 4.6

Anthropic?200K$5/$25

Arena 1295MathVis+Py 84.6V*+Py 86.4

Gemini 3 Pro

Google?1M$2/$12

Arena 1287

GPT 5.2

OpenAI?128K$2/$14

Arena 1278

Claude Sonnet 4.6

Anthropic?200K$3/$15

Arena 1271

Gemini 3 Flash

Google?1M$0.50/$3

Arena 1268

Dola Seed 2.0

ByteDance?128K

Arena 1257

GLM 5V Turbo

Z.ai744B (40B active, MoE)200K$1/$4

GPT 5.4

OpenAI?128K$2/$14

MathVis+Py 96.1V*+Py 98.4

GPT 5.1

OpenAI?128K$2/$14

Arena 1249

10.

Mistral Medium 3.5●

Mistral128B256K$2/$8

11.

Claude Opus 4.8

Anthropic?200K$15/$75

12.

Claude Opus 4.7

Anthropic?200K$15/$75

13.

Claude Haiku 4.5

Anthropic?200K$1/$5

14.

GPT-5.5

OpenAI?400K

15.

Gemini 3.1 Pro

Google?1M$2/$12

MathVis+Py 95.7V*+Py 96.9

16.

Kimi K2.6●

Moonshot1T128K

MMMU-Pro 79.4MathVis+Py 93.2V*+Py 96.9

17.

Ming-flash-omni-Preview●

InclusionAI??

18.

GLM 5.1●

Z.ai

19.

DeepSeek V4

DeepSeek

20.

Qwen 3.6 Plus●

Alibaba

21.

DeepSeek V3.2●

DeepSeek

22.

GLM 4.6●

Z.ai

23.

MiniMax M3●

MiniMax?1M$0.60/$2

24.

Gemini 3.5 Flash

Google?1M$2/$9

25.

Qwen3.7 Plus

Alibaba?1M$0.40/$2

26.

Qwen3.7 Max

Alibaba?1M$3/$8

27.

Gemini 2.5 Pro

Google?1M$1/$10

Arena 1246

28.

Kimi K2.5 Thinking

Moonshot1T128K

Arena 1245MathVis+Py 85.0V*+Py 86.9

29.

Grok 4.20 Reasoning

#10

xAI?128K

Arena 1243

30.

Qwen 3.5 397B●

#11

Alibaba397B128K

Arena 1240

31.

GPT 5

#12

OpenAI?128K$2/$14

Arena 1225

32.

Qwen 3 VL 235B●

#13

Alibaba235B128K

Arena 1215

33.

Llama 4 Maverick●

#14

Meta400B1M

Arena 1147

Image Generation

Data as of June 9, 2026

Model						ArenaB		Notes▼
1.□	GPT Image 2OpenAI	2026/4	?	-	○	11385	1	Successor to GPT-Image-1.5; supports detailed instruction following, accurate placement, available on Vercel AI Gateway
2.□	Reve 2.0Reve	2026/6	4K	-	○	21273	2	#2 on text-to-image arena. Pitched as 'best 4K image model'; introduces layout-based generation/editing — precise control over where every object and text region lands
3.□	Gemini 3.1 Flash ImageGoogle	2026/2	2048×2048	$0.04	○	31269	3	-
4.□	MAI Image 2.5Microsoft AI	2026/5	2048×2048	-	○	41253	4	#4 on text-to-image arena (#3 per Microsoft), #2 on image-to-image; successor to MAI Image 2
5.□	GPT Image 1.5 HFOpenAI	2025/12	2048×2048	$0.08	○	51241	5	-
6.□	Grok Imagine (Quality)xAI	2026/5	-	-	○	61234	6	Quality variant added to LMArena May 6 2026
7.□	Gemini 3 Pro ImageGoogle	2025/11	4K	$0.24	○	71232	7	-
8.□	Ideogram 4.0Ideogram	2026/6	2048×2048	-	●	81204	8	#9 overall on text-to-image arena, #1 open-weight model. Open weights at launch; trained with bounding boxes tied to region descriptions for layout control; excels at text rendering and commercial design
9.□	Uni 1.1 MaxLuma AI	2026/5	-	-	○	91192	9	-
10.□	MAI Image 2Microsoft AI	-	2048×2048	-	○	101183	10	-
11.□	Uni 1.1Luma AI	2026/5	-	-	○	111175	11	-
12.□	Grok ImaginexAI	2025/7	2048×2048	free	○	121172	12	-
13.□	Qwen Image 2.0 ProAlibaba	2026/4	-	-	●	131168	13	-
14.□	Reve v1.5Reve	-	2048×2048	-	○	141164	14	-
15.□	Flux 2 MaxBlack Forest Labs	-	2048×2048	$0.08	○	151164	15	-
16.□	Grok Imagine ProxAI	-	-	-	○	161160	16	-
17.□	Riverflow 2.0 ProSourceful	2025/11	4K	-	○	-	16	Agentic text-to-image with autonomous self-correction; #1 Text-to-Image & Image Editing on Artificial Analysis (Feb 2026). $0.15 per 1K/2K image, $0.33 per 4K.
18.□	Flux 2 ProBlack Forest Labs	-	-	-	○	171156	17	-
19.□	Flux 2 FlexBlack Forest Labs	-	-	-	○	181156	18	-
20.□	Gemini 2.5 Flash Image (Nano Banana)Google	2025/8	1024×1024	$0.03	○	191152	19	Codename 'Nano Banana' on LMArena leaderboard; conversational image editing, character consistency
21.□	Hunyuan Image 3.0Tencent	2025/9	2048×2048	free	●	201151	20	China API
22.□	Flux 2 DevBlack Forest Labs	-	-	-	●	211149	21	-
23.□	Imagen Ultra 4.0Google	-	2048×2048	$0.06	○	221148	22	-
24.□	Seedream 4.5ByteDance	2025/12	2048×2048	$0.03	○	231142	23	China API
25.□	Seedream 4 2KByteDance	-	2048×2048	-	○	241141	24	-
26.□	Wan 2.6 T2IAlibaba	2025/12	1536×1536	free	●	251134	25	-
27.□	Imagen 4.0Google	-	-	-	○	261130	26	-
28.□	Qwen Image 2512Alibaba	-	-	-	●	271130	27	-
29.□	GPT Image 1OpenAI	2025/4	2048×2048	$0.04	○	281115	28	-
30.□	Recraft v4Recraft	-	2048×2048	$0.04	○	291099	29	-
31.□	Ideogram v3Ideogram	2025/3	2048×2048	$0.08	○	301049	30	-
32.□	DALL-E 3OpenAI	2023/11	1792×1024	$0.08	○	31968.0	31	-
33.□	SD 3.5 LargeStability AI	2024/10	1024×1024	free	●	32938.0	32	-

GPT Image 2

OpenAI?

Arena 1385

Successor to GPT-Image-1.5; supports detailed instruction following, accurate placement, available on Vercel AI Gateway

Reve 2.0

Reve4K

Arena 1273

#2 on text-to-image arena. Pitched as 'best 4K image model'; introduces layout-based generation/editing — precise control over where every object and text region lands

Gemini 3.1 Flash Image

Google2048×2048$0.04

Arena 1269

MAI Image 2.5

Microsoft AI2048×2048

Arena 1253

#4 on text-to-image arena (#3 per Microsoft), #2 on image-to-image; successor to MAI Image 2

GPT Image 1.5 HF

OpenAI2048×2048$0.08

Arena 1241

Grok Imagine (Quality)

xAI

Arena 1234

Quality variant added to LMArena May 6 2026

Gemini 3 Pro Image

Google4K$0.24

Arena 1232

Ideogram 4.0●

Ideogram2048×2048

Arena 1204

#9 overall on text-to-image arena, #1 open-weight model. Open weights at launch; trained with bounding boxes tied to region descriptions for layout control; excels at text rendering and commercial design

Uni 1.1 Max

Luma AI

Arena 1192

10.MAI Image 2

#10

Microsoft AI2048×2048

Arena 1183

11.

Uni 1.1

#11

Luma AI

Arena 1175

12.

Grok Imagine

#12

xAI2048×2048free

Arena 1172

13.

Qwen Image 2.0 Pro●

#13

Alibaba

Arena 1168

14.Reve v1.5

#14

Reve2048×2048

Arena 1164

15.

Flux 2 Max

#15

Black Forest Labs2048×2048$0.08

Arena 1164

16.

Grok Imagine Pro

#16

xAI

Arena 1160

17.

Riverflow 2.0 Pro

#16

Sourceful4K

Agentic text-to-image with autonomous self-correction; #1 Text-to-Image & Image Editing on Artificial Analysis (Feb 2026). $0.15 per 1K/2K image, $0.33 per 4K.

18.

Flux 2 Pro

#17

Black Forest Labs

Arena 1156

19.

Flux 2 Flex

#18

Black Forest Labs

Arena 1156

20.

Gemini 2.5 Flash Image (Nano Banana)

#19

Google1024×1024$0.03

Arena 1152

Codename 'Nano Banana' on LMArena leaderboard; conversational image editing, character consistency

21.

Hunyuan Image 3.0●

#20

Tencent2048×2048free

Arena 1151

China API

22.

Flux 2 Dev●

#21

Black Forest Labs

Arena 1149

23.

Imagen Ultra 4.0

#22

Google2048×2048$0.06

Arena 1148

24.

Seedream 4.5

#23

ByteDance2048×2048$0.03

Arena 1142

China API

25.

Seedream 4 2K

#24

ByteDance2048×2048

Arena 1141

26.

Wan 2.6 T2I●

#25

Alibaba1536×1536free

Arena 1134

27.

Imagen 4.0

#26

Google

Arena 1130

28.

Qwen Image 2512●

#27

Alibaba

Arena 1130

29.

GPT Image 1

#28

OpenAI2048×2048$0.04

Arena 1115

30.Recraft v4

#29

Recraft2048×2048$0.04

Arena 1099

31.Ideogram v3

#30

Ideogram2048×2048$0.08

Arena 1049

32.

DALL-E 3

#31

OpenAI1792×1024$0.08

Arena 968.0

33.

SD 3.5 Large●

#32

Stability AI1024×1024free

Arena 938.0

Video Generation

Data as of June 9, 2026

Model							ArenaB		Notes▼
1.□	Seedance 2.0ByteDance	2026/2	15s	1080p	$0.10	○	11450	1	China API; via Dreamina
2.□	Veo 3.1Google	2025/10	8s	4K	$0.75	○	21371	2	Vertex AI only
3.□	Sora 2 ProOpenAI	2025/9	25s	1080p	$0.50	○	31364	3	SHUTTING DOWN Apr 26
4.□	Grok VideoxAI	-	-	720p	-	○	41361	4	-
5.□	Wan 2.6Alibaba	2025/12	10s	720p	free	●	51349	5	Open weights
6.□	Veo 3Google	2025/5	8s	1080p	$0.50	○	61341	6	-
7.□	Sora 2OpenAI	2025/9	25s	1080p	$0.25	○	71340	7	SHUTTING DOWN Apr 26
8.□	Seedance 1.5 ProByteDance	2025/12	15s	720p	$0.05	○	81259	8	China API
9.□	Runway Gen 4.5Runway	2025/3	10s	1080p	$0.17	○	91242	9	-
10.□	Kling 3.0Kuaishou	2026/2	10s	4K	$0.02	○	-	9	Leads text-to-video arena; native 4K, Multi-Shot Storyboard, dual audio inputs. Extend to 3 min on paid tiers.
11.□	HappyHorse 1.0Alibaba	2026/4	15s	1080p	$0.22	○	-	9	Tops Artificial Analysis Video Arena (T2V & I2V). API/demo only as of Jun 2026 — weights NOT released (HF/GitHub 'coming soon'); 'open-source' framing is marketing. ~15B params, single-stream transformer, 8-step CFG-free (10s 1080p in ~32s), synced audio. Access via Alibaba Cloud Bailian / fal.
12.□	PixVerse v5.6PixVerse	-	-	1080p	-	○	101238	10	-
13.□	Kling 2.6 ProKuaishou	2025/12	15s	1080p	$0.04	○	111218	11	Cheapest quality option
14.□	Ray 3Luma AI	-	-	1080p	-	○	121207	12	-
15.□	Hailuo 2.3MiniMax	2025/10	6s	1080p	-	○	131197	13	China API
16.□	LTX-2.3Lightricks	2026/5	20s	1080p	free	●	141185	14	Open weights; synced audio+video; native 9:16 portrait; fp8 + distilled variants
17.□	LTX-2Lightricks	2026/1	20s	4K	free	●	151175	15	Open weights; first open audiovisual model; 4K @ 50fps
18.□	Hunyuan Video 1.5Tencent	2025/11	5s	720p	free	●	161170	16	Open weights; China API
19.□	Veo 2Google	2024/12	8s	4K	$0.30	○	171164	17	-
20.□	Pika 2.2Pika	2025/2	10s	1080p	-	○	181009	18	-

Seedance 2.0

ByteDance15s1080p$0.10

Arena 1450

China API; via Dreamina

Veo 3.1

Google8s4K$0.75

Arena 1371

Vertex AI only

Sora 2 Pro

OpenAI25s1080p$0.50

Arena 1364

SHUTTING DOWN Apr 26

Grok Video

xAI720p

Arena 1361

Wan 2.6●

Alibaba10s720pfree

Arena 1349

Open weights

Veo 3

Google8s1080p$0.50

Arena 1341

Sora 2

OpenAI25s1080p$0.25

Arena 1340

SHUTTING DOWN Apr 26

Seedance 1.5 Pro

ByteDance15s720p$0.05

Arena 1259

China API

Runway Gen 4.5

Runway10s1080p$0.17

Arena 1242

10.

Kling 3.0

Kuaishou10s4K$0.02

Leads text-to-video arena; native 4K, Multi-Shot Storyboard, dual audio inputs. Extend to 3 min on paid tiers.

11.

HappyHorse 1.0

Alibaba15s1080p$0.22

Tops Artificial Analysis Video Arena (T2V & I2V). API/demo only as of Jun 2026 — weights NOT released (HF/GitHub 'coming soon'); 'open-source' framing is marketing. ~15B params, single-stream transformer, 8-step CFG-free (10s 1080p in ~32s), synced audio. Access via Alibaba Cloud Bailian / fal.

12.PixVerse v5.6

#10

PixVerse1080p

Arena 1238

13.

Kling 2.6 Pro

#11

Kuaishou15s1080p$0.04

Arena 1218

Cheapest quality option

14.

Ray 3

#12

Luma AI1080p

Arena 1207

15.

Hailuo 2.3

#13

MiniMax6s1080p

Arena 1197

China API

16.

LTX-2.3●

#14

Lightricks20s1080pfree

Arena 1185

Open weights; synced audio+video; native 9:16 portrait; fp8 + distilled variants

17.

LTX-2●

#15

Lightricks20s4Kfree

Arena 1175

Open weights; first open audiovisual model; 4K @ 50fps

18.

Hunyuan Video 1.5●

#16

Tencent5s720pfree

Arena 1170

Open weights; China API

19.

Veo 2

#17

Google8s4K$0.30

Arena 1164

20.

Pika 2.2

#18

Pika10s1080p

Arena 1009

Visual Factual Knowledge (WorldVQA)

Data as of April 26, 2026

Model					Overall %B		Notes▼
1.□	Gemini 3 ProGoogle	Frontier VLM	2025/11	○	147.5	1	Top overall F-score on WorldVQA
2.□	Kimi K2.5Moonshot	Frontier VLM	2026/1	●	246.8	2	Author's own model; #2 by 0.7%
3.□	Claude Opus 4.5Anthropic	Frontier VLM	2025/9	○	337.5	3	Strong on Head categories, weaker on Tail
4.□	Seed-1.5-vision-proByteDance	Frontier VLM	2025/10	○	435.2	4	ByteDance Seed visual frontier

Gemini 3 Pro

GoogleFrontier VLM

Overall % 47.5

Top overall F-score on WorldVQA

Kimi K2.5●

MoonshotFrontier VLM

Overall % 46.8

Author's own model; #2 by 0.7%

Claude Opus 4.5

AnthropicFrontier VLM

Overall % 37.5

Strong on Head categories, weaker on Tail

Seed-1.5-vision-pro

ByteDanceFrontier VLM

Overall % 35.2

ByteDance Seed visual frontier

Document Parsing (ParseBench)

Data as of April 25, 2026

Model					OverallB	TablesB	ChartsB	FaithfulB	FormatB	GroundB		Notes▼
1.□	LlamaParse AgenticLlamaIndex	Agentic parser	2026/4	○	184.9	390.7	178.1	389.7	185.2	180.6	1	Highest overall; agentic loop with retries
2.□	Gemini 3 Flash (Thinking High)Google	VLM (high-reasoning)	2025/12	○	275.0	191.5	764.8	190.9	368.3	659.8	2	Tables SOTA; weak on grounding
3.□	Reducto (Agentic)Reducto	Agentic parser	2024/8	○	373.0	880.4	273.4	686.4	857.6	567.1	3	Specialist parsing service; agentic mode
4.□	LlamaParse Cost EffectiveLlamaIndex	Cost-tier parser	2026/4	○	471.9	973.2	366.7	488.0	273.0	758.6	4	<$0.004/page; budget-tier of LlamaParse
5.□	Gemini 3 Flash (Thinking Minimal)Google	VLM (light-reasoning)	2025/12	○	571.0	589.8	664.8	886.2	758.4	856.0	5	Tradeoff vs Thinking High: cheaper, formatting drops
6.□	Chandra-ocr-2Adobe Research	Specialized OCR	2025/9	○	670.1	689.2	565.1	1083.7	461.4	951.2	6	Tables strong; grounding lags
7.□	Gemini 3.1 ProGoogle	VLM (frontier)	2026/2	○	769.1	291.0	941.1	290.2	1052.4	271.0	7	Highest faithfulness; charts surprisingly weak
8.□	ReductoReducto	Specialist parser	2024/8	○	867.8	1070.3	857.0	786.4	956.8	368.7	8	Non-agentic mode; tied with Extend
9.□	Extend (Beta)Extend	Document AI	2024/11	○	967.8	785.9	1040.4	985.0	659.5	468.3	9	YC-backed structured-data extractor
10.□	GPT-5.5 (Reasoning Medium)OpenAI	VLM (reasoning)	2026/4	○	1067.8	490.0	465.5	586.8	560.1	1036.3	10	Tables strong; grounding worst in top 10

LlamaParse Agentic

LlamaIndexAgentic parser

Overall 84.9Tables 90.7Charts 78.1Faithful 89.7Format 85.2Ground 80.6

Highest overall; agentic loop with retries

Gemini 3 Flash (Thinking High)

GoogleVLM (high-reasoning)

Overall 75.0Tables 91.5Charts 64.8Faithful 90.9Format 68.3Ground 59.8

Tables SOTA; weak on grounding

Reducto (Agentic)

ReductoAgentic parser

Overall 73.0Tables 80.4Charts 73.4Faithful 86.4Format 57.6Ground 67.1

Specialist parsing service; agentic mode

LlamaParse Cost Effective

LlamaIndexCost-tier parser

Overall 71.9Tables 73.2Charts 66.7Faithful 88.0Format 73.0Ground 58.6

<$0.004/page; budget-tier of LlamaParse

Gemini 3 Flash (Thinking Minimal)

GoogleVLM (light-reasoning)

Overall 71.0Tables 89.8Charts 64.8Faithful 86.2Format 58.4Ground 56.0

Tradeoff vs Thinking High: cheaper, formatting drops

Chandra-ocr-2

Adobe ResearchSpecialized OCR

Overall 70.1Tables 89.2Charts 65.1Faithful 83.7Format 61.4Ground 51.2

Tables strong; grounding lags

Gemini 3.1 Pro

GoogleVLM (frontier)

Overall 69.1Tables 91.0Charts 41.1Faithful 90.2Format 52.4Ground 71.0

Highest faithfulness; charts surprisingly weak

Reducto

ReductoSpecialist parser

Overall 67.8Tables 70.3Charts 57.0Faithful 86.4Format 56.8Ground 68.7

Non-agentic mode; tied with Extend

Extend (Beta)

ExtendDocument AI

Overall 67.8Tables 85.9Charts 40.4Faithful 85.0Format 59.5Ground 68.3

YC-backed structured-data extractor

10.

GPT-5.5 (Reasoning Medium)

#10

OpenAIVLM (reasoning)

Overall 67.8Tables 90.0Charts 65.5Faithful 86.8Format 60.1Ground 36.3

Tables strong; grounding worst in top 10

AI Image Detection (AdversIm)

Data as of April 25, 2026

Model							Clean %C	Perturbed %B		Notes▼
1.□	Gemini 3 Pro ImageGoogle	Detector + Generator	-95	2025/11	Proprietary (API)	○	1100.0	55.0	1	Top clean-image detection; collapses under blur/noise/JPEG
2.□	GPT-Image-1.5OpenAI	Detector + Generator	-94	2025/9	Proprietary (API)	○	2100.0	26.0	2	Tied for clean-image SOTA; equally fragile under perturbation
3.□	Grok ImaginexAI	Generator (most evasive)	-	2025/8	Proprietary (X Premium+)	○	-	17.0	2	Most evasive generator under perturbation in the benchmark
4.□	Qwen ImageAlibaba	Detector + Generator	-94	2025/8	Apache 2.0	●	3100.0	36.0	3	Only fully open-weight entry; same fragility as closed peers
5.□	Seedream v4.5ByteDance	Detector + Generator	-93	2025/10	Proprietary (API)	○	499.0	46.0	4	ByteDance Seed image gen; detection nearly matches frontier

Gemini 3 Pro Image

GoogleDetector + Generator-95Proprietary (API)

Clean % 100.0Perturbed % 5.0

Top clean-image detection; collapses under blur/noise/JPEG

GPT-Image-1.5

OpenAIDetector + Generator-94Proprietary (API)

Clean % 100.0Perturbed % 6.0

Tied for clean-image SOTA; equally fragile under perturbation

Grok Imagine

xAIGenerator (most evasive)Proprietary (X Premium+)

Perturbed % 7.0

Most evasive generator under perturbation in the benchmark

Qwen Image●

AlibabaDetector + Generator-94Apache 2.0

Clean % 100.0Perturbed % 6.0

Only fully open-weight entry; same fragility as closed peers

Seedream v4.5

ByteDanceDetector + Generator-93Proprietary (API)

Clean % 99.0Perturbed % 6.0

ByteDance Seed image gen; detection nearly matches frontier

Face-Swap / Avatar / Synthetic Media

Data as of April 24, 2026

Model								Notes▼
1.□	DeepFaceLabiperov	2024/4	Face-swap (open source)	unlimited	source-dependent	GPLv3	●	Canonical OSS face-swap toolkit
2.□	LivePortraitKuaishou	2024/7	Portrait animation	unlimited	512-1024	MIT	●	Driven by reference video motion
3.□	FaceFusion 3Henry Ruhs	2025/3	Face-swap (open source)	unlimited	source-dependent	MIT	●	GPU + CPU pipelines; active dev
4.□	Hallo3Fudan / Baidu	2025/3	Audio-driven talking-head	unlimited	1024	MIT	●	Voice-driven portrait animation
5.□	D-ID Creative Reality StudioD-ID	2025/6	Avatar / talking-head	5min	1080p	Proprietary	○	Widely used for photo-to-video
6.□	Pika 2.2Pika Labs	2025/7	Text/image-to-video	10s	1080p	Proprietary	○	Pikaffects for stylised motion
7.□	MiniMax Hailuo 02MiniMax	2025/8	Text/image-to-video	10s	1080p	Proprietary	○	Lowest-cost frontier video gen
8.□	Kling 2.0Kuaishou	2025/9	Text/image-to-video	10s	1080p	Proprietary	○	Strong face/body motion realism
9.□	Luma Ray 2Luma AI	2025/9	Text/image-to-video	10s	1080p	Proprietary	○	Strong physics fidelity
10.□	HeyGen Avatar V5HeyGen	2025/10	Avatar / talking-head	5min	4K	Proprietary	○	Most-used commercial deepfake stack
11.□	Runway Gen-4Runway	2025/10	Text/image-to-video	16s	1080p	Proprietary API	○	Character and scene consistency focus
12.□	Veo 3Google DeepMind	2025/11	Text-to-video (with audio)	60s	4K	Proprietary (Gemini API, Flow)	○	SynthID watermark on all outputs
13.□	Sora 2OpenAI	2025/12	Text-to-video	60s	1080p	Proprietary (ChatGPT Pro/Plus)	○	C2PA provenance watermarking
14.□	Flux KontextBlack Forest Labs	2026/3	Image edit / face-swap	n/a	2K	Open weights	●	Reference-driven image editing; face-swap capable
15.□	Seedance 2.0ByteDance	2026/4	Text/image-to-video	10s	1080p	Proprietary (Venice, CapCut)	○	#1 in public video arena (2026-04)