Rundale Bench — Models

All models

Every model that has been seen by the bench, with whichever signals are present. — means not yet measured for that column. Click a column header to sort; click a row to drill into per-prompt samples + per-provider perf.

Model	Type	Dialog ▼	Gaeilge	p50 ms	tok/s	$/hr	Provs
x-ai/grok-4.3	cloud	5.00	—	3183	207.5	—	1
mimo-v2.5-pro	cloud	4.58	2.92	7714	354.6	$0.87	1
qwen3.5-plus	cloud	4.38	4.85	81710	2081.4	$0.20	1
kimi-k2.5	cloud	4.36	4.90	6759	—	$0.57	1
minimax-m2.5	cloud	4.32	3.73	5504	43.3	$0.27	1
mimo-v2.5	cloud	4.26	4.52	4196	469.6	$0.38	1
kimi-k2.6	cloud	4.22	4.90	7269	—	$0.87	1
Gglm-5.1	cloud	4.16	4.55	7571	279.7	$1.23	1
Gglm-5	cloud	4.06	4.67	7848	167.4	$0.88	1
minimax-m2.7	cloud	4.06	3.15	2781	4975.4	$0.27	1
deepseek-v4-flash	cloud	3.96	4.98	3799	—	$0.12	1
deepseek-v4-pro	cloud	3.92	5.00	6335	—	$1.44	1
qwen3.6-plus	cloud	3.86	5.00	55809	961.3	$0.50	1
mlx-community/Qwen2.5-14B-Instruct-4bit	local	—	2.11	—	—	—	—
amazon/nova-pro-v1	cloud	—	—	855	65.9	—	1
anthropic/claude-haiku-4.5	cloud	—	—	1989	81.6	—	1
anthropic/claude-opus-4.7	cloud	—	—	3861	39.0	—	1
anthropic/claude-sonnet-4.6	cloud	—	—	3221	44.1	—	1
deepseek/deepseek-v3.2	cloud	—	—	2704	22.2	—	1
deepseek/deepseek-v4-pro	cloud	—	—	6170	39.2	—	1
google/gemini-2.5-flash	cloud	—	—	1032	99.5	—	1
google/gemini-2.5-pro	cloud	—	—	3331	143.3	—	1
google/gemma-3-27b-it	cloud	—	—	2328	37.9	—	1
google/gemma-4-31b-it	cloud	—	—	3214	21.9	—	1
meta-llama/llama-3.3-70b-instruct	cloud	—	—	2339	43.2	$0.11	1
meta-llama/llama-4-maverick	cloud	—	—	1601	51.1	—	1
meta-llama/llama-4-scout	cloud	—	—	1419	54.2	—	1
microsoft/phi-4	cloud	—	—	1674	67.8	—	1
mistralai/mistral-large-2512	cloud	—	—	1714	44.1	—	1
mistralai/mistral-medium-3.1	cloud	—	—	2089	50.4	—	1
mistralai/mistral-small-24b-instruct-2501	cloud	—	—	838	77.4	—	1
moonshotai/kimi-k2.5	cloud	—	—	7220	37.6	—	1
nousresearch/hermes-4-405b	cloud	—	—	5623	39.4	—	1
openai/gpt-4o-mini	cloud	—	—	1243	192.7	—	1
openai/gpt-5.4	cloud	—	—	2677	34.7	—	1
openai/gpt-5.4-mini	cloud	—	—	1389	75.5	—	1
openai/gpt-5.5	cloud	—	—	4067	60.0	—	1
openai/gpt-oss-120b	cloud	—	—	3072	62.7	—	1
qwen/qwen-2.5-72b-instruct	cloud	—	—	5242	—	—	1
qwen/qwen3-235b-a22b-2507	cloud	—	—	1696	54.2	—	1
qwen/qwen3-max	cloud	—	—	2904	32.6	—	1
x-ai/grok-3-mini	cloud	—	—	—	—	—	1
z-ai/glm-4.6	cloud	—	—	16883	34.7	—	1

Quality rows ranked by dialogue overall by default. Missing dialogue + gaeilge ⇒ model has only perf data so far — run the funnel against it to fill the quality columns.