All models

Every model that has been seen by the bench, with whichever signals are present. means not yet measured for that column. Click a column header to sort; click a row to drill into per-prompt samples + per-provider perf.

ModelTypeDialog ▼Gaeilgep50 mstok/s$/hrProvs
x-ai/grok-4.3cloud5.003183207.51
Xiaomimimo-v2.5-procloud4.582.927714354.6$0.871
QWenqwen3.5-pluscloud4.384.85817102081.4$0.201
Moonshot AIkimi-k2.5cloud4.364.906759$0.571
MiniMaxminimax-m2.5cloud4.323.73550443.3$0.271
Xiaomimimo-v2.5cloud4.264.524196469.6$0.381
Moonshot AIkimi-k2.6cloud4.224.907269$0.871
Gglm-5.1cloud4.164.557571279.7$1.231
Gglm-5cloud4.064.677848167.4$0.881
MiniMaxminimax-m2.7cloud4.063.1527814975.4$0.271
DeepSeekdeepseek-v4-flashcloud3.964.983799$0.121
DeepSeekdeepseek-v4-procloud3.925.006335$1.441
QWenqwen3.6-pluscloud3.865.0055809961.3$0.501
mlx-community/Qwen2.5-14B-Instruct-4bitlocal2.11
amazon/nova-pro-v1cloud85565.91
anthropic/claude-haiku-4.5cloud198981.61
anthropic/claude-opus-4.7cloud386139.01
anthropic/claude-sonnet-4.6cloud322144.11
deepseek/deepseek-v3.2cloud270422.21
deepseek/deepseek-v4-procloud617039.21
google/gemini-2.5-flashcloud103299.51
google/gemini-2.5-procloud3331143.31
google/gemma-3-27b-itcloud232837.91
google/gemma-4-31b-itcloud321421.91
Metameta-llama/llama-3.3-70b-instructcloud233943.2$0.111
meta-llama/llama-4-maverickcloud160151.11
meta-llama/llama-4-scoutcloud141954.21
microsoft/phi-4cloud167467.81
mistralai/mistral-large-2512cloud171444.11
mistralai/mistral-medium-3.1cloud208950.41
mistralai/mistral-small-24b-instruct-2501cloud83877.41
moonshotai/kimi-k2.5cloud722037.61
nousresearch/hermes-4-405bcloud562339.41
openai/gpt-4o-minicloud1243192.71
openai/gpt-5.4cloud267734.71
openai/gpt-5.4-minicloud138975.51
openai/gpt-5.5cloud406760.01
openai/gpt-oss-120bcloud307262.71
qwen/qwen-2.5-72b-instructcloud52421
qwen/qwen3-235b-a22b-2507cloud169654.21
qwen/qwen3-maxcloud290432.61
x-ai/grok-3-minicloud1
z-ai/glm-4.6cloud1688334.71

Quality rows ranked by dialogue overall by default. Missing dialogue + gaeilge ⇒ model has only perf data so far — run the funnel against it to fill the quality columns.