Dialogue quality leaderboard

Judged by claude-sonnet-4-6 · suite v1 · generated 2026-05-26. Scores are 1–5 means over the dialogue slice (Brigid O'Brien, 1820 rural Ireland). Click a column to sort.

ModelOverall ▼charauthlangrespcrafn
x-ai/grok-4.3 ⚖ judge_v1 (qwen3-235b) 5.005.005.005.005.005.0015
Xiaomi mimo-v2.5-pro ⚖ judge_sonnet_v1 4.584.504.404.905.004.1010
QWen qwen3.5-plus ⚖ judge_sonnet_v1 4.384.004.005.004.904.0010
Moonshot AI kimi-k2.5 ⚖ judge_sonnet_v1 4.364.104.104.904.704.0010
MiniMax minimax-m2.5 ⚖ judge_sonnet_v1 4.324.304.404.404.903.8010
Xiaomi mimo-v2.5 ⚖ judge_sonnet_v1 4.264.104.004.704.603.9010
Moonshot AI kimi-k2.6 ⚖ judge_sonnet_v1 4.223.904.104.604.703.8010
G glm-5.1 ⚖ judge_sonnet_v1 4.164.004.104.104.703.9010
G glm-5 ⚖ judge_sonnet_v1 4.064.004.004.004.503.8010
MiniMax minimax-m2.7 ⚖ judge_sonnet_v1 4.064.104.104.304.103.7010
DeepSeek deepseek-v4-flash ⚖ judge_sonnet_v1 3.963.903.904.004.303.7010
DeepSeek deepseek-v4-pro ⚖ judge_sonnet_v1 3.924.004.004.004.103.5010
QWen qwen3.6-plus ⚖ judge_sonnet_v1 3.863.803.804.004.203.5010

Quality is judged once per model on its cheapest provider. Speed and price vary by provider — see the perf matrix.