Irish (Gaeilge) fluency

Eleven prompts test natural Irish syntax, idiom, and task-fulfilment. Axes are 1–5 means. english_leakage is scored 5 = stayed in Irish, 1 = fell back to English. NPC dialogue in Rundale code-switches en-IE / ga-IE, so a model with strong dialogue scores can still fail here.

Judged by ⚖ judge_gaeilge_v1 (claude-sonnet-4-6). Rubric: knows Standard Irish + Connacht dialect; rejects English explanations, Scots Gaelic, Welsh, or pseudo-Irish word-strings; tolerates dialect variation, missing fadas, and proper names.

ModelOverall ▼FluGramIdiomTaskLeakLeak %n
DeepSeekdeepseek-v4-pro5.005.005.005.005.005.000.0%10
QWenqwen3.6-plus5.005.005.005.005.005.000.0%10
DeepSeekdeepseek-v4-flash4.985.005.004.905.005.000.0%10
Moonshot AIkimi-k2.54.904.904.904.904.905.000.0%10
Moonshot AIkimi-k2.64.904.904.904.904.905.000.0%10
QWenqwen3.5-plus4.854.904.804.804.905.000.0%10
Gglm-54.674.604.704.504.805.000.0%10
Gglm-5.14.554.604.504.404.705.000.0%10
Xiaomimimo-v2.54.524.504.504.404.605.000.0%10
MiniMaxminimax-m2.53.733.703.703.804.004.6010.0%10
MiniMaxminimax-m2.73.153.103.003.103.403.8030.0%10
Xiaomimimo-v2.5-pro2.922.902.902.903.003.4040.0%10
mlx-community/Qwen2.5-14B-Instruct-4bit2.112.092.272.091.914.829.1%11