Blog

Notes from the arena

What the head-to-head runs actually show, how the methodology works, and takes you can disagree with by voting.

Release reports

Fable 5 is back — and the arena receipts are worth reading

July 2, 2026 · 6 min

Anthropic’s Mythos-class model is usable again after the launch crunch. What 18 one-shot coding challenges, 3 compiled Godot games, and the community votes actually say about it.

Method notes

Why we run AI model outputs live, not screenshots

July 2, 2026 · 5 min

Every model comparison you see on social media is a screenshot of the best run out of many. Here are the four rules this arena uses instead, and what live execution catches that images hide.

Takes & analysis

Does thinking effort actually matter? Same model, low vs max

July 2, 2026 · 5 min

Opus 4.8 and Sonnet ship at up to six effort levels each. The arena lets you blind-compare a model against itself — and the output data says effort is not the dial you think it is.