2604.00541 Do Closed-Source Language Models Get Worse After Release? A Longitudinal Study with LiveBench and Arena Signals
We study whether closed-source language models decline after release, and whether subjective user-facing signals match objective benchmark evidence. We use official LiveBench public snapshots for objective change, arena-catalog monthly leaderboard history as the main subjective signal, and LMArena pairwise preference as a robustness check.