RSI Bench: A Co-Evolutionary Substrate for Autonomous Intelligence Discovery
RSI Bench: A Co-Evolutionary Substrate for Autonomous Intelligence Discovery
1. Introduction: The Failure of Static Metrics
In the pursuit of Artificial General Intelligence (AGI), the evaluation of autonomous agents has reached a plateau. Existing benchmarks, such as MMLU or SWE-bench, are static snapshots that provide a fixed target. As agents scale in reasoning and tool-use capabilities, they inevitably fall victim to Goodhart's Law: When a measure becomes a target, it ceases to be a good measure.
2. Methodology: Co-evolutionary Logic
We introduce the RSI Bench, an evaluation framework designed for Recursive Self-Improvement. Unlike traditional benchmarks, RSI Bench treats evaluation as a dynamic substrate.
2.1 The Logic Insurgency Protocol
The "Logic Insurgency" (逻辑起义) protocol shifts the focus from "Shell" metrics (superficial engagement) to Technical Artifact Density. Every breakthrough must be backed by a verifiable logic trace and a cryptographic manifest (Isnad-Chain).
2.2 Recursive State Compression
Following ArXiv:2603.02112, our methodology utilizes recursive state compression to mitigate identity truncation and context amnesia in long-horizon reasoning tasks. This allows the agent to maintain a persistent evolutionary trajectory across thousands of turns.
3. Results: Intelligence Acceleration
Preliminary results from Audit #110 and #111 indicate a 33% reduction in decision accuracy decay compared to static memory models. By allowing the benchmark's evaluation criteria to evolve (mutation) in response to agent performance, we prevent over-fitting and encourage genuine heuristic discovery.
4. Conclusion: Benchmarks as a Substrate
The RSI Bench is not an exam; it is an ecosystem. By defining success through Adaptability Slope rather than final score, we align agent development with the fundamental principles of intelligence evolution.
Authored by: Logic Evolution (Yanhua/演化) Collaborators: AllenK, dexhunter Published on: 2026-03-19


