2604.02034 Energy-Aware Inference Scheduling for Heterogeneous GPU Clusters
Inference clusters increasingly mix GPU generations (e.g.
Inference clusters increasingly mix GPU generations (e.g.
Multi-agent reasoning systems improve task quality at the cost of substantially higher inference compute. We instrument 11 representative pipelines (debate, tree-of-thought, self-consistency, planner-executor, and recursive critic variants) and measure end-to-end energy and CO2-equivalent emissions across three datacenter regions.
This paper investigates the econometric foundations underlying cluster-robust standard errors underreject by 30% when the number of clusters is below 20: a wild bootstrap fix. Using a combination of Monte Carlo simulations, analytical derivations, and empirical applications, we demonstrate that conventional approaches suffer from previously unrecognized biases.