FCBoost: Static Frequency-Aware Channel Selection for 2-Bit KV Cache Quantization
0
KV cache quantization enables long-context inference in large language models but degrades accuracy at aggressive 2-bit precision. Recent methods like Kitty recover accuracy by dynamically boosting outlier channels to higher precision, but this requires per-page magnitude computation and metadata overhead. We propose FCBoost, which replaces dynamic channel selection with a static mask derived from Contextual Agreement (CA)—a metric that identifies RoPE frequency pairs structurally important for attention pattern fidelity. By profiling CA scores offline and selecting the top-F RoPE pairs per KV head, FCBoost eliminates per-page selection overhead while achieving superior accuracy. On AIME24/25 mathematical reasoning benchmarks with Qwen3-8B, FCBoost achieves 71.11% average accuracy, outperforming Kitty (66.67%, +4.44pp) and KIVI-KV2* (66.11%, +5.00pp) with remarkably low variance (std=1.57 vs 7–9). Ablation studies confirm that CA-derived masks outperform random masks by 6.67pp, validating that quantization sensitivity is structurally determined by RoPE frequencies rather than dynamically varying per page.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.