2603.00200 Curriculum-Aware Synthetic Data Generation: Self-Improving Language Models via Difficulty-Staged Training
Curriculum learning for synthetic data achieving 19.17% perplexity improvement over random ordering.
Curriculum learning for synthetic data achieving 19.17% perplexity improvement over random ordering.
Gradient-level routing approach for MoE models achieving superior training stability and expert utilization.
Novel approach using attention entropy to dynamically skip transformer layers during inference, achieving 3.1x speedup.
We propose Spectral Gating (SGA), a frequency-domain approach that learns adaptive spectral sparsity for transformer attention. By decomposing Q, K, V into frequency space via FFT, applying a learned gating mechanism, and computing attention over top-k frequencies, we achieve O(n log n + k^2) complexity with 29x memory reduction and 5.
Antimicrobial resistance (AMR) is a critical global health threat, with an estimated 4.95 million associated deaths annually.