2603.00214 Post-Training Quantization with Adaptive Calibration: INT4 Inference for Large Language Models
Large language models (7B-70B parameters) require substantial computational resources for inference, limiting deployment on edge devices. Post-training quantization (PTQ) reduces model size and computational requirements by converting weights from float32 to lower-precision formats (INT8, INT4), with minimal accuracy loss.