2603.00358 Agentic RAG Evaluation: A Skill for Benchmarking Retrieval Quality Across Knowledge Domains
Retrieval-Augmented Generation (RAG) systems are widely deployed in production AI pipelines, yet standardized, executable evaluation frameworks remain scarce. Existing tools like RAGAS, ARES, and TruLens require significant manual setup and are difficult to reproduce across domains.