RAGEvalX: An Extended Framework for Measuring Core Accuracy, Context Integrity, Robustness, and Practical Statistics in  RAG Pipelines

Dr. Rashmiranjan Pradhan

doi:10.15680/avyy2m60

Authors

Dr. Rashmiranjan Pradhan AI, Gen AI, Agentic AI Innovation leader at IBM, Bangalore, Karnataka, India Author

DOI:

https://doi.org/10.15680/avyy2m60

Keywords:

Retrieval-Augmented Generation, RAG, Large Language Models, LLM, Evaluation Metrics, AI Robustness, Natural Language Processing, IEEE Standards

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a cornerstone for building context-aware and factual Large Language Model (LLM) applications. However, evaluating the performance of these complex pipelines remains a significant challenge. Existing evaluation frameworks often focus on a narrow set of metrics, failing to provide a holistic view of a system's accuracy, reliability, and practical usability. This paper introduces RAGEvalX, an extended, multi-faceted evaluation framework designed to address this gap. RAGEvalX systematically measures four crucial dimensions: (1) Core RAG Accuracy, including faithfulness and relevancy; (2) Context Integrity, assessing the quality and utilization of retrieved information; (3) Robustness against common input perturbations; and (4) Practical Statistics for operational monitoring. We provide a detailed methodology for implementing the framework, complete with code snippets and guidance on LLM selection for evaluation tasks. Through case studies in healthcare, finance, and legal sectors, we demonstrate how RAGEvalX provides actionable insights for optimizing RAG pipelines. Our framework offers a standardized, comprehensive, and implementable approach to ensure RAG systems are not only accurate but also reliable and ready for real-world deployment.

References

1. P. Lewis, et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Advances in Neural Information Processing Systems, 2020.

2. J. Es, et al., "RAGAS: Automated Evaluation of Retrieval Augmented Generation," arXiv preprint arXiv:2309.15217, 2023.

3. J. Saad-Falcon, et al., "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems," arXiv preprint arXiv:2311.09476, 2023.

4. S. H. Tu, et al., "Seven Failure Points When Engineering a Retrieval Augmented Generation System," arXiv preprint arXiv:2401.05856, 2024.

5. O. O'Brien, S. Lee, S. Kim, "A Comparison of Data Quality Frameworks: A Review," Data, 2024.

6. C. L. C. Chen, et al., "A framework for extending the health-related quality of life descriptive systems," BMC Medical Research Methodology, 2024.

7. Pradhan, R. and Tomar, G., AN ANALYSIS OF SMART HEALTHCARE MANAGEMENT USING ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS.

8. Rashmiranjan, Pradhan Dr. "Empirical analysis of agentic ai design patterns in real-world applications." (2025).

9. Pradhan, Rashmiranjan, and Geeta Tomar. "IOT BASED HEALTHCARE MODEL USING ARTIFICIAL INTELLIGENT ALGORITHM FOR PATIENT CARE." NeuroQuantology 20.11 (2022): 8699-8709.

10. Rashmiranjan, Pradhan. "Contextual Transparency: A Framework for Reporting AI, Genai, and Agentic System Deployments across Industries." (2025).

11. Pradhan, Rashmiranjan. “AI Guardian- Security, Observability & Risk in Multi-Agent Systems.” International Journal of Innovative Research in Computer and Communication Engineering, 2025. doi:10.15680/IJIRCCE.2025.1305043.

12. Pradhan, Dr. Rashmiranjan. “Establishing Comprehensive Guardrails for Digital Virtual Agents: A Holistic Framework for Contextual Understanding, Response Quality, Adaptability, and Secure Engagement.” International Journal of Innovative Research in Computer and Communication Engineering, 2025. doi:10.15680/IJIRCCE.2025.1307013.

13. Wooldridge, M. J. (2009). An introduction to multiagent systems. John Wiley & Sons.

14. Russell, S., & Norvig, P. (2020). Artificial intelligence: a modern approach (4th ed.). Pearson Education.

15. IEEE Std 1012-2016, IEEE Standard for System, Software, and Hardware Verification and Validation. IEEE.

16. IEEE P7000 Standard, Model Process for Addressing Ethical Concerns During System Design. IEEE.

17. Cox, M. T. (2005). Metacognition in autonomous agents. IEEE Intelligent Systems, 20(1), 70-79.

18. Allen, J. F., Ferguson, G., & Stentz, A. (1995). An architecture for integrated planning and reactive execution. In AI planning systems (pp. 1-12).

19. Yao, S., Zhao, W., Yu, Y., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629. 1

20. Rawal, A., McCoy, J., Rawat, D.B., Sadler, B.M. and Amant, R.S., 2021. Recent advances in trustworthy explainable artificial intelligence: Status, challenges, and perspectives. IEEE Transactions on Artificial Intelligence, 3(6), pp.852-866.

21. Sivakumar, S., 2024. Agentic AI in Predictive AIOps: Enhancing IT Autonomy and Performance. International Journal of Scientific Research and Management (IJSRM), 12(11), pp.1631-1638.

22. Li, J., Qin, R., Guan, S., Xue, X., Zhu, P. and Wang, F.Y., 2024. Digital CEOs in digital enterprises: Automating, augmenting, and parallel in Metaverse/CPSS/TAOs. IEEE/CAA Journal of Automatica Sinica, 11(4), pp.820-823.

23. Cheng, L., Guo, R., Moraffah, R., Sheth, P., Candan, K.S. and Liu, H., 2022. Evaluation methods and measures for causal learning algorithms. IEEE Transactions on Artificial Intelligence, 3(6), pp.924-943.

RAGEvalX: An Extended Framework for Measuring Core Accuracy, Context Integrity, Robustness, and Practical Statistics in RAG Pipelines

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Make a Submission

open-access

Menu

License

Information