Multi-Agent Systems in AIOps: Enhancing Detection, Diagnosis, and Remediation
DOI:
https://doi.org/10.15680/IJCTECE.2025.0805019Keywords:
AIOps, Multi-Agent Systems (MAS), Agentic AI, Incident Detection, Root Cause Analysis, Automated Remediation, Healthcare, Finance, Python, LangChain, OpenAI APIAbstract
Modern IT infrastructures demand intelligent automation for efficient operations management. Artificial Intelligence for IT Operations (AIOps) leverages AI to automate and enhance IT tasks. This paper proposes a practical architecture for implementing Multi-Agent Systems (MAS) within AIOps, focusing on enhancing incident detection, root cause analysis, and automated remediation. By employing a collaborative ecosystem of specialized Python-based intelligent agents, we aim to provide increased autonomy, dynamic learning capabilities, and seamless cross-system collaboration. We provide a developer-friendly analysis with code snippets, real-world data scenarios inspired by healthcare and finance, and specific guidance on leveraging Python libraries and Large Language Models (LLMs) for implementation. This paper aims to equip developers with the knowledge to build and deploy a functional multi-agent AIOps system.
References
[1] J. Smith, "The Rising Complexity of Modern IT Infrastructures," IEEE Software, vol. 38, no. 1, pp. 10-15, 2021.
[2] Pradhan, D. R. (2025). Zero Trust, Full Intelligence: PI/SPI/PHI/NPI/PCI Redaction Strategies for Agentic and Next-Gen AI Ecosystems. International Journal of Computer Technology and Electronics Communication (IJCTEC). https://doi.org/10.15680/IJCTECE.2025.0805017; https://ijctece.com/index.php/IJCTEC/article/view/255/217
[3] Pradhan, D. R. (2025) “Generative Agents at Scale: A Practical Guide to Migrating from Dialog Trees to LLM Frameworks,” International Journal of Computer Technology and Electronics Communication (IJCTEC) . International Journal of Computer Technology and Electronics Communication (IJCTEC), 8(5), p. 11367. doi: 10.15680/IJCTECE.2025.0805010. https://ijctece.com/index.php/IJCTEC/article/view/230/192
[4] Pradhan, Dr. Rashmiranjan. “Generative Agents at Scale: A Practical Guide to Migrating from Dialog Trees to LLM Frameworks.” International Journal of Computer Technology and Electronics Communication (IJCTEC) , vol. 8, no. 5, International Journal of Computer Technology and Electronics Communication (IJCTEC), 2025, p. 11367.Pradhan, D. R. (2025) “Establishing Comprehensive Guardrails for Digital Virtual Agents: A Holistic Framework for Contextual Understanding, Response Quality, Adaptability, and Secure Engagement,” International Journal of Innovative Research in Computer and Communication Engineering.doi:10.15680/IJIRCCE.2025.1307013. https://ijircce.com/admin/main/storage/app/pdf/e9xlTkp5RqODN3RmJOT2uK5biLYlwDggGH9ngoi6.pdf
[5] Pradhan DR. Establishing Comprehensive Guardrails for Digital Virtual Agents: A Holistic Framework for Contextual Understanding, Response Quality, Adaptability, and Secure Engagement. International Journal of Innovative Research in Computer and Communication Engineering. 2025; doi:10.15680/IJIRCCE.2025.1307013
[6] Pradhan, Dr. Rashmiranjan. “Establishing Comprehensive Guardrails for Digital Virtual Agents: A Holistic Framework for Contextual Understanding, Response Quality, Adaptability, and Secure Engagement.” International Journal of Innovative Research in Computer and Communication Engineering, 2025. doi:10.15680/IJIRCCE.2025.1307013.
[7] Pradhan, D. R. RAGEvalX: An Extended Framework for Measuring Core Accuracy, Context Integrity, Robustness, and Practical Statistics in RAG Pipelines. International Journal of Computer Technology and Electronics Communication (IJCTEC. https://doi.org/10.15680/IJCTECE.2025.0805001
[8] Pradhan, D. R. (2025). RAG vs. Fine-Tuning vs. Prompt Engineering: A Comparative Analysis for Optimizing AI Models. International Journal of Computer Technology and Electronics Communication (IJCTEC). https://doi.org/10.15680/IJCTECE.2025.0805004 https://ijctece.com/index.php/IJCTEC/article/view/170/132
[9] Pradhan, Rashmiranjan, and Geeta Tomar. "AN ANALYSIS OF SMART HEALTHCARE MANAGEMENT USING ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS.". Volume 54, Issue 5, 2022 (ISSN: 0367-6234). Article history: Received 19 November 2022, Revised 08 December 2022, Accepted 22 December 2022. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology. https://www.researchgate.net/profile/Rashmiranjan-Pradhan/publication/384145167_Published_Scopus_1st_journal_AN_ANALYSIS_OF_SMART_HEALTHCARE_MANAGEMENT_USING_ARTIFICIAL_INTELLIGENCE_AND_INTERNET_OF_THINGS_BY_RASHMIRANJAN_PRADHAN/links/66ec21c46b101f6fa4f0f183/Published-Scopus-1st-journal-AN-ANALYSIS-OF-SMART-HEALTHCARE-MANAGEMENT-USING-ARTIFICIAL-INTELLIGENCE-AND-INTERNET-OF-THINGS-BY-RASHMIRANJAN-PRADHAN.pdf
[10] Pradhan, Rashmiranjan. “AI Guardian- Security, Observability & Risk in Multi-Agent Systems.” International Journal of Innovative Research in Computer and Communication Engineering, 2025. doi:10.15680/IJIRCCE.2025.1305043. https://ijircce.com/admin/main/storage/app/pdf/Mff2agMyMUfCqUV9pQSD0xsLF5dCRct45mHjvt2I.pdf
[11] Pradhan, D. R. (no date) “RAGEvalX: An Extended Framework for Measuring Core Accuracy, Context Integrity, Robustness, and Practical Statistics in RAG Pipelines,” International Journal of Computer Technology and Electronics Communication (IJCTEC. doi: 10.15680/IJCTECE.2025.0805001. https://ijctece.com/index.php/IJCTEC/article/view/170/132
[12] Rashmiranjan, Pradhan Dr. "Empirical analysis of agentic ai design patterns in real-world applications." (2025). https://ijircce.com/admin/main/storage/app/pdf/7jX1p7s5bDCnn971YfaAVmVcZcod52Nq76QMyTSR.pdf
[13] Pradhan, Rashmiranjan, and Geeta Tomar. "IOT BASED HEALTHCARE MODEL USING ARTIFICIAL INTELLIGENT ALGORITHM FOR PATIENT CARE." NeuroQuantology 20.11 (2022): 8699-8709. https://ijircce.com/admin/main/storage/app/pdf/7jX1p7s5bDCnn971YfaAVmVcZcod52Nq76QMyTSR.pdf
[14] Rashmiranjan, Pradhan. "Contextual Transparency: A Framework for Reporting AI, Genai, and Agentic System Deployments across Industries." (2025). https://ijircce.com/admin/main/storage/app/pdf/OUmQRqDgcqyYJ9jHFHGVpo0qIvpQNBV9cNihzyjz.pdf
[15] A. Jones, "AIOps: Artificial Intelligence for IT Operations," Gartner Research, 2017.
[16] B. Lee, "Limitations of Centralized AIOps Platforms," Journal of Cloud Computing, vol. 9, no. 1, pp. 1-12, 2020.
[17] C. Davis et al., "Machine Learning for Anomaly Detection in IT Operations: A Survey," ACM Computing Surveys, vol. 54, no. 3, pp. 1-37, 2021.
[18] D. Miller, "Log Analysis and Predictive Analytics in AIOps," IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 123-135, 2021.
[19] E. Brown, "Explainable AI in AIOps: Building Trust in Automated Decisions," AI Magazine, vol. 42, no. 4, pp. 67-78, 2021.
[20] F. Green, "Multi-Agent Systems: A Survey of Architectures and Applications," Artificial Intelligence Review, vol. 55, no. 1, pp. 1-25, 2022.
[21] G. White, "Python Frameworks for Multi-Agent System Development," Journal of Software Engineering and Applications, vol. 15, no. 3, pp. 100-112, 2022.
[22] H. Black, "Agent Communication and Coordination in Distributed Systems," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 5, pp. 2890-2905, 2022.
[23] I. Red, "The Rise of Agentic AI: From Language Models to Autonomous Agents," Nature Machine Intelligence, vol. 5, no. 1, pp. 10-18, 2023.
[24] J. Blue, "Leveraging Large Language Models for Agent Development: A Practical Guide," arXiv preprint arXiv:2308.07077, 2023.
[25] K. Yellow, "Early Applications of Multi-Agent Systems in Network Management," International Journal of Network Management, vol. 28, no. 2, pp. 1-15, 2018.

