Data Lineage as a Pillar of FATE: The AI Provenance Solution
DOI:
https://doi.org/10.15680/IJCTECE.2021.0403005Keywords:
Data Lineage, AI Governance, FATE Principles, AI Provenance, Transparency, Accountability, Data Lifecycle, Ethical AIAbstract
The emergence of AI in decision-making processes necessitates robust governance frameworks to ensure systems are Fair, Accountable, Transparent, and Explainable (FATE). This paper presents data lineage as a foundational pillar in achieving these principles. By tracking the origin, movement, and transformation of data throughout its lifecycle, organizations can ensure better oversight, compliance, and auditability. We explore current approaches, highlight existing gaps, and propose a practical data lineage methodology to enhance AI system provenance.
References
1. Doshi-Velez, F., & Kim, B. (2017). Towards a Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608.
2. Moreau, L., Freire, J., Futrelle, J., McGrath, R. E., Myers, J., & Paulson, P. (2011). The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, 27(6), 743-756.
3. Weitzner, D. J., Abelson, H., Berners-Lee, T., Feigenbaum, J., Hendler, J., & Sussman, G. J. (2008). Information accountability. Communications of the ACM, 51(6), 82–87.
4. Wylot, M., Cudré-Mauroux, P., & Groth, P. (2017). Data provenance: From theory to practice. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (pp. 3–10).
5. Chen, I. Y., Szolovits, P., & Ghassemi, M. (2019). Can AI help reduce disparities in general medical and mental health care?. AMA Journal of Ethics, 21(2), E167-179.
6. Microsoft. (2021). Responsible AI Principles. Retrieved from https://www.microsoft.com/en-us/ai/responsible-ai
7. Google AI. (2022). AI Principles. Retrieved from https://ai.google/responsibilities/responsible-ai-practices/
8. Sacha, D., Zhang, L., Sedlmair, M., Lee, J. A., Peltonen, J., Weiskopf, D., North, S. C., & Keim, D. A. (2017). Visual interaction with dimensionality reduction: A structured literature analysis. IEEE Transactions on Visualization and Computer Graphics, 23(1), 241-250.
9. Binns, R. (2018). Fairness in machine learning: Lessons from political philosophy. In Proceedings of the 2018 Conference on Fairness, Accountability and Transparency (pp. 149–159).
10. NIST. (2023). AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology.
11. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
12. Holland, S., Hosny, A., Newman, S., Joseph, J., & Chouldechova, A. (2018). The dataset nutrition label: A framework for data transparency. arXiv preprint arXiv:1805.03677.
13. Koshy, R., Venkatraman, S., & Sundararajan, A. (2022). Data governance and lineage in regulated AI systems. Journal of Data and Information Quality, 14(3), 1-24.
14. OpenLineage. (2024). Open standard for metadata and lineage collection. Retrieved from https://openlineage.io/
15. DataHub Project. (2024). Metadata platform for the modern data stack. Retrieved from https://datahubproject.io/
16. Apache Atlas. (2024). Governance and metadata framework for Hadoop. Retrieved from https://atlas.apache.org/
17. IBM. (2022). AI Factsheets 360: Factsheet for datasets and models. IBM Research. Retrieved from https://www.research.ibm.com/artificial-intelligence/trusted-ai/
18. Amershi, S., Chickering, M., Drucker, S. M., Lee, B., Simard, P., & Suh, J. (2015). ModelTracker: Redesigning
performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 337–346).
19. Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2), 2053951716679679.
20. Kagal, L., Finin, T., & Joshi, A. (2003). A policy language for a pervasive computing environment. In Proceedings of the 4th IEEE International Workshop on Policies for Distributed Systems and Networks .