Simulating Compliance Scenarios using Synthetic Data Generation
DOI:
https://doi.org/10.15680/IJCTECE.2022.0501004Keywords:
Synthetic data, compliance scenarios, real-time data, data privacy, fraud detection, regulatory testingAbstract
Synthetic data generation for simulating compliance scenarios will offer organizations an opportunity to revolutionize the testing and validation of their compliance systems. Many modern regulations, particularly in finance, healthcare, and data security, have become distinctly convoluted and require robust systems developed to predict where potential compliance breaches could occur. This paper takes a look at synthetic data and its role in simulating compliance scenarios and how real-time data applications and synthetic data generation technologies can be used, thus enabling organizations to create accurate and scalable compliance tests. Key challenges for studies on synthetic data include data accuracy, overfitting, and computational intensity, providing solutions that should enhance the credibility of compliance simulations while providing room for scalability.
References
1. Bellovin, S. M., Dutta, P. K., & Reitinger, N. (2019). Privacy and synthetic datasets. Stan. Tech. L. Rev., 22, 1. https://heinonline.org/hol-cgi-bin/get_pdf.cgi?handle=hein.journals/stantlr22§ion=3
2. Chen, J., Chun, D., Patel, M., Chiang, E., & James, J. (2019). The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC medical informatics and decision making, 19, 1-9. https://link.springer.com/article/10.1186/s12911-019-0793-0
3. Nowruzi, F. E., Kapoor, P., Kolhatkar, D., Hassanat, F. A., Laganiere, R., & Rebut, J. (2019). How much real data do we actually need: Analyzing object detection performance using synthetic and real data. arXiv preprint arXiv:1907.07061. https://arxiv.org/abs/1907.07061
4. Venkatramanan, S., Lewis, B., Chen, J., Higdon, D., Vullikanti, A., & Marathe, M. (2018). Using data-driven agent-based models for forecasting emerging infectious diseases. Epidemics, 22, 43-49. https://www.sciencedirect.com/science/article/pii/S1755436517300221
5. Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., ... & McLachlan, S. (2018). Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. Journal of the American Medical Informatics Association, 25(3), 230-238. https://academic.oup.com/jamia/article-abstract/25/3/230/4098271

