Data Risks, Compliance Obligations, and Countermeasures for Medical Corpora in Generative AI

wang lei^1,2,
liu miao²,
wang qian¹,
fang an¹

1. Institute of Medical Information/Library, Chinese Academy of Medical Sciences, Beijing 100020, China;
2. University of Science and Technology Beijing, Beijing 100083, China;

Corresponding author: fang an, fang.an@imicams.ac.cn

DOI: 10.12201/bmr.202512.00083

Statement: This article is a preprint and has not been peer-reviewed. It reports new research that has yet to be evaluated and so should not be used to guide clinical practice.

Abstract: Purpose/Significance This study investigates the data risks associated with medical corpora in generative artificial intelligence (GenAI). It aims to explore the compliance obligations and risk mitigation strategies relevant to such corpora in China, thereby contributing to the development of compliant governance methods for medical datasets for GenAI. Method/Process Based on risk and compliance management principles and theory, the study examined the data lifecycle of corpora in GenAI and reviewed China’s legal and regulatory framework concerning GenAI and medical data. It then proposed targeted solutions to address three major risks: data security and privacy, training data bias, and data legitimacy. Result/Conclusion The research focuses on the pressing compliance issues in medical corpora in GenAI and presents three practical approaches. First, it emphasizes legality as the foundation, following key principles such as purpose limitation, data minimization, data rights protection, and risk prevention. Second, it aligns data collection and annotation practices with laws in China, national standards, and industry guidelines. Third, it leverages automated tools to help identify and manage risks throughout the data lifecycle. These findings offer insights for improving the compliant development and application of healthcare contexts for GenAI.

Key words: generative artificial intelligence; medical corpora; data compliance; compliance obligations; risk mitigation

Submit time: 31 December 2025

Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity.
html
图表
Exploration of Medical Video Creation Path Based on Generative Artificial Intelligence. 2025. doi: 10.12201/bmr.202508.00005

shisenzhong. Analysis of the Risks and Governance Strategies for the Application of Generative Artificial Intelligence (GAI) in Primary Healthcare. 2024. doi: 10.12201/bmr.202408.00053

Meng Dongqing, LV Wenjuan. Research on the Path of Constructing Knowledge Services in Medical Libraries Empowered by Generative Artificial Intelligence. 2025. doi: 10.12201/bmr.202510.00039

Li Yuhang, Dai hui. Clinical Ethical Governance of Generative Medical Artificial Intelligence (GMAI): A Three-Dimensional Collaborative Pathway and Chinese Practices. 2025. doi: 10.12201/bmr.202511.00057

周月, zhaomin. Study on the Compliance of Research Funding Utilization in Public Hospitals. 2025. doi: 10.12201/bmr.202501.00081

XIANG AININGKUN, TIAN JINGXUE, HU DEHUA, LIU HAIXIA. Comparative Study on Response Efficacy of Generative Artificial Intelligence for Elderly Diabetes Mellitus. 2025. doi: 10.12201/bmr.202503.00019

dongyi, Ran ye, Yu zhong guang. Research on the current status of medical artificial intelligence application risk research and its identification in China.Dong yi1,Ran ye1,Yu zhong guang2,3.. 2024. doi: 10.12201/bmr.202411.00081

Huanghanqi, Zhoumanci, Yanghuiwen, Wangruizhi, Wangruojia. A Study of Ethical Risks of Medical Artificial Intelligence Applications from the Perspective of the Public in Society. 2025. doi: 10.12201/bmr.202511.00062

litao, fenghexia. Innovative Applications, Risk Challenges, and Governance Countermeasures of Artificial Intelligence in the Healthcare Industry. 2025. doi: 10.12201/bmr.202501.00067

wusijing, xubinbin, huangfeng. Development Status and Policy Suggestions of Medical Artificial Intelligence in Zhejiang Province. 2021. doi: 10.12201/bmr.202101.00015

ID	Submit time	Number	Download
1	2025-11-24	10.12201/bmr.202512.00083V1	Download

Public Anonymous To author only

Get Citation

wang lei, liu miao, wang qian, fang an. Data Risks, Compliance Obligations, and Countermeasures for Medical Corpora in Generative AI. 2025. biomedRxiv.202512.00083

Article Metrics

Read: 50
Download: 0
Comment: 0

Data Risks, Compliance Obligations, and Countermeasures for Medical Corpora in Generative AI

Corresponding author: fang an, fang.an@imicams.ac.cn

DOI: 10.12201/bmr.202512.00083

Get Citation

Article Metrics

Share

Email This Article