gaiyanrong, zhangyunqiu, zhanghui, lichencheng, lujunrui. Research on Quality Analysis and Governance Strategies of Real-World Chinese Electronic Medical Records Data for Knowledge Extraction. 2025. biomedRxiv.202511.00077
Research on Quality Analysis and Governance Strategies of Real-World Chinese Electronic Medical Records Data for Knowledge Extraction
Corresponding author: zhangyunqiu, yunqiu@jlu.edu.cn
DOI: 10.12201/bmr.202511.00077
-
Abstract: Objective/Significance Knowledge extraction from real-world Chinese electronic medical records is currently constrained by issues such as inadequate alignment between the clinical significance and technical feasibility of annotation rules, low quality of source data, and lagging data governance. This study aims to alleviate these bottlenecks and explore a path for knowledge extraction from Chinese EMRs in real-world scenarios. Methods/Process In this study, annotation rules covering major entity and relationship types were formulated. Experiments were conducted on real-world EMRs based on the Bert+BiLSTM+CRF model, and key issues in EMR data governance were summarized accordingly. Results/Conclusion The F1-scores of the model for entity recognition and relationship recognition on real-world EMRs were approximately 0.62 and 0.36, respectively, which are significantly lower than those on public datasets. The main data-related causes include non-standard expressions, data sparsity, and terminological differences among departments. The main data governance-related causes include an imbalance between privacy protection and data utilization, lack of full-process management, and insufficient pre-storage quality inspection.
Key words: Chinese Electronic Medical Record; Knowledge Extraction; Named Entity Recognition; Relation Extraction; Data GovernanceSubmit time: 24 November 2025
Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity. -
图表
-
chenjieqing, zhangfeng. Named Entity Recognition in Chinese Electronic Medical Records Using Knowledge Graph Construction. 2023. doi: 10.12201/bmr.202312.00011
wuxuehong. A method of recognizing entities from Chinese Electronic Medical Record based on domain word vector combined with word attributes reasoning. 2021. doi: 10.12201/bmr.202109.00016
Guo Weijia. Method for Extracting Data Elements from Chinese Electronic Medical Records. 2024. doi: 10.12201/bmr.202404.00038
Deng Jiale, Hu Zhensheng, Lian Wanmin, Hua Yunpeng, Zhou Yi. Research on entity recognition of liver cancer electronic medical records based on RoBERTa-CRF. 2023. doi: 10.12201/bmr.202303.00027
chenjianqiu, huangxiaofang. Joint extraction of Chinese EMR entity relationship based on bert. 2022. doi: 10.12201/bmr.202206.00003
xiaoxiaoxia. Research on named entity recognition of Chinese medical records based on BERT-BiLSTM-CRF with Chinese radicals. 2023. doi: 10.12201/bmr.202303.00004
renhuiling, lixiaoying, wangweijie, wangxu, zhangying. Research on Chinese electronic medical record entity mapping method by fusing similarity algorithm and pre-trained model. 2023. doi: 10.12201/bmr.202305.00015
shenrongrong, xiashuaishuai, yanjunfeng. Review on Research of Named Entity Recognition in Chinese Medicine. 2022. doi: 10.12201/bmr.202207.00038
wuhuan, hekunlun. Construction of general medical knowledge graph based on evidence-based medicine and electronic medical record data. 2024. doi: 10.12201/bmr.202409.00027
zhang lixin, sun haixia, tang mingkun, qian qing. A Review of Real World Electronic Medical Record Data Evaluation. 2021. doi: 10.12201/bmr.202106.00015
-
-
Public Anonymous To author only
Get Citation
Article Metrics
- Read: 142
- Download: 0
- Comment: 0

Login
Register




京公网安备