• 国家药监局综合司 国家卫生健康委办公厅
  • 国家药监局综合司 国家卫生健康委办公厅

Research on Quality Analysis and Governance Strategies of Real-World Chinese Electronic Medical Records Data for Knowledge Extraction

Corresponding author: zhangyunqiu, yunqiu@jlu.edu.cn
DOI: 10.12201/bmr.202511.00077
Statement: This article is a preprint and has not been peer-reviewed. It reports new research that has yet to be evaluated and so should not be used to guide clinical practice.
  •  

    Abstract: Objective/Significance Knowledge extraction from real-world Chinese electronic medical records is currently constrained by issues such as inadequate alignment between the clinical significance and technical feasibility of annotation rules, low quality of source data, and lagging data governance. This study aims to alleviate these bottlenecks and explore a path for knowledge extraction from Chinese EMRs in real-world scenarios. Methods/Process In this study, annotation rules covering major entity and relationship types were formulated. Experiments were conducted on real-world EMRs based on the Bert+BiLSTM+CRF model, and key issues in EMR data governance were summarized accordingly. Results/Conclusion The F1-scores of the model for entity recognition and relationship recognition on real-world EMRs were approximately 0.62 and 0.36, respectively, which are significantly lower than those on public datasets. The main data-related causes include non-standard expressions, data sparsity, and terminological differences among departments. The main data governance-related causes include an imbalance between privacy protection and data utilization, lack of full-process management, and insufficient pre-storage quality inspection.

    Key words: Chinese Electronic Medical Record; Knowledge Extraction; Named Entity Recognition; Relation Extraction; Data Governance

    Submit time: 24 November 2025

    Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity.
  • 图表

  • chenjieqing, zhangfeng. Named Entity Recognition in Chinese Electronic Medical Records Using Knowledge Graph Construction. 2023. doi: 10.12201/bmr.202312.00011

    wuxuehong. A method of recognizing entities from Chinese Electronic Medical Record based on domain word vector combined with word attributes reasoning. 2021. doi: 10.12201/bmr.202109.00016

    Guo Weijia. Method for Extracting Data Elements from Chinese Electronic Medical Records. 2024. doi: 10.12201/bmr.202404.00038

    Deng Jiale, Hu Zhensheng, Lian Wanmin, Hua Yunpeng, Zhou Yi. Research on entity recognition of liver cancer electronic medical records based on RoBERTa-CRF. 2023. doi: 10.12201/bmr.202303.00027

    chenjianqiu, huangxiaofang. Joint extraction of Chinese EMR entity relationship based on bert. 2022. doi: 10.12201/bmr.202206.00003

    xiaoxiaoxia. Research on named entity recognition of Chinese medical records based on BERT-BiLSTM-CRF with Chinese radicals. 2023. doi: 10.12201/bmr.202303.00004

    renhuiling, lixiaoying, wangweijie, wangxu, zhangying. Research on Chinese electronic medical record entity mapping method by fusing similarity algorithm and pre-trained model. 2023. doi: 10.12201/bmr.202305.00015

    shenrongrong, xiashuaishuai, yanjunfeng. Review on Research of Named Entity Recognition in Chinese Medicine. 2022. doi: 10.12201/bmr.202207.00038

    wuhuan, hekunlun. Construction of general medical knowledge graph based on evidence-based medicine and electronic medical record data. 2024. doi: 10.12201/bmr.202409.00027

    zhang lixin, sun haixia, tang mingkun, qian qing. A Review of Real World Electronic Medical Record Data Evaluation. 2021. doi: 10.12201/bmr.202106.00015

  • ID Submit time Number Download
    2 2025-10-12

    10.12201/bmr.202511.00077V2

    Download
    1 2025-10-12

    10.12201/bmr.202511.00077V1

    Download
  • Public  Anonymous  To author only

Get Citation

gaiyanrong, zhangyunqiu, zhanghui, lichencheng, lujunrui. Research on Quality Analysis and Governance Strategies of Real-World Chinese Electronic Medical Records Data for Knowledge Extraction. 2025. biomedRxiv.202511.00077

Article Metrics

  • Read: 142
  • Download: 0
  • Comment: 0

Email This Article

User name:
Email:*请输入正确邮箱
Code:*验证码错误