融合相似度算法与预训练模型的中文电子病历实体映射方法研究

北京协和医学院医学信息研究所/图书馆;

通讯作者: 任慧玲, ren.huiling@imicams.ac.cn

DOI：10.12201/bmr.202305.00015

声明：预印本系统所发表的论文仅用于最新科研成果的交流与共享，未经同行评议，因此不建议直接应用于指导临床实践。

Research on Chinese electronic medical record entity mapping method by fusing similarity algorithm and pre-trained model

renhuiling,
lixiaoying,
wangweijie,
wangxu,
zhangying

Countrylnstitute of Medical lnformation/Medical Library,CAMS & PUMC ;

Corresponding author: renhuiling, ren.huiling@imicams.ac.cn

摘要：目的/意义为充分挖掘利用中文电子病历实体资源，研究无需人工构建规则特征适合进行大规模实体映射的算法与模型组合。方法/过程采用自标注中文电子病历标准数据集，融合相似度算法与预训练模型并分别应用与实体映射的候选实体生成和实体消歧阶段，并在此过程中对本文选用的不同相似度算法和预训练模型的性能进行比较分析。结果/结论提出了一种改进药物类实体映射效果的方法，并最终确定了Jaccard相似度算法与Bert预训练模型的组合，其在实体映射任务中达到了90%以上的精确率与99%以上的召回率，能高效实现海量中文电子病历实体映射任务。

关键词： 实体映射；实体标准化；相似度算法；电子病历；Bert模型; ; ;

Abstract: Purpose/SignificanceIn order to fully explore and utilize the physical resources of Chinese electronic medical records, the combination of algorithms and models suitable for large-scale entity mapping without manually constructing regular features is studied. Method/Process The self-annotated Chinese electronic medical record standard dataset is used to fuse the similarity algorithm and the pre-trained model, and the candidate entity generation and entity disambiguation stages of entity mapping are applied respectively, and the performance of different similarity algorithms and pre-trained models selected in this paper is compared and analyzed in this process. Results/Conclusion A method to improve the effect of drug-like entity mapping is proposed, and the combination of Jaccard similarity algorithm and Bert pre-training model is finally determined, which achieves more than 90% accuracy and 99% recall in the entity mapping task, which can efficiently realize the entity mapping task of massive Chinese electronic medical records.

Key words: Entity mapping; Entity standardization; Similarity algorithm; Electronic medical records

提交时间：2023-05-17

版权声明：作者本人独立拥有该论文的版权，预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。
html
图表
陈婕卿, 竹志超, 张锋, 曾可, 姜会珍, 程振宁. 面向知识图谱构建的中文电子病历命名实体识别方法研究. 2023. doi: 10.12201/bmr.202312.00011

邓嘉乐, 胡振生, 连万民, 华赟鹏, 周毅. 基于RoBERTa-CRF的肝癌电子病历实体识别研究. 2023. doi: 10.12201/bmr.202303.00027

武学鸿, 杨峰, 李建华, 徐倩. 融合词向量及词属性推理的中文电子病历实体识别方法. 2021. doi: 10.12201/bmr.202109.00016

陈剑秋, 黄晓芳, 周祖宏, 廖敏. 基于BERT的电子病历实体关系联合抽取研究. 2022. doi: 10.12201/bmr.202206.00003

刘彬, 肖晓霞, 邹北骥, 周展, 郑立瑞, 谭建聪. 融合汉字部首的BERT-BiLSTM-CRF中医医案命名实体识别模型. 2023. doi: 10.12201/bmr.202303.00004

邓兰, 杜同舟. 一种高效安全的密文电子病历多关键字检索方案. 2021. doi: 10.12201/bmr.202105.00008

谢甲琦, 李政. 基于预训练语言模型的公众健康问句分类. 2021. doi: 10.12201/bmr.202101.00017

胡海洋, 赵从朴, 马琏, 姜会珍, 张晶, 朱卫国. 基于注意力机制和DGCNN的中文医疗命名实体识别. 2021. doi: 10.12201/bmr.202102.00004

沈蓉蓉, 夏帅帅, 晏峻峰. 命名实体识别在中医药领域的研究进展. 2022. doi: 10.12201/bmr.202207.00038

赵佳奇, 王晓锋, 樊羽羽, 张伟, 王慧璇, 李金山. 电子病历数据质量及对策研究. 2020. doi: 10.12201/bmr.202011.00008

序号	提交日期	编号	操作
1	2023-02-20	bmr.202305.00015V1	下载

公开评论匿名评论仅发给作者

引用格式

冯凤翔, 任慧玲, 李晓瑛, 王巍洁, 王勖, 张颖. 融合相似度算法与预训练模型的中文电子病历实体映射方法研究. 2023. biomedRxiv.202305.00015

访问统计

阅读量：578
下载量：1
评论数：0

融合相似度算法与预训练模型的中文电子病历实体映射方法研究

通讯作者: 任慧玲, ren.huiling@imicams.ac.cn

DOI：10.12201/bmr.202305.00015

Research on Chinese electronic medical record entity mapping method by fusing similarity algorithm and pre-trained model

Corresponding author: renhuiling, ren.huiling@imicams.ac.cn

引用格式

推荐引用格式

访问统计

分享

Email This Article