郭炫志, 周武洁, 尚欣, 连春华, 詹开明, 林隆永. 基于 UNILM 的中医文献问题条件生成模型. 2021. biomedRxiv.202110.00036
基于 UNILM 的中医文献问题条件生成模型
通讯作者: 尚欣, 2899870779@qq.com
The Model based on UNILM of question conditional generation in the field of Chinese medicine
Corresponding author: shang xin, 2899870779@qq.com
摘要:该文主要针对的是中医领域段落或句子及其相关回答生成对应问题或问题组的条件文本生成任务。传统方法主要采用循环神经网络进行建模,然而这些方法存在诸多问题:(1)准确率低;(2)并行性差;(3)有着比较严重的曝光偏差和重复生成问题;(4)有着严重的长期依赖问题。最近的一些先进模型由于缺少中文预训练资源和算力资源而难以复现。针对上述问题,该文提出了一种基于 UNILM 的条件问题生成模型,并增加了包括嵌入层,copy 机制,对抗训练等模块。该文模型在“单基准模型,无集束搜索,不区分大小写”的条件下,在天池平台上的中医文献问题生成挑战赛取得了第二名的成绩(63.56%,第一名 63.79%),并且还有较大的提升空间。
Abstract: This paper focuses on the task of generating conditional text corresponding to questions or question groups forparagraphs or sentences and their related answers in the field of Chinese medicine. Traditional methods mainly use recurrentneural network for modeling, but these methods have many problems: (1) Low accuracy; (2)The parallelism is poor; (3)Relatively serious exposure deviation and repetitive generation problems; (4) A serious long-term dependency problem.Some recent advanced models are difficult to reproduce due to the lack of Chinese pre-training resources and computingresources. To solve these problems, we propose a conditional generation model based on UNILM, meanwhile, we add twoadditional embedding layers, copy mechanism, confrontation training and other modules to the base model. Under thecondition of single base model, no beam search and no case sensitivity, we achieved the second place (63.56%, while thefirst place got 63.79%) in the Challenge of TCM Literature Question generation on Tianchi platform, and it still has a largeroom for improvement.
Key words: question generation; Unified Language Model Pre-training for Natural Language(UNILM); copy mechanism; adversarial training提交时间:2022-04-07
版权声明:作者本人独立拥有该论文的版权,预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。 -
