xie jia qi. Leveraging Pre-trained Language Model for Consumer Health Question Classification. 2021. biomedRxiv.202101.00017
Leveraging Pre-trained Language Model for Consumer Health Question Classification
Corresponding author: xie jia qi, 66350354@qq.com
DOI: 10.12201/bmr.202101.00017
-
Abstract: Data mining has been widely applied in various of practical scenario recently, especially in the smart medical field. The data mining algorithms for medical are crucial for maximizing the usage of medical data, e.g. the health question classification. The health question classification aims to detect different questions from a given sentence, accuratelySdistinguish various questions is important for smart medical. The current medical data existed in the web is unstructured and non-standardized. Since the above data has no label, it is hard for us to discover some useful information from the above data. Besides, without high quality labeled data, training a good classifier is really hard. In this paper, we leverage various pre-trained language model to solve the health question classification task, including BERT-base, BERT-wwm and RoBERTa. By fine-tuning the pre-trained models using the labeled data, we can obtain some neural classifiers for the task. Beside, fine-tuning the pre-trained models may provide unstable results, which may have negative influence when applying to the practical scenario. Inspired by adversarial training, we employ this technique to improve the stability of our model. Meanwhile, category_C is rare in the training set, so we design a rule-based method to detect category_C, integrating the neural and human knowledge at the same time, and further improve the model’s performance. The experimental results show that our method can achieve good performance on the leaderboard.
Key words: Consumer Health, Question Classification, Deep Learning, Pre-trained Language Model, Adversarial TrainingSubmit time: 27 May 2021
Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity. -
图表
-
Xu Xiaowei, Guo Haihong, Li Jiao. Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification. 2021. doi: 10.12201/bmr.202101.00018
Gu Yao-wen, Li Jiao. Progress of Mining Electronic Health Records based on Unsupervised Deep Learning Methods. 2021. doi: 10.12201/bmr.202104.00013
kangyishuai, shaochenjie. An Algorithm for Generating TCM Document Questions Based on Unified Language Model. 2022. doi: 10.12201/bmr.202110.00044
guo xuan zhi, zhou wu jie, shang xin, lian chun hua, zhan kai ming, lin long yong. The Model based on UNILM of question conditional generation in the field of Chinese medicine. 2021. doi: 10.12201/bmr.202110.00036
Guo Mengying, Zhou Yi, He Jingshu, Pan Jiaxin, Sun JingKai, Huang Wei. Research on Function Classification of WeChat Official Account Service Platform of Traditional Chinese Medicine Hospital Based on Card Classification. 2020. doi: 10.12201/bmr.202010.00833
jia lirong. research on question understanding about the automatic question answering system of TCM. 2021. doi: 10.12201/bmr.202101.00002
刘, Yan Zhu, Zongyou Li, Dongfei Lin, Lihong Liu, Dongyun Shi. A study on Diseases Classification and Modelof the SNOMED CT. 2021. doi: 10.12201/bmr.202110.00005
liuqingjin, wangrui, miaoyuanqing. Intelligent Detection of Silent Myocardial Ischemia Dynamic Electrocardiogram Based on Deep Learning. 2021. doi: 10.12201/bmr.202111.00009
limengxiang, xuyang, chenlei. Research on construction and application of online intelligent pre-consultation system. 2021. doi: 10.12201/bmr.202110.00026
Guo Yi, Gong Liyue, Hu Dehua. Research on the Influencing Factors of Users Continuance Intention of Online Health Communities--Based on the Integrated Model. 2021. doi: 10.12201/bmr.202110.00041
-
ID Submit time Number Download 1 2021-01-14 bmr.202101.00017V1
Download -
-
Public Anonymous To author only
Get Citation
Article Metrics
- Read: 993
- Download: 1
- Comment: 0