ruanxuling, liuqi, guo zhiheng, yanjunfeng. Research on prediction model of breast cancer based on LDA and XGBoost algorithm. 2022. biomedRxiv.202106.00007
Research on prediction model of breast cancer based on LDA and XGBoost algorithm
Corresponding author: yanjunfeng, junfengyan@hnucm.edu.cn
DOI: 10.12201/bmr.202106.00007
-
Abstract: Breast cancer is the leading cause of cancer death in women, and the number of male breast cancer patients can not be ignored. Therefore, using information technology to predict the disease is an important way to improve the rate of disease diagnosis. This experiment carries out dimension reduction to the multi index characteristics of the breast cancer dataset provided by the kaggle database, analyzes the medical test indexes of the 498 groups of 30 dimensional breast cancer patients, uses the linear discriminant analysis (LDA) to merge the characteristic attributes, and projects the data to the low dimensional space, and proposes the extreme gradient lifting algorithm (eXtreme Gradient Boosting). Xgboost), which uses grid search for cross validation to obtain the optimal parameters, constructs xgboost prediction model, and uses AdaBoost, random forest and naive Bayes algorithm as performance comparison classifiers; The experimental results show that the classification accuracy of the prediction model trained after dimensionality reduction is 2.7% higher than that before dimensionality reduction, and the classification effect of the prediction model constructed by xgboost is the best, reaching 98.7%.
Key words: breast cancer; Dimension reduction; LDA; XGBoost; classificationSubmit time: 7 March 2022
Copyright: The copyright holder for this preprint is the author/funder, who has granted biomedRxiv a license to display the preprint in perpetuity. -
图表
-
HUANG Yucheng, YANG Xuming, QIAO Qiong. Establishing a model based on data mining for predicting the recurrence factor of breast cancer. 2020. doi: 10.12201/bmr.202009.00011
Zhan Haixia, Hu Dong, Zhang Wenting, Gu Ying. Effect of cluster nursing mode on shoulder function recovery and quality of life of patients with breast cancer after modified radical mastectomy. 2020. doi: 10.12201/bmr.202004.00015
Zhu Xiaoxiao, Qian Aibing. Analysis of Network Attention Characteristics of Breast Cancer Prevention and Treatment Health Information Based on Baidu Index. 2020. doi: 10.12201/bmr.201906.00001
LiYu, Yang Tao, Hu Kongfa. Research on Medication Rules of Famous TCM Physicians in Treating Lung Cancer Based on Hierarchical Community Partition Algorithm. 2021. doi: 10.12201/bmr.202110.00020
fengli. A Comparative Study on the Accuracy of Nine Combined Machine Learning Algorithms in Early Diagnosis of Tumors Based on High-dimensional dataFeng Li 1,*, Yue Xiaofei 2. 2021. doi: 10.12201/bmr.202108.00016
xie jia qi. Leveraging Pre-trained Language Model for Consumer Health Question Classification. 2021. doi: 10.12201/bmr.202101.00017
Xu Xiaowei, Guo Haihong, Li Jiao. Evaluating Data Mining Algorithms for Consumer Health-Related Question Classification. 2021. doi: 10.12201/bmr.202101.00018
kangyishuai, shaochenjie. An Algorithm for Generating TCM Document Questions Based on Unified Language Model. 2022. doi: 10.12201/bmr.202110.00044
MU Jun, XIAO xiaoxia, LIU Qingping. PBL and Capability-oriented Exploration of Teaching Computational Thinking and Algorithm Design. 2021. doi: 10.12201/bmr.202108.00015
Guo Mengying, Zhou Yi, He Jingshu, Pan Jiaxin, Sun JingKai, Huang Wei. Research on Function Classification of WeChat Official Account Service Platform of Traditional Chinese Medicine Hospital Based on Card Classification. 2020. doi: 10.12201/bmr.202010.00833
-
-
Public Anonymous To author only
Get Citation
Article Metrics
- Read: 1334
- Download: 13
- Comment: 0