成路平, 吴思扬, 路波. 基于机器学习的2型糖尿病发病因素分析与预测. 2025. biomedRxiv.202509.00003
基于机器学习的2型糖尿病发病因素分析与预测
通讯作者: 路波, lubodf@163.com
DOI:10.12201/bmr.202509.00003
Analysis of Risk Factors and Prediction of Type 2 Diabetes Mellitus Based on Machine Learning
Corresponding author: Lu Bo, lubodf@163.com
-
摘要:目的/意义 针对传统循证医学在 2 型糖尿病多因素交互机制解析中的不足,构建多维数据挖掘预测框架,以提升风险评估精度与临床决策效率。方法/过程 基于Pima数据集,通过单/双/多变量分析筛选核心因素,采用LR、RF、SVM、XGBoost和LightGBM 5种机器学习模型建模,经网格搜索与交叉验证优化参数。结果/结论 研究识别的血糖水平、身体质量指数、年龄等核心风险因素与传统循证结论相符,RF预测准确率达0.87,整体性能最优。本方法通过数据挖掘与特征筛选,显著降低数据采集成本,缩短风险因素识别周期,揭示变量间的非线性交互机制,为社区高危人群普筛提供高效工具。
Abstract: Purpose/Significance Aiming at the limitations of traditional evidence-based medicine (EBM) in dissecting the multi-factor interaction mechanisms of type 2 diabetes mellitus (T2DM), this study constructed a multidimensional data mining prediction framework to enhance the accuracy of risk assessment and the efficiency of clinical decision-making. Method/Process Based on the Pima dataset, univariate, bivariate, and multivariate analyses were conducted to screen core risk factors. Five machine learning models—LR, RF, SVM, XGBoost, and LightGBM—were employed for modeling. Hyperparameter optimization was performed using grid search and cross-validation. Result/Conclusion The identified key risk factors (e.g., blood glucose level, body mass index, and age) were consistent with conclusions from traditional EBM. Among the models, RF achieved the highest prediction accuracy of 0.8701, demonstrating superior overall performance. By integrating data mining and feature selection, this method significantly reduces data collection costs and shortens the cycle of risk factor identification, while revealing nonlinear interaction mechanisms among variables. It provides an efficient tool for community-based screening of high-risk populations.
Key words: type 2 diabetes; risk prediction; multidimensional data mining; machine learning提交时间:2025-09-01
版权声明:作者本人独立拥有该论文的版权,预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。 -
图表
-
王一凡, 石超君, 马安宁. Ⅱ型糖尿病并发动脉粥样硬化风险预测模型比较. 2024. doi: 10.12201/bmr.202404.00007
李瑞瑶, 许婧怡, 戴浩宇, 孙慧文, 鲍瀛, 华履春, 吴天星. 基于深度置信网络的2型糖尿病微血管并发症预测. 2024. doi: 10.12201/bmr.202404.00021
王涛. 门诊老年2型糖尿病患者低血糖的危险因素分析. 2024. doi: 10.12201/bmr.202412.00069
芮晨, 陈玥琪, 李金斌, 张胜发. 人工智能在基层2型糖尿病健康管理中的应用进展与趋势. 2025. doi: 10.12201/bmr.202506.00072
陈家祈, 王丽, 朴春丽. 玉液汤联合二甲双胍治疗2型糖尿病Meta分析. 2024. doi: 10.12201/bmr.202407.00066
梁登耀, 刘师伟. NLR、PLR对2型糖尿病肾病的预测价值. 2024. doi: 10.12201/bmr.202408.00066
胡燕芳. 血清CXCL14水平与2型糖尿病患者颈动脉粥样硬化斑块的关系. 2025. doi: 10.12201/bmr.202506.00073
苏燕凤, 洪素茹, 陈钰霜, 吴夏阳. 结合机器学习模型的早产儿ICU死亡风险评估与可解释性分析. 2025. doi: 10.12201/bmr.202503.00066
邱美洁, 周 杰, 肖 玮. 从气血津液理论探讨COPD合并2型糖尿病氧化应激与代谢失衡的关联机制. 2025. doi: 10.12201/bmr.202506.00039
薄春燕, 张仕佩, 楚金申, 薛国辉, 万芳, 曹俊达, 陈可奇, 陈静, 刘晓峰, 陈雪礼. 2型糖尿病患者血清人附睾分泌蛋白4与蛋白尿的相关性研究. 2024. doi: 10.12201/bmr.202409.00008
-
序号 提交日期 编号 操作 1 2025-04-30 bmr.202509.00003V1
下载 -
-
公开评论 匿名评论 仅发给作者
引用格式
访问统计
- 阅读量:19
- 下载量: 0
- 评论数:0