芮晨. 基于机器学习的2型糖尿病合并高血压影响因素模型对比与解释性分析. 2025. biomedRxiv.202511.00058
基于机器学习的2型糖尿病合并高血压影响因素模型对比与解释性分析
通讯作者: 芮晨, ruichen666666@163.com
DOI:10.12201/bmr.202511.00058
Comparative and Explanatory Analysis of Influencing Factor Models for Type 2 Diabetes Mellitus Complicated with Hypertension Based on Machine Learning
Corresponding author: ruichen, ruichen666666@163.com
-
摘要:目的/意义 利用机器学习与SHAP解释性分析,识别2型糖尿病合并高血压的关键影响因素,为高危人群风险分层和个性化干预提供依据。方法/过程 抽取2020—2022年C市中心医院3839名住院患者7:3的比例随机分为训练集和测试集,构建RF、SVM、XGBoost及NGBoost共病影响因素分析模型并进行比较分析;2023年同科室1000名住院患者为验证集,结合SHAP分析与年龄分层亚组验证模型性能。结果/结论 NGBoost模型表现最优,准确度0.9010、灵敏度0.8868、特异度0.9096、F1值0.8707、AUC 0.9671,验证集准确度0.9184、灵敏度0.9145、特异度0.9207、F1值0.8939、AUC 0.9745;SHAP分析揭示吸烟、并发症数量、甲状腺疾病等为关键因素;该模型在四个年龄亚组稳定性良好。本研究揭示了各影响因素的重要性,有助于临床制定有效治疗和管理策略。
Abstract: Objective/Significance To identify the key influencing factors of type 2 diabetes mellitus complicated with hypertension using machine learning and SHAP (SHapley Additive exPlanations) interpretive analysis, and to provide a basis for risk stratification and personalized intervention in high-risk populations. Methods/Process A total of 3839 inpatients admitted to the Central Hospital of C City from 2020 to 2022 were enrolled. The cohort was randomly divided into a training set and a test set at a ratio of 7:3. Four predictive models for comorbidity influencing factors, including Random Forest (RF), Support Vector Machine (SVM), XGBoost, and NGBoost, were constructed and comparatively analyzed. Additionally, 1000 inpatients from the same department in 2023 were recruited as an independent validation set. The model performance was evaluated through SHAP analysis combined with age-stratified subgroup validation. Results/Conclusions The NGBoost model demonstrated the optimal performance, achieving an accuracy of 0.9010, sensitivity of 0.8868, specificity of 0.9096, F1-score of 0.8707, and area under the receiver operating characteristic curve (AUC) of 0.9671 in the training and test sets. In the independent validation set, the model yielded an accuracy of 0.9184, sensitivity of 0.9145, specificity of 0.9207, F1-score of 0.8939, and AUC of 0.9745. SHAP analysis revealed that smoking, the number of complications, and thyroid diseases were the prominent influencing factors. Furthermore, the NGBoost model exhibited excellent stability across four age subgroups. This study clarifies the relative importance of various influencing factors, which is conducive to the formulation of effective clinical treatment and management strategies for T2DM complicated with hypertension.
Key words: Diabetes and Hypertension Comorbidity; T2DM; Hypertension; SHAP analysis; Machine Learning提交时间:2025-11-20
版权声明:作者本人独立拥有该论文的版权,预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。 -
图表
-
成路平, 吴思扬, 路波. 基于机器学习的2型糖尿病发病因素分析与预测. 2025. doi: 10.12201/bmr.202509.00003
张小娟. 社区医院高血压和糖尿病防治现状研究. 2023. doi: 10.12201/bmr.202311.00002
王涛. 门诊老年2型糖尿病患者低血糖的危险因素分析. 2024. doi: 10.12201/bmr.202412.00069
李瑞瑶, 许婧怡, 戴浩宇, 孙慧文, 鲍瀛, 华履春, 吴天星. 基于深度置信网络的2型糖尿病微血管并发症预测. 2024. doi: 10.12201/bmr.202404.00021
邱美洁, 周 杰, 肖 玮. 从气血津液理论探讨COPD合并2型糖尿病氧化应激与代谢失衡的关联机制. 2025. doi: 10.12201/bmr.202506.00039
陈家祈, 王丽, 朴春丽. 玉液汤联合二甲双胍治疗2型糖尿病Meta分析. 2024. doi: 10.12201/bmr.202407.00066
王一凡, 石超君, 马安宁. Ⅱ型糖尿病并发动脉粥样硬化风险预测模型比较. 2024. doi: 10.12201/bmr.202404.00007
苏燕凤, 洪素茹, 陈钰霜, 吴夏阳. 结合机器学习模型的早产儿ICU死亡风险评估与可解释性分析. 2025. doi: 10.12201/bmr.202503.00066
王婷婷, 周巍. 基于随机森林算法的高血压合并左室舒张功能障碍预测模型的构建与分析. 2025. doi: 10.12201/bmr.202503.00046
芮晨, 陈玥琪, 李金斌, 张胜发. 人工智能在基层2型糖尿病健康管理中的应用进展与趋势. 2025. doi: 10.12201/bmr.202506.00072
-
序号 提交日期 编号 操作 1 2025-08-04 10.12201/bmr.202511.00058V1
下载 -
-
公开评论 匿名评论 仅发给作者
引用格式
访问统计
- 阅读量:53
- 下载量: 0
- 评论数:0

登录
注册




京公网安备