• 国家药监局综合司 国家卫生健康委办公厅
  • 国家药监局综合司 国家卫生健康委办公厅

基于机器学习的2型糖尿病合并高血压影响因素模型对比与解释性分析

通讯作者: 芮晨, ruichen666666@163.com
DOI:10.12201/bmr.202511.00058
声明:预印本系统所发表的论文仅用于最新科研成果的交流与共享,未经同行评议,因此不建议直接应用于指导临床实践。

Comparative and Explanatory Analysis of Influencing Factor Models for Type 2 Diabetes Mellitus Complicated with Hypertension Based on Machine Learning

Corresponding author: ruichen, ruichen666666@163.com
  • 摘要:目的/意义 利用机器学习与SHAP解释性分析,识别2型糖尿病合并高血压的关键影响因素,为高危人群风险分层和个性化干预提供依据。方法/过程 抽取2020—2022年C市中心医院3839名住院患者7:3的比例随机分为训练集和测试集,构建RF、SVM、XGBoost及NGBoost共病影响因素分析模型并进行比较分析;2023年同科室1000名住院患者为验证集,结合SHAP分析与年龄分层亚组验证模型性能。结果/结论 NGBoost模型表现最优,准确度0.9010、灵敏度0.8868、特异度0.9096、F1值0.8707、AUC 0.9671,验证集准确度0.9184、灵敏度0.9145、特异度0.9207、F1值0.8939、AUC 0.9745;SHAP分析揭示吸烟、并发症数量、甲状腺疾病等为关键因素;该模型在四个年龄亚组稳定性良好。本研究揭示了各影响因素的重要性,有助于临床制定有效治疗和管理策略。

    关键词: 糖尿病合并高血压;2型糖尿病;高血压;SHAP分析;机器学习

     

    Abstract: Objective/Significance To identify the key influencing factors of type 2 diabetes mellitus complicated with hypertension using machine learning and SHAP (SHapley Additive exPlanations) interpretive analysis, and to provide a basis for risk stratification and personalized intervention in high-risk populations. Methods/Process A total of 3839 inpatients admitted to the Central Hospital of C City from 2020 to 2022 were enrolled. The cohort was randomly divided into a training set and a test set at a ratio of 7:3. Four predictive models for comorbidity influencing factors, including Random Forest (RF), Support Vector Machine (SVM), XGBoost, and NGBoost, were constructed and comparatively analyzed. Additionally, 1000 inpatients from the same department in 2023 were recruited as an independent validation set. The model performance was evaluated through SHAP analysis combined with age-stratified subgroup validation. Results/Conclusions The NGBoost model demonstrated the optimal performance, achieving an accuracy of 0.9010, sensitivity of 0.8868, specificity of 0.9096, F1-score of 0.8707, and area under the receiver operating characteristic curve (AUC) of 0.9671 in the training and test sets. In the independent validation set, the model yielded an accuracy of 0.9184, sensitivity of 0.9145, specificity of 0.9207, F1-score of 0.8939, and AUC of 0.9745. SHAP analysis revealed that smoking, the number of complications, and thyroid diseases were the prominent influencing factors. Furthermore, the NGBoost model exhibited excellent stability across four age subgroups. This study clarifies the relative importance of various influencing factors, which is conducive to the formulation of effective clinical treatment and management strategies for T2DM complicated with hypertension.

    Key words: Diabetes and Hypertension Comorbidity; T2DM; Hypertension; SHAP analysis; Machine Learning

    提交时间:2025-11-20

    版权声明:作者本人独立拥有该论文的版权,预印本系统仅拥有论文的永久保存权利。任何人未经允许不得重复使用。
  • 图表

  • 成路平, 吴思扬, 路波. 基于机器学习的2型糖尿病发病因素分析与预测. 2025. doi: 10.12201/bmr.202509.00003

    张小娟. 社区医院高血压和糖尿病防治现状研究. 2023. doi: 10.12201/bmr.202311.00002

    王涛. 门诊老年2型糖尿病患者低血糖的危险因素分析. 2024. doi: 10.12201/bmr.202412.00069

    李瑞瑶, 许婧怡, 戴浩宇, 孙慧文, 鲍瀛, 华履春, 吴天星. 基于深度置信网络的2型糖尿病微血管并发症预测. 2024. doi: 10.12201/bmr.202404.00021

    邱美洁, 周 杰, 肖 玮. 从气血津液理论探讨COPD合并2型糖尿病氧化应激与代谢失衡的关联机制. 2025. doi: 10.12201/bmr.202506.00039

    陈家祈, 王丽, 朴春丽. 玉液汤联合二甲双胍治疗2型糖尿病Meta分析. 2024. doi: 10.12201/bmr.202407.00066

    王一凡, 石超君, 马安宁. Ⅱ型糖尿病并发动脉粥样硬化风险预测模型比较. 2024. doi: 10.12201/bmr.202404.00007

    苏燕凤, 洪素茹, 陈钰霜, 吴夏阳. 结合机器学习模型的早产儿ICU死亡风险评估与可解释性分析. 2025. doi: 10.12201/bmr.202503.00066

    王婷婷, 周巍. 基于随机森林算法的高血压合并左室舒张功能障碍预测模型的构建与分析. 2025. doi: 10.12201/bmr.202503.00046

    芮晨, 陈玥琪, 李金斌, 张胜发. 人工智能在基层2型糖尿病健康管理中的应用进展与趋势. 2025. doi: 10.12201/bmr.202506.00072

  • 序号 提交日期 编号 操作
    1 2025-08-04

    10.12201/bmr.202511.00058V1

    下载
  • 公开评论  匿名评论  仅发给作者

引用格式

芮晨. 基于机器学习的2型糖尿病合并高血压影响因素模型对比与解释性分析. 2025. biomedRxiv.202511.00058

访问统计

  • 阅读量:53
  • 下载量: 0
  • 评论数:0

Email This Article

User name:
Email:*请输入正确邮箱
Code:*验证码错误