期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
A Study on the Explainability of Thyroid Cancer Prediction:SHAP Values and Association-Rule Based Feature Integration Framework
1
作者 Sujithra Sankar S.Sathyalakshmi 《Computers, Materials & Continua》 SCIE EI 2024年第5期3111-3138,共28页
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroi... In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications. 展开更多
关键词 Explainable AI machine learning clinical decision support systems thyroid cancer association-rule based framework shap values classification and prediction
暂未订购
An innovative machine learning workflow to research China’s systemic financial crisis with SHAP value and Shapley regression
2
作者 Da Wang YingXue Zhou 《Financial Innovation》 2024年第1期1875-1914,共40页
This study proposed a cutting-edge,multistep workflow and upgraded it by addressing its flaw of not considering how to determine the index system objectively.It then used the updated workflow to identify the probabili... This study proposed a cutting-edge,multistep workflow and upgraded it by addressing its flaw of not considering how to determine the index system objectively.It then used the updated workflow to identify the probability of China’s systemic financial crisis and analyzed the impact of macroeconomic indicators on the crisis.The final workflow comprises four steps:selecting rational indicators,modeling using supervised learning,decomposing the model’s internal function,and conducting the non-linear,non-parametric statistical inference,with advantages of objective index selection,accurate prediction,and high model transparency.In addition,since China’s international influence is progressively increasing,and the report of the 19th National Congress of the Communist Party of China has demonstrated that China is facing severe risk control challenges and stressed that the government should ensure that no systemic risks would emerge,this study selected China’s systemic financial crisis as an example.Specifically,one global trade factor and 11 country-level macroeconomic indicators were selected to conduct the machine learning models.The prediction models captured six risk-rising periods in China’s financial system from 1990 to 2020,which is consistent with reality.The interpretation techniques show the non-linearities of risk drivers,expressed as threshold and interval effects.Furthermore,Shapley regression validates the alignment of the indicators.The final workflow is suitable for categorical and regression analyses in several areas.These methods can also be used independently or in combination,depending on the research requirements.Researchers can switch to other suitable shallow machine learning models or deep neural networks for modeling.The results regarding crises could provide specific references for bank regulators and policymakers to develop critical measures to maintain macroeconomic and financial stability. 展开更多
关键词 China Machine learning shap value shapley regression Systemic financial crisis
在线阅读 下载PDF
Explainable multi-step heating load forecasting:Using SHAP values and temporal attention mechanisms for enhanced interpretability 被引量:1
3
作者 Alexander Neubauer Stefan Brandt Martin Kriegel 《Energy and AI》 2025年第2期164-179,共16页
The role of heating load forecasts in the energy transition is significant,given the considerable increase in the number of heat pumps and the growing prevalence of fluctuating electricity generation.While machine lea... The role of heating load forecasts in the energy transition is significant,given the considerable increase in the number of heat pumps and the growing prevalence of fluctuating electricity generation.While machine learning methods offer promising forecasting capabilities,their black-box nature makes them difficult to interpret and explain.The deployment of explainable artificial intelligence methodologies enables the actions of these machine learning models to be made transparent.In this study,a multi-step forecast was employed using an Encoder–Decoder model to forecast the hourly heating load for an multifamily residential building and a district heating system over a forecast horizon of 24-h.By using 24 instead of 48 lagged hours,the simulation time was reduced from 92.75 s to 45.80 s and the forecast accuracy was increased.The feature selection was conducted for four distinct methods.The Tree and Deep SHAP method yielded superior results in feature selection.The application of feature selection according to the Deep SHAP values resulted in a reduction of 3.98%in the training time and a 8.11%reduction in the NRMSE.The utilisation of local Deep SHAP values enables the visualisation of the influence of past input hours and individual features.By mapping temporal attention,it was possible to demonstrate the importance of the most recent time steps in a intrinsic way.The combination of explainable methods enables plant operators to gain further insights and trustworthiness from the purely data-driven forecast model,and to identify the importance of individual features and time steps. 展开更多
关键词 Multi-step load forecasting Explainable Al(XAI) shap values Encoder-Decoder model Attention mechanisms Feature selection
在线阅读 下载PDF
Enhanced hardenability prediction in 20CrMo special steel via XGBoost model 被引量:1
4
作者 De-xin Zhu Bin-bin Wang +8 位作者 Hai-tao Zhao Sen Wu Fu-yong Li Sheng-yong Huang Hong-hui Wu Shui-ze Wang Chao-lei Zhang Jun-heng Gao Xin-ping Mao 《Journal of Iron and Steel Research International》 2025年第4期1023-1033,共11页
Machine learning is employed to comprehensively analyze and predict the hardenability of 20CrMo steel.The hardenability dataset includes J9 and J15 hardenability values,chemical composition,and heat treatment paramete... Machine learning is employed to comprehensively analyze and predict the hardenability of 20CrMo steel.The hardenability dataset includes J9 and J15 hardenability values,chemical composition,and heat treatment parameters.Various machine learning models,including linear regression(LR),k-nearest neighbors(KNN),random forest(RF),and extreme Gradient Boosting(XGBoost),are employed to develop predictive models for the hardenability of 20CrMo steel.Among these models,the XGBoost model achieves the best performance,with coefficients of determination(R2)of 0.941 and 0.946 for predicting J9 and J15 values,respectively.The predictions fall with a±2 HRC bandwidth for 98%of J9 cases and 99%of J15 cases.Additionally,SHapley Additive exPlanations(SHAP)analysis is used to identify the key elements that significantly influence the hardenability of the 20CrMo steel.The analysis revealed that alloying elements such as Si,Cr,C,N and Mo play significant roles in hardenability.The strengths and weaknesses of various machine learning models in predicting hardenability are also discussed. 展开更多
关键词 HARDENABILITY Gear steel Jominy test Machine learning shap value Feature engineering
原文传递
Spatial heterogeneity of groundwater depths in coastal cities and their responses to multiple factors interactions by interpretable machine learning models
5
作者 Yuming Mo Jing Xu +5 位作者 Senlin Zhu Beibei Xu Jinran Wu Guangqiu Jin You-Gan Wang Ling Li 《Geoscience Frontiers》 2025年第3期223-241,共19页
Understanding spatial heterogeneity in groundwater responses to multiple factors is critical for water resource management in coastal cities.Daily groundwater depth(GWD)data from 43 wells(2018-2022)were collected in t... Understanding spatial heterogeneity in groundwater responses to multiple factors is critical for water resource management in coastal cities.Daily groundwater depth(GWD)data from 43 wells(2018-2022)were collected in three coastal cities in Jiangsu Province,China.Seasonal and Trend decomposition using Loess(STL)together with wavelet analysis and empirical mode decomposition were applied to identify tide-influenced wells while remaining wells were grouped by hierarchical clustering analysis(HCA).Machine learning models were developed to predict GWD,then their response to natural conditions and human activities was assessed by the Shapley Additive exPlanations(SHAP)method.Results showed that eXtreme Gradient Boosting(XGB)was superior to other models in terms of prediction performance and computational efficiency(R^(2)>0.95).GWD in Yancheng and southern Lianyungang were greater than those in Nantong,exhibiting larger fluctuations.Groundwater within 5 km of the coastline was affected by tides,with more pronounced effects in agricultural areas compared to urban areas.Shallow groundwater(3-7 m depth)responded immediately(0-1 day)to rainfall,primarily influenced by farmland and topography(slope and distance from rivers).Rainfall recharge to groundwater peaked at 50%farmland coverage,but this effect was suppressed by high temperatures(>30℃)which intensified as distance from rivers increased,especially in forest and grassland.Deep groundwater(>10 m)showed delayed responses to rainfall(1-4 days)and temperature(10-15 days),with GDP as the primary influence,followed by agricultural irrigation and population density.Farmland helped to maintain stable GWD in low population density regions,while excessive farmland coverage(>90%)led to overexploitation.In the early stages of GDP development,increased industrial and agricultural water demand led to GWD decline,but as GDP levels significantly improved,groundwater consumption pressure gradually eased.This methodological framework is applicable not only to coastal cities in China but also could be extended to coastal regions worldwide. 展开更多
关键词 Groundwater depth Spatial heterogeneity Multiple influence factorsCoastal cities Machine Learning models shap values
在线阅读 下载PDF
Real-Time Fraud Detection Using Machine Learning
6
作者 Benjamin Borketey 《Journal of Data Analysis and Information Processing》 2024年第2期189-209,共21页
Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit ca... Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as “V12” and “V14”. SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers. 展开更多
关键词 Credit Card Fraud Detection Machine Learning shap values Random Forest
暂未订购
基于XGBoost算法的山东烟叶质量预测模型初探 被引量:11
7
作者 别瑞 周婷云 +4 位作者 周显升 姜滨 周永 邱军 曹建敏 《中国烟草科学》 CSCD 北大核心 2022年第5期80-86,93,共8页
为挖掘烟叶化学成分与感官质量之间的关系,探究机器学习算法在烟叶质量评价领域的应用效果,以山东烟叶为试验材料,开展了常规成分、生物碱、有机酸、多酚和单双糖等20项主要化学成分检测和感官质量评价,并根据感官质量优劣将其划分为好... 为挖掘烟叶化学成分与感官质量之间的关系,探究机器学习算法在烟叶质量评价领域的应用效果,以山东烟叶为试验材料,开展了常规成分、生物碱、有机酸、多酚和单双糖等20项主要化学成分检测和感官质量评价,并根据感官质量优劣将其划分为好、中、差3个质量档次。利用遗传算法对XGBoost进行超参数寻优,建立了基于化学成分的山东烟叶质量档次预测模型,同时引入SHAP value模型解释框架进行全局解释与特征依赖分析。所建预测模型对山东烟叶质量档次判别准确率为85%,尤其对第3质量档次识别效果最佳。SHAP value全局解释表明,影响山东烤烟质量的7个特征指标贡献度排名为:酸酚比>蔗糖>氯>烟碱>降烟碱>柠檬酸>糖碱比,其中糖碱比、蔗糖、酸酚比分别为好、中、差质量档次判别贡献最大的化学指标。基于XGBoost算法的山东烟叶质量预测模型在烟叶质量档次判别应用中有效、可靠、可解释性强,对于烟叶质量评价和烟叶生产具有一定指导意义。 展开更多
关键词 山东烟叶 XGBoost 机器学习 shap value 质量预测
在线阅读 下载PDF
机器学习方法能识别中国系统性金融风险的概率吗?
8
作者 王达 周映雪 《金融市场研究》 2023年第7期48-58,共11页
本文采用梯度提升树这一机器学习模型,基于美国等17个国家的25个特征变量的宏观经济数据集,构造了风险识别模型对中国的系统性风险概率进行全面解析,并通过SHAP Value解释模型,在非线性非参数模型下探索中国的风险影响因素。实证结果表... 本文采用梯度提升树这一机器学习模型,基于美国等17个国家的25个特征变量的宏观经济数据集,构造了风险识别模型对中国的系统性风险概率进行全面解析,并通过SHAP Value解释模型,在非线性非参数模型下探索中国的风险影响因素。实证结果表明,梯度提升树模型对系统性风险的捕捉能力显著优于传统的逻辑回归模型,其能够较好地刻画中国的风险概率走势;经过SHAP Value分解可发现,信贷因素、货币因素、金融市场化因素及国内总储蓄等是主要的风险拉动因子,且均存在明显的阈值效应。 展开更多
关键词 系统性风险 机器学习 梯度提升树 shap value
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部