Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the X...Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.展开更多
Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of d...Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of data preprocessing and grid parameter optimization,an interpretable stroke risk prediction model was established by integrating XGBoost and SHAP and an explanatory analysis of risk factors was performed.Results:The XGBoost model’s accuracy,sensitivity,specificity,and area under the receiver operating characteristic(ROC)curve(AUC)were 96.71%,93.83%,99.59%,and 99.19%,respectively.Our explanatory analysis showed that age,type of residence,and history of hypertension were key factors affecting the incidence of stroke.Conclusion:Based on the data set,our analysis showed that the established model can be used to identify stroke,and our explanatory analysis based on SHAP increases the transparency of the model and facilitates medical practitioners to analyze the reliability of the model.展开更多
Based on data from a petrochemical company’s MIP unit over the past three years,19 input variables and 2 output variables were selected for modeling using the maximum information coefficient and Pearson correlation c...Based on data from a petrochemical company’s MIP unit over the past three years,19 input variables and 2 output variables were selected for modeling using the maximum information coefficient and Pearson correlation coefficient among 155 variables,which included properties of feedstock oil and spent catalyst,operational variables,and material flows.The distillation range variables were reduced using factor analysis,and the feedstock oils were clustered into three types using the K-means++algorithm.Each feedstock oil type was then used as an input variable for modeling.An XGBoost model and a back propagation(BP)neural network model with a structure of 20-15-15-2 were developed to predict the combined yield of gasoline and propylene,as well as the coke yield.In the test set,the BP neural network model demonstrated better fitting and generalization abilities with a mean absolute percentage error and determination coefficient of 1.48%and 0.738,respectively,compared to the XGBoost model.It was therefore chosen for further optimization work.The genetic algorithm was utilized to optimize operational variables in order to increase the combined yield of gasoline and propylene while controlling the growth of coke yield.Seven commercial test results in the MIP unit showed an average increase of 1.39 percentage points for the combined yield of gasoline and propylene and an average decrease of 0.11 percentage points for coke yield.These results indicate that the model effectively improves the combined yield of gasoline and propylene while controlling the increase in coke yield.展开更多
In the current aera of rapid development in the field of electric vehicles and electrochemical energy storage,solid-state battery technology is attracting much research and attention.Solid-state electrolytes,as the ke...In the current aera of rapid development in the field of electric vehicles and electrochemical energy storage,solid-state battery technology is attracting much research and attention.Solid-state electrolytes,as the key component of next-generation battery technology,are favored for their high safety,high energy density,and long life.However,finding high-performance solid-state electrolytes is the primary challenge for solid-state battery applications.Focusing on inorganic solid-state electrolytes,this work highlights the need for ideal solid-state electrolytes to have low electronic conductivity,good thermal stability,and structural and phase stability.Traditional experimental and theoretical computational methods suffer from inefficiency,thus machine learning methods become a novel path to intelligently predict material properties by analyzing a large number of inorganic structural properties and characteristics.Through the gradient descent-based XGBoost algorithm,we successfully predicted the energy band structure and stability of the materials,and screened out only 194 ideal solid-state electrolyte structures from more than 6000 structures that satisfy the requirements of low electronic conductivity and stability simultaneously,which greatly accelerated the development of solid-state batteries.展开更多
Driving fatigue is one of the important causes of accidents in tunnel(group)sections.In this paper,in order to effectively identify the driving fatigue of tunnel(group)drivers,an eye tracker and other instruments were...Driving fatigue is one of the important causes of accidents in tunnel(group)sections.In this paper,in order to effectively identify the driving fatigue of tunnel(group)drivers,an eye tracker and other instruments were used to conduct real vehicle tests on long tunnel(group)expressways and thus obtain the eye movement,driving duration,and Karolinska sleepiness scale(KSS)data of 30 drivers.The impacts of the tunnel and non-tunnel sections on drivers were compared,and the relationship between blink indexes,such as the blink frequency,blink duration,mean value of blink duration,driving duration,and driving fatigue,was studied.A paired t-test and a Spearman correlation test were performed to select the indexes that can effectively characterize the tunnel driving fatigue.A driving fatigue detection model was then developed based on the XGBoost algorithm.The obtained results show that the blink frequency,total blink duration,and mean value of blink duration gradually increase with the deepening of driving fatigue,and the mean value of blink duration is the most sensitive in the tunnel environment.In addition,a significant correlation exists between the driving duration index and driving fatigue,which can provide a reference for improving the tunnel safety.Using the mean value of blink duration and driving duration as the characteristic indexes,the accuracy of the driving fatigue detection model based on the XGBoost algorithm reaches 98%.The cumulative and continuous tunnel proportion effectively estimates the driving fatigue state in a long tunnel(group)environment.展开更多
Introduction:Clinical manifestations are essential for early diagnosis of influenza-like illness(ILI).Machine learning models for influenza prediction were developed and a new ILI definition was introduced.Methods:A r...Introduction:Clinical manifestations are essential for early diagnosis of influenza-like illness(ILI).Machine learning models for influenza prediction were developed and a new ILI definition was introduced.Methods:A retrospective cohort study was conducted at three hospitals in southwest China during June 2022 and May 2023.Artificial intelligence was used to extract variables from medical records and XGBOOST algorithm was used to develop prediction models for the total population and three age subgroups.A new ILI definition was introduced based on the optimal model and its performance was compared with WHO,China CDC,and USA CDC definitions.Results:Totally 200,135 patients were included.4,249(36.2%)were confirmed influenza.The predictors of the optimal model included epidemiological characteristics,important symptoms and signs,and age for the total population[Area under curve(AUC)0.734(0.710–0.750),accuracy 0.689(0.669–0.772)].The new ILI definition was fever(≥37.9℃)with cough or rhinorrhea,and its AUC,sensitivity,and specificity for diagnosing influenza were 0.618(0.598–0.639),0.665 and 0.572,outperformed the WHO,China CDC,and USA CDC definitions(P<0.05).Conclusions:Fever,cough,and rhinorrhea maybe the most important indicators for influenza surveillance.展开更多
Urban block form significantly impacts energy and environmental performance.Therefore,optimizing urban block design in the early stages contributes to enhancing urban energy efficiency and environmental sustainability...Urban block form significantly impacts energy and environmental performance.Therefore,optimizing urban block design in the early stages contributes to enhancing urban energy efficiency and environmental sustainability.However,widely used multi-objective optimization methods based on performance simulation face the challenges of high computational loads and low efficiency.This study introduces a framework using machine learning,especially the XGBoost model,to accelerate multi-objective optimization of energy-efficient urban block forms.A residential block in Nanjing serves as the case study.The framework commences with a parametric block form model driven by design variables,focusing on minimizing building energy consumption(EUI),maximizing photovoltaic energy generation(PVE)and outdoor sunlight hours(SH).Data generated through Latin Hypercube Sampling and performance simulations inform the model training.Through training and hyperparameter tuning,XGBoost’s predictive accuracy was validated against artificial neural network(ANN),support vector machine(SVM),and random forest(RF)models.Subsequently,XGBoost replaced traditional performance simulations,conducting multi-objective optimization via the NSGA-II algorithm.Results showcase the framework’s significant acceleration of the optimization process,improving computational efficiency by over 420 times and producing 185 Pareto optimal solutions with improved performance metrics.SHAP analysis highlighted shape factor(SF),building density(BD),and building orientation(BO)as key morphological parameters influencing EUI,PVE,and SH.This study presents an efficient approach to energy-efficient urban block design,contributing valuable insights for sustainable urban development.展开更多
基金This work is supported by the NSFC[Grant Nos.61772281,61703212,61602254]Jiangsu Province Natural Science Foundation[Grant No.BK2160968]the Priority Academic Program Development of Jiangsu Higher Edu-cation Institutions(PAPD)and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(CICAEET).
文摘Privacy protection is a hot research topic in information security field.An improved XGBoost algorithm is proposed to protect the privacy in classification tasks.By combining with differential privacy protection,the XGBoost can improve the classification accuracy while protecting privacy information.When using CART regression tree to build a single decision tree,noise is added according to Laplace mechanism.Compared with random forest algorithm,this algorithm can reduce computation cost and prevent overfitting to a certain extent.The experimental results show that the proposed algorithm is more effective than other traditional algorithms while protecting the privacy information in training data.
基金the National Natural Science Foundation Project(Grant No.61863027)the Special Research Project on High Quality Development of Innovation and Entrepreneurship Education of the Chinese Society of Higher Education(Grant No.21CXD01)the Key R&D Plan of Jiangxi Province(Grant No.20202BBGL73057).
文摘Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of data preprocessing and grid parameter optimization,an interpretable stroke risk prediction model was established by integrating XGBoost and SHAP and an explanatory analysis of risk factors was performed.Results:The XGBoost model’s accuracy,sensitivity,specificity,and area under the receiver operating characteristic(ROC)curve(AUC)were 96.71%,93.83%,99.59%,and 99.19%,respectively.Our explanatory analysis showed that age,type of residence,and history of hypertension were key factors affecting the incidence of stroke.Conclusion:Based on the data set,our analysis showed that the established model can be used to identify stroke,and our explanatory analysis based on SHAP increases the transparency of the model and facilitates medical practitioners to analyze the reliability of the model.
基金the National Natural Science Foundation of China(No.U22B20141)the SINOPEC funded project(No.31900000-21-ZC0607-0009).
文摘Based on data from a petrochemical company’s MIP unit over the past three years,19 input variables and 2 output variables were selected for modeling using the maximum information coefficient and Pearson correlation coefficient among 155 variables,which included properties of feedstock oil and spent catalyst,operational variables,and material flows.The distillation range variables were reduced using factor analysis,and the feedstock oils were clustered into three types using the K-means++algorithm.Each feedstock oil type was then used as an input variable for modeling.An XGBoost model and a back propagation(BP)neural network model with a structure of 20-15-15-2 were developed to predict the combined yield of gasoline and propylene,as well as the coke yield.In the test set,the BP neural network model demonstrated better fitting and generalization abilities with a mean absolute percentage error and determination coefficient of 1.48%and 0.738,respectively,compared to the XGBoost model.It was therefore chosen for further optimization work.The genetic algorithm was utilized to optimize operational variables in order to increase the combined yield of gasoline and propylene while controlling the growth of coke yield.Seven commercial test results in the MIP unit showed an average increase of 1.39 percentage points for the combined yield of gasoline and propylene and an average decrease of 0.11 percentage points for coke yield.These results indicate that the model effectively improves the combined yield of gasoline and propylene while controlling the increase in coke yield.
基金supported by the National Natural Science Foundation of China(No.21421063,No.21473166,No.21573211,No.21633007,No.21790350,No.21803067,No.91950207)the Chinese Academy of Sciences(QYZDB-SSW-SLH018)+3 种基金the Anhui Initiative in Quantum Information Technologies(AHY090200)the USTC-NSRL Joint Funds(UN2018LHJJ)the Anhui Provincial Natural Science Foundation(2108085QB63)Numerical Theoretical simulations were done in the Supercomputing Center of USTC.
文摘In the current aera of rapid development in the field of electric vehicles and electrochemical energy storage,solid-state battery technology is attracting much research and attention.Solid-state electrolytes,as the key component of next-generation battery technology,are favored for their high safety,high energy density,and long life.However,finding high-performance solid-state electrolytes is the primary challenge for solid-state battery applications.Focusing on inorganic solid-state electrolytes,this work highlights the need for ideal solid-state electrolytes to have low electronic conductivity,good thermal stability,and structural and phase stability.Traditional experimental and theoretical computational methods suffer from inefficiency,thus machine learning methods become a novel path to intelligently predict material properties by analyzing a large number of inorganic structural properties and characteristics.Through the gradient descent-based XGBoost algorithm,we successfully predicted the energy band structure and stability of the materials,and screened out only 194 ideal solid-state electrolyte structures from more than 6000 structures that satisfy the requirements of low electronic conductivity and stability simultaneously,which greatly accelerated the development of solid-state batteries.
基金supported by the National Natural Science Foundation of China(52362050,52472347)Science and Technology Project of Shandong Transportation Department(2022KJ-044)+1 种基金“Hongliu Excellent Young”Talents Support Program of Lanzhou University of Technologythe Fundamental Research Funds for the Cornell University,CHD University(300102223505)。
文摘Driving fatigue is one of the important causes of accidents in tunnel(group)sections.In this paper,in order to effectively identify the driving fatigue of tunnel(group)drivers,an eye tracker and other instruments were used to conduct real vehicle tests on long tunnel(group)expressways and thus obtain the eye movement,driving duration,and Karolinska sleepiness scale(KSS)data of 30 drivers.The impacts of the tunnel and non-tunnel sections on drivers were compared,and the relationship between blink indexes,such as the blink frequency,blink duration,mean value of blink duration,driving duration,and driving fatigue,was studied.A paired t-test and a Spearman correlation test were performed to select the indexes that can effectively characterize the tunnel driving fatigue.A driving fatigue detection model was then developed based on the XGBoost algorithm.The obtained results show that the blink frequency,total blink duration,and mean value of blink duration gradually increase with the deepening of driving fatigue,and the mean value of blink duration is the most sensitive in the tunnel environment.In addition,a significant correlation exists between the driving duration index and driving fatigue,which can provide a reference for improving the tunnel safety.Using the mean value of blink duration and driving duration as the characteristic indexes,the accuracy of the driving fatigue detection model based on the XGBoost algorithm reaches 98%.The cumulative and continuous tunnel proportion effectively estimates the driving fatigue state in a long tunnel(group)environment.
基金Supported by the Chongqing Social Science Planning Project(Grant Number 2020PY48)funded by the Chongqing Federation of Social Sciencethe Joint Project of Chongqing Science and Technology Bureau and Health Commission(Grant Number 2020NCPZX03)funded by the Chongqing Science and Technology Bureau and Chongqing Health Commission of China.
文摘Introduction:Clinical manifestations are essential for early diagnosis of influenza-like illness(ILI).Machine learning models for influenza prediction were developed and a new ILI definition was introduced.Methods:A retrospective cohort study was conducted at three hospitals in southwest China during June 2022 and May 2023.Artificial intelligence was used to extract variables from medical records and XGBOOST algorithm was used to develop prediction models for the total population and three age subgroups.A new ILI definition was introduced based on the optimal model and its performance was compared with WHO,China CDC,and USA CDC definitions.Results:Totally 200,135 patients were included.4,249(36.2%)were confirmed influenza.The predictors of the optimal model included epidemiological characteristics,important symptoms and signs,and age for the total population[Area under curve(AUC)0.734(0.710–0.750),accuracy 0.689(0.669–0.772)].The new ILI definition was fever(≥37.9℃)with cough or rhinorrhea,and its AUC,sensitivity,and specificity for diagnosing influenza were 0.618(0.598–0.639),0.665 and 0.572,outperformed the WHO,China CDC,and USA CDC definitions(P<0.05).Conclusions:Fever,cough,and rhinorrhea maybe the most important indicators for influenza surveillance.
基金sponsored by the National Natural Science Foundation of China(NSFC No.52478011,No.52378046).
文摘Urban block form significantly impacts energy and environmental performance.Therefore,optimizing urban block design in the early stages contributes to enhancing urban energy efficiency and environmental sustainability.However,widely used multi-objective optimization methods based on performance simulation face the challenges of high computational loads and low efficiency.This study introduces a framework using machine learning,especially the XGBoost model,to accelerate multi-objective optimization of energy-efficient urban block forms.A residential block in Nanjing serves as the case study.The framework commences with a parametric block form model driven by design variables,focusing on minimizing building energy consumption(EUI),maximizing photovoltaic energy generation(PVE)and outdoor sunlight hours(SH).Data generated through Latin Hypercube Sampling and performance simulations inform the model training.Through training and hyperparameter tuning,XGBoost’s predictive accuracy was validated against artificial neural network(ANN),support vector machine(SVM),and random forest(RF)models.Subsequently,XGBoost replaced traditional performance simulations,conducting multi-objective optimization via the NSGA-II algorithm.Results showcase the framework’s significant acceleration of the optimization process,improving computational efficiency by over 420 times and producing 185 Pareto optimal solutions with improved performance metrics.SHAP analysis highlighted shape factor(SF),building density(BD),and building orientation(BO)as key morphological parameters influencing EUI,PVE,and SH.This study presents an efficient approach to energy-efficient urban block design,contributing valuable insights for sustainable urban development.