Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time...Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time-consuming and labor-intensive.Recently,discovering governing PDEs from collected actual data via Physics Informed Neural Networks(PINNs)provides a more efficient way to analyze fresh dynamic systems and establish PEDmodels.This study proposes Sequentially Threshold Least Squares-Lasso(STLasso),a module constructed by incorporating Lasso regression into the Sequentially Threshold Least Squares(STLS)algorithm,which can complete sparse regression of PDE coefficients with the constraints of l0 norm.It further introduces PINN-STLasso,a physics informed neural network combined with Lasso sparse regression,able to find underlying PDEs from data with reduced data requirements and better interpretability.In addition,this research conducts experiments on canonical inverse PDE problems and compares the results to several recent methods.The results demonstrated that the proposed PINN-STLasso outperforms other methods,achieving lower error rates even with less data.展开更多
Damage to electrical equipment in an earthquake can lead to power outage of power systems.Seismic fragility analysis is a common method to assess the seismic reliability of electrical equipment.To further guarantee th...Damage to electrical equipment in an earthquake can lead to power outage of power systems.Seismic fragility analysis is a common method to assess the seismic reliability of electrical equipment.To further guarantee the efficiency of analysis,multi-source uncertainties including the structure itself and seismic excitation need to be considered.A method for seismic fragility analysis that reflects structural and seismic parameter uncertainty was developed in this study.The proposed method used a random sampling method based on Latin hypercube sampling(LHS)to account for the structure parameter uncertainty and the group structure characteristics of electrical equipment.Then,logistic Lasso regression(LLR)was used to find the seismic fragility surface based on double ground motion intensity measures(IM).The seismic fragility based on the finite element model of an±1000 kV main transformer(UHVMT)was analyzed using the proposed method.The results show that the seismic fragility function obtained by this method can be used to construct the relationship between the uncertainty parameters and the failure probability.The seismic fragility surface did not only provide the probabilities of seismic damage states under different IMs,but also had better stability than the fragility curve.Furthermore,the sensitivity analysis of the structural parameters revealed that the elastic module of the bushing and the height of the high-voltage bushing may have a greater influence.展开更多
To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of ...To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of dynamic experiments methodologies.For utilizing such time-resolved data to model the dynamic behavior,dynamic response surface methodology(DRSM),a datadriven modeling method,has been proposed.Two approaches can be adopted in the estimation of the model parameters:stepwise regression,used in several of previous publications,and Lasso regression,which is newly incorporated in this paper for the estimation of DRSM models.Here,we show that both approaches yield similarly accurate models,while the computational time of Lasso is on average two magnitude smaller.Two case studies are performed to show the advantages of the proposed method.In the first case study,where the concentrations of different species are modeled directly,DRSM method provides more accurate models compared to the models in the literature.The second case study,where the reaction extents are modeled instead of the species concentrations,illustrates the versatility of the DRSM methodology.Therefore,DRSM with Lasso regression can provide faster and more accurate datadriven models for a variety of organic synthesis datasets.展开更多
Background:Liver transplantations(LTs)with extended criteria have produced surgical results comparable to those obtained with traditional standards.However,it is not sufficient to predict hepatocellular carcinoma(HCC)...Background:Liver transplantations(LTs)with extended criteria have produced surgical results comparable to those obtained with traditional standards.However,it is not sufficient to predict hepatocellular carcinoma(HCC)recurrence after LT according to morphological criteria alone.The present study aimed to construct a nomogram for predicting HCC recurrence after LT using extended selection criteria.Methods:Retrospective data on patients with HCC,including pathology,serological markers and followup data,were collected from January 2015 to April 2020 at Huashan Hospital,Fudan University,Shanghai,China.Logistic least absolute shrinkage and selection operator(LASSO)regression and multivariate Cox regression analyses were performed to identify and construct the prognostic nomogram.Receiver operating characteristic(ROC)curves,Kaplan-Meier curves,decision curve analyses(DCAs),calibration diagrams,net reclassification indices(NRIs)and integrated discrimination improvement(IDI)values were used to assess the prognostic capacity of the nomogram.Results:A total of 301 patients with HCC who underwent LT were enrolled in the study.The nomogram was constructed,and the ROC curve showed good performance in predicting survival in both the development set(2/3)and the validation set(1/3)(the area under the curve reached 0.748 and 0.716,respectively).According to the median value of the risk score,the patients were categorized into the high-and low-risk groups,which had significantly different recurrence-free survival(RFS)rates(P<0.01).Compared with the Milan criteria and University of California San Francisco(UCSF)criteria,DCA revealed that the new nomogram model had the best net benefit in predicting 1-,3-and 5-year RFS.The nomogram performed well for calibration,NRI and IDI improvement.Conclusions:The nomogram,based on the Milan criteria and serological markers,showed good accuracy in predicting the recurrence of HCC after LT using extended selection criteria.展开更多
There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good ...There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good estimate of the true future trend.The grey neural network model fused with Lasso regression is a comprehensive prediction model that combines the grey prediction model and the BP neural network model after dimensionality reduction using Lasso.It can reduce the dimensionality of the original data,make separate predictions for each explanatory variable,and then use neural networks to make multivariate predictions,thereby making up for the shortcomings of traditional methods of insufficient prediction accuracy.In this paper,we took the financial revenue data of China’s Hunan Province from 2005 to 2019 as the object of analysis.Firstly,we used Lasso regression to reduce the dimensionality of the data.Because the grey prediction model has the excellent predictive performance for small data volumes,then we chose the grey prediction model to obtain the predicted values of all explanatory variables in 2020,2021 by using the data of 2005–2019.Finally,considering that fiscal revenue is affected by many factors,we applied the BP neural network,which has a good effect on multiple inputs,to make the final forecast of fiscal revenue.The experimental results show that the combined model has a good effect in financial revenue forecasting.展开更多
The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation ver...The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation verification and online feedback correction results are presented.The results show that,after applying machine learning,the feedback accuracy of the tune feedback system was higher,and the betatron tune stability was further improved.展开更多
To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at ex...To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at exploring the attractiveness of the mountain product label for consumers, considering both attitude towards the label itself and purchase intentions. We propose a model to investigate relationships between four latent constructs-mountain attractiveness, mountain food attractiveness, attitude towards the mountain product label, and purchase intention-which have been tested, thus confirming the statistical relevance of the relationships. All 47 items selected for describing the latent constructs are suitable for this purpose. Ridge and LASSO results also show that 17 items of the first three constructs are relevant in explaining purchase intentions. Some contextual variables, such as age, income, geographical origin of consumers, and knowledge of mountain products and mountains for tourism purposes, can positively influence consumers’ behavior. These findings could support the design of mountain development strategies, in particular marketing actions for both the product and the territory.展开更多
Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the ...Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the tumor microenvironment’s role.Methods:The thought of traditional Chinese medicine syndrome differentiation and treatment runs through the whole study.We analyzed CRC tissue data from The Cancer Genome Atlas and the Gene Expression Omnibus using single-sample gene set enrichment and weighted gene correlation network analyses to identify prognostic markers and evaluate immune infiltration.We also investigated predictive drug sensitivities.Results:We identified seven disulfidptosis-related markers–complement C1q A chain(C1QA),solute carrier family 11 member 1(SLC11A1),cluster of differentiation 36(CD36),cluster of differentiation 6(CD6),interleukin 1 receptor associated kinase 3(IRAK3),S100 calcium binding protein A8(S100A8),and CD8 subunit alpha(CD8A)–that significantly influence prognosis.Patients classified in the low-risk group demonstrated improved overall survival compared to those in the high-risk group across training(P=0.0026)and validation cohorts(P=0.032).Differential gene expression was significant in the high-risk group(P<0.001),and prevalent mutations included APC regulator of WNT signaling pathway(APC),tumor protein P53(TP53),Titin(TTN),and Kirsten rat sarcoma viral oncogene(KRAS).The risk score correlated linearly with tumor microenvironment attributes.The results of drug analysis showed that some traditional drugs may have anticancer effects through the vertical action of disulfidptosis.Conclusion:Our prognostic model,integrating seven disulfidptosis-related genes,categorizes CRC patients by survival probability and underscores these genes as potential biomarkers linked to the tumor microenvironment.These findings support their use in refining therapeutic strategies for CRC.展开更多
Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of...Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of soybean, least absolute shrinkage and selection operator(LASSO) regression and stepwise regression were combined, and a genomic selection model was established for 40 002 SNP markers covering soybean genome and relative lesion area of soybean FLS. As a result, 68 molecular markers controlling soybean FLS were detected accurately, and the phenotypic contribution rate of these markers reached 82.45%. In this study, a model was established, which could be used directly to evaluate the resistance of soybean FLS and to select excellent offspring. This research method could also provide ideas and methods for other plants to breeding in disease resistance.展开更多
Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is...Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.展开更多
This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis...This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis,the study makes use of cross-sectional linear regression technique in the first part and after required diagnostic tests use 2SLS regression technique for correcting possible endogeneity bias in the second part.Findings from the study indicate that factors like total reported cases,population size,population over 70 years of age,extreme poverty,and human development index play significant role in determining COVID-19-related death.Further,to check the robustness of the findings the present study employed LASSO regression.Findings from the study highlight the possibility of government intervention to devise appropriate policies to control COVID-related incidence and death.展开更多
We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and...We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.展开更多
This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly availabl...This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly available bond default data,we analyze the roles of market variables against accounting variables under various models.While we find that Merton's market-based structural model and KMV's Distance to Default exhibit languid discriminating power compared with hazard models that have carefully constructed predictors,other market variables carry significant information about bond defaults and could help improve on models with only the accounting variables.This implies that the collective intelligence of the market could somehow mitigate the distortion caused by misreported accounting information.Further,model performance can be significantly improved by adding predicting variables that link an individual financial measure to the broader market performance,such as the relative margin—a business environment proxy introduced in this study.We not only shed light on the default behavior of the Chinese bond market,but also provide a promising approach to improve the variable selection process.展开更多
The applications of four machine learning(ML)algorithms,namely:Support Vector Regressor(SVR),Extreme Gradient Boosting(XGBoost),Least absolute shrinkage and selection operator(Lasso),and Ridge,in predicting the corros...The applications of four machine learning(ML)algorithms,namely:Support Vector Regressor(SVR),Extreme Gradient Boosting(XGBoost),Least absolute shrinkage and selection operator(Lasso),and Ridge,in predicting the corrosion inhibition efficiency(IE)of Treculia africana(TA)leaves extract on AA7075-T7351 alloy,in corrosive 1.0 M HCl environment,with a small(42)sample space,have been studied.Time and resource constraints in traditional corrosion study methods have been avoided through feature engineering to expedite prediction process.The dominant features,which affected the IE,were done through feature importance and selection processes using pair plot matrix of features and Kendall correlation etc.,to remove redundant features.The results in the form of data visualization,feature importance,and the performance of each algorithm on the test set were explicitly depicted.The evaluation metrics,including coefficients of determination(R2)and root mean square error(RMSE),validated the efficacy of the models in predicting the IE of TA on AA7075-T7351 in 1.0 M HCl environments.Ridge model demonstrated superior accuracy,with R2 score of 0.972,particularly in handling the highly correlated dataset used in this study.SVR followed closely in performance(0.969).XGBoost proved reliable at R2 score of 0.953.Lasso with R2 of 0.952 was the least of the four models,due to its random feature selection method.The RMSE scores corroborated the prediction accuracies with values;4.145,4.408,5.138 and 5.462 respectively.This study revealed the viability of using the four machine learning algorithms in potential generalization ability of IE prediction accuracy,while offering an efficient and accurate alternative to traditional methods.展开更多
Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analyti...Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analytics,few studies have used artificial intelligence(AI)algorithms.This study investigated prognostic factors for COVID-19 patients using AI methods.Methods:COVID-19 patients who were admitted in Wuhan Infectious Diseases Hospital from December 29,2019 to March 2,2020 were included.The whole cohort was randomly divided into training and testing sets at a 6:4 ratio.Demographic and clinical data were analyzed to identify predictors of mortality using least absolute shrinkage and selection operator(LASSO)regression and LASSO-based artificial neural network(ANN)models.The predictive performance of the models was evaluated using receiver operating characteristic(ROC)curve analysis.Results:A total of 1145 patients(610 male,53.3%)were included in the study.Of the 1145 patients,704 were assigned to the training set and 441 were assigned to the testing set.The median age of the patients was 57 years(range:47-66 years).Severity of illness,age,platelet count,leukocyte count,prealbumin,C-reactive protein(CRP),total bilirubin,Acute Physiology and Chronic Health Evaluation(APACHE)II score,and Sequential Organ Failure Assessment(SOFA)score were identified as independent prognostic factors for mortality.Incorporating these nine factors into the LASSO regression model yielded a correct classification rate of 0.98,with area under the ROC curve(AUC)values of 0.980 and 0.990 in the training and testing cohorts,respectively.Incorporating the same factors into the LASSO-based ANN model yielded a correct classification rate of 0.990,with an AUC of 0.980 in both the training and testing cohorts.Conclusions:Both the LASSO regression and LASSO-based ANN model accurately predicted the clinical outcome of patients with COVID-19.Severity of illness,age,platelet count,leukocyte count,prealbumin,CRP,total bilirubin,APACHE II score,and SOFA score were identified as prognostic factors for mortality in patients with COVID-19.展开更多
文摘Partial Differential Equation(PDE)is among the most fundamental tools employed to model dynamic systems.Existing PDE modeling methods are typically derived from established knowledge and known phenomena,which are time-consuming and labor-intensive.Recently,discovering governing PDEs from collected actual data via Physics Informed Neural Networks(PINNs)provides a more efficient way to analyze fresh dynamic systems and establish PEDmodels.This study proposes Sequentially Threshold Least Squares-Lasso(STLasso),a module constructed by incorporating Lasso regression into the Sequentially Threshold Least Squares(STLS)algorithm,which can complete sparse regression of PDE coefficients with the constraints of l0 norm.It further introduces PINN-STLasso,a physics informed neural network combined with Lasso sparse regression,able to find underlying PDEs from data with reduced data requirements and better interpretability.In addition,this research conducts experiments on canonical inverse PDE problems and compares the results to several recent methods.The results demonstrated that the proposed PINN-STLasso outperforms other methods,achieving lower error rates even with less data.
基金National Key R&D Program of China under Grant Nos.2018YFC1504504 and 2018YFC0809404。
文摘Damage to electrical equipment in an earthquake can lead to power outage of power systems.Seismic fragility analysis is a common method to assess the seismic reliability of electrical equipment.To further guarantee the efficiency of analysis,multi-source uncertainties including the structure itself and seismic excitation need to be considered.A method for seismic fragility analysis that reflects structural and seismic parameter uncertainty was developed in this study.The proposed method used a random sampling method based on Latin hypercube sampling(LHS)to account for the structure parameter uncertainty and the group structure characteristics of electrical equipment.Then,logistic Lasso regression(LLR)was used to find the seismic fragility surface based on double ground motion intensity measures(IM).The seismic fragility based on the finite element model of an±1000 kV main transformer(UHVMT)was analyzed using the proposed method.The results show that the seismic fragility function obtained by this method can be used to construct the relationship between the uncertainty parameters and the failure probability.The seismic fragility surface did not only provide the probabilities of seismic damage states under different IMs,but also had better stability than the fragility curve.Furthermore,the sensitivity analysis of the structural parameters revealed that the elastic module of the bushing and the height of the high-voltage bushing may have a greater influence.
基金Yachao Dong is grateful for the financial support of Fundamental Research Funds for the Central Universities(Grant No.DUT20RC(3)070).
文摘To study the dynamic behavior of a process,time-resolved data are collected at different time instants during each of a series of experiments,which are usually designed with the design of experiments or the design of dynamic experiments methodologies.For utilizing such time-resolved data to model the dynamic behavior,dynamic response surface methodology(DRSM),a datadriven modeling method,has been proposed.Two approaches can be adopted in the estimation of the model parameters:stepwise regression,used in several of previous publications,and Lasso regression,which is newly incorporated in this paper for the estimation of DRSM models.Here,we show that both approaches yield similarly accurate models,while the computational time of Lasso is on average two magnitude smaller.Two case studies are performed to show the advantages of the proposed method.In the first case study,where the concentrations of different species are modeled directly,DRSM method provides more accurate models compared to the models in the literature.The second case study,where the reaction extents are modeled instead of the species concentrations,illustrates the versatility of the DRSM methodology.Therefore,DRSM with Lasso regression can provide faster and more accurate datadriven models for a variety of organic synthesis datasets.
基金supported by grants from the National Key Research and Development Program of China(2023YFC2505900)the National Natural Science Foundation of China(82241225,81873874 and 82071797)。
文摘Background:Liver transplantations(LTs)with extended criteria have produced surgical results comparable to those obtained with traditional standards.However,it is not sufficient to predict hepatocellular carcinoma(HCC)recurrence after LT according to morphological criteria alone.The present study aimed to construct a nomogram for predicting HCC recurrence after LT using extended selection criteria.Methods:Retrospective data on patients with HCC,including pathology,serological markers and followup data,were collected from January 2015 to April 2020 at Huashan Hospital,Fudan University,Shanghai,China.Logistic least absolute shrinkage and selection operator(LASSO)regression and multivariate Cox regression analyses were performed to identify and construct the prognostic nomogram.Receiver operating characteristic(ROC)curves,Kaplan-Meier curves,decision curve analyses(DCAs),calibration diagrams,net reclassification indices(NRIs)and integrated discrimination improvement(IDI)values were used to assess the prognostic capacity of the nomogram.Results:A total of 301 patients with HCC who underwent LT were enrolled in the study.The nomogram was constructed,and the ROC curve showed good performance in predicting survival in both the development set(2/3)and the validation set(1/3)(the area under the curve reached 0.748 and 0.716,respectively).According to the median value of the risk score,the patients were categorized into the high-and low-risk groups,which had significantly different recurrence-free survival(RFS)rates(P<0.01).Compared with the Milan criteria and University of California San Francisco(UCSF)criteria,DCA revealed that the new nomogram model had the best net benefit in predicting 1-,3-and 5-year RFS.The nomogram performed well for calibration,NRI and IDI improvement.Conclusions:The nomogram,based on the Milan criteria and serological markers,showed good accuracy in predicting the recurrence of HCC after LT using extended selection criteria.
基金This research was funded by the National Natural Science Foundation of China(No.61304208)Scientific Research Fund of Hunan Province Education Department(18C0003)+2 种基金Research project on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Changsha City Science and Technology Plan Program(K1501013-11)Hunan Normal University University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open project,grant number 20181901CRP04.
文摘There are many influencing factors of fiscal revenue,and traditional forecasting methods cannot handle the feature dimensions well,which leads to serious over-fitting of the forecast results and unable to make a good estimate of the true future trend.The grey neural network model fused with Lasso regression is a comprehensive prediction model that combines the grey prediction model and the BP neural network model after dimensionality reduction using Lasso.It can reduce the dimensionality of the original data,make separate predictions for each explanatory variable,and then use neural networks to make multivariate predictions,thereby making up for the shortcomings of traditional methods of insufficient prediction accuracy.In this paper,we took the financial revenue data of China’s Hunan Province from 2005 to 2019 as the object of analysis.Firstly,we used Lasso regression to reduce the dimensionality of the data.Because the grey prediction model has the excellent predictive performance for small data volumes,then we chose the grey prediction model to obtain the predicted values of all explanatory variables in 2020,2021 by using the data of 2005–2019.Finally,considering that fiscal revenue is affected by many factors,we applied the BP neural network,which has a good effect on multiple inputs,to make the final forecast of fiscal revenue.The experimental results show that the combined model has a good effect in financial revenue forecasting.
基金supported by the National Natural Science Foundation of China (No. 11975227)Hefei Science Center,CAS (No.2019HSC-KPRD003)
文摘The theory of tune feedback correction and the principle of a feedback algorithm based on machine learning are introduced,with a focus on the application of lasso regression for tune feedback correction.Simulation verification and online feedback correction results are presented.The results show that,after applying machine learning,the feedback accuracy of the tune feedback system was higher,and the betatron tune stability was further improved.
基金financially supported by the Department of Agricultural,Food,Environmental and Animal Sciences,University of Udine,Italy。
文摘To protect and promote the originality and authenticity of mountain foodstuffs, the European Union set Regulation No 1151/2012 to create the optional quality term "mountain product". Our research aimed at exploring the attractiveness of the mountain product label for consumers, considering both attitude towards the label itself and purchase intentions. We propose a model to investigate relationships between four latent constructs-mountain attractiveness, mountain food attractiveness, attitude towards the mountain product label, and purchase intention-which have been tested, thus confirming the statistical relevance of the relationships. All 47 items selected for describing the latent constructs are suitable for this purpose. Ridge and LASSO results also show that 17 items of the first three constructs are relevant in explaining purchase intentions. Some contextual variables, such as age, income, geographical origin of consumers, and knowledge of mountain products and mountains for tourism purposes, can positively influence consumers’ behavior. These findings could support the design of mountain development strategies, in particular marketing actions for both the product and the territory.
文摘Background:Colorectal cancer(CRC)is a leading cause of cancer mortality globally.This study aims to develop a prognostic model based on disulfidptosis-related genes to assess survival outcomes in CRC,highlighting the tumor microenvironment’s role.Methods:The thought of traditional Chinese medicine syndrome differentiation and treatment runs through the whole study.We analyzed CRC tissue data from The Cancer Genome Atlas and the Gene Expression Omnibus using single-sample gene set enrichment and weighted gene correlation network analyses to identify prognostic markers and evaluate immune infiltration.We also investigated predictive drug sensitivities.Results:We identified seven disulfidptosis-related markers–complement C1q A chain(C1QA),solute carrier family 11 member 1(SLC11A1),cluster of differentiation 36(CD36),cluster of differentiation 6(CD6),interleukin 1 receptor associated kinase 3(IRAK3),S100 calcium binding protein A8(S100A8),and CD8 subunit alpha(CD8A)–that significantly influence prognosis.Patients classified in the low-risk group demonstrated improved overall survival compared to those in the high-risk group across training(P=0.0026)and validation cohorts(P=0.032).Differential gene expression was significant in the high-risk group(P<0.001),and prevalent mutations included APC regulator of WNT signaling pathway(APC),tumor protein P53(TP53),Titin(TTN),and Kirsten rat sarcoma viral oncogene(KRAS).The risk score correlated linearly with tumor microenvironment attributes.The results of drug analysis showed that some traditional drugs may have anticancer effects through the vertical action of disulfidptosis.Conclusion:Our prognostic model,integrating seven disulfidptosis-related genes,categorizes CRC patients by survival probability and underscores these genes as potential biomarkers linked to the tumor microenvironment.These findings support their use in refining therapeutic strategies for CRC.
基金Supported by the National Key Research and Development Program of China(2021YFD1201103-01-05)。
文摘Soybean frogeye leaf spot(FLS) disease is a global disease affecting soybean yield, especially in the soybean growing area of Heilongjiang Province. In order to realize genomic selection breeding for FLS resistance of soybean, least absolute shrinkage and selection operator(LASSO) regression and stepwise regression were combined, and a genomic selection model was established for 40 002 SNP markers covering soybean genome and relative lesion area of soybean FLS. As a result, 68 molecular markers controlling soybean FLS were detected accurately, and the phenotypic contribution rate of these markers reached 82.45%. In this study, a model was established, which could be used directly to evaluate the resistance of soybean FLS and to select excellent offspring. This research method could also provide ideas and methods for other plants to breeding in disease resistance.
文摘Background:Depression is a kind of emotional disorders caused by a variety of factors,with the accelerating pace of life,people in life and work facing competition pressure is increasing,the incidence of depression is increasing year by year,so the in-depth study of the pathogenesis of depression,and the development of depression risk prediction model is becoming increasingly important.Method:This study data is derived from the 2017–2018 follow-up data from the National Health and Nutrition Examination Survey database,a publicly available database using a multi-stage,hierarchical,clustered,probability sampling design to determine a nationally representative sample of non-institutionalized US civilians.Participants completed home interviews,laboratory measurements,and a physical examination.Details of the survey design have been published previously.This study evaluated the risk factors for the occurrence of depression from this study from multiple variables such as age,sex,and combined complications.Four machine learning algorithms(logistic regression,Lasso regression,support vector machine,random forest)were used to establish predictive classification models and compare the area under the subject operating feature curve and accuracy.The dataset was validated using a 10-fold cross-validation.Result:We excluded the invalid samples for 815 included samples,of which 570 cases were divided into the validation set and 245 cases were divided into the training set.The area under the curve(AUC)of Nomogram establishing risk of depression based on logistic regression was 0.73.Among the three machine learning models,the Lasso regression-based model AUC was 0.548,a mean AUC for support vector machines was 0.695,and a random forest AUC of 0.613.The support vector machines-based model predicted the best performance compared to other machine models.Conclusion:Random forest-based prediction models are able to assist clinicians in providing decision support when it is difficult to give an exact diagnosis.The model has good clinical utility and facilitates clinicians to identify high-risk patients and perform individualized treatment.The established four models of logistic regression,Lasso regression,support vector machine,and random forest all have good predictive power.
基金This research is supported by a seed money Grant provided by Maulana Azad National Institute of Technology,Bhopal under the Grant No.:DeanR&C/1420.
文摘This study attempts to investigate the factors determining COVID-19 deaths during the pandemic across countries by employing a rich dataset sourced from 94 countries updated till 6 February,2022.For empirical analysis,the study makes use of cross-sectional linear regression technique in the first part and after required diagnostic tests use 2SLS regression technique for correcting possible endogeneity bias in the second part.Findings from the study indicate that factors like total reported cases,population size,population over 70 years of age,extreme poverty,and human development index play significant role in determining COVID-19-related death.Further,to check the robustness of the findings the present study employed LASSO regression.Findings from the study highlight the possibility of government intervention to devise appropriate policies to control COVID-related incidence and death.
基金supported by National Natural Science Foundation of China (Grant Nos. 11401383, 11301391 and 11271080)
文摘We propose a two-step variable selection procedure for censored quantile regression with high dimensional predictors. To account for censoring data in high dimensional case, we employ effective dimension reduction and the ideas of informative subset idea. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. Simulation study and real data analysis are conducted to evaluate the finite sample performance of the proposed approach.
文摘This paper provides the first empirical study on bond defaults in the Chinese market.It overcomes the deficiencies of existing methods,which suffer from lack of actual default data for back testing.With newly available bond default data,we analyze the roles of market variables against accounting variables under various models.While we find that Merton's market-based structural model and KMV's Distance to Default exhibit languid discriminating power compared with hazard models that have carefully constructed predictors,other market variables carry significant information about bond defaults and could help improve on models with only the accounting variables.This implies that the collective intelligence of the market could somehow mitigate the distortion caused by misreported accounting information.Further,model performance can be significantly improved by adding predicting variables that link an individual financial measure to the broader market performance,such as the relative margin—a business environment proxy introduced in this study.We not only shed light on the default behavior of the Chinese bond market,but also provide a promising approach to improve the variable selection process.
文摘The applications of four machine learning(ML)algorithms,namely:Support Vector Regressor(SVR),Extreme Gradient Boosting(XGBoost),Least absolute shrinkage and selection operator(Lasso),and Ridge,in predicting the corrosion inhibition efficiency(IE)of Treculia africana(TA)leaves extract on AA7075-T7351 alloy,in corrosive 1.0 M HCl environment,with a small(42)sample space,have been studied.Time and resource constraints in traditional corrosion study methods have been avoided through feature engineering to expedite prediction process.The dominant features,which affected the IE,were done through feature importance and selection processes using pair plot matrix of features and Kendall correlation etc.,to remove redundant features.The results in the form of data visualization,feature importance,and the performance of each algorithm on the test set were explicitly depicted.The evaluation metrics,including coefficients of determination(R2)and root mean square error(RMSE),validated the efficacy of the models in predicting the IE of TA on AA7075-T7351 in 1.0 M HCl environments.Ridge model demonstrated superior accuracy,with R2 score of 0.972,particularly in handling the highly correlated dataset used in this study.SVR followed closely in performance(0.969).XGBoost proved reliable at R2 score of 0.953.Lasso with R2 of 0.952 was the least of the four models,due to its random feature selection method.The RMSE scores corroborated the prediction accuracies with values;4.145,4.408,5.138 and 5.462 respectively.This study revealed the viability of using the four machine learning algorithms in potential generalization ability of IE prediction accuracy,while offering an efficient and accurate alternative to traditional methods.
基金supported by the National Natural Science Foundation of China(Grant No.81,873,944 and 81,971,869)the Shanghai Science and Technology Commission(Grant No.20DZ2200500).
文摘Background:Novel coronavirus disease 2019(COVID-19)is an ongoing global pandemic with high mortality.Although several studies have reported different risk factors for mortality in patients based on traditional analytics,few studies have used artificial intelligence(AI)algorithms.This study investigated prognostic factors for COVID-19 patients using AI methods.Methods:COVID-19 patients who were admitted in Wuhan Infectious Diseases Hospital from December 29,2019 to March 2,2020 were included.The whole cohort was randomly divided into training and testing sets at a 6:4 ratio.Demographic and clinical data were analyzed to identify predictors of mortality using least absolute shrinkage and selection operator(LASSO)regression and LASSO-based artificial neural network(ANN)models.The predictive performance of the models was evaluated using receiver operating characteristic(ROC)curve analysis.Results:A total of 1145 patients(610 male,53.3%)were included in the study.Of the 1145 patients,704 were assigned to the training set and 441 were assigned to the testing set.The median age of the patients was 57 years(range:47-66 years).Severity of illness,age,platelet count,leukocyte count,prealbumin,C-reactive protein(CRP),total bilirubin,Acute Physiology and Chronic Health Evaluation(APACHE)II score,and Sequential Organ Failure Assessment(SOFA)score were identified as independent prognostic factors for mortality.Incorporating these nine factors into the LASSO regression model yielded a correct classification rate of 0.98,with area under the ROC curve(AUC)values of 0.980 and 0.990 in the training and testing cohorts,respectively.Incorporating the same factors into the LASSO-based ANN model yielded a correct classification rate of 0.990,with an AUC of 0.980 in both the training and testing cohorts.Conclusions:Both the LASSO regression and LASSO-based ANN model accurately predicted the clinical outcome of patients with COVID-19.Severity of illness,age,platelet count,leukocyte count,prealbumin,CRP,total bilirubin,APACHE II score,and SOFA score were identified as prognostic factors for mortality in patients with COVID-19.