The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intr...BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.展开更多
Modern engineering design optimization often relies on computer simulations to evaluate candidate designs, a setup which results in expensive black-box optimization problems. Such problems introduce unique challenges,...Modern engineering design optimization often relies on computer simulations to evaluate candidate designs, a setup which results in expensive black-box optimization problems. Such problems introduce unique challenges, which has motivated the application of metamodel-assisted computational intelligence algorithms to solve them. Such algorithms combine a computational intelligence optimizer which employs a population of candidate solutions, with a metamodel which is a computationally cheaper approximation of the expensive computer simulation. However, although a variety of metamodels and optimizers have been proposed, the optimal types to employ are problem dependant. Therefore, a priori prescribing the type of metamodel and optimizer to be used may degrade its effectiveness. Leveraging on this issue, this study proposes a new computational intelligence algorithm which autonomously adapts the type of the metamodel and optimizer during the search by selecting the most suitable types out of a family of candidates at each stage. Performance analysis using a set of test functions demonstrates the effectiveness of the proposed algorithm, and highlights the merit of the proposed adaptation approach.展开更多
In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making d...In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.展开更多
The test selection and optimization (TSO) can improve the abilities of fault diagnosis, prognosis and health-state evalua- tion for prognostics and health management (PHM) systems. Traditionally, TSO mainly focuse...The test selection and optimization (TSO) can improve the abilities of fault diagnosis, prognosis and health-state evalua- tion for prognostics and health management (PHM) systems. Traditionally, TSO mainly focuses on fault detection and isolation, but they cannot provide an effective guide for the design for testability (DFT) to improve the PHM performance level. To solve the problem, a model of TSO for PHM systems is proposed. Firstly, through integrating the characteristics of fault severity and propa- gation time, and analyzing the test timing and sensitivity, a testability model based on failure evolution mechanism model (FEMM) for PHM systems is built up. This model describes the fault evolution- test dependency using the fault-symptom parameter matrix and symptom parameter-test matrix. Secondly, a novel method of in- herent testability analysis for PHM systems is developed based on the above information. Having completed the analysis, a TSO model, whose objective is to maximize fault trackability and mini- mize the test cost, is proposed through inherent testability analysis results, and an adaptive simulated annealing genetic algorithm (ASAGA) is introduced to solve the TSO problem. Finally, a case of a centrifugal pump system is used to verify the feasibility and effectiveness of the proposed models and methods. The results show that the proposed technology is important for PHM systems to select and optimize the test set in order to improve their performance level.展开更多
Extended Kalman Filter(EKF)algorithm is widely used in parameter estimation for nonlinear systems.The estimation precision is sensitively dependent on EKF’s initial state covariance matrix and state noise matrix.The ...Extended Kalman Filter(EKF)algorithm is widely used in parameter estimation for nonlinear systems.The estimation precision is sensitively dependent on EKF’s initial state covariance matrix and state noise matrix.The grid optimization method is always used to find proper initial matrix for off-line estimation.However,the grid method has the draw back being time consuming hence,coarse grid followed by a fine grid method is adopted.To further improve efficiency without the loss of estimation accuracy,we propose a genetic algorithm for the coarse grid optimization in this paper.It is recognized that the crossover rate and mutation rate are the main influencing factors for the performance of the genetic algorithm,so sensitivity experiments for these two factors are carried out and a set of genetic algorithm parameters with good adaptability were selected by testing with several gyros’experimental data.Experimental results show that the proposed algorithm has higher efficiency and better estimation accuracy than the traversing grid algorithm.展开更多
Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of ...Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of the variance parameter.In this paper,we propose and study a novel class of models:a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population.The problem of variable selection for the proposed models is considered.In particular,a modi ed Expectation-Maximization(EM)algorithm for estimating the model parameters is developed.The consistency and the oracle property of the penalized estimators is established.Simulation studies are conducted to investigate the nite sample performance of the proposed methodolo-gies.An example is illustrated by the proposed methodologies.展开更多
What determines selection of the most cost effective parameters of hard rock surface mining is consideration of all alternative variants of mine design and the conflicting effect of their parameters on cost. Considera...What determines selection of the most cost effective parameters of hard rock surface mining is consideration of all alternative variants of mine design and the conflicting effect of their parameters on cost. Consideration could be realized based on the mathematical model of the cumulative influence of rockmass and mine design variables on the overall cost per ton of the hard rock drilled, blasted, hauled and primary crushed. Available works on the topic mostly dwelt on four processes of hard rock surface mining separately. This paper dwells on the theoretical part of a research proposed to enhance effectiveness in the selection of the parameters of hard rock surface mining design based on the regression model of overall cost per tonne of the rock mined fit on the determinant variations of rockmass and mine design. The regression model could be developed based on the statistical data generated by many of the hard rock surface mines operating in variable conditions of rockmass and mine design worldwide. Also, a regression model based general algorithm has been formulated for the development of software and computer aided selection of the most cost effective parameters of hard rock surface mining.展开更多
Interest has recently emerged in potential applications of(n,2n)reactions of unstable nuclei.Challenges have arisen because of the scarcity of experimental cross-sectional data.This study aims to predict the(n,2n)reac...Interest has recently emerged in potential applications of(n,2n)reactions of unstable nuclei.Challenges have arisen because of the scarcity of experimental cross-sectional data.This study aims to predict the(n,2n)reaction cross-section of long-lived fission products based on a tensor model.This tensor model is an extension of the collaborative filtering algorithm used for nuclear data.It is based on tensor decomposition and completion to predict(n,2n)reaction cross-sections;the corresponding EXFOR data are applied as training data.The reliability of the proposed tensor model was validated by comparing the calculations with data from EXFOR and different databases.Predictions were made for long-lived fission products such as^(60)Co,^(79)Se,^(93)Zr,^(107)P,^(126)Sn,and^(137)Cs,which provide a predicted energy range to effectively transmute long-lived fission products into shorter-lived or less radioactive isotopes.This method could be a powerful tool for completing(n,2n)reaction cross-sectional data and shows the possibility of selective transmutation of nuclear waste.展开更多
An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the ...An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation- maximization (EM) algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.展开更多
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
基金the Chinese Clinical Trial Registry(No.ChiCTR2000040109)approved by the Hospital Ethics Committee(No.20210130017).
文摘BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.
文摘Modern engineering design optimization often relies on computer simulations to evaluate candidate designs, a setup which results in expensive black-box optimization problems. Such problems introduce unique challenges, which has motivated the application of metamodel-assisted computational intelligence algorithms to solve them. Such algorithms combine a computational intelligence optimizer which employs a population of candidate solutions, with a metamodel which is a computationally cheaper approximation of the expensive computer simulation. However, although a variety of metamodels and optimizers have been proposed, the optimal types to employ are problem dependant. Therefore, a priori prescribing the type of metamodel and optimizer to be used may degrade its effectiveness. Leveraging on this issue, this study proposes a new computational intelligence algorithm which autonomously adapts the type of the metamodel and optimizer during the search by selecting the most suitable types out of a family of candidates at each stage. Performance analysis using a set of test functions demonstrates the effectiveness of the proposed algorithm, and highlights the merit of the proposed adaptation approach.
文摘In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.
基金supported by the National Natural Science Foundation of China(51175502)
文摘The test selection and optimization (TSO) can improve the abilities of fault diagnosis, prognosis and health-state evalua- tion for prognostics and health management (PHM) systems. Traditionally, TSO mainly focuses on fault detection and isolation, but they cannot provide an effective guide for the design for testability (DFT) to improve the PHM performance level. To solve the problem, a model of TSO for PHM systems is proposed. Firstly, through integrating the characteristics of fault severity and propa- gation time, and analyzing the test timing and sensitivity, a testability model based on failure evolution mechanism model (FEMM) for PHM systems is built up. This model describes the fault evolution- test dependency using the fault-symptom parameter matrix and symptom parameter-test matrix. Secondly, a novel method of in- herent testability analysis for PHM systems is developed based on the above information. Having completed the analysis, a TSO model, whose objective is to maximize fault trackability and mini- mize the test cost, is proposed through inherent testability analysis results, and an adaptive simulated annealing genetic algorithm (ASAGA) is introduced to solve the TSO problem. Finally, a case of a centrifugal pump system is used to verify the feasibility and effectiveness of the proposed models and methods. The results show that the proposed technology is important for PHM systems to select and optimize the test set in order to improve their performance level.
文摘Extended Kalman Filter(EKF)algorithm is widely used in parameter estimation for nonlinear systems.The estimation precision is sensitively dependent on EKF’s initial state covariance matrix and state noise matrix.The grid optimization method is always used to find proper initial matrix for off-line estimation.However,the grid method has the draw back being time consuming hence,coarse grid followed by a fine grid method is adopted.To further improve efficiency without the loss of estimation accuracy,we propose a genetic algorithm for the coarse grid optimization in this paper.It is recognized that the crossover rate and mutation rate are the main influencing factors for the performance of the genetic algorithm,so sensitivity experiments for these two factors are carried out and a set of genetic algorithm parameters with good adaptability were selected by testing with several gyros’experimental data.Experimental results show that the proposed algorithm has higher efficiency and better estimation accuracy than the traversing grid algorithm.
基金Supported by the National Natural Science Foundation of China(11861041).
文摘Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of the variance parameter.In this paper,we propose and study a novel class of models:a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population.The problem of variable selection for the proposed models is considered.In particular,a modi ed Expectation-Maximization(EM)algorithm for estimating the model parameters is developed.The consistency and the oracle property of the penalized estimators is established.Simulation studies are conducted to investigate the nite sample performance of the proposed methodolo-gies.An example is illustrated by the proposed methodologies.
文摘What determines selection of the most cost effective parameters of hard rock surface mining is consideration of all alternative variants of mine design and the conflicting effect of their parameters on cost. Consideration could be realized based on the mathematical model of the cumulative influence of rockmass and mine design variables on the overall cost per ton of the hard rock drilled, blasted, hauled and primary crushed. Available works on the topic mostly dwelt on four processes of hard rock surface mining separately. This paper dwells on the theoretical part of a research proposed to enhance effectiveness in the selection of the parameters of hard rock surface mining design based on the regression model of overall cost per tonne of the rock mined fit on the determinant variations of rockmass and mine design. The regression model could be developed based on the statistical data generated by many of the hard rock surface mines operating in variable conditions of rockmass and mine design worldwide. Also, a regression model based general algorithm has been formulated for the development of software and computer aided selection of the most cost effective parameters of hard rock surface mining.
基金supported by the Key Laboratory of Nuclear Data foundation(No.JCKY2022201C157)。
文摘Interest has recently emerged in potential applications of(n,2n)reactions of unstable nuclei.Challenges have arisen because of the scarcity of experimental cross-sectional data.This study aims to predict the(n,2n)reaction cross-section of long-lived fission products based on a tensor model.This tensor model is an extension of the collaborative filtering algorithm used for nuclear data.It is based on tensor decomposition and completion to predict(n,2n)reaction cross-sections;the corresponding EXFOR data are applied as training data.The reliability of the proposed tensor model was validated by comparing the calculations with data from EXFOR and different databases.Predictions were made for long-lived fission products such as^(60)Co,^(79)Se,^(93)Zr,^(107)P,^(126)Sn,and^(137)Cs,which provide a predicted energy range to effectively transmute long-lived fission products into shorter-lived or less radioactive isotopes.This method could be a powerful tool for completing(n,2n)reaction cross-sectional data and shows the possibility of selective transmutation of nuclear waste.
基金The National Natural Science Foundation of China(No.61105048,60972165)the Doctoral Fund of Ministry of Education of China(No.20110092120034)+2 种基金the Natural Science Foundation of Jiangsu Province(No.BK2010240)the Technology Foundation for Selected Overseas Chinese Scholar,Ministry of Human Resources and Social Security of China(No.6722000008)the Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control(No.YCCK201005)
文摘An improved Gaussian mixture model (GMM)- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation- maximization (EM) algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.