Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling an...Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.展开更多
Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of...Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.展开更多
We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algori...We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algorithmic information theory (AIT), in general, and algorithmic randomness and Kolmogorov complexity (KC), in particular. The processing and recognition tasks addressed include data discrimination and multilayer open set data categorization, change detection, data aggregation, clustering and data segmentation, data selection and link analysis, data cleaning and data revision, and prediction and identification of critical states. The unifying theme throughout the paper is that of “compression entails comprehension”, which is realized using the interrelated concepts of randomness vs. regularity and Kolmogorov complexity. The constructive and all encompassing active learning (AL) methodology, which mediates and supports the above theme, is context-driven and takes advantage of statistical learning, in general, and semi-supervised learning and transduction, in particular. Active learning employs explore and exploit actions characteristic of closed-loop control for evidence accumulation in order to revise its prediction models and to reduce uncertainty. The set-based similarity scores, driven by algorithmic randomness and Kolmogorov complexity, employ strangeness / typicality and p-values. We propose the application of the IIM framework to critical states prediction for complex physical systems;in particular, the prediction of cyclone genesis and intensification.展开更多
CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting...Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.展开更多
Welding deformation adversely affects the quality and precision of structural components,and traditional methods require significant material resources and time.Machine learning has demonstrated exceptional ac-curacy ...Welding deformation adversely affects the quality and precision of structural components,and traditional methods require significant material resources and time.Machine learning has demonstrated exceptional ac-curacy and efficiency in solving complex problems.Thus,the use of machine learning to predict welding de-formations is a novel approach.In this study,laser welding experiments were conducted on a TC4 titanium alloy to establish a welding deformation dataset.The deep neural network(DNN)and convolutional neural network(CNN)models were designed and constructed,with average prediction errors of 0.85 mm and 0.94 mm on the validation set,respectively.To further optimize the network parameters,a differential evolution algorithm was employed through mutation,crossover,and selection.The results indicated that after optimization,the pre-diction errors of the DNN and CNN models reduced to 0.75 mm and 0.85 mm,respectively.These represent accuracy improvements of 14.8%and 9.6%,respectively.The optimized models exhibited superior predictive performances for the validation set.展开更多
Bimetallic nanoparticles(AmBn)usually exhibit rich catalytic chemistry and have drawn tremendous attention in heterogeneous catalysis.However,challenged by the huge configuration space,the understanding toward their c...Bimetallic nanoparticles(AmBn)usually exhibit rich catalytic chemistry and have drawn tremendous attention in heterogeneous catalysis.However,challenged by the huge configuration space,the understanding toward their composition and distribution of A/B element is known little at the atomic level,which hinders the rational synthesis.Herein,we develop an on-the-fly training strategy combing the machine learning model(SchNet)with the genetic algorithm(GA)search technique,which achieve the fast and accurate energy prediction of complex bimetallic clusters at the DFT level.Taking the 38-atom PtmAu38-mnanoparticle as example,the element distribution identification problem and the stability trend as a function of Pt/Au composition is quantitatively re solved.Specifically,results show that on the Pt-rich cluster Au atoms prefer to occupy the low-coordinated surface corner sites and form patch-like surface segregation patte rns,while for the Au-rich ones Pt atoms tend to site in the co re region and form the co re-shell(Pt@Au)configuration.The thermodynamically most stable PtmAu38-mcluster is Pt6 Au32,with all the core-region sites occupied by Pt,rationalized by the stronger Pt-Pt bond in comparison with Pt-Au and Au-Au bonds.This work exemplifies the potent application of rapid global sea rch enabled by machine learning in exploring the high-dimensional configuration space of bimetallic nanocatalysts.展开更多
Aiming at the problems that fuzzy neural network controller has heavy computation and lag,a T-S norm Fuzzy Neural Network Control based on hybrid learning algorithm was proposed.Immune genetic algorithm (IGA) was used...Aiming at the problems that fuzzy neural network controller has heavy computation and lag,a T-S norm Fuzzy Neural Network Control based on hybrid learning algorithm was proposed.Immune genetic algorithm (IGA) was used to optimize the parameters of membership functions (MFs) off line,and the neural network was used to adjust the parameters of MFs on line to enhance the response of the controller.Moreover,the latter network was used to adjust the fuzzy rules automatically to reduce the computation of the neural network and improve the robustness and adaptability of the controller,so that the controller can work well ever when the underwater vehicle works in hostile ocean environment.Finally,experiments were carried on " XX" mini autonomous underwater vehicle (min-AUV) in tank.The results showed that this controller has great improvement in response and overshoot,compared with the traditional controllers.展开更多
BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intr...BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
This paper presents an improved BP algorithm. The approach can reduce the amount of computation by using the logarithmic objective function. The learning rate μ(k) per iteration is determined by dynamic o...This paper presents an improved BP algorithm. The approach can reduce the amount of computation by using the logarithmic objective function. The learning rate μ(k) per iteration is determined by dynamic optimization method to accelerate the convergence rate. Since the determination of the learning rate in the proposed BP algorithm only uses the obtained first order derivatives in standard BP algorithm(SBP), the scale of computational and storage burden is like that of SBP algorithm,and the convergence rate is remarkably accelerated. Computer simulations demonstrate the effectiveness of the proposed algorithm展开更多
Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-h...Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.展开更多
This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm f...This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm for online training the three-layer neural networks in stochastic environment is studied. A special case where an unknown nonlinearity can exactly be approximated by some neural network with a nonlinear activation function for its output layer is considered. To analyze the asymptotic behavior of the learning processes, the so-called Lyapunov-like approach is utilized. As the Lyapunov function, the expected value of the square of approximation error depending on network parameters is chosen. Within this approach, sufficient conditions guaranteeing the convergence of learning algorithm with probability 1 are derived. Simulation results are presented to support the theoretical analysis.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Measurement-while-drilling(MWD)and guidance technologies have been extensively deployed in the exploitation of oil,natural gas,and other energy resources.Conventional control approaches are plagued by challenges,inclu...Measurement-while-drilling(MWD)and guidance technologies have been extensively deployed in the exploitation of oil,natural gas,and other energy resources.Conventional control approaches are plagued by challenges,including limited anti-interference capabilities and the insufficient generalization of decision-making experience.To address the intricate problem of directional well trajectory control,an intelligent algorithm design framework grounded in the high-level interaction mechanism between geology and engineering is put forward.This framework aims to facilitate the rapid batch migration and update of drilling strategies.The proposed directional well trajectory control method comprehensively considers the multi-source heterogeneous attributes of drilling experience data,leverages the generative simulation of the geological drilling environment,and promptly constructs a directional well trajectory control model with self-adaptive capabilities to environmental variations.This construction is carried out based on three hierarchical levels:“offline pre-drilling learning,online during-drilling interaction,and post-drilling model transfer”.Simulation results indicate that the guidance model derived from this method demonstrates remarkable generalization performance and accuracy.It can significantly boost the adaptability of the control algorithm to diverse environments and enhance the penetration rate of the target reservoir during drilling operations.展开更多
The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficie...The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.展开更多
An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is ado...An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus,the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods, the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail.展开更多
Population-based algorithms have been used in many real-world problems.Bat algorithm(BA)is one of the states of the art of these approaches.Because of the super bat,on the one hand,BA can converge quickly;on the other...Population-based algorithms have been used in many real-world problems.Bat algorithm(BA)is one of the states of the art of these approaches.Because of the super bat,on the one hand,BA can converge quickly;on the other hand,it is easy to fall into local optimum.Therefore,for typical BA algorithms,the ability of exploration and exploitation is not strong enough and it is hard to find a precise result.In this paper,we propose a novel bat algorithm based on cross boundary learning(CBL)and uniform explosion strategy(UES),namely BABLUE in short,to avoid the above contradiction and achieve both fast convergence and high quality.Different from previous opposition-based learning,the proposed CBL can expand the search area of population and then maintain the ability of global exploration in the process of fast convergence.In order to enhance the ability of local exploitation of the proposed algorithm,we propose UES,which can achieve almost the same search precise as that of firework explosion algorithm but consume less computation resource.BABLUE is tested with numerous experiments on unimodal,multimodal,one-dimensional,high-dimensional and discrete problems,and then compared with other typical intelligent optimization algorithms.The results show that the proposed algorithm outperforms other algorithms.展开更多
BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suita...BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suitable for rapid clinical application.METHODS:In this multi-center retrospective cohort study,AAS patient data from three hospitals were analyzed.The modeling cohort included data from the First Affiliated Hospital of Zhengzhou University and the People’s Hospital of Xinjiang Uygur Autonomous Region,with Peking University Third Hospital data serving as the external test set.Four machine learning algorithms—logistic regression(LR),multilayer perceptron(MLP),Gaussian naive Bayes(GNB),and random forest(RF)—were used to develop predictive models based on 34 early-accessible clinical variables.A simplifi ed model was then derived based on fi ve key variables(Stanford type,pericardial eff usion,asymmetric peripheral arterial pulsation,decreased bowel sounds,and dyspnea)via Least Absolute Shrinkage and Selection Operator(LASSO)regression to improve ED applicability.RESULTS:A total of 929 patients were included in the modeling cohort,and 210 were included in the external test set.Four machine learning models based on 34 clinical variables were developed,achieving internal and external validation AUCs of 0.85-0.90 and 0.73-0.85,respectively.The simplifi ed model incorporating fi ve key variables demonstrated internal and external validation AUCs of 0.71-0.86 and 0.75-0.78,respectively.Both models showed robust calibration and predictive stability across datasets.CONCLUSION:Both kinds of models were built based on machine learning tools,and proved to have certain prediction performance and extrapolation.展开更多
文摘Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.
文摘Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.
文摘We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algorithmic information theory (AIT), in general, and algorithmic randomness and Kolmogorov complexity (KC), in particular. The processing and recognition tasks addressed include data discrimination and multilayer open set data categorization, change detection, data aggregation, clustering and data segmentation, data selection and link analysis, data cleaning and data revision, and prediction and identification of critical states. The unifying theme throughout the paper is that of “compression entails comprehension”, which is realized using the interrelated concepts of randomness vs. regularity and Kolmogorov complexity. The constructive and all encompassing active learning (AL) methodology, which mediates and supports the above theme, is context-driven and takes advantage of statistical learning, in general, and semi-supervised learning and transduction, in particular. Active learning employs explore and exploit actions characteristic of closed-loop control for evidence accumulation in order to revise its prediction models and to reduce uncertainty. The set-based similarity scores, driven by algorithmic randomness and Kolmogorov complexity, employ strangeness / typicality and p-values. We propose the application of the IIM framework to critical states prediction for complex physical systems;in particular, the prediction of cyclone genesis and intensification.
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
基金National Key Research and Development Program of China,No.2023YFC3006704National Natural Science Foundation of China,No.42171047CAS-CSIRO Partnership Joint Project of 2024,No.177GJHZ2023097MI。
文摘Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.
基金Supported by Defense Industrial Technology Development Program of China(Grant No.JCKY2021605B015).
文摘Welding deformation adversely affects the quality and precision of structural components,and traditional methods require significant material resources and time.Machine learning has demonstrated exceptional ac-curacy and efficiency in solving complex problems.Thus,the use of machine learning to predict welding de-formations is a novel approach.In this study,laser welding experiments were conducted on a TC4 titanium alloy to establish a welding deformation dataset.The deep neural network(DNN)and convolutional neural network(CNN)models were designed and constructed,with average prediction errors of 0.85 mm and 0.94 mm on the validation set,respectively.To further optimize the network parameters,a differential evolution algorithm was employed through mutation,crossover,and selection.The results indicated that after optimization,the pre-diction errors of the DNN and CNN models reduced to 0.75 mm and 0.85 mm,respectively.These represent accuracy improvements of 14.8%and 9.6%,respectively.The optimized models exhibited superior predictive performances for the validation set.
基金supported by National Key R&D Program of China(No.2018YFA0208602)NSFC(Nos.21622305,21873028,21703067)+1 种基金National Ten Thousand Talent Program for Young Top-notch Talents in ChinaShanghai ShuGuang project(No.17SG30)。
文摘Bimetallic nanoparticles(AmBn)usually exhibit rich catalytic chemistry and have drawn tremendous attention in heterogeneous catalysis.However,challenged by the huge configuration space,the understanding toward their composition and distribution of A/B element is known little at the atomic level,which hinders the rational synthesis.Herein,we develop an on-the-fly training strategy combing the machine learning model(SchNet)with the genetic algorithm(GA)search technique,which achieve the fast and accurate energy prediction of complex bimetallic clusters at the DFT level.Taking the 38-atom PtmAu38-mnanoparticle as example,the element distribution identification problem and the stability trend as a function of Pt/Au composition is quantitatively re solved.Specifically,results show that on the Pt-rich cluster Au atoms prefer to occupy the low-coordinated surface corner sites and form patch-like surface segregation patte rns,while for the Au-rich ones Pt atoms tend to site in the co re region and form the co re-shell(Pt@Au)configuration.The thermodynamically most stable PtmAu38-mcluster is Pt6 Au32,with all the core-region sites occupied by Pt,rationalized by the stronger Pt-Pt bond in comparison with Pt-Au and Au-Au bonds.This work exemplifies the potent application of rapid global sea rch enabled by machine learning in exploring the high-dimensional configuration space of bimetallic nanocatalysts.
文摘Aiming at the problems that fuzzy neural network controller has heavy computation and lag,a T-S norm Fuzzy Neural Network Control based on hybrid learning algorithm was proposed.Immune genetic algorithm (IGA) was used to optimize the parameters of membership functions (MFs) off line,and the neural network was used to adjust the parameters of MFs on line to enhance the response of the controller.Moreover,the latter network was used to adjust the fuzzy rules automatically to reduce the computation of the neural network and improve the robustness and adaptability of the controller,so that the controller can work well ever when the underwater vehicle works in hostile ocean environment.Finally,experiments were carried on " XX" mini autonomous underwater vehicle (min-AUV) in tank.The results showed that this controller has great improvement in response and overshoot,compared with the traditional controllers.
基金the Chinese Clinical Trial Registry(No.ChiCTR2000040109)approved by the Hospital Ethics Committee(No.20210130017).
文摘BACKGROUND Difficulty of colonoscopy insertion(DCI)significantly affects colonoscopy effectiveness and serves as a key quality indicator.Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.AIM To evaluate the predictive performance of machine learning(ML)algorithms for DCI by comparing three modeling approaches,identify factors influencing DCI,and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.METHODS This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021.Demographic data,past medical history,medication use,and psychological status were collected.The endoscopist assessed DCI using the visual analogue scale.After univariate screening,predictive models were developed using multivariable logistic regression,least absolute shrinkage and selection operator(LASSO)regression,and random forest(RF)algorithms.Model performance was evaluated based on discrimination,calibration,and decision curve analysis(DCA),and results were visualized using nomograms.RESULTS A total of 712 patients(53.8%male;mean age 54.5 years±12.9 years)were included.Logistic regression analysis identified constipation[odds ratio(OR)=2.254,95%confidence interval(CI):1.289-3.931],abdominal circumference(AC)(77.5–91.9 cm,OR=1.895,95%CI:1.065-3.350;AC≥92 cm,OR=1.271,95%CI:0.730-2.188),and anxiety(OR=1.071,95%CI:1.044-1.100)as predictive factors for DCI,validated by LASSO and RF methods.Model performance revealed training/validation sensitivities of 0.826/0.925,0.924/0.868,and 1.000/0.981;specificities of 0.602/0.511,0.510/0.562,and 0.977/0.526;and corresponding area under the receiver operating characteristic curves(AUCs)of 0.780(0.737-0.823)/0.726(0.654-0.799),0.754(0.710-0.798)/0.723(0.656-0.791),and 1.000(1.000-1.000)/0.754(0.688-0.820),respectively.DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37.The RF model demonstrated superior diagnostic accuracy,reflected by perfect training sensitivity(1.000)and highest validation AUC(0.754),outperforming other methods in clinical applicability.CONCLUSION The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models.This approach supports individualized preoperative optimization,enhancing colonoscopy quality through targeted risk stratification.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
文摘This paper presents an improved BP algorithm. The approach can reduce the amount of computation by using the logarithmic objective function. The learning rate μ(k) per iteration is determined by dynamic optimization method to accelerate the convergence rate. Since the determination of the learning rate in the proposed BP algorithm only uses the obtained first order derivatives in standard BP algorithm(SBP), the scale of computational and storage burden is like that of SBP algorithm,and the convergence rate is remarkably accelerated. Computer simulations demonstrate the effectiveness of the proposed algorithm
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00518960)in part by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(RS-2025-00563192).
文摘Mental-health risk detection seeks early signs of distress from social media posts and clinical transcripts to enable timely intervention before crises.When such risks go undetected,consequences can escalate to self-harm,long-term disability,reduced productivity,and significant societal and economic burden.Despite recent advances,detecting risk from online text remains challenging due to heterogeneous language,evolving semantics,and the sequential emergence of new datasets.Effective solutions must encode clinically meaningful cues,reason about causal relations,and adapt to new domains without forgetting prior knowledge.To address these challenges,this paper presents a Continual Neuro-Symbolic Graph Learning(CNSGL)framework that unifies symbolic reasoning,causal inference,and continual learning within a single architecture.Each post is represented as a symbolic graph linking clinically relevant tags to textual content,enriched with causal edges derived from directional Point-wise Mutual Information(PMI).A two-layer Graph Convolutional Network(GCN)encodes these graphs,and a Transformer-based attention pooler aggregates node embeddings while providing interpretable tag-level importances.Continual adaptation across datasets is achieved through the Multi-Head Freeze(MH-Freeze)strategy,which freezes a shared encoder and incrementally trains lightweight task-specific heads(small classifiers attached to the shared embedding).Experimental evaluations across six diverse mental-health datasets ranging from Reddit discourse to clinical interviews,demonstrate that MH-Freeze consistently outperforms existing continual-learning baselines in both discriminative accuracy and calibration reliability.Across six datasets,MH-Freeze achieves up to 0.925 accuracy and 0.923 F1-Score,with AUPRC≥0.934 and AUROC≥0.942,consistently surpassing all continual-learning baselines.The results confirm the framework’s ability to preserve prior knowledge,adapt to domain shifts,and maintain causal interpretability,establishing CNSGL as a promising step toward robust,explainable,and lifelong mental-health risk assessment.
文摘This paper deals with deriving the properties of updated neural network model that is exploited to identify an unknown nonlinear system via the standard gradient learning algorithm. The convergence of this algorithm for online training the three-layer neural networks in stochastic environment is studied. A special case where an unknown nonlinearity can exactly be approximated by some neural network with a nonlinear activation function for its output layer is considered. To analyze the asymptotic behavior of the learning processes, the so-called Lyapunov-like approach is utilized. As the Lyapunov function, the expected value of the square of approximation error depending on network parameters is chosen. Within this approach, sufficient conditions guaranteeing the convergence of learning algorithm with probability 1 are derived. Simulation results are presented to support the theoretical analysis.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
基金supported by the National Key R&D Program of China(No.2019YFA0708304)the CNPC Innovation Fund(No.2022DQ02-0609)the Scientific research and technology development Project of CNPC(No.2022DJ4507).
文摘Measurement-while-drilling(MWD)and guidance technologies have been extensively deployed in the exploitation of oil,natural gas,and other energy resources.Conventional control approaches are plagued by challenges,including limited anti-interference capabilities and the insufficient generalization of decision-making experience.To address the intricate problem of directional well trajectory control,an intelligent algorithm design framework grounded in the high-level interaction mechanism between geology and engineering is put forward.This framework aims to facilitate the rapid batch migration and update of drilling strategies.The proposed directional well trajectory control method comprehensively considers the multi-source heterogeneous attributes of drilling experience data,leverages the generative simulation of the geological drilling environment,and promptly constructs a directional well trajectory control model with self-adaptive capabilities to environmental variations.This construction is carried out based on three hierarchical levels:“offline pre-drilling learning,online during-drilling interaction,and post-drilling model transfer”.Simulation results indicate that the guidance model derived from this method demonstrates remarkable generalization performance and accuracy.It can significantly boost the adaptability of the control algorithm to diverse environments and enhance the penetration rate of the target reservoir during drilling operations.
基金supported by the National Key Research and Development Program of China(2023YFB3307801)the National Natural Science Foundation of China(62394343,62373155,62073142)+3 种基金Major Science and Technology Project of Xinjiang(No.2022A01006-4)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)under Grant B17017the Fundamental Research Funds for the Central Universities,Science Foundation of China University of Petroleum,Beijing(No.2462024YJRC011)the Open Research Project of the State Key Laboratory of Industrial Control Technology,China(Grant No.ICT2024B70).
文摘The distillation process is an important chemical process,and the application of data-driven modelling approach has the potential to reduce model complexity compared to mechanistic modelling,thus improving the efficiency of process optimization or monitoring studies.However,the distillation process is highly nonlinear and has multiple uncertainty perturbation intervals,which brings challenges to accurate data-driven modelling of distillation processes.This paper proposes a systematic data-driven modelling framework to solve these problems.Firstly,data segment variance was introduced into the K-means algorithm to form K-means data interval(KMDI)clustering in order to cluster the data into perturbed and steady state intervals for steady-state data extraction.Secondly,maximal information coefficient(MIC)was employed to calculate the nonlinear correlation between variables for removing redundant features.Finally,extreme gradient boosting(XGBoost)was integrated as the basic learner into adaptive boosting(AdaBoost)with the error threshold(ET)set to improve weights update strategy to construct the new integrated learning algorithm,XGBoost-AdaBoost-ET.The superiority of the proposed framework is verified by applying this data-driven modelling framework to a real industrial process of propylene distillation.
基金co-supported by the Key National Natural Science Foundation of China (No. U1533202)the Civil Aviation Administration of China (No. MHRD20150104)the Shandong Independent Innovation and Achievements Transformation Fund, China (No. 2014CGZH1101)
文摘An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus,the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss–Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods, the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail.
基金Supported by the National Natural Science Foundation of China(61472289)the Open Project Program of the State Key Laboratory of Digital Manufacturing Equipment and Technology(DMETKF2017016)
文摘Population-based algorithms have been used in many real-world problems.Bat algorithm(BA)is one of the states of the art of these approaches.Because of the super bat,on the one hand,BA can converge quickly;on the other hand,it is easy to fall into local optimum.Therefore,for typical BA algorithms,the ability of exploration and exploitation is not strong enough and it is hard to find a precise result.In this paper,we propose a novel bat algorithm based on cross boundary learning(CBL)and uniform explosion strategy(UES),namely BABLUE in short,to avoid the above contradiction and achieve both fast convergence and high quality.Different from previous opposition-based learning,the proposed CBL can expand the search area of population and then maintain the ability of global exploration in the process of fast convergence.In order to enhance the ability of local exploitation of the proposed algorithm,we propose UES,which can achieve almost the same search precise as that of firework explosion algorithm but consume less computation resource.BABLUE is tested with numerous experiments on unimodal,multimodal,one-dimensional,high-dimensional and discrete problems,and then compared with other typical intelligent optimization algorithms.The results show that the proposed algorithm outperforms other algorithms.
基金supported by the special fund of the National Clinical Key Specialty Construction Program[(2022)301-2305].
文摘BACKGROUND:This study aims to develop and validate a machine learning-based in-hospital mortality predictive model for acute aortic syndrome(AAS)in the emergency department(ED)and to derive a simplifi ed version suitable for rapid clinical application.METHODS:In this multi-center retrospective cohort study,AAS patient data from three hospitals were analyzed.The modeling cohort included data from the First Affiliated Hospital of Zhengzhou University and the People’s Hospital of Xinjiang Uygur Autonomous Region,with Peking University Third Hospital data serving as the external test set.Four machine learning algorithms—logistic regression(LR),multilayer perceptron(MLP),Gaussian naive Bayes(GNB),and random forest(RF)—were used to develop predictive models based on 34 early-accessible clinical variables.A simplifi ed model was then derived based on fi ve key variables(Stanford type,pericardial eff usion,asymmetric peripheral arterial pulsation,decreased bowel sounds,and dyspnea)via Least Absolute Shrinkage and Selection Operator(LASSO)regression to improve ED applicability.RESULTS:A total of 929 patients were included in the modeling cohort,and 210 were included in the external test set.Four machine learning models based on 34 clinical variables were developed,achieving internal and external validation AUCs of 0.85-0.90 and 0.73-0.85,respectively.The simplifi ed model incorporating fi ve key variables demonstrated internal and external validation AUCs of 0.71-0.86 and 0.75-0.78,respectively.Both models showed robust calibration and predictive stability across datasets.CONCLUSION:Both kinds of models were built based on machine learning tools,and proved to have certain prediction performance and extrapolation.