It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geoch...It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.展开更多
With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In resp...With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.展开更多
Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter c...Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.展开更多
The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the i...The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.展开更多
Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies a...Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies associated with adverse geology provide a novel strategy for addressing these limitations.However,statistical methods for identifying geochemical anomalies are insufficient for tunnel engineering.In contrast,data mining techniques such as machine learning have demonstrated greater efficacy when applied to geological data.Herein,a method for identifying adverse geology using machine learning of geochemical anomalies is proposed.The method was identified geochemical anomalies in tunnel that were not identified by statistical methods.We by employing robust factor analysis and self-organizing maps to reduce the dimensionality of geochemical data and extract the anomaly elements combination(AEC).Using the AEC sample data,we trained an isolation forest model to identify the multi-element anomalies,successfully.We analyzed the adverse geological features based the multi-element anomalies.This study,therefore,extends the traditional approach of geological analysis in tunnels and demonstrates that machine learning is an effective tool for intelligent geological analysis.Correspondingly,the research offers new insights regarding the adverse geology and the prevention of hazards during the construction of tunnels and underground engineering projects.展开更多
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op...Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are...Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are challenged by the large-scale,high-dimensional and heterogeneous data that are prevalent in the era of big data.Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data.It relies on the idea that anomalies are few and different from normal instances,and thus can be easily isolated by random partitioning.Isolation-based methods have several advantages over existing methods,such as low computational complexity,low memory usage,high scalability,robustness to noise and irrelevant features,and no need for prior knowledge or heavy parameter tuning.In this survey,we review the state-of-the-art isolation-based anomaly detection methods,includ-ing their data partitioning strategies,anomaly score functions,and algorithmic details.We also discuss some extensions and applica-tions of isolation-based methods in different scenarios,such as detecting anomalies in streaming data,time series,trajectory and image datasets.Finally,we identify some open challenges and future directions for isolation-based anomaly detection research.展开更多
To address the limitation of traditional machine learning models in explaining the rockburst intensity prediction process,this study proposes an interpretable rockburst intensity prediction model.The model was develop...To address the limitation of traditional machine learning models in explaining the rockburst intensity prediction process,this study proposes an interpretable rockburst intensity prediction model.The model was developed using 350 sets of actual rockburst sample data to explore the impact of input metrics on the final rockburst intensity level.The collected data underwent pre-processing using the isolation forest algorithm and synthetic minority oversampling technique.The random forest model was optimized through 5-fold cross-validation and the Optuna framework,resulting in the establishment of an Optuna-random forest(Op-RF)model that generates decision rules through its internal decision tree,utilizing the properties of the random forest model.The model was further interpreted using the Shapley additive explanations algorithm,both locally and globally.The results demonstrate that the proposed model achieved an area under curve score of 0.984.In comparison to eight other machine learning models,the proposed Op-RF model demonstrated superior accuracy,precision,recall,and F1 score.The model provides a transparent explanation of the prediction process,linking impact characteristics to the final output.Additionally,a cloud deployment method for the rockburst intensity prediction model is provided and its effectiveness is demonstrated through engineering verification.The proposed model offers a new approach to the application of machine learning in rockburst intensity prediction.展开更多
With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for a...With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.展开更多
The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities....The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities.Clinical studies help to make a correct diagnosis of COVID-19,where the disease has already spread to the organs in most cases.Prompt and early diagnosis is indispensable for providing patients with the possibility of early clinical diagnosis and slowing down the disease spread.Therefore,clinical investigations in patients with COVID-19 have revealed distinct patterns of breathing relative to other diseases such as flu and cold,which are worth investigating.Current supervised Machine Learning(ML)based techniques mostly investigate clinical reports such as X-Rays and Computerized Tomography(CT)for disease detection.This strategy relies on a larger clinical dataset and does not focus on early symptom identification.Towards this end,an innovative hybrid unsupervised ML technique is introduced to uncover the probability of COVID-19 occurrence based on the breathing patterns and commonly reported symptoms,fever,and cough.Specifically,various metrics,including body temperature,breathing and cough patterns,and physical activity,were considered in this study.Finally,a lightweight ML algorithm based on the K-Means and Isolation Forest technique was implemented on relatively small data including 40 individuals.The proposed technique shows an outlier detection with an accuracy of 89%,on average.展开更多
With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial secu...With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial security has become a top priority with the sustainable development of blockchain-based platforms because of an increasing number of cyber attacks,which have resulted in a huge loss of crypto assets in recent years.Therefore,it is imperative to study the real-time detection of cyber attacks to facilitate effective supervision and regulation.To this end,this paper proposes the weighted and extended isolation forest algorithms and designs a novel framework for the real-time detection of cyber-attack transactions by thoroughly studying and summarizing real-world examples.Furthermore,this study develops a new detection approach for locating the compromised address of a cyber attack to resolve the data scarcity of hack addresses and reduce time consumption.Moreover,three experiments are carried out not only to apply on different types of cyber attacks but also to compare the proposed approach with the widely used existing methods.The results demonstrate the high efficiency and generality of the proposed approach.Finally,the lower time consumption and robustness of our method were validated through additional experiments.In conclusion,the proposed blockchain-oriented approach in this study can handle real-time detection of cyber attacks and has significant scope for applications.展开更多
Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized m...Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.展开更多
Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interrupt...Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interruption,and other factors,a method of anomaly recognition and differentiation for monitoring data was proposed.Firstly,the empirical wavelet transform(EWT)and the autoregressive integrated moving average(ARIMA)model were used for time series modelling of monitoring data to obtain the residual sequence reflecting the anomaly monitoring data value,and then the isolation forest algorithm was used to identify the abnormal information,and the monitoring sequence was segmented according to the recognition results.Secondly,the segmented sequence was symbolised by the improved multi-dimensional SAX vector representation method,and the assessment of the anomaly pattern was made by calculating the similarity score of the adjacent symbol vectors,and the monitoring sequence correlation was further used to verify the assessment.Finally,the case study result shows that the proposed method can reliably recognise abnormal data and accurately distinguish between invalid and valid anomaly patterns.展开更多
文摘It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.
文摘With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.
基金supported by the National Natural Science Foundation of China(Nos.41672322,41872244)。
文摘Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.
文摘The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.
基金the support from the National Natural Science Foundation of China(Nos.52279103,52379103)the Natural Science Foundation of Shandong Province(No.ZR2023YQ049)。
文摘Geological analysis,despite being a long-term method for identifying adverse geology in tunnels,has significant limitations due to its reliance on empirical analysis.The quantitative aspects of geochemical anomalies associated with adverse geology provide a novel strategy for addressing these limitations.However,statistical methods for identifying geochemical anomalies are insufficient for tunnel engineering.In contrast,data mining techniques such as machine learning have demonstrated greater efficacy when applied to geological data.Herein,a method for identifying adverse geology using machine learning of geochemical anomalies is proposed.The method was identified geochemical anomalies in tunnel that were not identified by statistical methods.We by employing robust factor analysis and self-organizing maps to reduce the dimensionality of geochemical data and extract the anomaly elements combination(AEC).Using the AEC sample data,we trained an isolation forest model to identify the multi-element anomalies,successfully.We analyzed the adverse geological features based the multi-element anomalies.This study,therefore,extends the traditional approach of geological analysis in tunnels and demonstrates that machine learning is an effective tool for intelligent geological analysis.Correspondingly,the research offers new insights regarding the adverse geology and the prevention of hazards during the construction of tunnels and underground engineering projects.
基金supported by the National Natural Science Foundation of China(61873006)Beijing Natural Science Foundation(4204087,4212040).
文摘Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金supported by the National Natural Science Foundation of China(No.62076120)supported by the State Key Laboratory for Novel Software Technology at Nanjing University,China(No.KFKT2024A01)Open Access funding enabled and organized by CAUL and its Member Institutions。
文摘Anomaly detection is a longstanding and active research area that has many applications in domains such as finance,secur-ity and manufacturing.However,the efficiency and performance of anomaly detection algorithms are challenged by the large-scale,high-dimensional and heterogeneous data that are prevalent in the era of big data.Isolation-based unsupervised anomaly detection is a novel and effective approach for identifying anomalies in data.It relies on the idea that anomalies are few and different from normal instances,and thus can be easily isolated by random partitioning.Isolation-based methods have several advantages over existing methods,such as low computational complexity,low memory usage,high scalability,robustness to noise and irrelevant features,and no need for prior knowledge or heavy parameter tuning.In this survey,we review the state-of-the-art isolation-based anomaly detection methods,includ-ing their data partitioning strategies,anomaly score functions,and algorithmic details.We also discuss some extensions and applica-tions of isolation-based methods in different scenarios,such as detecting anomalies in streaming data,time series,trajectory and image datasets.Finally,we identify some open challenges and future directions for isolation-based anomaly detection research.
基金financially supported by the National Natural Science Foundation of China(Grant No.51934003)the Yunnan Major Scientific and Technological Projects(Grant No.202202AG050014)the Yunnan Innovation Team(Grant No.202105AE160023).
文摘To address the limitation of traditional machine learning models in explaining the rockburst intensity prediction process,this study proposes an interpretable rockburst intensity prediction model.The model was developed using 350 sets of actual rockburst sample data to explore the impact of input metrics on the final rockburst intensity level.The collected data underwent pre-processing using the isolation forest algorithm and synthetic minority oversampling technique.The random forest model was optimized through 5-fold cross-validation and the Optuna framework,resulting in the establishment of an Optuna-random forest(Op-RF)model that generates decision rules through its internal decision tree,utilizing the properties of the random forest model.The model was further interpreted using the Shapley additive explanations algorithm,both locally and globally.The results demonstrate that the proposed model achieved an area under curve score of 0.984.In comparison to eight other machine learning models,the proposed Op-RF model demonstrated superior accuracy,precision,recall,and F1 score.The model provides a transparent explanation of the prediction process,linking impact characteristics to the final output.Additionally,a cloud deployment method for the rockburst intensity prediction model is provided and its effectiveness is demonstrated through engineering verification.The proposed model offers a new approach to the application of machine learning in rockburst intensity prediction.
文摘With the rising demand for data access,network service providers face the challenge of growing their capital and operating costs while at the same time enhancing network capacity and meeting the increased demand for access.To increase efficacy of Software Defined Network(SDN)and Network Function Virtualization(NFV)framework,we need to eradicate network security configuration errors that may create vulnerabilities to affect overall efficiency,reduce network performance,and increase maintenance cost.The existing frameworks lack in security,and computer systems face few abnormalities,which prompts the need for different recognition and mitigation methods to keep the system in the operational state proactively.The fundamental concept behind SDN-NFV is the encroachment from specific resource execution to the programming-based structure.This research is around the combination of SDN and NFV for rational decision making to control and monitor traffic in the virtualized environment.The combination is often seen as an extra burden in terms of resources usage in a heterogeneous network environment,but as well as it provides the solution for critical problems specially regarding massive network traffic issues.The attacks have been expanding step by step;therefore,it is hard to recognize and protect by conventional methods.To overcome these issues,there must be an autonomous system to recognize and characterize the network traffic’s abnormal conduct if there is any.Only four types of assaults,including HTTP Flood,UDP Flood,Smurf Flood,and SiDDoS Flood,are considered in the identified dataset,to optimize the stability of the SDN-NFVenvironment and security management,through several machine learning based characterization techniques like Support Vector Machine(SVM),K-Nearest Neighbors(KNN),Logistic Regression(LR)and Isolation Forest(IF).Python is used for simulation purposes,including several valuable utilities like the mine package,the open-source Python ML libraries Scikit-learn,NumPy,SciPy,Matplotlib.Few Flood assaults and Structured Query Language(SQL)injections anomalies are validated and effectively-identified through the anticipated procedure.The classification results are promising and show that overall accuracy lies between 87%to 95%for SVM,LR,KNN,and IF classifiers in the scrutiny of traffic,whether the network traffic is normal or anomalous in the SDN-NFV environment.
基金This work is sponsored by Universiti Sains Malaysia Research Grant:(RUI:1001/PELECT/8014049).
文摘The COVID-19 virus exhibits pneumonia-like symptoms,including fever,cough,and shortness of breath,and may be fatal.Many COVID-19 contraction experiments require comprehensive clinical procedures at medical facilities.Clinical studies help to make a correct diagnosis of COVID-19,where the disease has already spread to the organs in most cases.Prompt and early diagnosis is indispensable for providing patients with the possibility of early clinical diagnosis and slowing down the disease spread.Therefore,clinical investigations in patients with COVID-19 have revealed distinct patterns of breathing relative to other diseases such as flu and cold,which are worth investigating.Current supervised Machine Learning(ML)based techniques mostly investigate clinical reports such as X-Rays and Computerized Tomography(CT)for disease detection.This strategy relies on a larger clinical dataset and does not focus on early symptom identification.Towards this end,an innovative hybrid unsupervised ML technique is introduced to uncover the probability of COVID-19 occurrence based on the breathing patterns and commonly reported symptoms,fever,and cough.Specifically,various metrics,including body temperature,breathing and cough patterns,and physical activity,were considered in this study.Finally,a lightweight ML algorithm based on the K-Means and Isolation Forest technique was implemented on relatively small data including 40 individuals.The proposed technique shows an outlier detection with an accuracy of 89%,on average.
基金supported by the National Natural Science Foundation of China(72171059,71771041,72121001)the Fundamental Research Funds for the Central Universities(FRFCU5710000220)the Natural Science Foundation of Heilongjiang Province,China(No.YQ2020G003).
文摘With the high-speed development of decentralized applications,account-based blockchain platforms have become a hotbed of various financial scams and hacks due to their anonymity and high financial value.Financial security has become a top priority with the sustainable development of blockchain-based platforms because of an increasing number of cyber attacks,which have resulted in a huge loss of crypto assets in recent years.Therefore,it is imperative to study the real-time detection of cyber attacks to facilitate effective supervision and regulation.To this end,this paper proposes the weighted and extended isolation forest algorithms and designs a novel framework for the real-time detection of cyber-attack transactions by thoroughly studying and summarizing real-world examples.Furthermore,this study develops a new detection approach for locating the compromised address of a cyber attack to resolve the data scarcity of hack addresses and reduce time consumption.Moreover,three experiments are carried out not only to apply on different types of cyber attacks but also to compare the proposed approach with the widely used existing methods.The results demonstrate the high efficiency and generality of the proposed approach.Finally,the lower time consumption and robustness of our method were validated through additional experiments.In conclusion,the proposed blockchain-oriented approach in this study can handle real-time detection of cyber attacks and has significant scope for applications.
文摘Parking space is usually very limited in major cities,especially Cairo,leading to traffic congestion,air pollution,and driver frustration.Existing car parking systems tend to tackle parking issues in a non-digitized manner.These systems require the drivers to search for an empty parking space with no guaran-tee of finding any wasting time,resources,and causing unnecessary congestion.To address these issues,this paper proposes a digitized parking system with a proof-of-concept implementation that combines multiple technological concepts into one solution with the advantages of using IoT for real-time tracking of park-ing availability.User authentication and automated payments are handled using a quick response(QR)code on entry and exit.Some experiments were done on real data collected for six different locations in Cairo via a live popular times library.Several machine learning models were investigated in order to estimate the occu-pancy rate of certain places.Moreover,a clear analysis of the differences in per-formance is illustrated with the final model deployed being XGboost.It has achieved the most efficient results with a R^(2) score of 85.7%.
基金supported by State Grid Hebei Electric Power Co.,Ltd.(kj2020-040).
文摘Aiming at the problem of abnormal data generated by a power transformer on-line monitoring system due to the influences of transformer operation state change,external environmental interference,communication interruption,and other factors,a method of anomaly recognition and differentiation for monitoring data was proposed.Firstly,the empirical wavelet transform(EWT)and the autoregressive integrated moving average(ARIMA)model were used for time series modelling of monitoring data to obtain the residual sequence reflecting the anomaly monitoring data value,and then the isolation forest algorithm was used to identify the abnormal information,and the monitoring sequence was segmented according to the recognition results.Secondly,the segmented sequence was symbolised by the improved multi-dimensional SAX vector representation method,and the assessment of the anomaly pattern was made by calculating the similarity score of the adjacent symbol vectors,and the monitoring sequence correlation was further used to verify the assessment.Finally,the case study result shows that the proposed method can reliably recognise abnormal data and accurately distinguish between invalid and valid anomaly patterns.