With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In resp...With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.展开更多
Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or ...Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.展开更多
In industrial processes,there exist faults that have complex effect on process variables.Complex and simple faults are defined according to their effect dimensions.The conventional approaches based on structured resid...In industrial processes,there exist faults that have complex effect on process variables.Complex and simple faults are defined according to their effect dimensions.The conventional approaches based on structured residuals cannot isolate complex faults.This paper presents a multi-level strategy for complex fault isolation.An extraction procedure is employed to reduce the complex faults to simple ones and assign them to several levels.On each level,faults are isolated by their different responses in the structured residuals.Each residual is obtained insensitive to one fault but more sensitive to others.The faults on different levels are verified to have different residual responses and will not be confused.An entire incidence matrix containing residual response characteristics of all faults is obtained,based on which faults can be isolated.The proposed method is applied in the Tennessee Eastman process example,and the effectiveness and advantage are demonstrated.展开更多
The methods for geochemical anomaly detection are usually based on statistical models, and it needs to assume that the sample population satisfies a specific distribution, which may reduce the performance of geochemic...The methods for geochemical anomaly detection are usually based on statistical models, and it needs to assume that the sample population satisfies a specific distribution, which may reduce the performance of geochemical anomaly detection. In this paper, the isolation forest model is used to detect geochemical anomalies and it does not require geochemical data to satisfy a particular distribution. By constructing a tree to traverse the average path length of all data, anomaly scores are used to characterize the anomaly and background fields, and the optimal threshold is selected to identify geochemical anomalies. Taking 1∶200 000 geochemical exploration data of Fusong area in Jilin Province, NE China as an example, Fe2O3 and Pb were selected as the indicator elements to identify geochemical anomalies, and the results were compared with traditional statistical methods. The results show that the isolation forest model can effectively identify univariate geochemical anomalies, and the identified anomalies results have significant spatial correlation with known mine locations. Moreover, it can identify both high value anomalies and weak anomalies.展开更多
The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the i...The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.展开更多
Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter c...Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.展开更多
It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geoch...It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
This study aims to improve control schemes for COVID-19 by a numerical model with estimation of parameters.We established a multi-level and multi-objective nonlinear SEIDR model to simulate the virus transmission.The ...This study aims to improve control schemes for COVID-19 by a numerical model with estimation of parameters.We established a multi-level and multi-objective nonlinear SEIDR model to simulate the virus transmission.The early spread in Japan was adopted as a case study.The first 96 days since the infection were divided into five stages with parameters estimated.Then,we analyzed the trend of the parameter value,age structure ratio,and the defined PCR test index(standardization of the scale of PCR tests).It was discovered that the self-healing rate and confirmed rate were linear with the age structure ratio and the PCR test index using the stepwise regression method.The transmission rates were related to the age structure ratio,PCR test index,and isolation efficiency.Both isolation measures and PCR test medical screening can effectively reduce the number of infected cases based on the simulation results.However,the strategy of increasing PCR test medical screening would encountered a bottleneck effect on the virus control when the index reached 0.3.The effectiveness of the policy would decrease and the basic reproduction number reached the extreme value at 0.6.This study gave a feasible combination for isolation and PCR test by simulation.The isolation intensity could be adjusted to compensate the insufficiency of PCR test to control the pandemic.展开更多
文摘With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.
基金Supported by projects of the National Natural Science Foundation of China(Nos.41272360,41472299,41672322)
文摘Constructing a statistical model that best fits the background is a key step in geochemical anomaly identification. But the model is hard to be constructed in situations where the sample population has unknown and/or complex distribution. Isolation forest is an outlier detection approach that explicitly isolates anomaly samples rather than models the population distribution. It can extract multivariate anomalies from huge-sized high-dimensional data with unknown population distribution. For this reason,we tentatively applied the method to identify multivariate anomalies from the stream sediment survey data of the Lalingzaohuo district,an area with a complex geological setting,in Qinghai Province in China. The performance of the isolation forest algorithm in anomaly identification was compared with that of a continuous restricted Boltzmann machine. The results show that the isolation forest model performs superiorly to the continuous restricted Boltzmann machine in multivariate anomaly identification in terms of receiver operating characteristic curve,area under the curve,and data-processing efficiency. The anomalies identified by the isolation forest model occupy 19% of the study area and contain 82% of the known mineral deposits,whereas the anomalies identified by the continuous restricted Boltzmann machine occupy 35% of the study area and contain 88% of the known mineral deposits. It takes 4. 07 and 279. 36 seconds respectively handling the dataset using the two models. Therefore,isolation forest is a useful anomaly detection method that can quickly extract multivariate anomalies from geochemical exploration data.
基金Supported by the National Natural Science Foundation of China(60574047)the National High Technology Research and Development Program of China(2007AA04Z168,2009AA04Z154)the Research Fund for the Doctoral Program of Higher Education in China(20050335018)
文摘In industrial processes,there exist faults that have complex effect on process variables.Complex and simple faults are defined according to their effect dimensions.The conventional approaches based on structured residuals cannot isolate complex faults.This paper presents a multi-level strategy for complex fault isolation.An extraction procedure is employed to reduce the complex faults to simple ones and assign them to several levels.On each level,faults are isolated by their different responses in the structured residuals.Each residual is obtained insensitive to one fault but more sensitive to others.The faults on different levels are verified to have different residual responses and will not be confused.An entire incidence matrix containing residual response characteristics of all faults is obtained,based on which faults can be isolated.The proposed method is applied in the Tennessee Eastman process example,and the effectiveness and advantage are demonstrated.
基金Supported by National Key Basic Research Development Planning Project(No.2015CB453005)
文摘The methods for geochemical anomaly detection are usually based on statistical models, and it needs to assume that the sample population satisfies a specific distribution, which may reduce the performance of geochemical anomaly detection. In this paper, the isolation forest model is used to detect geochemical anomalies and it does not require geochemical data to satisfy a particular distribution. By constructing a tree to traverse the average path length of all data, anomaly scores are used to characterize the anomaly and background fields, and the optimal threshold is selected to identify geochemical anomalies. Taking 1∶200 000 geochemical exploration data of Fusong area in Jilin Province, NE China as an example, Fe2O3 and Pb were selected as the indicator elements to identify geochemical anomalies, and the results were compared with traditional statistical methods. The results show that the isolation forest model can effectively identify univariate geochemical anomalies, and the identified anomalies results have significant spatial correlation with known mine locations. Moreover, it can identify both high value anomalies and weak anomalies.
文摘The cigarette detection data contains a large amount of true sample data and a small amount of false sample data. The false sample data is regarded as abnormal data, and anomaly detection is performed to realize the identification of real and fake cigarettes. Binary particle swarm optimization algorithm is used to improve the isolation forest construction process, and isolation trees with high precision and large differences are selected, which improves the accuracy and efficiency of the algorithm. The distance between the obtained anomaly score and the clustering center of the k-means algorithm is used as the threshold for anomaly judgment. The experimental results show that the accuracy of the BPSO-iForest algorithm is improved compared with the standard iForest algorithm. The experimental results of multiple brand samples also show that the method in this paper can accurately use the detection data for authenticity identification.
基金supported by the National Natural Science Foundation of China(Nos.41672322,41872244)。
文摘Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.
文摘It is not easy to construct a model to describe the geochemical background in geochemical anomaly detection due to the complexity of the geological setting.Isolation forest and its improved algorithms can detect geochemical anomalies without modeling the complex geochemical background.These methods can effec-tively extract multivariate anomalies from large volume of high-dimensional geochemical data with unknown population distribution.To test the performance of these algorithms in the detection of mineralization-related geochemical anomalies,the isolation forest,extended isolation forest and generalized isolation forest models were established to detect multivariate anomalies from the stream sediment survey data collected in the Wu-laga area in Heilongjiang Province.The geochemical anomalies detected by the generalized isolation forest model account for 40%of the study area,and contain 100%of the known gold deposits.The geochemical anomalies detected by the isolation forest model account for 20%of the study area,and contain 71%of the known gold deposits.The geochemical anomalies detected by the extended isolation forest algorithm account for 34%of the study area,and contain 100%of the known gold deposits.Therefore,the isolation forest mo-del,extended isolation fo-rest model and generalized isolation forest model are comparable in geochemical anomaly detection.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金National Natural Science Foundation of China under Grant Nos.61803152,31920103016,and 11871475Doctoral Start-Up Foundation of Hunan Normal University under Grant No.0531120-3827Hunan Provincial Education Department under Grant No.HNKCSZ-2020-0813.
文摘This study aims to improve control schemes for COVID-19 by a numerical model with estimation of parameters.We established a multi-level and multi-objective nonlinear SEIDR model to simulate the virus transmission.The early spread in Japan was adopted as a case study.The first 96 days since the infection were divided into five stages with parameters estimated.Then,we analyzed the trend of the parameter value,age structure ratio,and the defined PCR test index(standardization of the scale of PCR tests).It was discovered that the self-healing rate and confirmed rate were linear with the age structure ratio and the PCR test index using the stepwise regression method.The transmission rates were related to the age structure ratio,PCR test index,and isolation efficiency.Both isolation measures and PCR test medical screening can effectively reduce the number of infected cases based on the simulation results.However,the strategy of increasing PCR test medical screening would encountered a bottleneck effect on the virus control when the index reached 0.3.The effectiveness of the policy would decrease and the basic reproduction number reached the extreme value at 0.6.This study gave a feasible combination for isolation and PCR test by simulation.The isolation intensity could be adjusted to compensate the insufficiency of PCR test to control the pandemic.