With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In resp...With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter c...Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.展开更多
文摘With the changes in educational models,applying computer algorithms and artificial intelligence technologies to data analysis in universities has become a research hotspot in the field of intelligent education.In response to the increasing amount of student data in universities,this study proposes to use an optimized isolated forest algorithm for recognizing features to detect abnormal student behavior concealed in big data for educational management.Firstly,it uses a logistic regression algorithm to update the calculation method of isolated forest weights and then uses residual statistics to eliminate redundant forests.Finally,it utilizes discrete particle swarm optimization to optimize the isolated forest algorithm.On this basis,improvements have also been made to the traditional gated loop unit network.It merges the two improved algorithm models and builds an anomaly detection model for collecting college student education data.The experiment shows that the optimized isolated forest algorithm has a recognition accuracy of 0.986 and a training time of 1s.The recognition accuracy of the improved gated loop unit network is 0.965,and the training time is 0.16s.In summary,the constructed model can effectively identify abnormal data of college students,thereby helping educators to detect students’problems in time and helping students to improve their learning status.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.
基金supported by the National Natural Science Foundation of China(Nos.41672322,41872244)。
文摘Isolation forest and elliptic envelope are used to detect geochemical anomalies,and the bat algorithm was adopted to optimize the parameters of the two models.The two bat-optimized models and their default-parameter counterparts were used to detect multivariate geochemical anomalies from the stream sediment survey data of 1:50000 scale collected from the Helong district,Jilin Province,China.Based on the data modeling results,the receiver operating characteristic(ROC)curve analysis was performed to evaluate the performance of the two bat-optimized models and their default-parameter counterparts.The results show that the bat algorithm can improve the performance of the two models by optimizing their parameters in geochemical anomaly detection.The optimal threshold determined by the Youden index was used to identify geochemical anomalies from the geochemical data points.Compared with the anomalies detected by the elliptic envelope models,the anomalies detected by the isolation forest models have higher spatial relationship with the mineral occurrences discovered in the study area.According to the results of this study and previous work,it can be inferred that the background population of the study area is complex,which is not suitable for the establishment of elliptic envelope model.