As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping...As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping mall. When calculating the degree of data difference, the coefficient of variation is used as the attribute weight;the weighted Euclidean distance is used to calculate the degree of difference;and k-means clustering is used to classify different time periods. It applies the LOF algorithm to detect the outlier degree of transaction data at each time period, sets the initial threshold to detect outliers, deletes the outliers, and then performs SAX detection on the data set. If it does not pass the test, then it will gradually expand the outlying domain and repeat the above process to optimize the outlier threshold to improve the sensitivity of detection algorithm and reduce false positives.展开更多
Model averaging is a good alternative to model selection,which can deal with the uncertainty from model selection process and make full use of the information from various candidate models.However,most of the existing...Model averaging is a good alternative to model selection,which can deal with the uncertainty from model selection process and make full use of the information from various candidate models.However,most of the existing model averaging criteria do not consider the influence of outliers on the estimation procedures.The purpose of this paper is to develop a robust model averaging approach based on the local outlier factor(LOF)algorithm which can downweight the outliers in the covariates.Asymptotic optimality of the proposed robust model averaging estimator is derived under some regularity conditions.Further,we prove the consistency of the LOF-based weight estimator tending to the theoretically optimal weight vector.Numerical studies including Monte Carlo simulations and a real data example are provided to illustrate our proposed methodology.展开更多
Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagn...Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.展开更多
为了解决含顺序型和名义型变量混合型数据的监测问题,提出了一种基于LOF算法的多维混合型数据控制图(mixed-type data local outlier factor control chart,MLOF)。在监测过程变量变化的过程中,该控制图充分考虑了顺序型变量的等级特性...为了解决含顺序型和名义型变量混合型数据的监测问题,提出了一种基于LOF算法的多维混合型数据控制图(mixed-type data local outlier factor control chart,MLOF)。在监测过程变量变化的过程中,该控制图充分考虑了顺序型变量的等级特性和名义型变量的信息熵,基于数据的密度来衡量观测点的异常程度。分别使用基于信用卡申请数据集的仿真案例和基于德国信用卡数据集的实例,对比MLOF控制图和现有混合型数据控制图在异常点检测上的表现。仿真案例共模拟了30种监测场景。结果表明,在57%的场景中,MLOF控制图的综合表现都是最好的。而实例也验证了MLOF控制图更适用于数据量大、聚类情况复杂的混合型数据监测过程中。展开更多
文摘As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping mall. When calculating the degree of data difference, the coefficient of variation is used as the attribute weight;the weighted Euclidean distance is used to calculate the degree of difference;and k-means clustering is used to classify different time periods. It applies the LOF algorithm to detect the outlier degree of transaction data at each time period, sets the initial threshold to detect outliers, deletes the outliers, and then performs SAX detection on the data set. If it does not pass the test, then it will gradually expand the outlying domain and repeat the above process to optimize the outlier threshold to improve the sensitivity of detection algorithm and reduce false positives.
基金supported by the National Natural Science Foundation of China (Grant Nos.11971323,12031016).
文摘Model averaging is a good alternative to model selection,which can deal with the uncertainty from model selection process and make full use of the information from various candidate models.However,most of the existing model averaging criteria do not consider the influence of outliers on the estimation procedures.The purpose of this paper is to develop a robust model averaging approach based on the local outlier factor(LOF)algorithm which can downweight the outliers in the covariates.Asymptotic optimality of the proposed robust model averaging estimator is derived under some regularity conditions.Further,we prove the consistency of the LOF-based weight estimator tending to the theoretically optimal weight vector.Numerical studies including Monte Carlo simulations and a real data example are provided to illustrate our proposed methodology.
基金funded by a science and technology project of State Grid Corporation of China“Comparative Analysis of Long-Term Measurement and Prediction of the Ground Synthetic Electric Field of±800 kV DC Transmission Line”(GYW11201907738)Paulo R.F.Rocha acknowledges the support and funding from the European Research Council(ERC)under the European Union’s Horizon 2020 Research and Innovation Program(Grant Agreement No.947897).
文摘Ultra-high voltage(UHV)transmission lines are an important part of China’s power grid and are often surrounded by a complex electromagnetic environment.The ground total electric field is considered a main electromagnetic environment indicator of UHV transmission lines and is currently employed for reliable long-term operation of the power grid.Yet,the accurate prediction of the ground total electric field remains a technical challenge.In this work,we collected the total electric field data from the Ningdong-Zhejiang±800 kV UHVDC transmission project,as of the Ling Shao line,and perform an outlier analysis of the total electric field data.We show that the Local Outlier Factor(LOF)elimination algorithm has a small average difference and overcomes the performance of Density-Based Spatial Clustering of Applications with Noise(DBSCAN)and Isolated Forest elimination algorithms.Moreover,the Stacking algorithm has been found to have superior prediction accuracy than a variety of similar prediction algorithms,including the traditional finite element.The low prediction error of the Stacking algorithm highlights the superior ability to accurately forecast the ground total electric field of UHVDC transmission lines.
文摘为了解决含顺序型和名义型变量混合型数据的监测问题,提出了一种基于LOF算法的多维混合型数据控制图(mixed-type data local outlier factor control chart,MLOF)。在监测过程变量变化的过程中,该控制图充分考虑了顺序型变量的等级特性和名义型变量的信息熵,基于数据的密度来衡量观测点的异常程度。分别使用基于信用卡申请数据集的仿真案例和基于德国信用卡数据集的实例,对比MLOF控制图和现有混合型数据控制图在异常点检测上的表现。仿真案例共模拟了30种监测场景。结果表明,在57%的场景中,MLOF控制图的综合表现都是最好的。而实例也验证了MLOF控制图更适用于数据量大、聚类情况复杂的混合型数据监测过程中。