Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficie...Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficient movingwindow local outlier probability algorithm is proposed, lies key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Finally, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.展开更多
The heterogeneous nodes in the Internet of Things(IoT)are relatively weak in the computing power and storage capacity.Therefore,traditional algorithms of network security are not suitable for the IoT.Once these nodes ...The heterogeneous nodes in the Internet of Things(IoT)are relatively weak in the computing power and storage capacity.Therefore,traditional algorithms of network security are not suitable for the IoT.Once these nodes alternate between normal behavior and anomaly behavior,it is difficult to identify and isolate them by the network system in a short time,thus the data transmission accuracy and the integrity of the network function will be affected negatively.Based on the characteristics of IoT,a lightweight local outlier factor detection method is used for node detection.In order to further determine whether the nodes are an anomaly or not,the varying behavior of those nodes in terms of time is considered in this research,and a time series method is used to make the system respond to the randomness and selectiveness of anomaly behavior nodes effectively in a short period of time.Simulation results show that the proposed method can improve the accuracy of the data transmitted by the network and achieve better performance.展开更多
Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical ...Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.展开更多
Node localization is commonly employed in wireless networks. For example, it is used to improve routing and enhance security. Localization algorithms can be classified as range-free or range-based. Range-based algorit...Node localization is commonly employed in wireless networks. For example, it is used to improve routing and enhance security. Localization algorithms can be classified as range-free or range-based. Range-based algorithms use location metrics such as ToA, TDoA, RSS, and AoA to estimate the distance between two nodes. Proximity sensing between nodes is typically the basis for range-free algorithms. A tradeoff exists since range-based algorithms are more accurate but also more complex. However, in applications such as target tracking, localization accuracy is very important. In this paper, we propose a new range-based algorithm which is based on the density-based outlier detection algorithm (DBOD) from data mining. It requires selection of the K-nearest neighbours (KNN). DBOD assigns density values to each point used in the location estimation. The mean of these densities is calculated and those points having a density larger than the mean are kept as candidate points. Different performance measures are used to compare our approach with the linear least squares (LLS) and weighted linear least squares based on singular value decomposition (WLS-SVD) algorithms. It is shown that the proposed algorithm performs better than these algorithms even when the anchor geometry about an unlocalized node is poor.展开更多
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl...Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.展开更多
数据流是一类具有高生成率、动态分布特性的数据,其异常检测旨在从这一类数据中发现偏离预期行为的数据流,从而为医疗、工业生产、金融等诸多领域的决策提供支持。现有数据流异常检测方法普遍面临参数敏感性高、时空开销大、阈值选取难...数据流是一类具有高生成率、动态分布特性的数据,其异常检测旨在从这一类数据中发现偏离预期行为的数据流,从而为医疗、工业生产、金融等诸多领域的决策提供支持。现有数据流异常检测方法普遍面临参数敏感性高、时空开销大、阈值选取难等问题。为了解决上述问题,提出一种基于变密度的自适应数据流的异常检测方法。首先定义了可变局部离群因子(Va-riable Local Outlier Factor,VLOF),VLOF通过对比数据点在并行的不同k值的邻域窗口下,其局部可达密度和局部异常因子的变化情况,度量数据点的密度分布,降低单一k近邻密度度量导致的结果不准确。其次,计算VLOF与k值的相对增长率和绝对均值率,以反映数据流的动态变化趋势,并将适应这种动态变化趋势的数据点定义为核心点,通过核心点加快对后续正常点的判断。最后,将相对增长率和绝对均值率作为数据点理论分布的度量指标,计算理论分布和新数据点实际分布的差异,从而自适应地将偏离理论分布的点识别为异常。为了验证提出算法的有效性,在多个UCI数据集和真实数据集下与8个算法进行对比实验,实验结果表明:与基线模型相比,所提方法在精确率、召回率、F1性能指标上表现良好,且时间和空间效率也有相应提升。展开更多
Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately.To serve this purpose,a novel approach is introduced to determine the likelihood of an ...Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately.To serve this purpose,a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.Design/methodology/approach:This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems.The proposed approach,named Bagged and Voted Local Outlier Detection(BV-LOF),benefits from the Local Outlier Factor(LOF)as the base algorithm and improves its detection rate by using ensemble methods.Findings:Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method.According to the results,the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.Research limitations:In the BV-LOF approach,the base algorithm is applied to each subset data multiple times with different neighborhood sizes(k)in each case and with different ensemble sizes(T).In our study,we have chosen k and T value ranges as[1-100];however,these ranges can be changed according to the dataset handled and to the problem addressed.Practical implications:The proposed method can be applied to the datasets from different domains(i.e.health,finance,manufacturing,etc.)without requiring any prior information.Since the BV-LOF method includes two-level ensemble operations,it may lead to more computational time than single-level ensemble methods;however,this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.Originality/value:The proposed approach(BV-LOF)investigates multiple neighborhood sizes(k),which provides findings of instances with different local densities,and in this way,it provides more likelihood of outlier detection that LOF may neglect.It also brings many benefits such as easy implementation,improved capability,higher applicability,and interpretability.展开更多
Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to d...Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to deal with the relationship between assembly quality and press-assembly process,then the mathematical model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined for assembly quality control.To preprocess the raw dataset of displacement-force in the press-assembly process,an improved local outlier factor based on area density and P weight(LAOPW)is designed to eliminate the outliers which will result in inaccuracy of the mathematical model.A weighted distance based on information entropy is used to measure distance,and the reachable distance is replaced with P weight.Experiments show that the detection efficiency of the algorithm is improved by 5.6 ms compared with the traditional local outlier factor(LOF)algorithm,and the detection accuracy is improved by about 2%compared with the local outlier factor based on area density(LAOF)algorithm.The application of LAOPW algorithm and the linear regression model shows that it can effectively carry out intelligent early warning of press-assembly quality of high precision servo mechanism.展开更多
In this paper, a varying-coefficient density-ratio model for case-control studies is developed. We investigate the local empirical likelihood diagnosis of varying coefficient density-ratio model for case-control data....In this paper, a varying-coefficient density-ratio model for case-control studies is developed. We investigate the local empirical likelihood diagnosis of varying coefficient density-ratio model for case-control data. The local empirical log-likelihood ratios for the nonparametric coefficient functions are introduced. First, the estimation equations based on empirical likelihood method are established. Then, a few of diagnostic statistics are proposed. At last, we also examine the performance of proposed method for finite sample sizes through simulation studies.展开更多
针对目前飞机离地姿态异常的监控依赖单一参数超限探测、缺乏多参数组合异常检测的问题,提出了一种基于近邻搜索空间提取的局部异常因子算法(Isolation-based Data Extracting Local Outlier Factor,IDELOF)的飞机离地姿态异常检测方法...针对目前飞机离地姿态异常的监控依赖单一参数超限探测、缺乏多参数组合异常检测的问题,提出了一种基于近邻搜索空间提取的局部异常因子算法(Isolation-based Data Extracting Local Outlier Factor,IDELOF)的飞机离地姿态异常检测方法。首先,选取空速、俯仰角、滚转角作为飞机离地姿态特征参数,运用基于隔离思想的近邻搜索空间提取方法进行数据降维提取,降低计算复杂度;其次,利用局部异常因子算法对提取后的数据进行异常检测,识别多参综合异常;然后,基于国内某航空公司A319机队297个航班的快速存取记录器(Quick Access Recorder,QAR)数据,验证了模型对单一参数异常和多参综合异常检测结果的有效性;最后,对模型结果的正异常分布特征及可解释性进行分析,分别阐述了八种异常情况出现的主要原因,为飞行安全风险防控提供了深入的数据支持。展开更多
针对复杂工业生产过程具有高维度、多工况、非线性的特征以及扩散映射存在的新样本投影困难的问题,本文提出了一种基于可扩容式扩散映射和局部离群因子(expandable diffusion maps and local outlier factors, EDM-LOF)的工业过程故障...针对复杂工业生产过程具有高维度、多工况、非线性的特征以及扩散映射存在的新样本投影困难的问题,本文提出了一种基于可扩容式扩散映射和局部离群因子(expandable diffusion maps and local outlier factors, EDM-LOF)的工业过程故障检测方法.使用扩散映射方法提取训练样本的低维流形结构,构建局部投影矩阵将新样本投影至流形空间,并在流形空间中使用局部离群因子方法进行故障检测.将EDM-LOF应用于青霉素发酵过程进行故障检测,并与PCA、FD-kNN、LOF方法进行比较,结果表明EDM-LOF具有更高的故障检测性能,验证了该方法的有效性.展开更多
基金Supported by the National Natural Science Foundation of China(61374140)Shanghai Postdoctoral Sustentation Fund(12R21412600)+1 种基金the Fundamental Research Funds for the Central Universities(WH1214039)Shanghai Pujiang Program(12PJ1402200)
文摘Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficient movingwindow local outlier probability algorithm is proposed, lies key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Finally, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.
基金This work is partially supported by the Ministry of Education of China(www.moe.gov.cn)under grant Nos.201802123091(received by F.W.)and 201802123068(received by Z.W.)Scientific Project of CAFUC(www.cafuc.edu.cn)under grant Nos.F2017KF02 and J2018-3(both received by Z.W.)Teaching Reform Project of CAFUC(www.cafuc.edu.cn)under grant No.E2020044(received by Z.W.).
文摘The heterogeneous nodes in the Internet of Things(IoT)are relatively weak in the computing power and storage capacity.Therefore,traditional algorithms of network security are not suitable for the IoT.Once these nodes alternate between normal behavior and anomaly behavior,it is difficult to identify and isolate them by the network system in a short time,thus the data transmission accuracy and the integrity of the network function will be affected negatively.Based on the characteristics of IoT,a lightweight local outlier factor detection method is used for node detection.In order to further determine whether the nodes are an anomaly or not,the varying behavior of those nodes in terms of time is considered in this research,and a time series method is used to make the system respond to the randomness and selectiveness of anomaly behavior nodes effectively in a short period of time.Simulation results show that the proposed method can improve the accuracy of the data transmitted by the network and achieve better performance.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.
文摘Node localization is commonly employed in wireless networks. For example, it is used to improve routing and enhance security. Localization algorithms can be classified as range-free or range-based. Range-based algorithms use location metrics such as ToA, TDoA, RSS, and AoA to estimate the distance between two nodes. Proximity sensing between nodes is typically the basis for range-free algorithms. A tradeoff exists since range-based algorithms are more accurate but also more complex. However, in applications such as target tracking, localization accuracy is very important. In this paper, we propose a new range-based algorithm which is based on the density-based outlier detection algorithm (DBOD) from data mining. It requires selection of the K-nearest neighbours (KNN). DBOD assigns density values to each point used in the location estimation. The mean of these densities is calculated and those points having a density larger than the mean are kept as candidate points. Different performance measures are used to compare our approach with the linear least squares (LLS) and weighted linear least squares based on singular value decomposition (WLS-SVD) algorithms. It is shown that the proposed algorithm performs better than these algorithms even when the anchor geometry about an unlocalized node is poor.
基金Project(61362021)supported by the National Natural Science Foundation of ChinaProject(2016GXNSFAA380149)supported by Natural Science Foundation of Guangxi Province,China+1 种基金Projects(2016YJCXB02,2017YJCX34)supported by Innovation Project of GUET Graduate Education,ChinaProject(2011KF11)supported by the Key Laboratory of Cognitive Radio and Information Processing,Ministry of Education,China
文摘Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.
文摘数据流是一类具有高生成率、动态分布特性的数据,其异常检测旨在从这一类数据中发现偏离预期行为的数据流,从而为医疗、工业生产、金融等诸多领域的决策提供支持。现有数据流异常检测方法普遍面临参数敏感性高、时空开销大、阈值选取难等问题。为了解决上述问题,提出一种基于变密度的自适应数据流的异常检测方法。首先定义了可变局部离群因子(Va-riable Local Outlier Factor,VLOF),VLOF通过对比数据点在并行的不同k值的邻域窗口下,其局部可达密度和局部异常因子的变化情况,度量数据点的密度分布,降低单一k近邻密度度量导致的结果不准确。其次,计算VLOF与k值的相对增长率和绝对均值率,以反映数据流的动态变化趋势,并将适应这种动态变化趋势的数据点定义为核心点,通过核心点加快对后续正常点的判断。最后,将相对增长率和绝对均值率作为数据点理论分布的度量指标,计算理论分布和新数据点实际分布的差异,从而自适应地将偏离理论分布的点识别为异常。为了验证提出算法的有效性,在多个UCI数据集和真实数据集下与8个算法进行对比实验,实验结果表明:与基线模型相比,所提方法在精确率、召回率、F1性能指标上表现良好,且时间和空间效率也有相应提升。
文摘Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately.To serve this purpose,a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.Design/methodology/approach:This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems.The proposed approach,named Bagged and Voted Local Outlier Detection(BV-LOF),benefits from the Local Outlier Factor(LOF)as the base algorithm and improves its detection rate by using ensemble methods.Findings:Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method.According to the results,the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.Research limitations:In the BV-LOF approach,the base algorithm is applied to each subset data multiple times with different neighborhood sizes(k)in each case and with different ensemble sizes(T).In our study,we have chosen k and T value ranges as[1-100];however,these ranges can be changed according to the dataset handled and to the problem addressed.Practical implications:The proposed method can be applied to the datasets from different domains(i.e.health,finance,manufacturing,etc.)without requiring any prior information.Since the BV-LOF method includes two-level ensemble operations,it may lead to more computational time than single-level ensemble methods;however,this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.Originality/value:The proposed approach(BV-LOF)investigates multiple neighborhood sizes(k),which provides findings of instances with different local densities,and in this way,it provides more likelihood of outlier detection that LOF may neglect.It also brings many benefits such as easy implementation,improved capability,higher applicability,and interpretability.
文摘Focusing on controlling the press-assembly quality of high-precision servo mechanism,an intelligent early warning method based on outlier data detection and linear regression is proposed.Linear regression is used to deal with the relationship between assembly quality and press-assembly process,then the mathematical model of displacement-force in press-assembly process is established and a qualified press-assembly force range is defined for assembly quality control.To preprocess the raw dataset of displacement-force in the press-assembly process,an improved local outlier factor based on area density and P weight(LAOPW)is designed to eliminate the outliers which will result in inaccuracy of the mathematical model.A weighted distance based on information entropy is used to measure distance,and the reachable distance is replaced with P weight.Experiments show that the detection efficiency of the algorithm is improved by 5.6 ms compared with the traditional local outlier factor(LOF)algorithm,and the detection accuracy is improved by about 2%compared with the local outlier factor based on area density(LAOF)algorithm.The application of LAOPW algorithm and the linear regression model shows that it can effectively carry out intelligent early warning of press-assembly quality of high precision servo mechanism.
文摘In this paper, a varying-coefficient density-ratio model for case-control studies is developed. We investigate the local empirical likelihood diagnosis of varying coefficient density-ratio model for case-control data. The local empirical log-likelihood ratios for the nonparametric coefficient functions are introduced. First, the estimation equations based on empirical likelihood method are established. Then, a few of diagnostic statistics are proposed. At last, we also examine the performance of proposed method for finite sample sizes through simulation studies.
文摘针对目前飞机离地姿态异常的监控依赖单一参数超限探测、缺乏多参数组合异常检测的问题,提出了一种基于近邻搜索空间提取的局部异常因子算法(Isolation-based Data Extracting Local Outlier Factor,IDELOF)的飞机离地姿态异常检测方法。首先,选取空速、俯仰角、滚转角作为飞机离地姿态特征参数,运用基于隔离思想的近邻搜索空间提取方法进行数据降维提取,降低计算复杂度;其次,利用局部异常因子算法对提取后的数据进行异常检测,识别多参综合异常;然后,基于国内某航空公司A319机队297个航班的快速存取记录器(Quick Access Recorder,QAR)数据,验证了模型对单一参数异常和多参综合异常检测结果的有效性;最后,对模型结果的正异常分布特征及可解释性进行分析,分别阐述了八种异常情况出现的主要原因,为飞行安全风险防控提供了深入的数据支持。
文摘针对复杂工业生产过程具有高维度、多工况、非线性的特征以及扩散映射存在的新样本投影困难的问题,本文提出了一种基于可扩容式扩散映射和局部离群因子(expandable diffusion maps and local outlier factors, EDM-LOF)的工业过程故障检测方法.使用扩散映射方法提取训练样本的低维流形结构,构建局部投影矩阵将新样本投影至流形空间,并在流形空间中使用局部离群因子方法进行故障检测.将EDM-LOF应用于青霉素发酵过程进行故障检测,并与PCA、FD-kNN、LOF方法进行比较,结果表明EDM-LOF具有更高的故障检测性能,验证了该方法的有效性.