Dear Editor,Pose graph optimization(PGO)is a popular optimization approach that plays a crucial role in the simultaneous localization and mapping(SLAM)back-end.However,when incorrect loop closure constraints(referred ...Dear Editor,Pose graph optimization(PGO)is a popular optimization approach that plays a crucial role in the simultaneous localization and mapping(SLAM)back-end.However,when incorrect loop closure constraints(referred to as outliers)are present in the SLAM front-end,the standard PGO algorithm fails catastrophically and can not return an accurate map.To address this issue,this letter proposes a novel algorithm that leverages classical optimization methods to effectively handle outliers.The proposed algorithm introduces a new formulation that incorporates a credibility factor model,which improves the robustness of the optimization process.Additionally,an innovative consistency classification algorithm is developed to detect outliers.Extensive experiments are conducted on multiple benchmark datasets to evaluate the consistency and accuracy of the proposed algorithm.展开更多
This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the me...This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.展开更多
Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers ...Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.展开更多
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of ou...A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.展开更多
The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model ...The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.展开更多
The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LT...The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.展开更多
The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns seri...The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.展开更多
In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. T...In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.展开更多
A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberran...A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.展开更多
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma...In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.展开更多
We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregre...We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.展开更多
On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the...On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.展开更多
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-samp...We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.展开更多
Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies....Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.展开更多
Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an ef...Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.展开更多
Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,...Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.展开更多
In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic...In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.展开更多
False data injection attacks(FDIAs)can manipulate measurement data from Supervisory Control and Data Acquisition(SCADA)system and threat state estimation in smart grids.Blind FDIAs(BFDIAs)enhance traditional FDIAs,whi...False data injection attacks(FDIAs)can manipulate measurement data from Supervisory Control and Data Acquisition(SCADA)system and threat state estimation in smart grids.Blind FDIAs(BFDIAs)enhance traditional FDIAs,which eliminate the limitation of grasping measurement Jacobian matrix H in advance,but when there are outliers in measurement data,attack performance is degraded.In this paper,improved BFDIAs are proposed.In off-line phase,lowdimensional measurement matrix without outliers calculated by Linear Local Tangent Space Alignment algorithm(LLTSA)is sent into Continuous Deep Belief Network(CDBN)as training data to learn their probability distribution.In on-line phase,real-time low-dimensional measurement matrix with outliers are sent into the trained model as inputs,and outputs are reconstructed by the probability distribution in off-line phase,which eliminates the influence of outliers indirectly.Simulations are implemented on PJM 5-bus and IEEE 14-bus systems to verify the performance of proposed strategy compared with PCA-based BFDIAs.展开更多
The outlier problem for a multivariate elliptically contoured distribu-tion’s random sample with mean slippage is defined and the likelihood ratio test ofthe null hypothesis,in which there are no outliers,versus the ...The outlier problem for a multivariate elliptically contoured distribu-tion’s random sample with mean slippage is defined and the likelihood ratio test ofthe null hypothesis,in which there are no outliers,versus the alternative hypothesis,in which some outliers are present,is derived.We show that the testing problemis invariant under a group of affine transformations and obtain the maximal in-variance which is equivalent to the likelihood ratio testing statistic.Furthermore,the non-null and null density distribution functions of the likelihood ratio testingstatistic are derived.We find that the null density distribution function of thetesting statistic is robust and the density distribution function is a monotonicallikelihood ratio function of the maximal invariance.Therefore,the likelihood ratiotest is a uniformly most powerful invariant test among the group of affine transfor-mations.In the last section,we give an example of detecting multivariate outliersin elliptically contoured distribution.展开更多
基金supported in part by the National Nature Science Foundation of China(62273239,62103283).
文摘Dear Editor,Pose graph optimization(PGO)is a popular optimization approach that plays a crucial role in the simultaneous localization and mapping(SLAM)back-end.However,when incorrect loop closure constraints(referred to as outliers)are present in the SLAM front-end,the standard PGO algorithm fails catastrophically and can not return an accurate map.To address this issue,this letter proposes a novel algorithm that leverages classical optimization methods to effectively handle outliers.The proposed algorithm introduces a new formulation that incorporates a credibility factor model,which improves the robustness of the optimization process.Additionally,an innovative consistency classification algorithm is developed to detect outliers.Extensive experiments are conducted on multiple benchmark datasets to evaluate the consistency and accuracy of the proposed algorithm.
基金supported in part by the National Natural Science Foundation of China(61703245,61873148,61933007)the China Postdoctoral Science Foundation(2018T110702)+3 种基金the Postdoctoral Special Innovation Foundation of of Shandong Province of China(201701015)the European Union’s Horizon 2020 Research and Innovation Programme(820776(INTEGRADDE))the Royal Society of the UKthe Alexander von Humboldt Foundation of Germany。
文摘This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers.A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length(i.e.,the minimum duration between two adjacent impulsive signals)and minimum norm(i.e.,the minimum of the norms of all impulsive signals)are larger than certain thresholds that are adjustable according to engineering practice.In order to guarantee satisfactory filtering performance,a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state.First,a novel outlier detection strategy is developed,based on a dedicatedly constructed input-output model,to examine whether the received measurement is corrupted by an outlier.Then,through the outcome of the outlier detection,the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations.Furthermore,the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated.Finally,a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.
基金Science Council of Taiwan Province under Grant Nos.NSC 96-2628-E-366-004-MY2 and 96-2628-E-132-001-MY2
文摘Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, oudiers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal sohtary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
基金supported by the National Basic Research Program (973) of China (No. 2004CB117306)the Hi-Tech Research and Devel-opment Program (863) of China (No. 2006AA10A102)
文摘A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additive× environment interaction effects of QTLs in the presence of outliers.
文摘The structural equation model (SEM) concept is generally influenced by the presence of outliers and controlling variables. To a very large extent, this could have consequential effects on the parameters and the model fitness. Though previous researches have studied outliers and controlling observations from various perspectives including the use of box plots, normal probability plots, among others, the use of uniform horizontal QQ plot is yet to be explored. This study is, therefore, aimed at applying uniform QQ plots to identifying outliers and possible controlling observations in SEM. The results showed that all the three methods of estimators manifest the ability to identify outliers and possible controlling observations in SEM. It was noted that the Anderson-Rubin estimator of QQ plot showed a more efficient or visual display of spotting outliers and possible controlling observations as compared to the other methods of estimators. Therefore, this paper provides an efficient way identifying outliers as it fragments the data set.
文摘The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate from the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity 0( N In N) for large N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers. Keywords Least squares - Least trimmed squares - Outliers - System identification - Parameter estimation - Robust parameter estimation This work was supported in part by NSF ECS — 9710297 and ECS — 0098181.
文摘The study explored both Box and Jenkins, and iterative outlier detection procedures in determining the efficiency of ARIMA-GARCH-type models in the presence of outliers using the daily closing share price returns series of four prominent banks in Nigeria (Skye (Polaris) bank, Sterling bank, Unity bank and Zenith bank) from January 3, 2006 to November 24, 2016. The series consists of 2690 observations for each bank. The data were obtained from the Nigerian Stock Exchange. Unconditional variance and kurtosis coefficient were used as criteria for measuring the efficiency of ARIMA-GARCH-type models and our findings revealed that kurtosis is a better criterion (as it is a true measure of outliers) than the unconditional variance (as it can be depleted or amplified by outliers). Specifically, the strength of this study is in showing the applicability and relevance of iterative methods in time series modeling.
文摘In its broadest sense, this paper reviews the general outlier problem, the means available for addressing the discordancy (or lack thereof) of an outlier (or outliers), and possible strategies for dealing with them. Two alternate approaches to the multiple outlier problem, consecutive and block testing, and their respective inherent weaknesses, masking and swamping, are discussed. In addition, the relative susceptibility of several tests for outliers in normal samples to the swamping phenomena is reported.
文摘A variety of factors affect air quality, making it a difficult issue. The level of clean air in a certain area is referred to as air quality. It is challenging for conventional approaches to correctly discover aberrant values or outliers due to the significant fluctuation of this sort of data, which is influenced by Climate change and the environment. With accelerating industrial expansion and rising population density in Kolkata City, air pollution is continuously rising. This study involves two phases, in the first phase imputation of missing values and second detection of outliers using Statistical Process Control (SPC), and Functional Data Analysis (FDA), studies to achieve the efficacy of the outlier identification methodology proposed with working days and Nonworking days of the variables NO<sub>2</sub>, SO<sub>2</sub>, and O<sub>3</sub>, which were used for a year in a row in Kolkata, India. The results show how the functional data approach outshines traditional outlier detection methods. The outcomes show that functional data analysis vibrates more than the other two approaches after imputation, and the suggested outlier detector is absolutely appropriate for the precise detection of outliers in highly variable data.
文摘In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
文摘We introduce a new wavelet based procedure for detecting outliers in financial discrete time series.The procedure focuses on the analysis of residuals obtained from a model fit,and applied to the Generalized Autoregressive Conditional Heteroskedasticity(GARCH)like model,but not limited to these models.We apply the Maximal-Overlap Discrete Wavelet Transform(MODWT)to the residuals and compare their wavelet coefficients against quantile thresholds to detect outliers.Our methodology has several advantages over existing methods that make use of the standard Discrete Wavelet Transform(DWT).The series sample size does not need to be a power of 2 and the transform can explore any wavelet filter and be run up to the desired level.Simulated wavelet quantiles from a Normal and Student t-distribution are used as threshold for the maximum of the absolute value of wavelet coefficients.The performance of the procedure is illustrated and applied to two real series:the closed price of the Saudi Stock market and the S&P 500 index respectively.The efficiency of the proposed method is demonstrated and can be considered as a distinct important addition to the existing methods.
文摘On the basis of the newly developed regression diagnostic analysis, the diagnostic method with the assessment of the outliers of the logistic regression model was set up and it was used to analyze the prognosis of the patients with acute lymphatic leukemia.
文摘We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
基金Supported by the Natural Science Foundation of China(62072388,62276146)the Industry Guidance Project Foundation of Science technology Bureau of Fujian province(2020H0047)+2 种基金the Natural Science Foundation of Science Technology Bureau of Fujian province(2019J01601)the Creation Fund project of Science Technology Bureau of Fujian province(JAT190596)Putian University Research Project(2022034)。
文摘Background Image matching is crucial in numerous computer vision tasks such as 3D reconstruction and simultaneous visual localization and mapping.The accuracy of the matching significantly impacted subsequent studies.Because of their local similarity,when image pairs contain comparable patterns but feature pairs are positioned differently,incorrect recognition can occur as global motion consistency is disregarded.Methods This study proposes an image-matching filtering algorithm based on global motion consistency.It can be used as a subsequent matching filter for the initial matching results generated by other matching algorithms based on the principle of motion smoothness.A particular matching algorithm can first be used to perform the initial matching;then,the rotation and movement information of the global feature vectors are combined to effectively identify outlier matches.The principle is that if the matching result is accurate,the feature vectors formed by any matched point should have similar rotation angles and moving distances.Thus,global motion direction and global motion distance consistencies were used to reject outliers caused by similar patterns in different locations.Results Four datasets were used to test the effectiveness of the proposed method.Three datasets with similar patterns in different locations were used to test the results for similar images that could easily be incorrectly matched by other algorithms,and one commonly used dataset was used to test the results for the general image-matching problem.The experimental results suggest that the proposed method is more accurate than other state-of-the-art algorithms in identifying mismatches in the initial matching set.Conclusions The proposed outlier rejection matching method can significantly improve the matching accuracy for similar images with locally similar feature pairs in different locations and can provide more accurate matching results for subsequent computer vision tasks.
文摘Outlier detection techniques play a vital role in exploring unusual data of extreme events that have a critical effect considerably in the modeling and forecasting of functional data. The functional methods have an effective way of identifying outliers graphically, which might not be visible through the original data plot in classical analysis. This study’s main objective is to detect the extreme rainfall events using functional outliers detection methods depending on the depth and density functions. In order to identify the unusual events of rainfall variation over long time intervals, this work conducts based on the average monthly rainfall of the Taiz region from 1998 to 2019. Data were extracted from the Tropical Rainfall Measuring Mission and the analysis has been processed by R software. The approaches applied in this study involve rainbow plots, functional highest density region box-plot as well as functional bag-plot. According to the current results, the functional density box-plot method has proven effective in detecting outlier compared to the functional depth bag-plot method. In conclusion, the results of the current study showed that the rainfall over the Taiz region during the last two decades was influenced by the extreme events of years 1999, 2004, 2005, and 2009.
文摘Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.
文摘In robust regression we often have to decide how many are the unusualobservations, which should be removed from the sample in order to obtain better fitting for the restof the observations. Generally, we use the basic principle of LTS, which is to fit the majority ofthe data, identifying as outliers those points that cause the biggest damage to the robust fit.However, in the LTS regression method the choice of default values for high break down-point affectsseriously the efficiency of the estimator. In the proposed approach we introduce penalty cost fordiscarding an outlier, consequently, the best fit for the majority of the data is obtained bydiscarding only catastrophic observations. This penalty cost is based on robust design weights andhigh break down-point residual scale taken from the LTS estimator. The robust estimation is obtainedby solving a convex quadratic mixed integer programming problem, where in the objective functionthe sum of the squared residuals and penalties for discarding observations is minimized. Theproposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct asimulation study to compare other robust estimators with our approach in terms of their efficiencyand robustness.
基金supported by the Funds of the National Key Research and Development Program of China(Grant No.2020YFE0201100)the Funds of National Science of China(Grant nos.61973062,61973068)the Fundamental Research Funds for the Central Universities(Grant nos.N2004010,N2104021,N182008004).
文摘False data injection attacks(FDIAs)can manipulate measurement data from Supervisory Control and Data Acquisition(SCADA)system and threat state estimation in smart grids.Blind FDIAs(BFDIAs)enhance traditional FDIAs,which eliminate the limitation of grasping measurement Jacobian matrix H in advance,but when there are outliers in measurement data,attack performance is degraded.In this paper,improved BFDIAs are proposed.In off-line phase,lowdimensional measurement matrix without outliers calculated by Linear Local Tangent Space Alignment algorithm(LLTSA)is sent into Continuous Deep Belief Network(CDBN)as training data to learn their probability distribution.In on-line phase,real-time low-dimensional measurement matrix with outliers are sent into the trained model as inputs,and outputs are reconstructed by the probability distribution in off-line phase,which eliminates the influence of outliers indirectly.Simulations are implemented on PJM 5-bus and IEEE 14-bus systems to verify the performance of proposed strategy compared with PCA-based BFDIAs.
文摘The outlier problem for a multivariate elliptically contoured distribu-tion’s random sample with mean slippage is defined and the likelihood ratio test ofthe null hypothesis,in which there are no outliers,versus the alternative hypothesis,in which some outliers are present,is derived.We show that the testing problemis invariant under a group of affine transformations and obtain the maximal in-variance which is equivalent to the likelihood ratio testing statistic.Furthermore,the non-null and null density distribution functions of the likelihood ratio testingstatistic are derived.We find that the null density distribution function of thetesting statistic is robust and the density distribution function is a monotonicallikelihood ratio function of the maximal invariance.Therefore,the likelihood ratiotest is a uniformly most powerful invariant test among the group of affine transfor-mations.In the last section,we give an example of detecting multivariate outliersin elliptically contoured distribution.