A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and caus...A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and cause are documented and it is reasonably straightforward to statistically adjust (homogenize) the series for the effects of the changepoint. Sadly many changepoint times are undocumented and the changepoint times themselves are the main purpose of study. In this article, the changepoint analysis in two-phrase linear regression models is developed and discussed. Following Liu and Qian (2010)'s idea in the segmented linear regression models, the modified empirical likelihood ratio statistic is proposed to test if there exists a changepoint during the long-term experiment and observation. The modified empirical likelihood ratio statistic is computation-friendly and its ρ-value can be easily approximated based on the large sample properties. The procedure is applied to the Old Faithful geyser eruption data in October 1980.展开更多
Annually averaged daily maximum and minimum surface temperatures from southeastern China were evaluated for artificial discontinuities using three different tests for undocumented changepoints. Changepoints in the tim...Annually averaged daily maximum and minimum surface temperatures from southeastern China were evaluated for artificial discontinuities using three different tests for undocumented changepoints. Changepoints in the time series were identified by comparing each target series to a reference calculated from values observed at a number of nearby stations. Under the assumption that no trend was present in the sequence of target-reference temperature differences, a changepoint was assigned to the target series when at least two of the three tests rejected the null hypothesis of no changepoint at approximately the same position in the difference series. Each target series then was adjusted using a procedure that accounts for discontinuities in average temperature values from nearby stations that otherwise could bias estimates of the magnitude of the target series step change. A spatial comparison of linear temperature trends in the adjusted annual temperature series suggests that major relative discontinuities were removed in the homogenization process. A greater number of relative change points were detected in annual average minimum than in average maximum temperature series. Some evidence is presented which suggests that minimum surface temperature fields may be more sensitive to changes in measurement practice than maximum temperature fields. In addition, given previous evidence of urban heat island (i.e., local) trends in this region, the assumption of no slope in a target-reference difference series is likely to be violated more frequently in minimum than in maximum temperature series. Consequently, there may be greater potential to confound trend and step changes in minimum temperature series.展开更多
Count data is almost always over-dispersed where the variance exceeds the mean. Several count data models have been proposed by researchers but the problem of over-dispersion still remains unresolved, more so in the c...Count data is almost always over-dispersed where the variance exceeds the mean. Several count data models have been proposed by researchers but the problem of over-dispersion still remains unresolved, more so in the context of change point analysis. This study develops a likelihood-based algorithm that detects and estimates multiple change points in a set of count data assumed to follow the Negative Binomial distribution. Discrete change point procedures discussed in literature work well for equi-dispersed data. The new algorithm produces reliable estimates of change points in cases of both equi-dispersed and over-dispersed count data;hence its advantage over other count data change point techniques. The Negative Binomial Multiple Change Point Algorithm was tested using simulated data for different sample sizes and varying positions of change. Changes in the distribution parameters were detected and estimated by conducting a likelihood ratio test on several partitions of data obtained through step-wise recursive binary segmentation. Critical values for the likelihood ratio test were developed and used to check for significance of the maximum likelihood estimates of the change points. The change point algorithm was found to work best for large datasets, though it also works well for small and medium-sized datasets with little to no error in the location of change points. The algorithm correctly detects changes when present and fails to detect changes when change is absent in actual sense. Power analysis of the likelihood ratio test for change was performed through Monte-Carlo simulation in the single change point setting. Sensitivity analysis of the test power showed that likelihood ratio test is the most powerful when the simulated change points are located mid-way through the sample data as opposed to when changes were located in the periphery. Further, the test is more powerful when the change was located three-quarter-way through the sample data compared to when the change point is closer (quarter-way) to the first observation.展开更多
The Negative Binomial Multiple Change Point Algorithm is a hybrid change detection and estimation approach that works well for overdispersed and equidispersed count data. This simulation study assesses the performance...The Negative Binomial Multiple Change Point Algorithm is a hybrid change detection and estimation approach that works well for overdispersed and equidispersed count data. This simulation study assesses the performance of the NBMCPA under varying sample sizes and locations of true change points. Various performance metrics are calculated based on the change point estimates and used to assess how well the model correctly identifies change points. Errors in estimation of change points are obtained as absolute deviations of known change points from the change points estimated under the algorithm. Algorithm robustness is evaluated through error analysis and visualization techniques including kernel density estimation and computation of metrics such as change point location accuracy, precision, sensitivity and false positive rate. The results show that the model consistently detects change points that are present and does not erroneously detect changes where there are none. Change point location accuracy and precision of the NBMCPA increases with sample size, with best results for medium and large samples. Further model accuracy and precision are highest for changes located in the middle of the dataset compared to changes located in the periphery.展开更多
文摘A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and cause are documented and it is reasonably straightforward to statistically adjust (homogenize) the series for the effects of the changepoint. Sadly many changepoint times are undocumented and the changepoint times themselves are the main purpose of study. In this article, the changepoint analysis in two-phrase linear regression models is developed and discussed. Following Liu and Qian (2010)'s idea in the segmented linear regression models, the modified empirical likelihood ratio statistic is proposed to test if there exists a changepoint during the long-term experiment and observation. The modified empirical likelihood ratio statistic is computation-friendly and its ρ-value can be easily approximated based on the large sample properties. The procedure is applied to the Old Faithful geyser eruption data in October 1980.
基金supported bythe National Natural Science Foundation of China(40605021)National Science and Technology Supporting Item project (2007BAC29B01)
文摘Annually averaged daily maximum and minimum surface temperatures from southeastern China were evaluated for artificial discontinuities using three different tests for undocumented changepoints. Changepoints in the time series were identified by comparing each target series to a reference calculated from values observed at a number of nearby stations. Under the assumption that no trend was present in the sequence of target-reference temperature differences, a changepoint was assigned to the target series when at least two of the three tests rejected the null hypothesis of no changepoint at approximately the same position in the difference series. Each target series then was adjusted using a procedure that accounts for discontinuities in average temperature values from nearby stations that otherwise could bias estimates of the magnitude of the target series step change. A spatial comparison of linear temperature trends in the adjusted annual temperature series suggests that major relative discontinuities were removed in the homogenization process. A greater number of relative change points were detected in annual average minimum than in average maximum temperature series. Some evidence is presented which suggests that minimum surface temperature fields may be more sensitive to changes in measurement practice than maximum temperature fields. In addition, given previous evidence of urban heat island (i.e., local) trends in this region, the assumption of no slope in a target-reference difference series is likely to be violated more frequently in minimum than in maximum temperature series. Consequently, there may be greater potential to confound trend and step changes in minimum temperature series.
文摘Count data is almost always over-dispersed where the variance exceeds the mean. Several count data models have been proposed by researchers but the problem of over-dispersion still remains unresolved, more so in the context of change point analysis. This study develops a likelihood-based algorithm that detects and estimates multiple change points in a set of count data assumed to follow the Negative Binomial distribution. Discrete change point procedures discussed in literature work well for equi-dispersed data. The new algorithm produces reliable estimates of change points in cases of both equi-dispersed and over-dispersed count data;hence its advantage over other count data change point techniques. The Negative Binomial Multiple Change Point Algorithm was tested using simulated data for different sample sizes and varying positions of change. Changes in the distribution parameters were detected and estimated by conducting a likelihood ratio test on several partitions of data obtained through step-wise recursive binary segmentation. Critical values for the likelihood ratio test were developed and used to check for significance of the maximum likelihood estimates of the change points. The change point algorithm was found to work best for large datasets, though it also works well for small and medium-sized datasets with little to no error in the location of change points. The algorithm correctly detects changes when present and fails to detect changes when change is absent in actual sense. Power analysis of the likelihood ratio test for change was performed through Monte-Carlo simulation in the single change point setting. Sensitivity analysis of the test power showed that likelihood ratio test is the most powerful when the simulated change points are located mid-way through the sample data as opposed to when changes were located in the periphery. Further, the test is more powerful when the change was located three-quarter-way through the sample data compared to when the change point is closer (quarter-way) to the first observation.
文摘The Negative Binomial Multiple Change Point Algorithm is a hybrid change detection and estimation approach that works well for overdispersed and equidispersed count data. This simulation study assesses the performance of the NBMCPA under varying sample sizes and locations of true change points. Various performance metrics are calculated based on the change point estimates and used to assess how well the model correctly identifies change points. Errors in estimation of change points are obtained as absolute deviations of known change points from the change points estimated under the algorithm. Algorithm robustness is evaluated through error analysis and visualization techniques including kernel density estimation and computation of metrics such as change point location accuracy, precision, sensitivity and false positive rate. The results show that the model consistently detects change points that are present and does not erroneously detect changes where there are none. Change point location accuracy and precision of the NBMCPA increases with sample size, with best results for medium and large samples. Further model accuracy and precision are highest for changes located in the middle of the dataset compared to changes located in the periphery.