In this paper,we consider the limit distribution of the error density function estima-tor in the rst-order autoregressive models with negatively associated and positively associated random errors.Under mild regularity...In this paper,we consider the limit distribution of the error density function estima-tor in the rst-order autoregressive models with negatively associated and positively associated random errors.Under mild regularity assumptions,some asymptotic normality results of the residual density estimator are obtained when the autoregressive models are stationary process and explosive process.In order to illustrate these results,some simulations such as con dence intervals and mean integrated square errors are provided in this paper.It shows that the residual density estimator can replace the density\estimator"which contains errors.展开更多
Based on analyzing the limitations of the commonly used back-propagation neural network (BPNN), a wavelet neural network (WNN) is adopted as the nonlinear river channel flood forecasting method replacing the BPNN....Based on analyzing the limitations of the commonly used back-propagation neural network (BPNN), a wavelet neural network (WNN) is adopted as the nonlinear river channel flood forecasting method replacing the BPNN. The WNN has the characteristics of fast convergence and improved capability of nonlinear approximation. For the purpose of adapting the timevarying characteristics of flood routing, the WNN is coupled with an AR real-time correction model. The AR model is utilized to calculate the forecast error. The coefficients of the AR real-time correction model are dynamically updated by an adaptive fading factor recursive least square(RLS) method. The application of the flood forecasting method in the cross section of Xijiang River at Gaoyao shows its effectiveness.展开更多
Wavelets are applied to detection of the jump points of a regression function in nonlinear autoregressive model x(t) = T(x(t-1)) + epsilon t. By checking the empirical wavelet coefficients of the data,which have signi...Wavelets are applied to detection of the jump points of a regression function in nonlinear autoregressive model x(t) = T(x(t-1)) + epsilon t. By checking the empirical wavelet coefficients of the data,which have significantly large absolute values across fine scale levels, the number of the jump points and locations where the jumps occur are estimated. The jump heights are also estimated. All estimators are shown to be consistent. Wavelet method ia also applied to the threshold AR(1) model(TAR(1)). The simple estimators of the thresholds are given,which are shown to be consistent.展开更多
In this paper, we not only construct the confidence region for parameters in a mixed integer-valued autoregressive process using the empirical likelihood method, but also establish the empirical log-likelihood ratio s...In this paper, we not only construct the confidence region for parameters in a mixed integer-valued autoregressive process using the empirical likelihood method, but also establish the empirical log-likelihood ratio statistic and obtain its limiting distribution. And then, via simulation studies we give coverage probabilities for the parameters of interest. The results show that the empirical likelihood method performs very well.展开更多
Consider the model Yt = βYt-1+g(Yt-2)+εt for 3 〈 t 〈 T. Hereg is anunknown function, β is an unknown parameter, εt are i.i.d, random errors with mean 0 andvariance σ2 and the fourth moment α4, and α4 are ...Consider the model Yt = βYt-1+g(Yt-2)+εt for 3 〈 t 〈 T. Hereg is anunknown function, β is an unknown parameter, εt are i.i.d, random errors with mean 0 andvariance σ2 and the fourth moment α4, and α4 are independent of Y8 for all t ≥ 3 and s = 1, 2.Pseudo-LS estimators σ, σ2T α4τ and D2T of σ^2,α4 and Var(ε2↑3) are respectively constructedbased on piecewise polynomial approximator of g. The weak consistency of α4T and D2T are proved. The asymptotic normality of σ2T is given, i.e., √T(σ2T -σ^2)/DT converges indistribution to N(0, 1). The result can be used to establish large sample interval estimatesof σ^2 or to make large sample tests for σ^2.展开更多
The classical autoregressive(AR)model has been widely applied to predict future data usingmpast observations over five decades.As the classical AR model required m unknown parameters,this paper implements the AR model...The classical autoregressive(AR)model has been widely applied to predict future data usingmpast observations over five decades.As the classical AR model required m unknown parameters,this paper implements the AR model by reducing m parameters to two parameters to obtain a new model with an optimal delay called as the m-delay AR model.We derive the m-delay AR formula for approximating two unknown parameters based on the least squares method and develop an algorithm to determine optimal delay based on a brute-force technique.The performance of them-delay AR model was tested by comparing with the classical AR model.The results,obtained from Monte Carlo simulation using the monthly mean minimum temperature in PerthWestern Australia from the Bureau of Meteorology,are no significant difference compared to those obtained from the classical AR model.This confirms that the m-delay AR model is an effective model for time series analysis.展开更多
Change monitoring of distribution in time series models is an important issue. This paper proposes a procedure for monitoring changes in the error distribution of autoregressive time series, which is based on a weighe...Change monitoring of distribution in time series models is an important issue. This paper proposes a procedure for monitoring changes in the error distribution of autoregressive time series, which is based on a weighed empirical process of residuals with weights equal to the regressors. The asymptotic properties of our monitoring statistic are derived under the null hypothesis of no change in distribution. The finite sample properties are investigated by a simulation. As it turns out, the procedure is not only able to detect distributional changes but also changes in the regression coefficient and mean, Finally, we apply the statistic to a groups of financial data.展开更多
The Extended Exponentially Weighted Moving Average(extended EWMA)control chart is one of the control charts and can be used to quickly detect a small shift.The performance of control charts can be evaluated with the a...The Extended Exponentially Weighted Moving Average(extended EWMA)control chart is one of the control charts and can be used to quickly detect a small shift.The performance of control charts can be evaluated with the average run length(ARL).Due to the deriving explicit formulas for the ARL on a two-sided extended EWMA control chart for trend autoregressive or trend AR(p)model has not been reported previously.The aim of this study is to derive the explicit formulas for the ARL on a two-sided extended EWMA con-trol chart for the trend AR(p)model as well as the trend AR(1)and trend AR(2)models with exponential white noise.The analytical solution accuracy was obtained with the extended EWMA control chart and was compared to the numer-ical integral equation(NIE)method.The results show that the ARL obtained by the explicit formula and the NIE method is hardly different,but the explicit for-mula can help decrease the computational(CPU)time.Furthermore,this is also expanded to comparative performance with the Exponentially Weighted Moving Average(EWMA)control chart.The performance of the extended EWMA control chart is better than the EWMA control chart for all situations,both the trend AR(1)and trend AR(2)models.Finally,the analytical solution of ARL is applied to real-world data in the healthfield,such as COVID-19 data in the United Kingdom and Sweden,to demonstrate the efficacy of the proposed method.展开更多
The use of historical data is important in making the predictions, for instance in the exchange rate. However, in the construction of a model, extreme data or dirtiness of data is inevitable. In this study, AR model i...The use of historical data is important in making the predictions, for instance in the exchange rate. However, in the construction of a model, extreme data or dirtiness of data is inevitable. In this study, AR model is used with the exchange rate historical data (January 2007 until December 2007) for USD/MYR and is divided into 1-, 3- and 6-horizontal months respectively. Since the presence of extreme data will affect the accuracy of the results obtained in a prediction. Therefore, to obtain a more accurate prediction results, the bootstrap approach was implemented by hybrid with AR model coins as the Bootstrap Autoregressive model (BAR). The effectiveness of the proposed model is investigated by comparing the existing and the proposed model through the statistical performance methods which are RMSE, MAE and MAD. The comparison involves 1%, 5% and 10% for each horizontal month. The results showed that the BAR model performed better than the AR model in terms of sensitivity to extreme data, the accuracy of forecasting models, efficiency and predictability of the model prediction. In conclusion, bootstrap method can alleviate the sensitivity of the model to the extreme data, thereby improving the accuracy of forecasting model which also have high prediction efficiency and that can increase the predictability of the model.展开更多
This paper proposes a new method for extracting ENF (electric network frequency) fluctuations from digital audio recordings for the purpose of forensic authentication. It is shown that the extraction of ENF componen...This paper proposes a new method for extracting ENF (electric network frequency) fluctuations from digital audio recordings for the purpose of forensic authentication. It is shown that the extraction of ENF components from audio recordings is realizable by applying a parametric approach based on an AR (autoregressive) model. The proposed method is compared to the existing STFT (short-time Fourier transform) based ENF extraction method. Experimental results from recorded electrical grid signals and recorded audio signals show that the proposed approach can improve the time resolution in the extracted ENF fluctuations and improve the detection of tampering with short alterations in longer audio recordings.展开更多
The identification of the inter-electrode gap size in the high frequency group pulse micro-electrochemical machining (HGPECM) is mainly discussed. The auto-regressive(AR) model of group pulse current flowing acros...The identification of the inter-electrode gap size in the high frequency group pulse micro-electrochemical machining (HGPECM) is mainly discussed. The auto-regressive(AR) model of group pulse current flowing across the cathode and the anode are created under different situations with different processing parameters and inter-electrode gap size. The AR model based on the current signals indicates that the order of the AR model is obviously different relating to the different processing conditions and the inter-electrode gap size; Moreover, it is different about the stability of the dynamic system, i.e. the white noise response of the Green's function of the dynamic system is diverse. In addition, power spectrum method is used in the analysis of the dynamic time series about the current signals with different inter-electrode gap size, the results show that there exists a strongest power spectrum peak, characteristic power spectrum(CPS), to the current signals related to the different inter-electrode gap size in the range of 0~5 kHz. Therefore, the CPS of current signals can implement the identification of the inter-electrode gap.展开更多
In this article,we study the empirical likelihood(EL)method for autoregressive models with spatial errors.The EL ratio statistics are constructed for the parameters of the models.It is shown that the limiting distribu...In this article,we study the empirical likelihood(EL)method for autoregressive models with spatial errors.The EL ratio statistics are constructed for the parameters of the models.It is shown that the limiting distributions of the EL ratio statistics are chi-square distributions,which are used to construct confidence intervals for the parameters of the models.A simulation study is conducted to compare the performances of the EL based and the normal approximation(NA)based confidence intervals.Simulation results show that the confidence intervals based on EL are superior to the NA based confidence intervals.展开更多
Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developi...Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed.Since datasets are an important setup of LLMs,this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes.The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs.Secondly,based on the properties of the pre-train and fine-tune processes,it comments on pre-train datasets from quality,quantity,and relation with models,and comments on fine-tune datasets from quality,quantity,and concerns.This study then critically figures out the problems and research trends that exist in current LLM datasets.The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development.To the best of our knowledge,this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs.The survey offers insights and suggestions to researchers and LLM developers as they build their models,and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.展开更多
This paper considers the random coefficient autoregressive model with time-functional variance noises,hereafter the RCA-TFV model.We first establish the consistency and asymptotic normality of the conditional least sq...This paper considers the random coefficient autoregressive model with time-functional variance noises,hereafter the RCA-TFV model.We first establish the consistency and asymptotic normality of the conditional least squares estimator for the constant coefficient.The semiparametric least squares estimator for the variance of the random coefficient and the nonparametric estimator for the variance function are constructed,and their asymptotic results are reported.A simulation study is presented along with an analysis of real data to assess the performance of our method in finite samples.展开更多
The rapid expansion of tobacco farming poses a significant threat to biodiversity in Yunnan Province,China,a region known for its rich biodiversity.This study aims to understand the trade-offs between tobacco farming ...The rapid expansion of tobacco farming poses a significant threat to biodiversity in Yunnan Province,China,a region known for its rich biodiversity.This study aims to understand the trade-offs between tobacco farming and higher plant species diversity,and to identify priority counties for conservation.We employed an integrated approach combining species distribution modeling,GIS overlay analysis,and empirical spatial regression to em pirically assess the impact of tobacco farming intensity on biodiversity risk.Our findings reveal a compelling negative spatial correlation between tobacco farming expansion and higher plant species diversity.Specifically,southern counties in Wenshan and Honghe prefectures are major priority areas of conservation that exhibit signif icant spatial correlations between biodiversity risks and high tobacco farming intensity.Quantitatively,at county level,a 1%increase in tobacco farming area corresponds to a 0.094%decrease in endemic higher plant species richness across the entire province.These results underscore the need for targeted and region-specific regulations to mitigate biodiversity loss and promote sustainable development in Yunnan Province.The integrated approach used in this study provides a comprehensive assessment of the tobacco-biodiversity trade-offs,offering actionable insights for policymaking.展开更多
The energy sector is the second largest emitter of greenhouse (GHG) gases in Kenya, emitting about 31.2% of GHG emissions in the country. The aim of this study was to model Kenya’s GHG emissions by the energy sector ...The energy sector is the second largest emitter of greenhouse (GHG) gases in Kenya, emitting about 31.2% of GHG emissions in the country. The aim of this study was to model Kenya’s GHG emissions by the energy sector using ARIMA models for forecasting future values. The data used for the study was that of Kenya’s GHG emissions by the energy sector for the period starting from 1970 to 2022 obtained for the International Monetary Fund (IMF) database that was split into training and testing sets using the 80/20 rule for modelling purposes. The best specification for the ARIMA model was identified using Akaike Information Criterion (AIC), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and mean absolute scaled error (MASE). ARIMA (1, 1, 1) was identified as the best model for modelling Kenya’s GHG emissions and forecasting future values. Using this model, Kenya’s GHG emissions by the energy sector were forecasted to increase to a value of about 43.13 million metric tons of carbon dioxide equivalents by 2030. The study, therefore, recommends that Kenya should accelerate the adjustment of industry structure and improve the efficient use of energy, optimize the energy structure and accelerate development and promotion of energy-efficient products to reduce the emission of GHGs by the country’s energy sector.展开更多
Kernel-based slow feature analysis(SFA)methods have been successfully applied in the industrial process fault detection field.However,kernel-based SFA methods have high computational complexity as dealing with nonline...Kernel-based slow feature analysis(SFA)methods have been successfully applied in the industrial process fault detection field.However,kernel-based SFA methods have high computational complexity as dealing with nonlinearity,leading to delays in detecting time-varying data features.Additionally,the uncertain kernel function and kernel parameters limit the ability of the extracted features to express process characteristics,resulting in poor fault detection performance.To alleviate the above problems,a novel randomized auto-regressive dynamic slow feature analysis(RRDSFA)method is proposed to simultaneously monitor the operating point deviations and process dynamic faults,enabling real-time monitoring of data features in industrial processes.Firstly,the proposed Random Fourier mappingbased method achieves more effective nonlinear transformation,contrasting with the current kernelbased RDSFA algorithm that may lead to significant computational complexity.Secondly,a randomized RDSFA model is developed to extract nonlinear dynamic slow features.Furthermore,a Bayesian inference-based overall fault monitoring model including all RRDSFA sub-models is developed to overcome the randomness of random Fourier mapping.Finally,the superiority and effectiveness of the proposed monitoring method are demonstrated through a numerical case and a simulation of continuous stirred tank reactor.展开更多
As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with h...As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.展开更多
This paper is devoted to the goodness-of-fit test for the general autoregressive models in time series. By averaging for the weighted residuals, we construct a score type test which is asymptotically standard chi-squa...This paper is devoted to the goodness-of-fit test for the general autoregressive models in time series. By averaging for the weighted residuals, we construct a score type test which is asymptotically standard chi-squared under the null and has some desirable power properties under the alternatives. Specifically, the test is sensitive to alternatives and can detect the alternatives approaching, along a direction, the null at a rate that is arbitrarily close to n-1/2. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of alternatives. The performance of the tests is evaluated through simulation studies.展开更多
This paper introduces a new model-based soft decoding techniqt, e to restore the widely used joint photographic expert group (JPEG) streams. The image is modeled as a two dimensional (2D) piecewise stationary auto...This paper introduces a new model-based soft decoding techniqt, e to restore the widely used joint photographic expert group (JPEG) streams. The image is modeled as a two dimensional (2D) piecewise stationary autoregressive process, and the decoding task is formulated as a constrained optimization problem. All the constraints are given by the quantization intervals which available at the decoder freely. The autoregressive model serves as an important regularization term of the objective function of the optimization, and the model parameters are solved on the decoded image locally using a weighted total least square method. In addition, a novel bilateral dualside weighting scheme is proposed to minimize the influence of the blocking artifact on the accuracy of parameter estimation. Extensive experimental results suggest that the proposed algorithm systematically improves the quality of JPEG images and also outperforms existing JPEG postprocessing algorithms in a wide bit-rate range both in terms of peak signal-to-noise ratio (PSNR) and subjective quality展开更多
基金supported by the National Natural Science Foundation of China(12131015,12071422)。
文摘In this paper,we consider the limit distribution of the error density function estima-tor in the rst-order autoregressive models with negatively associated and positively associated random errors.Under mild regularity assumptions,some asymptotic normality results of the residual density estimator are obtained when the autoregressive models are stationary process and explosive process.In order to illustrate these results,some simulations such as con dence intervals and mean integrated square errors are provided in this paper.It shows that the residual density estimator can replace the density\estimator"which contains errors.
基金The National Natural Science Foundation of China(No.50479017).
文摘Based on analyzing the limitations of the commonly used back-propagation neural network (BPNN), a wavelet neural network (WNN) is adopted as the nonlinear river channel flood forecasting method replacing the BPNN. The WNN has the characteristics of fast convergence and improved capability of nonlinear approximation. For the purpose of adapting the timevarying characteristics of flood routing, the WNN is coupled with an AR real-time correction model. The AR model is utilized to calculate the forecast error. The coefficients of the AR real-time correction model are dynamically updated by an adaptive fading factor recursive least square(RLS) method. The application of the flood forecasting method in the cross section of Xijiang River at Gaoyao shows its effectiveness.
文摘Wavelets are applied to detection of the jump points of a regression function in nonlinear autoregressive model x(t) = T(x(t-1)) + epsilon t. By checking the empirical wavelet coefficients of the data,which have significantly large absolute values across fine scale levels, the number of the jump points and locations where the jumps occur are estimated. The jump heights are also estimated. All estimators are shown to be consistent. Wavelet method ia also applied to the threshold AR(1) model(TAR(1)). The simple estimators of the thresholds are given,which are shown to be consistent.
基金Supported by National Natural Science Foundation of China(11731015,11571051,J1310022,11501241)Natural Science Foundation of Jilin Province(20150520053JH,20170101057JC,20180101216JC)+2 种基金Program for Changbaishan Scholars of Jilin Province(2015010)Science and Technology Program of Jilin Educational Department during the "13th Five-Year" Plan Period(2016-399)Science and Technology Research Program of Education Department in Jilin Province for the 13th Five-Year Plan(2016213)
文摘In this paper, we not only construct the confidence region for parameters in a mixed integer-valued autoregressive process using the empirical likelihood method, but also establish the empirical log-likelihood ratio statistic and obtain its limiting distribution. And then, via simulation studies we give coverage probabilities for the parameters of interest. The results show that the empirical likelihood method performs very well.
基金Supported by the National Natural Science Foundation of China(60375003) Supported by the Chinese Aviation Foundation(03153059)
文摘Consider the model Yt = βYt-1+g(Yt-2)+εt for 3 〈 t 〈 T. Hereg is anunknown function, β is an unknown parameter, εt are i.i.d, random errors with mean 0 andvariance σ2 and the fourth moment α4, and α4 are independent of Y8 for all t ≥ 3 and s = 1, 2.Pseudo-LS estimators σ, σ2T α4τ and D2T of σ^2,α4 and Var(ε2↑3) are respectively constructedbased on piecewise polynomial approximator of g. The weak consistency of α4T and D2T are proved. The asymptotic normality of σ2T is given, i.e., √T(σ2T -σ^2)/DT converges indistribution to N(0, 1). The result can be used to establish large sample interval estimatesof σ^2 or to make large sample tests for σ^2.
文摘The classical autoregressive(AR)model has been widely applied to predict future data usingmpast observations over five decades.As the classical AR model required m unknown parameters,this paper implements the AR model by reducing m parameters to two parameters to obtain a new model with an optimal delay called as the m-delay AR model.We derive the m-delay AR formula for approximating two unknown parameters based on the least squares method and develop an algorithm to determine optimal delay based on a brute-force technique.The performance of them-delay AR model was tested by comparing with the classical AR model.The results,obtained from Monte Carlo simulation using the monthly mean minimum temperature in PerthWestern Australia from the Bureau of Meteorology,are no significant difference compared to those obtained from the classical AR model.This confirms that the m-delay AR model is an effective model for time series analysis.
基金Supported by the National Natural Science Foundation of China(Grant No.11301291)the Open Fund of State Key Laboratory of Remote Sensing Science of China(Grant No.OFSLRSS201206)
文摘Change monitoring of distribution in time series models is an important issue. This paper proposes a procedure for monitoring changes in the error distribution of autoregressive time series, which is based on a weighed empirical process of residuals with weights equal to the regressors. The asymptotic properties of our monitoring statistic are derived under the null hypothesis of no change in distribution. The finite sample properties are investigated by a simulation. As it turns out, the procedure is not only able to detect distributional changes but also changes in the regression coefficient and mean, Finally, we apply the statistic to a groups of financial data.
基金Thailand Science ResearchInnovation Fund,and King Mongkut's University of Technology North Bangkok Contract No.KMUTNB-FF-65-45.
文摘The Extended Exponentially Weighted Moving Average(extended EWMA)control chart is one of the control charts and can be used to quickly detect a small shift.The performance of control charts can be evaluated with the average run length(ARL).Due to the deriving explicit formulas for the ARL on a two-sided extended EWMA control chart for trend autoregressive or trend AR(p)model has not been reported previously.The aim of this study is to derive the explicit formulas for the ARL on a two-sided extended EWMA con-trol chart for the trend AR(p)model as well as the trend AR(1)and trend AR(2)models with exponential white noise.The analytical solution accuracy was obtained with the extended EWMA control chart and was compared to the numer-ical integral equation(NIE)method.The results show that the ARL obtained by the explicit formula and the NIE method is hardly different,but the explicit for-mula can help decrease the computational(CPU)time.Furthermore,this is also expanded to comparative performance with the Exponentially Weighted Moving Average(EWMA)control chart.The performance of the extended EWMA control chart is better than the EWMA control chart for all situations,both the trend AR(1)and trend AR(2)models.Finally,the analytical solution of ARL is applied to real-world data in the healthfield,such as COVID-19 data in the United Kingdom and Sweden,to demonstrate the efficacy of the proposed method.
文摘The use of historical data is important in making the predictions, for instance in the exchange rate. However, in the construction of a model, extreme data or dirtiness of data is inevitable. In this study, AR model is used with the exchange rate historical data (January 2007 until December 2007) for USD/MYR and is divided into 1-, 3- and 6-horizontal months respectively. Since the presence of extreme data will affect the accuracy of the results obtained in a prediction. Therefore, to obtain a more accurate prediction results, the bootstrap approach was implemented by hybrid with AR model coins as the Bootstrap Autoregressive model (BAR). The effectiveness of the proposed model is investigated by comparing the existing and the proposed model through the statistical performance methods which are RMSE, MAE and MAD. The comparison involves 1%, 5% and 10% for each horizontal month. The results showed that the BAR model performed better than the AR model in terms of sensitivity to extreme data, the accuracy of forecasting models, efficiency and predictability of the model prediction. In conclusion, bootstrap method can alleviate the sensitivity of the model to the extreme data, thereby improving the accuracy of forecasting model which also have high prediction efficiency and that can increase the predictability of the model.
文摘This paper proposes a new method for extracting ENF (electric network frequency) fluctuations from digital audio recordings for the purpose of forensic authentication. It is shown that the extraction of ENF components from audio recordings is realizable by applying a parametric approach based on an AR (autoregressive) model. The proposed method is compared to the existing STFT (short-time Fourier transform) based ENF extraction method. Experimental results from recorded electrical grid signals and recorded audio signals show that the proposed approach can improve the time resolution in the extracted ENF fluctuations and improve the detection of tampering with short alterations in longer audio recordings.
基金This project is supported by the 10th Five-year Plan Pre-research Project Foundation of China Weapon Industry Company, China(No.42001080701).
文摘The identification of the inter-electrode gap size in the high frequency group pulse micro-electrochemical machining (HGPECM) is mainly discussed. The auto-regressive(AR) model of group pulse current flowing across the cathode and the anode are created under different situations with different processing parameters and inter-electrode gap size. The AR model based on the current signals indicates that the order of the AR model is obviously different relating to the different processing conditions and the inter-electrode gap size; Moreover, it is different about the stability of the dynamic system, i.e. the white noise response of the Green's function of the dynamic system is diverse. In addition, power spectrum method is used in the analysis of the dynamic time series about the current signals with different inter-electrode gap size, the results show that there exists a strongest power spectrum peak, characteristic power spectrum(CPS), to the current signals related to the different inter-electrode gap size in the range of 0~5 kHz. Therefore, the CPS of current signals can implement the identification of the inter-electrode gap.
基金supported by the Natural Science Foundation of Guangxi(2022GXNSFAA035556)the National Natural Science Foundation of China(12161009,12061017)+1 种基金Center for Applied Mathematics of Guangxi(Guangxi Normal University)Key Laboratory of Interdisciplinary Research for Data Science,Education Department of Guangxi Zhuang Autonomous Region.
文摘In this article,we study the empirical likelihood(EL)method for autoregressive models with spatial errors.The EL ratio statistics are constructed for the parameters of the models.It is shown that the limiting distributions of the EL ratio statistics are chi-square distributions,which are used to construct confidence intervals for the parameters of the models.A simulation study is conducted to compare the performances of the EL based and the normal approximation(NA)based confidence intervals.Simulation results show that the confidence intervals based on EL are superior to the NA based confidence intervals.
文摘Since OpenAI opened access to ChatGPT,large language models(LLMs)become an increasingly popular topic attracting researchers’attention from abundant domains.However,public researchers meet some problems when developing LLMs given that most of the LLMs are produced by industries and the training details are typically unrevealed.Since datasets are an important setup of LLMs,this paper does a holistic survey on the training datasets used in both the pre-train and fine-tune processes.The paper first summarizes 16 pre-train datasets and 16 fine-tune datasets used in the state-of-the-art LLMs.Secondly,based on the properties of the pre-train and fine-tune processes,it comments on pre-train datasets from quality,quantity,and relation with models,and comments on fine-tune datasets from quality,quantity,and concerns.This study then critically figures out the problems and research trends that exist in current LLM datasets.The study helps public researchers train and investigate LLMs by visual cases and provides useful comments to the research community regarding data development.To the best of our knowledge,this paper is the first to summarize and discuss datasets used in both autoregressive and chat LLMs.The survey offers insights and suggestions to researchers and LLM developers as they build their models,and contributes to the LLM study by pointing out the existing problems of LLM studies from the perspective of data.
基金supported by the National Natural Science Foundation of China(Grant No.52338009)the National Science Fund for Distinguished Young Scholars(Grant No.52025085)+4 种基金the Graduate Research Innovation Project of Hunan Province(Grant No.CX20220952)Xiaohui Liu’s research is supported by the NSF of China(Grant No.11971208)the National Social Science Foundation of China(Grant No.21&ZD152)the Outstanding Youth Fund Project of the Science and Technology Department of Jiangxi Province(Grant No.20224ACB211003)the NSF of China(Grant No.92358303).
文摘This paper considers the random coefficient autoregressive model with time-functional variance noises,hereafter the RCA-TFV model.We first establish the consistency and asymptotic normality of the conditional least squares estimator for the constant coefficient.The semiparametric least squares estimator for the variance of the random coefficient and the nonparametric estimator for the variance function are constructed,and their asymptotic results are reported.A simulation study is presented along with an analysis of real data to assess the performance of our method in finite samples.
文摘The rapid expansion of tobacco farming poses a significant threat to biodiversity in Yunnan Province,China,a region known for its rich biodiversity.This study aims to understand the trade-offs between tobacco farming and higher plant species diversity,and to identify priority counties for conservation.We employed an integrated approach combining species distribution modeling,GIS overlay analysis,and empirical spatial regression to em pirically assess the impact of tobacco farming intensity on biodiversity risk.Our findings reveal a compelling negative spatial correlation between tobacco farming expansion and higher plant species diversity.Specifically,southern counties in Wenshan and Honghe prefectures are major priority areas of conservation that exhibit signif icant spatial correlations between biodiversity risks and high tobacco farming intensity.Quantitatively,at county level,a 1%increase in tobacco farming area corresponds to a 0.094%decrease in endemic higher plant species richness across the entire province.These results underscore the need for targeted and region-specific regulations to mitigate biodiversity loss and promote sustainable development in Yunnan Province.The integrated approach used in this study provides a comprehensive assessment of the tobacco-biodiversity trade-offs,offering actionable insights for policymaking.
文摘The energy sector is the second largest emitter of greenhouse (GHG) gases in Kenya, emitting about 31.2% of GHG emissions in the country. The aim of this study was to model Kenya’s GHG emissions by the energy sector using ARIMA models for forecasting future values. The data used for the study was that of Kenya’s GHG emissions by the energy sector for the period starting from 1970 to 2022 obtained for the International Monetary Fund (IMF) database that was split into training and testing sets using the 80/20 rule for modelling purposes. The best specification for the ARIMA model was identified using Akaike Information Criterion (AIC), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and mean absolute scaled error (MASE). ARIMA (1, 1, 1) was identified as the best model for modelling Kenya’s GHG emissions and forecasting future values. Using this model, Kenya’s GHG emissions by the energy sector were forecasted to increase to a value of about 43.13 million metric tons of carbon dioxide equivalents by 2030. The study, therefore, recommends that Kenya should accelerate the adjustment of industry structure and improve the efficient use of energy, optimize the energy structure and accelerate development and promotion of energy-efficient products to reduce the emission of GHGs by the country’s energy sector.
基金supported by the Program of National Natural Science Foundation of China(U23A20329,62163036)Youth Academic and Technical Leaders Reserve Talent Training project(202105AC160094)Industrial Innovation Talent Special Project of Xingdian Talent Support Program(XDYC-CYCX-2022-0010).
文摘Kernel-based slow feature analysis(SFA)methods have been successfully applied in the industrial process fault detection field.However,kernel-based SFA methods have high computational complexity as dealing with nonlinearity,leading to delays in detecting time-varying data features.Additionally,the uncertain kernel function and kernel parameters limit the ability of the extracted features to express process characteristics,resulting in poor fault detection performance.To alleviate the above problems,a novel randomized auto-regressive dynamic slow feature analysis(RRDSFA)method is proposed to simultaneously monitor the operating point deviations and process dynamic faults,enabling real-time monitoring of data features in industrial processes.Firstly,the proposed Random Fourier mappingbased method achieves more effective nonlinear transformation,contrasting with the current kernelbased RDSFA algorithm that may lead to significant computational complexity.Secondly,a randomized RDSFA model is developed to extract nonlinear dynamic slow features.Furthermore,a Bayesian inference-based overall fault monitoring model including all RRDSFA sub-models is developed to overcome the randomness of random Fourier mapping.Finally,the superiority and effectiveness of the proposed monitoring method are demonstrated through a numerical case and a simulation of continuous stirred tank reactor.
基金supported by the Program of Support Xinjiang by Technology(2024E02028,B2-2024-0359)Xinjiang Tianchi Talent Program of 2024,the Foundation of Chinese Academy of Sciences(B2-2023-0239)the Youth Foundation of Shandong Natural Science(ZR2023QD070).
文摘As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.
基金grant from the Research Grants Council of Hong Kong
文摘This paper is devoted to the goodness-of-fit test for the general autoregressive models in time series. By averaging for the weighted residuals, we construct a score type test which is asymptotically standard chi-squared under the null and has some desirable power properties under the alternatives. Specifically, the test is sensitive to alternatives and can detect the alternatives approaching, along a direction, the null at a rate that is arbitrarily close to n-1/2. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of alternatives. The performance of the tests is evaluated through simulation studies.
基金supported by the National Natural Science Foundation of China(61033004,61070138,61072104,61003148)
文摘This paper introduces a new model-based soft decoding techniqt, e to restore the widely used joint photographic expert group (JPEG) streams. The image is modeled as a two dimensional (2D) piecewise stationary autoregressive process, and the decoding task is formulated as a constrained optimization problem. All the constraints are given by the quantization intervals which available at the decoder freely. The autoregressive model serves as an important regularization term of the objective function of the optimization, and the model parameters are solved on the decoded image locally using a weighted total least square method. In addition, a novel bilateral dualside weighting scheme is proposed to minimize the influence of the blocking artifact on the accuracy of parameter estimation. Extensive experimental results suggest that the proposed algorithm systematically improves the quality of JPEG images and also outperforms existing JPEG postprocessing algorithms in a wide bit-rate range both in terms of peak signal-to-noise ratio (PSNR) and subjective quality