To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen s...To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen set of models accurately represents the‘true’distribution of considered observables.Furthermore,the models are chosen globally,indicating their applicability across the entire energy range of interest.However,this approach overlooks uncertainties inherent in the models themselves.In this work,we propose that instead of selecting globally a winning model set and proceeding with it as if it was the‘true’model set,we,instead,take a weighted average over multiple models within a Bayesian model averaging(BMA)framework,each weighted by its posterior probability.The method involves executing a set of TALYS calculations by randomly varying multiple nuclear physics models and their parameters to yield a vector of calculated observables.Next,computed likelihood function values at each incident energy point were then combined with the prior distributions to obtain updated posterior distributions for selected cross sections and the elastic angular distributions.As the cross sections and elastic angular distributions were updated locally on a per-energy-point basis,the approach typically results in discontinuities or“kinks”in the cross section curves,and these were addressed using spline interpolation.The proposed BMA method was applied to the evaluation of proton-induced reactions on ^(58)Ni between 1 and 100 MeV.The results demonstrated a favorable comparison with experimental data as well as with the TENDL-2023 evaluation.展开更多
Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating a...Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating and integrating precipitation datasets from different sources to accurately characterize precipitation patterns has become a challenge to provide more accurate and alternative precipitation information for the region,which can even improve the performance of hydrological modelling.This study evaluated the applicability of widely used five satellite-based precipitation products(Climate Hazards Group InfraRed Precipitation with Station(CHIRPS),China Meteorological Forcing Dataset(CMFD),Climate Prediction Center morphing method(CMORPH),Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record(PERSIANN-CDR),and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis(TMPA))and a reanalysis precipitation dataset(ECMWF Reanalysis v5-Land Dataset(ERA5-Land))in Xinjiang using ground-based observational precipitation data from a limited number of meteorological stations.Based on this assessment,we proposed a framework that integrated different precipitation datasets with varying spatial resolutions using a dynamic Bayesian model averaging(DBMA)approach,the expectation-maximization method,and the ordinary Kriging interpolation method.The daily precipitation data merged using the DBMA approach exhibited distinct spatiotemporal variability,with an outstanding performance,as indicated by low root mean square error(RMSE=1.40 mm/d)and high Person's correlation coefficient(CC=0.67).Compared with the traditional simple model averaging(SMA)and individual product data,although the DBMA-fused precipitation data were slightly lower than the best precipitation product(CMFD),the overall performance of DBMA was more robust.The error analysis between DBMA-fused precipitation dataset and the more advanced Integrated Multi-satellite Retrievals for Global Precipitation Measurement Final(IMERG-F)precipitation product,as well as hydrological simulations in the Ebinur Lake Basin,further demonstrated the superior performance of DBMA-fused precipitation dataset in the entire Xinjiang region.The proposed framework for solving the fusion problem of multi-source precipitation data with different spatial resolutions is feasible for application in inland arid areas,and aids in obtaining more accurate regional hydrological information and improving regional water resources management capabilities and meteorological research in these regions.展开更多
In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity condi...In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.展开更多
Gross primary production(GPP) plays a crucial part in the carbon cycle of terrestrial ecosystems.A set of validated monthly GPP data from 1957 to 2010 in 0.5°× 0.5° grids of China was weighted from the ...Gross primary production(GPP) plays a crucial part in the carbon cycle of terrestrial ecosystems.A set of validated monthly GPP data from 1957 to 2010 in 0.5°× 0.5° grids of China was weighted from the Multi-scale Terrestrial Model Intercomparison Project using Bayesian model averaging(BMA).The spatial anomalies of detrended BMA GPP during the growing seasons of typical El Nino years indicated that GPP response to El Nino varies with Pacific Decadal Oscillation(PDO) phases: when the PDO was in the cool phase,it was likely that GPP was greater in northern China(32°–38°N,111°–122°E) and less in the Yangtze River valley(28°–32°N,111°–122°E);in contrast,when PDO was in the warm phase,the GPP anomalies were usually reversed in these two regions.The consistent spatiotemporal pattern and high partial correlation revealed that rainfall dominated this phenomenon.The previously published findings on how El Nino during different phases of PDO affecting rainfall in eastern China make the statistical relationship between GPP and El Nino in this study theoretically credible.This paper not only introduces an effective way to use BMA in grids that have mixed plant function types,but also makes it possible to evaluate the carbon cycle in eastern China based on the prediction of El Nino and PDO.展开更多
The choices of the parameterizations for each component in a microwave emission model have significant effects on the quality of brightness temperature (Tb) sim- ulation. How to reduce the uncertainty in the Tb simu...The choices of the parameterizations for each component in a microwave emission model have significant effects on the quality of brightness temperature (Tb) sim- ulation. How to reduce the uncertainty in the Tb simulation is investigated by adopting a statistical post-processing procedure with the Bayesian model averaging (BMA) ensemble approach. The simulations by the community microwave emission model (CMEM) cou- pled with the community land model version 4.5 (CLM4.5) over China's Mainland are con- ducted by the 24 configurations from four vegetation opacity parameterizations (VOPs), three soil dielectric constant parameterizations (SDCPs), and two soil roughness param- eterizations (SRPs). Compared with the simple arithmetical averaging (SAA) method, the BMA reconstructions have a higher spatial correlation coefficient (larger than 0.99) than the C-band satellite observations of the advanced microwave scanning radiometer on the Earth observing system (AMSR-E) at the vertical polarization. Moreover, the BMA product performs the best among the ensemble members for all vegetation classes, with a mean root-mean-square difference (RMSD) of 4 K and a temporal correlation coefficient of 0.64.展开更多
The ability to estimate terrestrial water storage(TWS)is essential for monitoring hydrological extremes(e.g.,droughts and floods)and predicting future changes in the hydrological cycle.However,inadequacies in model ph...The ability to estimate terrestrial water storage(TWS)is essential for monitoring hydrological extremes(e.g.,droughts and floods)and predicting future changes in the hydrological cycle.However,inadequacies in model physics and parameters,as well as uncertainties in meteorological forcing data,commonly limit the ability of land surface models(LSMs)to accurately simulate TWS.In this study,the authors show how simulations of TWS anomalies(TWSAs)from multiple meteorological forcings and multiple LSMs can be combined in a Bayesian model averaging(BMA)ensemble approach to improve monitoring and predictions.Simulations using three forcing datasets and two LSMs were conducted over China's Mainland for the period 1979–2008.All the simulations showed good temporal correlations with satellite observations from the Gravity Recovery and Climate Experiment during 2004–08.The correlation coefficient ranged between 0.5 and 0.8 in the humid regions(e.g.,the Yangtze river basin,Huaihe basin,and Zhujiang basin),but was much lower in the arid regions(e.g.,the Heihe basin and Tarim river basin).The BMA ensemble approach performed better than all individual member simulations.It captured the spatial distribution and temporal variations of TWSAs over China's Mainland and the eight major river basins very well;plus,it showed the highest R value(>0.5)over most basins and the lowest root-mean-square error value(<40 mm)in all basins of China.The good performance of the BMA ensemble approach shows that it is a promising way to reproduce long-term,high-resolution spatial and temporal TWSA data.展开更多
Climate change in mountainous regions has significant impacts on hydrological and ecological systems. This research studied the future temperature, precipitation and snowfall in the 21^(st) century for the Tianshan ...Climate change in mountainous regions has significant impacts on hydrological and ecological systems. This research studied the future temperature, precipitation and snowfall in the 21^(st) century for the Tianshan and northern Kunlun Mountains(TKM) based on the general circulation model(GCM) simulation ensemble from the coupled model intercomparison project phase 5(CMIP5) under the representative concentration pathway(RCP) lower emission scenario RCP4.5 and higher emission scenario RCP8.5 using the Bayesian model averaging(BMA) technique. Results show that(1) BMA significantly outperformed the simple ensemble analysis and BMA mean matches all the three observed climate variables;(2) at the end of the 21^(st) century(2070–2099) under RCP8.5, compared to the control period(1976–2005), annual mean temperature and mean annual precipitation will rise considerably by 4.8°C and 5.2%, respectively, while mean annual snowfall will dramatically decrease by 26.5%;(3) precipitation will increase in the northern Tianshan region while decrease in the Amu Darya Basin. Snowfall will significantly decrease in the western TKM. Mean annual snowfall fraction will also decrease from 0.56 of 1976–2005 to 0.42 of 2070–2099 under RCP8.5; and(4) snowfall shows a high sensitivity to temperature in autumn and spring while a low sensitivity in winter, with the highest sensitivity values occurring at the edge areas of TKM. The projections mean that flood risk will increase and solid water storage will decrease.展开更多
Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator ar...Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estimator are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.展开更多
In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually ...In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually referred to as post-model selection inference. The shortcomings of such practice are widely recognized, finding a general solution is extremely challenging. We propose a model averaging alternative consisting on taking into account model selection probability and the like-lihood in assigning the weights. The approach is applied to Bernoulli trials and outperforms Akaike weights model averaging and post-model selection estimators.展开更多
Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.Th...Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.The divide and conquer trick,which combines the estimator in each subset with equal weight,is commonly applied in distributed data.To overcome multicollinearity and improve estimation accuracy in the presence of distributed data,we propose a Mallows-type model averaging method for ridge regressions,which combines estimators from all subsets.Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent.The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived.Furthermore,the asymptotic normality of the model averaging estimator is demonstrated.Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.展开更多
Multivariate time series forecasting holds substantial practical significance,facilitates precise predictions,and informs decision-making.The complexity of nonlinear relationships and the presence of higher-order feat...Multivariate time series forecasting holds substantial practical significance,facilitates precise predictions,and informs decision-making.The complexity of nonlinear relationships and the presence of higher-order features in multivariate time series data have sparked a burgeoning interest in leveraging deep learning approaches for such forecasting tasks.Existing methods often use pre-scaled neural networks,whose reliability and generalization can pose a challenge.In this study,the authors propose an instance-wise graph-based Mallows model averaging(IGMMA)framework for multivariate time series prediction.The framework incorporates a model averaging module into the network,where extracted features are utilized as inputs for candidate linear models.These linear models are combined with weights to create a new linear layer,forming a novel graph neural network model.Moreover,the network loss function is modified based on the Mallows criterion,where penalties are imposed on the parameters and the weights separately.The authors use the proposed method to predict multicommodity futures prices,and the empirical results show that IGMMA has superior predictive accuracy even when small neural networks are used.This indicates that the model averaging module significantly reduces the parameters required for deep learning training,which enables the training of multiple small models as an alternative to training a large model.展开更多
In recent years,Kriging model has gained wide popularity in various fields such as space geology,econometrics,and computer experiments.As a result,research on this model has proliferated.In this paper,the authors prop...In recent years,Kriging model has gained wide popularity in various fields such as space geology,econometrics,and computer experiments.As a result,research on this model has proliferated.In this paper,the authors propose a model averaging estimation based on the best linear unbiased prediction of Kriging model and the leave-one-out cross-validation method,with consideration for the model uncertainty.The authors present a weight selection criterion for the model averaging estimation and provide two theoretical justifications for the proposed method.First,the estimated weight based on the proposed criterion is asymptotically optimal in achieving the lowest possible prediction risk.Second,the proposed method asymptotically assigns all weights to the correctly specified models when the candidate model set includes these models.The effectiveness of the proposed method is verified through numerical analyses.展开更多
Prediction plays an important role in data analysis.Model averaging method generally provides better prediction than using any of its components.Even though model averaging has been extensively investigated under inde...Prediction plays an important role in data analysis.Model averaging method generally provides better prediction than using any of its components.Even though model averaging has been extensively investigated under independent errors,few authors have considered model averaging for semiparametric models with correlated errors.In this paper,the authors offer an optimal model averaging method to improve the prediction in partially linear model for longitudinal data.The model averaging weights are obtained by minimizing criterion,which is an unbiased estimator of the expected in-sample squared error loss plus a constant.Asymptotic properties,including asymptotic optimality and consistency of averaging weights,are established under two scenarios:(i)All candidate models are misspecified;(ii)Correct models are available in the candidate set.Simulation studies and an empirical example show that the promise of the proposed procedure over other competitive methods.展开更多
In the past two decades,model averaging,as a way to solve model uncertainty,has attracted more and more attention.In this paper,the authors propose a jackknife model averaging(JMA) method for the quantile single-index...In the past two decades,model averaging,as a way to solve model uncertainty,has attracted more and more attention.In this paper,the authors propose a jackknife model averaging(JMA) method for the quantile single-index coefficient model,which is widely used in statistics.Under model misspecification,the model averaging estimator is proved to be asymptotically optimal in terms of minimizing out-of-sample quantile loss.Simulation experiments are conducted to compare the JMA method with several model selections and model averaging methods,and the results show that the proposed method has a satisfactory performance.The method is also applied to a real dataset.展开更多
The dissemination of news is a vital topic in management science,social science and data science.With the development of technology,the sample sizes and dimensions of digital news data increase remarkably.To alleviate...The dissemination of news is a vital topic in management science,social science and data science.With the development of technology,the sample sizes and dimensions of digital news data increase remarkably.To alleviate the computational burden in big data,this paper proposes a method to deal with massive and moderate-dimensional data for linear regression models via combing model averaging and subsampling methodologies.The author first samples a subsample from the full data according to some special probabilities and split covariates into several groups to construct candidate models.Then,the author solves each candidate model and calculates the model-averaging weights to combine these estimators based on this subsample.Additionally,the asymptotic optimality in subsampling form is proved and the way to calculate optimal subsampling probabilities is provided.The author also illustrates the proposed method via simulations,which shows it takes less running time than that of the full data and generates more accurate estimations than uniform subsampling.Finally,the author applies the proposed method to analyze and predict the sharing number of news,and finds the topic,vocabulary and dissemination time are the determinants.展开更多
In this paper,the authors propose a frequentist model averaging method for composite quantile regression with diverging number of parameters.Different from the traditional model averaging for quantile regression which...In this paper,the authors propose a frequentist model averaging method for composite quantile regression with diverging number of parameters.Different from the traditional model averaging for quantile regression which considers only a single quantile,the proposed model averaging estimator is based on multiple quantiles.The well-known delete-one cross-validation or jackknife approach is applied to estimate the model weights.The resultant jackknife model averaging estimator is shown to be asymptotically optimal in terms of minimizing the out-of-sample composite final prediction error.Simulation studies are conducted to demonstrate the finite sample performance of the new model averaging estimator.The proposed method is also applied to the analysis of the stock returns data and the wage data.展开更多
The bending capacity of the precast decks is greatly dependent on the flexural strength exhibited by the joints between them.However,due to the complexity and diversity of this system,precise predictive models are cur...The bending capacity of the precast decks is greatly dependent on the flexural strength exhibited by the joints between them.However,due to the complexity and diversity of this system,precise predictive models are currently unavailable.This study introduces an effective and precise methodology for assessing flexural strength using Monte Carlo Model Averaging(MCMA),a statistical technique that combines the strengths of model averaging(MA)and Monte Carlo simulation.To construct the MCMA model,input variables were derived by analyzing the experimental results,and a database of 433 bending test specimens was compiled.The MCMA model incorporated four different machine learning models,namely decision tree(DT),linear regression(LR),adaptive boosting(AdaBoost),and multilayer perceptron(MLP).Comparative analyses revealed that the MCMA model outperformed baseline models(DT,AdaBoost,LR,and MLP)across all employed metrics.The impact of three different categories on flexural capacity was explored through boxplot analysis.Furthermore,a comparison between the MCMA model and the strut and tie model highlighted the superior performance of the MCMA model.The impact of input variables on the flexural strength prediction was further examined through Shapley Additive exPlanations based feature importance and global interpretation,as well as parametric study.展开更多
Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the ...Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the forecast factors of forecast models.Secondly,the O_(3)-8h concentration in Baoding City in 2021 was predicted based on the constructed models of multiple linear regression(MLR),backward propagation neural network(BPNN),and auto regressive integrated moving average(ARIMA),and the predicted values were compared with the observed values to test their prediction effects.The results show that overall,the MLR,BPNN and ARIMA models were able to forecast the changing trend of O_(3)-8h concentration in Baoding in 2021,but the BPNN model gave better forecast results than the ARIMA and MLR models,especially for the prediction of the high values of O_(3)-8h concentration,and the correlation coefficients between the predicted values and the observed values were all higher than 0.9 during June-September.The mean error(ME),mean absolute error(MAE),and root mean square error(RMSE)of the predicted values and the observed values of daily O_(3)-8h concentration based on the BPNN model were 0.45,19.11 and 24.41μg/m 3,respectively,which were significantly better than those of the MLR and ARIMA models.The prediction effects of the MLR,BPNN and ARIMA models were the best at the pollution level,followed by the excellent level,and it was the worst at the good level.In comparison,the prediction effect of BPNN model was better than that of the MLR and ARIMA models as a whole,especially for the pollution and excellent levels.The TS scores of the BPNN model were all above 66%,and the PC values were above 86%.The BPNN model can forecast the changing trend of O_(3)concentration more accurately,and has a good practical application value,but at the same time,the predicted high values of O_(3)concentration should be appropriately increased according to error characteristics of the model.展开更多
KaKs_Calculator is a software package that calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates through model selection and model averaging. Since existing methods for this estimation adopt their s...KaKs_Calculator is a software package that calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates through model selection and model averaging. Since existing methods for this estimation adopt their specific mutation (substitution) models that consider different evolutionary features, leading to diverse estimates, KaKs_Calculator implements a set of candidate models in a maximum likelihood framework and adopts the Akaike information criterion to measure fitness between models and data, aiming to include as many features as needed for accurately capturing evolutionary information in protein-coding sequences. In addition, several existing methods for calculating Ka and Ks are also incorporated into this software. KaKs_Calculator, including source codes, compiled executables, and documentation, is freely available for academic use at http://evolution.genomics.org.cn/software.htm.展开更多
In applications, the traditional estimation procedure generally begins with model selection.Once a specific model is selected, subsequent estimation is conducted under the selected model withoutconsideration of the un...In applications, the traditional estimation procedure generally begins with model selection.Once a specific model is selected, subsequent estimation is conducted under the selected model withoutconsideration of the uncertainty from the selection process. This often leads to the underreportingof variability and too optimistic confidence sets. Model averaging estimation is an alternative to thisprocedure, which incorporates model uncertainty into the estimation process. In recent years, therehas been a rising interest in model averaging from the frequentist perspective, and some importantprogresses have been made. In this paper, the theory and methods on frequentist model averagingestimation are surveyed. Some future research topics are also discussed.展开更多
基金funding from the Paul ScherrerInstitute,Switzerland through the NES/GFA-ABE Cross Project。
文摘To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen set of models accurately represents the‘true’distribution of considered observables.Furthermore,the models are chosen globally,indicating their applicability across the entire energy range of interest.However,this approach overlooks uncertainties inherent in the models themselves.In this work,we propose that instead of selecting globally a winning model set and proceeding with it as if it was the‘true’model set,we,instead,take a weighted average over multiple models within a Bayesian model averaging(BMA)framework,each weighted by its posterior probability.The method involves executing a set of TALYS calculations by randomly varying multiple nuclear physics models and their parameters to yield a vector of calculated observables.Next,computed likelihood function values at each incident energy point were then combined with the prior distributions to obtain updated posterior distributions for selected cross sections and the elastic angular distributions.As the cross sections and elastic angular distributions were updated locally on a per-energy-point basis,the approach typically results in discontinuities or“kinks”in the cross section curves,and these were addressed using spline interpolation.The proposed BMA method was applied to the evaluation of proton-induced reactions on ^(58)Ni between 1 and 100 MeV.The results demonstrated a favorable comparison with experimental data as well as with the TENDL-2023 evaluation.
基金supported by The Technology Innovation Team(Tianshan Innovation Team),Innovative Team for Efficient Utilization of Water Resources in Arid Regions(2022TSYCTD0001)the National Natural Science Foundation of China(42171269)the Xinjiang Academician Workstation Cooperative Research Project(2020.B-001).
文摘Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating and integrating precipitation datasets from different sources to accurately characterize precipitation patterns has become a challenge to provide more accurate and alternative precipitation information for the region,which can even improve the performance of hydrological modelling.This study evaluated the applicability of widely used five satellite-based precipitation products(Climate Hazards Group InfraRed Precipitation with Station(CHIRPS),China Meteorological Forcing Dataset(CMFD),Climate Prediction Center morphing method(CMORPH),Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record(PERSIANN-CDR),and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis(TMPA))and a reanalysis precipitation dataset(ECMWF Reanalysis v5-Land Dataset(ERA5-Land))in Xinjiang using ground-based observational precipitation data from a limited number of meteorological stations.Based on this assessment,we proposed a framework that integrated different precipitation datasets with varying spatial resolutions using a dynamic Bayesian model averaging(DBMA)approach,the expectation-maximization method,and the ordinary Kriging interpolation method.The daily precipitation data merged using the DBMA approach exhibited distinct spatiotemporal variability,with an outstanding performance,as indicated by low root mean square error(RMSE=1.40 mm/d)and high Person's correlation coefficient(CC=0.67).Compared with the traditional simple model averaging(SMA)and individual product data,although the DBMA-fused precipitation data were slightly lower than the best precipitation product(CMFD),the overall performance of DBMA was more robust.The error analysis between DBMA-fused precipitation dataset and the more advanced Integrated Multi-satellite Retrievals for Global Precipitation Measurement Final(IMERG-F)precipitation product,as well as hydrological simulations in the Ebinur Lake Basin,further demonstrated the superior performance of DBMA-fused precipitation dataset in the entire Xinjiang region.The proposed framework for solving the fusion problem of multi-source precipitation data with different spatial resolutions is feasible for application in inland arid areas,and aids in obtaining more accurate regional hydrological information and improving regional water resources management capabilities and meteorological research in these regions.
文摘In this paper, a model averaging method is proposed for varying-coefficient models with response missing at random by establishing a weight selection criterion based on cross-validation. Under certain regularity conditions, it is proved that the proposed method is asymptotically optimal in the sense of achieving the minimum squared error.
基金supported by the National Key Research and Development Program of China (Grant Nos.2016YFA0602501 and 2018YFA0606004)the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant Nos.XDA20040301 and XDA20020201)。
文摘Gross primary production(GPP) plays a crucial part in the carbon cycle of terrestrial ecosystems.A set of validated monthly GPP data from 1957 to 2010 in 0.5°× 0.5° grids of China was weighted from the Multi-scale Terrestrial Model Intercomparison Project using Bayesian model averaging(BMA).The spatial anomalies of detrended BMA GPP during the growing seasons of typical El Nino years indicated that GPP response to El Nino varies with Pacific Decadal Oscillation(PDO) phases: when the PDO was in the cool phase,it was likely that GPP was greater in northern China(32°–38°N,111°–122°E) and less in the Yangtze River valley(28°–32°N,111°–122°E);in contrast,when PDO was in the warm phase,the GPP anomalies were usually reversed in these two regions.The consistent spatiotemporal pattern and high partial correlation revealed that rainfall dominated this phenomenon.The previously published findings on how El Nino during different phases of PDO affecting rainfall in eastern China make the statistical relationship between GPP and El Nino in this study theoretically credible.This paper not only introduces an effective way to use BMA in grids that have mixed plant function types,but also makes it possible to evaluate the carbon cycle in eastern China based on the prediction of El Nino and PDO.
基金Project supported by the China Special Fund for Meteorological Research in the Public Interest(No.GYHY201306045)the National Natural Science Foundation of China(Nos.41305066 and41575096)
文摘The choices of the parameterizations for each component in a microwave emission model have significant effects on the quality of brightness temperature (Tb) sim- ulation. How to reduce the uncertainty in the Tb simulation is investigated by adopting a statistical post-processing procedure with the Bayesian model averaging (BMA) ensemble approach. The simulations by the community microwave emission model (CMEM) cou- pled with the community land model version 4.5 (CLM4.5) over China's Mainland are con- ducted by the 24 configurations from four vegetation opacity parameterizations (VOPs), three soil dielectric constant parameterizations (SDCPs), and two soil roughness param- eterizations (SRPs). Compared with the simple arithmetical averaging (SAA) method, the BMA reconstructions have a higher spatial correlation coefficient (larger than 0.99) than the C-band satellite observations of the advanced microwave scanning radiometer on the Earth observing system (AMSR-E) at the vertical polarization. Moreover, the BMA product performs the best among the ensemble members for all vegetation classes, with a mean root-mean-square difference (RMSD) of 4 K and a temporal correlation coefficient of 0.64.
基金supported by the National Natural Science Foundation of China(Grant Nos.41405083 and 91437220)the Natural Science Foundation of Hunan Province,China(Grant No.2015JJ3098)+1 种基金the Key Research Program of Frontier Sciences,CAS(QYZDY-SSW-DQC012)the Fund Project for The Education Department of Hunan Province(Grant No.16A234)
文摘The ability to estimate terrestrial water storage(TWS)is essential for monitoring hydrological extremes(e.g.,droughts and floods)and predicting future changes in the hydrological cycle.However,inadequacies in model physics and parameters,as well as uncertainties in meteorological forcing data,commonly limit the ability of land surface models(LSMs)to accurately simulate TWS.In this study,the authors show how simulations of TWS anomalies(TWSAs)from multiple meteorological forcings and multiple LSMs can be combined in a Bayesian model averaging(BMA)ensemble approach to improve monitoring and predictions.Simulations using three forcing datasets and two LSMs were conducted over China's Mainland for the period 1979–2008.All the simulations showed good temporal correlations with satellite observations from the Gravity Recovery and Climate Experiment during 2004–08.The correlation coefficient ranged between 0.5 and 0.8 in the humid regions(e.g.,the Yangtze river basin,Huaihe basin,and Zhujiang basin),but was much lower in the arid regions(e.g.,the Heihe basin and Tarim river basin).The BMA ensemble approach performed better than all individual member simulations.It captured the spatial distribution and temporal variations of TWSAs over China's Mainland and the eight major river basins very well;plus,it showed the highest R value(>0.5)over most basins and the lowest root-mean-square error value(<40 mm)in all basins of China.The good performance of the BMA ensemble approach shows that it is a promising way to reproduce long-term,high-resolution spatial and temporal TWSA data.
基金supported by the Thousand Youth Talents Plan(Xinjiang Project)the National Natural Science Foundation of China(41630859)the West Light Foundation of Chinese Academy of Sciences(2016QNXZB12)
文摘Climate change in mountainous regions has significant impacts on hydrological and ecological systems. This research studied the future temperature, precipitation and snowfall in the 21^(st) century for the Tianshan and northern Kunlun Mountains(TKM) based on the general circulation model(GCM) simulation ensemble from the coupled model intercomparison project phase 5(CMIP5) under the representative concentration pathway(RCP) lower emission scenario RCP4.5 and higher emission scenario RCP8.5 using the Bayesian model averaging(BMA) technique. Results show that(1) BMA significantly outperformed the simple ensemble analysis and BMA mean matches all the three observed climate variables;(2) at the end of the 21^(st) century(2070–2099) under RCP8.5, compared to the control period(1976–2005), annual mean temperature and mean annual precipitation will rise considerably by 4.8°C and 5.2%, respectively, while mean annual snowfall will dramatically decrease by 26.5%;(3) precipitation will increase in the northern Tianshan region while decrease in the Amu Darya Basin. Snowfall will significantly decrease in the western TKM. Mean annual snowfall fraction will also decrease from 0.56 of 1976–2005 to 0.42 of 2070–2099 under RCP8.5; and(4) snowfall shows a high sensitivity to temperature in autumn and spring while a low sensitivity in winter, with the highest sensitivity values occurring at the edge areas of TKM. The projections mean that flood risk will increase and solid water storage will decrease.
文摘Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estimator are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.
文摘In several instances of statistical practice, it is not uncommon to use the same data for both model selection and inference, without taking account of the variability induced by model selection step. This is usually referred to as post-model selection inference. The shortcomings of such practice are widely recognized, finding a general solution is extremely challenging. We propose a model averaging alternative consisting on taking into account model selection probability and the like-lihood in assigning the weights. The approach is applied to Bernoulli trials and outperforms Akaike weights model averaging and post-model selection estimators.
基金partially supported by the Research Foundation of Shenzhen Polytechnic University (Grant No. 6023312034K)the Post-doctoral Later-stageof Shenzhen Polytechnic University(Grant No. 6023271021K)+2 种基金partially supported by the National Natural Science Foundation of China (Grant No. 71973116)partially supported by the National Natural Science Foundation of China (Grant Nos. 11971323 and 12031016)the Beijing Natural Science Foundation (Grant No. Z210003)
文摘Ridge regression is an effective tool to handle multicollinearity in regressions.It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications.The divide and conquer trick,which combines the estimator in each subset with equal weight,is commonly applied in distributed data.To overcome multicollinearity and improve estimation accuracy in the presence of distributed data,we propose a Mallows-type model averaging method for ridge regressions,which combines estimators from all subsets.Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent.The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived.Furthermore,the asymptotic normality of the model averaging estimator is demonstrated.Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.
基金supported by the Research Foundation of Shenzhen Polytechnic University under Grant No.6023312034Kthe Post-doctoral Later-Stageof Shenzhen Polytechnic University under Grant No.6023271021K+3 种基金the National Science Foundation of Guangdong Province of China under Grant No.2024A1515011542the Guangdong Provincial Ordinary Universities Youth Innovative Talent Project(Natural Science)under Grant No.2023KQNCX064the National Natural Science Foundation of China under Grant No.12371448the National Key R&D Program of China under Grant No.2023YFE0126800。
文摘Multivariate time series forecasting holds substantial practical significance,facilitates precise predictions,and informs decision-making.The complexity of nonlinear relationships and the presence of higher-order features in multivariate time series data have sparked a burgeoning interest in leveraging deep learning approaches for such forecasting tasks.Existing methods often use pre-scaled neural networks,whose reliability and generalization can pose a challenge.In this study,the authors propose an instance-wise graph-based Mallows model averaging(IGMMA)framework for multivariate time series prediction.The framework incorporates a model averaging module into the network,where extracted features are utilized as inputs for candidate linear models.These linear models are combined with weights to create a new linear layer,forming a novel graph neural network model.Moreover,the network loss function is modified based on the Mallows criterion,where penalties are imposed on the parameters and the weights separately.The authors use the proposed method to predict multicommodity futures prices,and the empirical results show that IGMMA has superior predictive accuracy even when small neural networks are used.This indicates that the model averaging module significantly reduces the parameters required for deep learning training,which enables the training of multiple small models as an alternative to training a large model.
基金supported by the National Natural Science Foundation of China under Grant Nos.71973116 and 12201018the Postdoctoral Project in China under Grant No.2022M720336+2 种基金the National Natural Science Foundation of China under Grant Nos.12071457 and 11971045the Beijing Natural Science Foundation under Grant No.1222002the NQI Project under Grant No.2022YFF0609903。
文摘In recent years,Kriging model has gained wide popularity in various fields such as space geology,econometrics,and computer experiments.As a result,research on this model has proliferated.In this paper,the authors propose a model averaging estimation based on the best linear unbiased prediction of Kriging model and the leave-one-out cross-validation method,with consideration for the model uncertainty.The authors present a weight selection criterion for the model averaging estimation and provide two theoretical justifications for the proposed method.First,the estimated weight based on the proposed criterion is asymptotically optimal in achieving the lowest possible prediction risk.Second,the proposed method asymptotically assigns all weights to the correctly specified models when the candidate model set includes these models.The effectiveness of the proposed method is verified through numerical analyses.
基金supported by the National Natural Science Foundation of China under Grant Nos.11971421,71925007,72091212,and 12288201Yunling Scholar Research Fund of Yunnan Province under Grant No.YNWR-YLXZ-2018-020+1 种基金the CAS Project for Young Scientists in Basic Research under Grant No.YSBR-008the Start-Up Grant from Kunming University of Science and Technology under Grant No.KKZ3202207024.
文摘Prediction plays an important role in data analysis.Model averaging method generally provides better prediction than using any of its components.Even though model averaging has been extensively investigated under independent errors,few authors have considered model averaging for semiparametric models with correlated errors.In this paper,the authors offer an optimal model averaging method to improve the prediction in partially linear model for longitudinal data.The model averaging weights are obtained by minimizing criterion,which is an unbiased estimator of the expected in-sample squared error loss plus a constant.Asymptotic properties,including asymptotic optimality and consistency of averaging weights,are established under two scenarios:(i)All candidate models are misspecified;(ii)Correct models are available in the candidate set.Simulation studies and an empirical example show that the promise of the proposed procedure over other competitive methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.U23A2064 and 12031005。
文摘In the past two decades,model averaging,as a way to solve model uncertainty,has attracted more and more attention.In this paper,the authors propose a jackknife model averaging(JMA) method for the quantile single-index coefficient model,which is widely used in statistics.Under model misspecification,the model averaging estimator is proved to be asymptotically optimal in terms of minimizing out-of-sample quantile loss.Simulation experiments are conducted to compare the JMA method with several model selections and model averaging methods,and the results show that the proposed method has a satisfactory performance.The method is also applied to a real dataset.
基金supported by the National Natural Science Foundation of China under Grant No.12201431the Young Teacher Foundation of Capital University of Economics and Business under Grant Nos.XRZ2022-070 and 00592254413070。
文摘The dissemination of news is a vital topic in management science,social science and data science.With the development of technology,the sample sizes and dimensions of digital news data increase remarkably.To alleviate the computational burden in big data,this paper proposes a method to deal with massive and moderate-dimensional data for linear regression models via combing model averaging and subsampling methodologies.The author first samples a subsample from the full data according to some special probabilities and split covariates into several groups to construct candidate models.Then,the author solves each candidate model and calculates the model-averaging weights to combine these estimators based on this subsample.Additionally,the asymptotic optimality in subsampling form is proved and the way to calculate optimal subsampling probabilities is provided.The author also illustrates the proposed method via simulations,which shows it takes less running time than that of the full data and generates more accurate estimations than uniform subsampling.Finally,the author applies the proposed method to analyze and predict the sharing number of news,and finds the topic,vocabulary and dissemination time are the determinants.
基金supported by the National Natural Science Foundation of China under Grant Nos.11971323 and 12031016。
文摘In this paper,the authors propose a frequentist model averaging method for composite quantile regression with diverging number of parameters.Different from the traditional model averaging for quantile regression which considers only a single quantile,the proposed model averaging estimator is based on multiple quantiles.The well-known delete-one cross-validation or jackknife approach is applied to estimate the model weights.The resultant jackknife model averaging estimator is shown to be asymptotically optimal in terms of minimizing the out-of-sample composite final prediction error.Simulation studies are conducted to demonstrate the finite sample performance of the new model averaging estimator.The proposed method is also applied to the analysis of the stock returns data and the wage data.
文摘The bending capacity of the precast decks is greatly dependent on the flexural strength exhibited by the joints between them.However,due to the complexity and diversity of this system,precise predictive models are currently unavailable.This study introduces an effective and precise methodology for assessing flexural strength using Monte Carlo Model Averaging(MCMA),a statistical technique that combines the strengths of model averaging(MA)and Monte Carlo simulation.To construct the MCMA model,input variables were derived by analyzing the experimental results,and a database of 433 bending test specimens was compiled.The MCMA model incorporated four different machine learning models,namely decision tree(DT),linear regression(LR),adaptive boosting(AdaBoost),and multilayer perceptron(MLP).Comparative analyses revealed that the MCMA model outperformed baseline models(DT,AdaBoost,LR,and MLP)across all employed metrics.The impact of three different categories on flexural capacity was explored through boxplot analysis.Furthermore,a comparison between the MCMA model and the strut and tie model highlighted the superior performance of the MCMA model.The impact of input variables on the flexural strength prediction was further examined through Shapley Additive exPlanations based feature importance and global interpretation,as well as parametric study.
基金the Project of the Key Open Laboratory of Atmospheric Detection,China Meteorological Administration(2023KLAS02M)the Second Batch of Science and Technology Project of China Meteorological Administration("Jiebangguashuai"):the Research and Development of Short-term and Near-term Warning Products for Severe Convective Weather in Beijing-Tianjin-Hebei Region(CMAJBGS202307).
文摘Firstly,based on the data of air quality and the meteorological data in Baoding City from 2017 to 2021,the correlations of meteorological elements and pollutants with O_(3)concentration were explored to determine the forecast factors of forecast models.Secondly,the O_(3)-8h concentration in Baoding City in 2021 was predicted based on the constructed models of multiple linear regression(MLR),backward propagation neural network(BPNN),and auto regressive integrated moving average(ARIMA),and the predicted values were compared with the observed values to test their prediction effects.The results show that overall,the MLR,BPNN and ARIMA models were able to forecast the changing trend of O_(3)-8h concentration in Baoding in 2021,but the BPNN model gave better forecast results than the ARIMA and MLR models,especially for the prediction of the high values of O_(3)-8h concentration,and the correlation coefficients between the predicted values and the observed values were all higher than 0.9 during June-September.The mean error(ME),mean absolute error(MAE),and root mean square error(RMSE)of the predicted values and the observed values of daily O_(3)-8h concentration based on the BPNN model were 0.45,19.11 and 24.41μg/m 3,respectively,which were significantly better than those of the MLR and ARIMA models.The prediction effects of the MLR,BPNN and ARIMA models were the best at the pollution level,followed by the excellent level,and it was the worst at the good level.In comparison,the prediction effect of BPNN model was better than that of the MLR and ARIMA models as a whole,especially for the pollution and excellent levels.The TS scores of the BPNN model were all above 66%,and the PC values were above 86%.The BPNN model can forecast the changing trend of O_(3)concentration more accurately,and has a good practical application value,but at the same time,the predicted high values of O_(3)concentration should be appropriately increased according to error characteristics of the model.
基金grants from the Ministry of Science and Technology of China (No. 2001AA231061) the National Natural Science Foundation of China (No. 30270748)
文摘KaKs_Calculator is a software package that calculates nonsynonymous (Ka) and synonymous (Ks) substitution rates through model selection and model averaging. Since existing methods for this estimation adopt their specific mutation (substitution) models that consider different evolutionary features, leading to diverse estimates, KaKs_Calculator implements a set of candidate models in a maximum likelihood framework and adopts the Akaike information criterion to measure fitness between models and data, aiming to include as many features as needed for accurately capturing evolutionary information in protein-coding sequences. In addition, several existing methods for calculating Ka and Ks are also incorporated into this software. KaKs_Calculator, including source codes, compiled executables, and documentation, is freely available for academic use at http://evolution.genomics.org.cn/software.htm.
基金supported by the National Natural Science Foundation of China under Grant Nos. 70625004, 10721101, and 70221001
文摘In applications, the traditional estimation procedure generally begins with model selection.Once a specific model is selected, subsequent estimation is conducted under the selected model withoutconsideration of the uncertainty from the selection process. This often leads to the underreportingof variability and too optimistic confidence sets. Model averaging estimation is an alternative to thisprocedure, which incorporates model uncertainty into the estimation process. In recent years, therehas been a rising interest in model averaging from the frequentist perspective, and some importantprogresses have been made. In this paper, the theory and methods on frequentist model averagingestimation are surveyed. Some future research topics are also discussed.