For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold ...For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.展开更多
为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction...为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。展开更多
A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is oppo...A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive 1-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l 〉 20. To corroborate the efficacy and feasibility of fast l-fold cross validation, experiments on five benchmark regression data sets are evaluated.展开更多
Battery powered vertical takeoff and landing(VTOL) aircraft attracts more and more interests from public, while limited hover endurance hinders many prospective applications. Based on the weight models of battery, mot...Battery powered vertical takeoff and landing(VTOL) aircraft attracts more and more interests from public, while limited hover endurance hinders many prospective applications. Based on the weight models of battery, motor and electronic speed controller, the power consumption model of propeller and the constant power discharge model of battery, an efficient method to estimate the hover endurance of battery powered VTOL aircraft was presented. In order to understand the mechanism of performance improvement, the impacts of propulsion system parameters on hover endurance were analyzed by simulations, including the motor power density, the battery capacity, specific energy and Peukert coefficient. Ground experiment platform was established and validation experiments were carried out, the results of which showed a well agreement with the simulations. The estimation method and the analysis results could be used for optimization design and hover performance evaluation of battery powered VTOL aircraft.展开更多
An improved method using kernel density estimation (KDE) and confidence level is presented for model validation with small samples. Decision making is a challenging problem because of input uncertainty and only smal...An improved method using kernel density estimation (KDE) and confidence level is presented for model validation with small samples. Decision making is a challenging problem because of input uncertainty and only small samples can be used due to the high costs of experimental measurements. However, model validation provides more confidence for decision makers when improving prediction accuracy at the same time. The confidence level method is introduced and the optimum sample variance is determined using a new method in kernel density estimation to increase the credibility of model validation. As a numerical example, the static frame model validation challenge problem presented by Sandia National Laboratories has been chosen. The optimum bandwidth is selected in kernel density estimation in order to build the probability model based on the calibration data. The model assessment is achieved using validation and accreditation experimental data respectively based on the probability model. Finally, the target structure prediction is performed using validated model, which are consistent with the results obtained by other researchers. The results demonstrate that the method using the improved confidence level and kernel density estimation is an effective approach to solve the model validation problem with small samples.展开更多
Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Predictio...Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.展开更多
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues....In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.展开更多
Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussia...Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis function kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for training and validating is introduced. The aim of this paper is to compare the performance of support vector machine with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental results showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.展开更多
Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In ord...Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In order to resolve this problem, after the problem formulation, a validation theorem on the cross iteration is proposed, and the proof of the theorem is given under the cross iteration circumstance. Meanwhile, applying the proposed theorem, the credibility calculation algorithm is provided, and the solvent of the defect tracing is explained. Further, based on the validation theorem on the cross iteration, a validation method for simulation models with the cross iteration is proposed, which is illustrated by a flowchart step by step. Finally, a validation example of a sixdegree of freedom (DOF) flight vehicle model is provided, and the validation process is performed by using the validation method. The result analysis shows that the method is effective to obtain the credibility of the model and accomplish the defect tracing of the validation.展开更多
systematic verification and validation(V&V)of our previously proposed momentum source wave generation method is performed.Some settings of previous numerical wave tanks(NWTs)of regular and irregular waves have bee...systematic verification and validation(V&V)of our previously proposed momentum source wave generation method is performed.Some settings of previous numerical wave tanks(NWTs)of regular and irregular waves have been optimized.The H2-5 V&V method involving five mesh sizes with mesh refinement ratio being 1.225 is used to verify the NWT of regular waves,in which the wave height and mass conservation are mainly considered based on a Lv3(H s=0.75 m)and a Lv6(H s=5 m)regular wave.Additionally,eight different sea states are chosen to validate the wave height,mass conservation and wave frequency of regular waves.Regarding the NWT of irregular waves,five different sea states with significant wave heights ranging from 0.09 m to 12.5 m are selected to validate the statistical characteristics of irregular waves,including the profile of the wave spectrum,peak frequency and significant wave height.Results show that the verification errors for Lv3 and Lv6 regular wave on the most refined grid are−0.018 and−0.35 for wave height,respectively,and−0.14 and for−0.17 mass conservation,respectively.The uncertainty estimation analysis shows that the numerical error could be partially balanced out by the modelling error to achieve a smaller validation error by adjusting the mesh size elaborately.And the validation errors of the wave height,mass conservation and dominant frequency of regular waves under different sea states are no more than 7%,8% and 2%,respectively.For a Lv3(H_(s)=0.75 m)and a Lv6(H_(s)=5 m)regular wave,simulations are validated on the wave height in wave development section for safety factors FS≈1 and FS≈0.5-1,respectively.Regarding irregular waves,the validation errors of the significant wave height and peak frequency are both lower than 2%.展开更多
In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
A simulation model developed by the authors (Huang et al., 1999) was validated against independent field measurements of methane emission from rice paddy soils in Texas of USA, Tuzu Of China and Vercelli of Italy.A si...A simulation model developed by the authors (Huang et al., 1999) was validated against independent field measurements of methane emission from rice paddy soils in Texas of USA, Tuzu Of China and Vercelli of Italy.A simplified version of the simulation model was further validated against methane emission measurements from various regions of the world, including italy, China, Indonesia, Philippines and the United States. Model validation suggested that the seasonal variation of methane emission was mainly regulated by rice growth and development and that methane emission could be predicted from rice net productivity, cultivar character, soil texture and temperature, and organic matter amendments. Model simulations in general agreed with the observations. The comparison between computed and measured methane emission resulted in correlation coefficients r2 values from 0.450 to 0.952, significant at 0.01-0.001 probability level.On the basis of available information on rice cultivated area, growth duration, grain yield, soil texture and temperature, methane emission from rice paddy soils of China's Mainland was estimated for 28 rice cultivated provinces/municipal cities by employing the validated model. The calculated daily methane emission rates, on a provincial scale, ranged from 0.12 to 0.71 g m-2 with an average of 0.26 g m-2. A total amount of 7.92 Tg CH4 per year, ranging from 5.89 to 11.17 Tg year-1, was estimated to be released from Chinese rice paddy soils. Of the total, 45% was emitted from the single-rice growing season, and 19% and 36% were from the early-rice and the late-rice growing seasons, respectively. Approximately 70% of the total was emitted in the region located at latitude between 25°and 32°N. The emissions from rice fields in Sichuan and Hunan provinces were calculated to be 2.34 Tg year-1, accounting for approximately 30% of the total.展开更多
This paper proposes a cross-reference method of nonlinear time series analysis, combining the tasks of dynamical system parameter estimation and noise reduction which were fulfilled separately before. With the positiv...This paper proposes a cross-reference method of nonlinear time series analysis, combining the tasks of dynamical system parameter estimation and noise reduction which were fulfilled separately before. With the positive interaction between the two processing modules, the method is somewhat superior. Some prior works can be viewed as special cases of this general framework and effective new algorithms may be devised according to it. Two examples of chaotic time series analysis are also given to show the applicability of the proposed method.展开更多
文摘For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.
文摘为了实现提高产量和抵抗病害等能力的目的,需要提高育种水平,通过设计交差验证(Cross-Validation)实验进行大豆基因型和表型数据的分组处理,根据数据的个体和mark的数量进行合理分配,采用gBLUP(genomic Best Linear Unbiased Prediction)方法进行表型预测。根据对大豆数据多个性状通过不同分组的对比来得到精确值的范围,为后续的育种分析提供依据。对于只有大豆基因型数据而没有表型数据的情况,需要模拟表型,根据设定遗传力和模拟位点的个数(NQTN)进行模拟,然后再进行不同分组获取精准值,这样扩大了大豆数据的预测灵活性。
基金supported by the National Natural Science Foundation of China(51006052)the NUST Outstanding Scholar Supporting Program
文摘A method for fast 1-fold cross validation is proposed for the regularized extreme learning machine (RELM). The computational time of fast l-fold cross validation increases as the fold number decreases, which is opposite to that of naive 1-fold cross validation. As opposed to naive l-fold cross validation, fast l-fold cross validation takes the advantage in terms of computational time, especially for the large fold number such as l 〉 20. To corroborate the efficacy and feasibility of fast l-fold cross validation, experiments on five benchmark regression data sets are evaluated.
文摘Battery powered vertical takeoff and landing(VTOL) aircraft attracts more and more interests from public, while limited hover endurance hinders many prospective applications. Based on the weight models of battery, motor and electronic speed controller, the power consumption model of propeller and the constant power discharge model of battery, an efficient method to estimate the hover endurance of battery powered VTOL aircraft was presented. In order to understand the mechanism of performance improvement, the impacts of propulsion system parameters on hover endurance were analyzed by simulations, including the motor power density, the battery capacity, specific energy and Peukert coefficient. Ground experiment platform was established and validation experiments were carried out, the results of which showed a well agreement with the simulations. The estimation method and the analysis results could be used for optimization design and hover performance evaluation of battery powered VTOL aircraft.
基金Funding of Jiangsu Innovation Program for Graduate Education (CXZZ11_0193)NUAA Research Funding (NJ2010009)
文摘An improved method using kernel density estimation (KDE) and confidence level is presented for model validation with small samples. Decision making is a challenging problem because of input uncertainty and only small samples can be used due to the high costs of experimental measurements. However, model validation provides more confidence for decision makers when improving prediction accuracy at the same time. The confidence level method is introduced and the optimum sample variance is determined using a new method in kernel density estimation to increase the credibility of model validation. As a numerical example, the static frame model validation challenge problem presented by Sandia National Laboratories has been chosen. The optimum bandwidth is selected in kernel density estimation in order to build the probability model based on the calibration data. The model assessment is achieved using validation and accreditation experimental data respectively based on the probability model. Finally, the target structure prediction is performed using validated model, which are consistent with the results obtained by other researchers. The results demonstrate that the method using the improved confidence level and kernel density estimation is an effective approach to solve the model validation problem with small samples.
基金supported by the US Department of Agriculture,Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive grant no.2015-67015-22947
文摘Background: A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model.Methods: Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.Results: Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations.Conclusions: Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
文摘In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
文摘Support vector machine (SVM) has been successfully applied for classification in this paper. This paper discussed the basic principle of the SVM at first, and then SVM classifier with polynomial kernel and the Gaussian radial basis function kernel are choosen to determine pupils who have difficulties in writing. The 10-fold cross-validation method for training and validating is introduced. The aim of this paper is to compare the performance of support vector machine with RBF and polynomial kernel used for classifying pupils with or without handwriting difficulties. Experimental results showed that the performance of SVM with RBF kernel is better than the one with polynomial kernel.
基金supported by the National Natural Science Foundation of China(61374164)
文摘Cross iteration often exists in the computational process of the simulation models, especially for control models. There is a credibility defect tracing problem in the validation of models with cross iteration. In order to resolve this problem, after the problem formulation, a validation theorem on the cross iteration is proposed, and the proof of the theorem is given under the cross iteration circumstance. Meanwhile, applying the proposed theorem, the credibility calculation algorithm is provided, and the solvent of the defect tracing is explained. Further, based on the validation theorem on the cross iteration, a validation method for simulation models with the cross iteration is proposed, which is illustrated by a flowchart step by step. Finally, a validation example of a sixdegree of freedom (DOF) flight vehicle model is provided, and the validation process is performed by using the validation method. The result analysis shows that the method is effective to obtain the credibility of the model and accomplish the defect tracing of the validation.
基金supported by the National Key R&D Program of China(Grant No.2022YFB3303500).
文摘systematic verification and validation(V&V)of our previously proposed momentum source wave generation method is performed.Some settings of previous numerical wave tanks(NWTs)of regular and irregular waves have been optimized.The H2-5 V&V method involving five mesh sizes with mesh refinement ratio being 1.225 is used to verify the NWT of regular waves,in which the wave height and mass conservation are mainly considered based on a Lv3(H s=0.75 m)and a Lv6(H s=5 m)regular wave.Additionally,eight different sea states are chosen to validate the wave height,mass conservation and wave frequency of regular waves.Regarding the NWT of irregular waves,five different sea states with significant wave heights ranging from 0.09 m to 12.5 m are selected to validate the statistical characteristics of irregular waves,including the profile of the wave spectrum,peak frequency and significant wave height.Results show that the verification errors for Lv3 and Lv6 regular wave on the most refined grid are−0.018 and−0.35 for wave height,respectively,and−0.14 and for−0.17 mass conservation,respectively.The uncertainty estimation analysis shows that the numerical error could be partially balanced out by the modelling error to achieve a smaller validation error by adjusting the mesh size elaborately.And the validation errors of the wave height,mass conservation and dominant frequency of regular waves under different sea states are no more than 7%,8% and 2%,respectively.For a Lv3(H_(s)=0.75 m)and a Lv6(H_(s)=5 m)regular wave,simulations are validated on the wave height in wave development section for safety factors FS≈1 and FS≈0.5-1,respectively.Regarding irregular waves,the validation errors of the significant wave height and peak frequency are both lower than 2%.
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
文摘A simulation model developed by the authors (Huang et al., 1999) was validated against independent field measurements of methane emission from rice paddy soils in Texas of USA, Tuzu Of China and Vercelli of Italy.A simplified version of the simulation model was further validated against methane emission measurements from various regions of the world, including italy, China, Indonesia, Philippines and the United States. Model validation suggested that the seasonal variation of methane emission was mainly regulated by rice growth and development and that methane emission could be predicted from rice net productivity, cultivar character, soil texture and temperature, and organic matter amendments. Model simulations in general agreed with the observations. The comparison between computed and measured methane emission resulted in correlation coefficients r2 values from 0.450 to 0.952, significant at 0.01-0.001 probability level.On the basis of available information on rice cultivated area, growth duration, grain yield, soil texture and temperature, methane emission from rice paddy soils of China's Mainland was estimated for 28 rice cultivated provinces/municipal cities by employing the validated model. The calculated daily methane emission rates, on a provincial scale, ranged from 0.12 to 0.71 g m-2 with an average of 0.26 g m-2. A total amount of 7.92 Tg CH4 per year, ranging from 5.89 to 11.17 Tg year-1, was estimated to be released from Chinese rice paddy soils. Of the total, 45% was emitted from the single-rice growing season, and 19% and 36% were from the early-rice and the late-rice growing seasons, respectively. Approximately 70% of the total was emitted in the region located at latitude between 25°and 32°N. The emissions from rice fields in Sichuan and Hunan provinces were calculated to be 2.34 Tg year-1, accounting for approximately 30% of the total.
基金Supported by National Science Key Foundation of China
文摘This paper proposes a cross-reference method of nonlinear time series analysis, combining the tasks of dynamical system parameter estimation and noise reduction which were fulfilled separately before. With the positive interaction between the two processing modules, the method is somewhat superior. Some prior works can be viewed as special cases of this general framework and effective new algorithms may be devised according to it. Two examples of chaotic time series analysis are also given to show the applicability of the proposed method.