Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not avail...Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.展开更多
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l.a major black-fly vector of onchoceriasis,postulate models relating observational ecological-sampled para...The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l.a major black-fly vector of onchoceriasis,postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects.Generally,this correlation comes from two sources:(1)the design of the random effects and their assumed covariance from the multiple levels within the regression model and(2)the correlation structure of the residuals.Unfortunately,inconspicuous errors in residual intracluster correlation estimates can overstate precision in forecasted S.damnosum s.l.riverine larval habitat explanatory attributes regardless how they are treated(e.g.independent,autoregressive,Toeplitz,etc.).In this research,the geographical locations for multiple riverine-based S.damnosum s.l.larval ecosystem habitats sampled from two preestablished epidemiological sites in Togo were identified and recorded from July 2009 to June 2010.Initially,the data were aggregated into PROC GENMOD.An agglomerative hierarchical residual cluster-based analysis was then performed.The sampled clustered study site data was then analyzed for statistical correlations using monthly biting rates(MBR).Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS.A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by annual biting rates(ABR).The data was overlain onto multitemporal sub-meter pixel resolution satellite data(i.e.QuickBird 0.61m wavbands).Orthogonal spatial filter eigenvectors were then generated in SAS/Geographic Information Systems(GIS).Univariate and nonlinear regression-based models(i.e.logistic,Poisson,and negative binomial)were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data.Thereafter,Durbin–Watson statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG.Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC.The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters.The analyses also revealed that the estimators,levels of turbidity,and presence of rocks were statistically significant for the high-ABR-stratified clusters,while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster.Varying and constant coefficient regression models,ABRstratified GIS-generated clusters,sub-meter resolution satellite imagery,a robust residual intra-cluster diagnostic test,MBR-based histograms,eigendecomposition spatial filter algorithms,and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities(i.e.heteroskedasticity)for testing correlations between georeferenced S.damnosum s.l.riverine larval habitat estimators.The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S.damnosum s.l.habitats based on spatiotemporal field-sampled count data.展开更多
In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental var...In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental variable type estimation procedure is proposed.The asymptotic results of the estimator,such as the consistency and the weak convergence rate,are obtained.The proposed procedure can attenuate the effect of measurement errors and have proved workable for finite samples.展开更多
Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on fe...Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.展开更多
The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In th...The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.展开更多
The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 ...The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 people aged 75 and over living in the community at baseline, disabilities were assessed annually for up to four years with the Functional Autonomy Measurement System (SMAF), which generates 14 Iso-SMAF profiles. These profiles are grouped into 4 disability states, which are predominant alterations in instrumental activities of daily living (IADLs), mobility, mental functions as well as severe and mixed disabilities. Continuous-time, multi-state Markov modeling was used to identify the factors associated with transitions made by older people between these states and to institutionalization and death. Greater age and receiving help for ADL were associated with four transitions, while altered cognitive functions and hospitalization were associated with three, all involving more decline or less recovery. From mild IADL profiles, men have a higher risk of transitioning to intermediate predominantly mental profiles, while women are at higher risk of transitioning to intermediate predominantly mobility profiles. Unmet needs are associated with deterioration, from mild IADL to intermediate predominantly mobility profiles. These results help understanding the complex progression of disabilities in older people.展开更多
Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil s...Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.展开更多
Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two est...Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.展开更多
The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametr...The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametric transformation models. The aim of this article is to develop modified estimating equations under semiparametric transformation models of survival time with time-varying coefficient effect and time-varying continuous covariates. For this, it is important to organize the data in a counting process style and transform the time with standard transformation classes which shall be applied in this article. In the situation when the effect of coefficient and covariates change over time, the widely used maximum likelihood estimation method becomes more complex and burdensome in estimating consistent estimates. To overcome this problem, alternatively, the modified estimating equations were applied to estimate the unknown parameters and unspecified monotone transformation functions. The estimating equations were modified to incorporate the time-varying effect in both coefficient and covariates. The performance of the proposed methods is tested through a simulation study. To sum up the study, the effect of possibly time-varying covariates and time-varying coefficients was evaluated in some special cases of semiparametric transformation models. Finally, the results have shown that the role of the time-varying covariate in the semiparametric transformation models was plausible and credible.展开更多
The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is show...The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.展开更多
This study proposes a regression-based estimation method in difference-in-differences settings in the presence of time-varying covariates-a scenario commonly encountered in applications.We impose only a conditional pa...This study proposes a regression-based estimation method in difference-in-differences settings in the presence of time-varying covariates-a scenario commonly encountered in applications.We impose only a conditional parallel trends assumption with timevarying covariates and plausible assumptions on the conditional expectation functions.We show that a family of causal effect parameters is exactly the coefficient estimators from our proposed regressions even in the presence of staggered treatment timing and treatment effect heterogeneity across cohorts,time periods,and covariates.These parameters can be further aggregated to the dynamic treatment effects and the overall effect of being treated.We establish the corresponding asymptotic properties.Simulation studies suggest that our proposed regression-based estimators successfully outperform in estimating the causal parameters.Finally,we apply this method to evaluate the effect of intrastate bank deregulation on income inequality in the United States in the setting of Beck et al.(2010).We find substantially different results based on our proposed method.展开更多
In this article,we study a robust estimation method for a general class of integervalued time series models.The conditional distribution of the process belongs to a broad class of distributions and unlike the classica...In this article,we study a robust estimation method for a general class of integervalued time series models.The conditional distribution of the process belongs to a broad class of distributions and unlike the classical autoregressive framework,the conditional mean of the process also depends on some exogenous covariates.We derive a robust inference procedure based on the minimum density power divergence.Under certain regularity conditions,we establish that the proposed estimator is consistent and asymptotically normal.In the case where the conditional distribution belongs to the exponential family,we provide sufficient conditions for the existence of a stationary and ergodicτ-weakly dependent solution.Simulation experiments are conducted to illustrate the empirical performances of the estimator.An application to the number of transactions per minute for the stock Ericsson B is also provided.展开更多
How to achieve the tradeoff between privacy and utility is one of fundamental problems in the private data analysis.In this paper,we give a rigorously differentially private analysis of networks in the appearance of c...How to achieve the tradeoff between privacy and utility is one of fundamental problems in the private data analysis.In this paper,we give a rigorously differentially private analysis of networks in the appearance of covariates via a generalized β-model,which has an n-dimensional degree parameterβand a p-dimensional homophily parameter γ.Under(k_(n),∈_(n))-edge differential privacy,we use the popular Laplace mechanism to release the network statistics.The method of moments is used to estimate the unknown model parameters.We establish the conditions guaranteeing consistency of the differentially private estimators β and γ as the number of nodes n goes to infinity,which reveals an interesting tradeoff between a privacy parameter and model parameters.The consistency is shown by applying a two-stage Newton's method to obtain the upper bound of the error between(β,γ)and its true value(β,γ)in terms of thel_(∞)distance,which has a convergence rate of a rough order 1/n^(1/2)for β and 1/n for γ,up to a logarithm factor.Furthermore,we derive the asymptotic normalities of β and γ,whose asymptotic variances are the same as those of the non-private estimators under some conditions.Our paper sheds light on how to explore asymptotic theory under differential privacy in a principled manner;these principled methods should be applicable to a class of network models with covariates beyond the generalized β-model.展开更多
This paper deals with the analysis of accelerated failure time model when the primary covariate is subject to missing. We assume that the true covariate is measured precisely on a randomly chosen validation set, where...This paper deals with the analysis of accelerated failure time model when the primary covariate is subject to missing. We assume that the true covariate is measured precisely on a randomly chosen validation set, whereas auxiliary information for primary covariate is available to all study subjects. The asymptotic properties for the proposed estimator are developed and the simulation studies show that the efficiency gain is remarkable compared to the method using only the validation sample. A real example is also provided as an illustration.展开更多
The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the se...The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the setting with highly correlated covariates.In this paper,the semi-standard partial covariance(SPAC)method with Lasso penalty is proposed to study the generalized linear model with highly correlated covariates,and the consistencies of the estimation and variable selection are shown in high-dimensional settings under some regularity conditions.Some simulation studies and an analysis of colon tumor dataset are carried out to show that the proposed method performs better in addressing highly correlated problem than the traditional penalized variable selection methods.展开更多
Causation is a distinct concept from association and more important than association inepidemiologic studies.This paper proposes the concept of uniform non-confounding for causal distributioneffects over multiple cova...Causation is a distinct concept from association and more important than association inepidemiologic studies.This paper proposes the concept of uniform non-confounding for causal distributioneffects over multiple covariates,and gives the sufficient conditions for uniform non-confoundingover a covariate set C including confounders or non-confounders,and also shows the conditions forconditionally non-confounding in the subpopulations.All these conditions can be tested by observeddata.展开更多
In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for th...In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for the linear regression model with the response and covariates missing simultaneously.We apply model adjustment and inverse probability weighting methods to deal with the missingness of response and covariate,respectively.In order to avoid the curse of dimension,we propose an empirical process test with the linear indicator weighting function.The asymptotic properties of the proposed test under the null,local and global alternative hypothe tical models are rigorously investigated.A consisten t wild boot strap method is developed to approximate the critical value.Finally,simulation studies and real data analysis are performed to show that the proposed method performed well.展开更多
In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least s...In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.展开更多
Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature scr...Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.展开更多
Frequent droughts pose considerable threat to global forest carbon uptake,but little is known about the response of forest carbon fluxes in climatic transition zones to seasonal drought.In this study,the responses of ...Frequent droughts pose considerable threat to global forest carbon uptake,but little is known about the response of forest carbon fluxes in climatic transition zones to seasonal drought.In this study,the responses of carbon fluxes to seasonal drought in two natural forests(Quercus aliena var.acute serrata Maxim and Pinus tabuliformis Carr.)in the Baotianman Nature Reserve were investigated.The Q.aliena forest exhibited a high resilience with stable gross primary productivity(GPP).However,ecosystem respiration(Re)significantly declined by 18.4%compared with normal years,leading to an increase in net carbon sequestration capacity of 4.1%.This resilience was attributed to its deep root system accessing soil water(SWC_(50cm))to sustain stomatal openness,coupled with the efficient utilization of photosynthetically active radiation to drive photosynthesis.In contrast,the P.tabuliformis forest,which relied on shallow soil moisture(SWC_(20cm)),experienced simultaneous decreases in both GPP and Re during drought,with a sharply greater decrease in GPP,resulting in low net carbon sink capacity.Further analysis revealed that the Q.aliena forest prioritized carbon assimilation through a deep water-stomatal synergy strategy(anisohydric behavior),whereas the P.tabuliformis forest adopted an isohydric strategy favoring water conservation at the expense of carbon fixation efficiency.These findings highlight distinct mechanisms underlying drought adaptation between forest types,providing critical insight into optimizing forest carbon cycle models and selecting drought-resistant species under the influence of climate change.展开更多
基金supported by grants from the National Natural Science Foundation of China(41431177 and 41871300)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD),China+4 种基金the Innovation Project of State Key Laboratory of Resources and Environmental Information System(LREIS),China(O88RA20CYA)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province,ChinaSupports to A-Xing Zhu through the Vilas Associate Awardthe Hammel Faculty Fellow Awardthe Manasse Chair Professorship from the University of Wisconsin-Madison。
文摘Selecting a proper set of covariates is one of the most important factors that influence the accuracy of digital soil mapping(DSM).The statistical or machine learning methods for selecting DSM covariates are not available for those situations with limited samples.To solve the problem,this paper proposed a case-based method which could formalize the covariate selection knowledge contained in practical DSM applications.The proposed method trained Random Forest(RF)classifiers with DSM cases extracted from the practical DSM applications and then used the trained classifiers to determine whether each one potential covariate should be used in a new DSM application.In this study,we took topographic covariates as examples of covariates and extracted 191 DSM cases from 56 peer-reviewed journal articles to evaluate the performance of the proposed case-based method by Leave-One-Out cross validation.Compared with a novices’commonly-used way of selecting DSM covariates,the proposed case-based method improved more than 30%accuracy according to three quantitative evaluation indices(i.e.,recall,precision,and F1-score).The proposed method could be also applied to selecting the proper set of covariates for other similar geographical modeling domains,such as landslide susceptibility mapping,and species distribution modeling.
基金This work was produced by the US National Institute of Health/Fogarty International Center under SR01TW008508.
文摘The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l.a major black-fly vector of onchoceriasis,postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects.Generally,this correlation comes from two sources:(1)the design of the random effects and their assumed covariance from the multiple levels within the regression model and(2)the correlation structure of the residuals.Unfortunately,inconspicuous errors in residual intracluster correlation estimates can overstate precision in forecasted S.damnosum s.l.riverine larval habitat explanatory attributes regardless how they are treated(e.g.independent,autoregressive,Toeplitz,etc.).In this research,the geographical locations for multiple riverine-based S.damnosum s.l.larval ecosystem habitats sampled from two preestablished epidemiological sites in Togo were identified and recorded from July 2009 to June 2010.Initially,the data were aggregated into PROC GENMOD.An agglomerative hierarchical residual cluster-based analysis was then performed.The sampled clustered study site data was then analyzed for statistical correlations using monthly biting rates(MBR).Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS.A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by annual biting rates(ABR).The data was overlain onto multitemporal sub-meter pixel resolution satellite data(i.e.QuickBird 0.61m wavbands).Orthogonal spatial filter eigenvectors were then generated in SAS/Geographic Information Systems(GIS).Univariate and nonlinear regression-based models(i.e.logistic,Poisson,and negative binomial)were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data.Thereafter,Durbin–Watson statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG.Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC.The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters.The analyses also revealed that the estimators,levels of turbidity,and presence of rocks were statistically significant for the high-ABR-stratified clusters,while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster.Varying and constant coefficient regression models,ABRstratified GIS-generated clusters,sub-meter resolution satellite imagery,a robust residual intra-cluster diagnostic test,MBR-based histograms,eigendecomposition spatial filter algorithms,and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities(i.e.heteroskedasticity)for testing correlations between georeferenced S.damnosum s.l.riverine larval habitat estimators.The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S.damnosum s.l.habitats based on spatiotemporal field-sampled count data.
基金Supported by the National Natural Science Foundation of China(11101119)the Natural Science Foundation of Guangxi(2010GXNSFB013051)the Philosophy and Social Sciences Foundation of Guangxi(11FTJ002)
文摘In this paper,the estimation for a class of generalized varying coefficient models with error-prone covariates is considered.By combining basis function approximations with some auxiliary variables,an instrumental variable type estimation procedure is proposed.The asymptotic results of the estimator,such as the consistency and the weak convergence rate,are obtained.The proposed procedure can attenuate the effect of measurement errors and have proved workable for finite samples.
基金supported by Istanbul University Scientific Research Project Department with IRP-51706 Project Number.
文摘Biometric gait recognition is a lesser-known but emerging and effective biometric recognition method which enables subjects’walking patterns to be recognized.Existing research in this area has primarily focused on feature analysis through the extraction of individual features,which captures most of the information but fails to capture subtle variations in gait dynamics.Therefore,a novel feature taxonomy and an approach for deriving a relationship between a function of one set of gait features with another set are introduced.The gait features extracted from body halves divided by anatomical planes on vertical,horizontal,and diagonal axes are grouped to form canonical gait covariates.Canonical Correlation Analysis is utilized to measure the strength of association between the canonical covariates of gait.Thus,gait assessment and identification are enhancedwhenmore semantic information is available through CCA-basedmulti-feature fusion.Hence,CarnegieMellon University’s 3D gait database,which contains 32 gait samples taken at different paces,is utilized in analyzing gait characteristics.The performance of Linear Discriminant Analysis,K-Nearest Neighbors,Naive Bayes,Artificial Neural Networks,and Support Vector Machines was improved by a 4%average when the CCA-utilized gait identification approachwas used.Asignificant maximumaccuracy rate of 97.8%was achieved throughCCA-based gait identification.Beyond that,the rate of false identifications and unrecognized gaits went down to half,demonstrating state-of-the-art for gait identification.
基金Supported by the National Natural Science Foundation of China(Grant No.11861042)the China Statistical Research Project(Grant No.2020LZ25)。
文摘The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.
基金the Canadian Institutes of Health Research Quebec’s Ministry of Health and Social Services+1 种基金 the Estrie Regional Health and Social Services Agencythe Universite de Sherbrooke
文摘The objective of this study was to explore the relationship of sociodemographic, clinical, and health-services use-related variables with transitions between disability-based profiles. In a longitudinal study of 1386 people aged 75 and over living in the community at baseline, disabilities were assessed annually for up to four years with the Functional Autonomy Measurement System (SMAF), which generates 14 Iso-SMAF profiles. These profiles are grouped into 4 disability states, which are predominant alterations in instrumental activities of daily living (IADLs), mobility, mental functions as well as severe and mixed disabilities. Continuous-time, multi-state Markov modeling was used to identify the factors associated with transitions made by older people between these states and to institutionalization and death. Greater age and receiving help for ADL were associated with four transitions, while altered cognitive functions and hospitalization were associated with three, all involving more decline or less recovery. From mild IADL profiles, men have a higher risk of transitioning to intermediate predominantly mental profiles, while women are at higher risk of transitioning to intermediate predominantly mobility profiles. Unmet needs are associated with deterioration, from mild IADL to intermediate predominantly mobility profiles. These results help understanding the complex progression of disabilities in older people.
基金supported financially by the National Natural Science Foundation of China (91325301, 41571212 and 41137224)the Project of "One-Three-Five" Strategic Planning & Frontier Sciences of the Institute of Soil Science, Chinese Academy of Sciences (ISSASIP1622)the National Key Basic Research Special Foundation of China (2012FY112100)
文摘Environmental covariates are the basis of predictive soil mapping.Their selection determines the performance of soil mapping to a great extent,especially in cases where the number of soil samples is limited but soil spatial heterogeneity is high.In this study,we proposed an integrated method to select environmental covariates for predictive soil depth mapping.First,candidate variables that may influence the development of soil depth were selected based on pedogenetic knowledge.Second,three conventional methods(Pearson correlation analysis(PsCA),generalized additive models(GAMs),and Random Forest(RF))were used to generate optimal combinations of environmental covariates.Finally,three optimal combinations were integrated to produce a final combination based on the importance and occurrence frequency of each environmental covariate.We tested this method for soil depth mapping in the upper reaches of the Heihe River Basin in Northwest China.A total of 129 soil sampling sites were collected using a representative sampling strategy,and RF and support vector machine(SVM)models were used to map soil depth.The results showed that compared to the set of environmental covariates selected by the three conventional selection methods,the set of environmental covariates selected by the proposed method achieved higher mapping accuracy.The combination from the proposed method obtained a root mean square error(RMSE)of 11.88 cm,which was 2.25–7.64 cm lower than the other methods,and an R^2 value of 0.76,which was 0.08–0.26 higher than the other methods.The results suggest that our method can be used as an alternative to the conventional methods for soil depth mapping and may also be effective for mapping other soil properties.
文摘Given a sample of regression data from (Y, Z), a new diagnostic plotting method is proposed for checking the hypothesis H0: the data are from a given Cox model with the time-dependent covariates Z. It compares two estimates of the marginal distribution FY of Y. One is an estimate of the modified expression of FY under H0, based on a consistent estimate of the parameter under H0, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of FY, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing H0. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.
文摘The consideration of the time-varying covariate and time-varying coefficient effect in survival models are plausible and robust techniques. Such kind of analysis can be carried out with a general class of semiparametric transformation models. The aim of this article is to develop modified estimating equations under semiparametric transformation models of survival time with time-varying coefficient effect and time-varying continuous covariates. For this, it is important to organize the data in a counting process style and transform the time with standard transformation classes which shall be applied in this article. In the situation when the effect of coefficient and covariates change over time, the widely used maximum likelihood estimation method becomes more complex and burdensome in estimating consistent estimates. To overcome this problem, alternatively, the modified estimating equations were applied to estimate the unknown parameters and unspecified monotone transformation functions. The estimating equations were modified to incorporate the time-varying effect in both coefficient and covariates. The performance of the proposed methods is tested through a simulation study. To sum up the study, the effect of possibly time-varying covariates and time-varying coefficients was evaluated in some special cases of semiparametric transformation models. Finally, the results have shown that the role of the time-varying covariate in the semiparametric transformation models was plausible and credible.
文摘The empirical likelihood-based inference for varying coefficient models with missing covariates is investigated. An imputed empirical likelihood ratio function for the coefficient functions is proposed, and it is shown that iis limiting distribution is standard chi-squared. Then the corresponding confidence intervals for the regression coefficients are constructed. Some simulations show that the proposed procedure can attenuate the effect of the missing data, and performs well for the finite sample.
基金supported by the Special Project for Training Young Science and Technology Talents in Early Career Development of Jiangxi Province,China(Grant No.20244BCE52087).
文摘This study proposes a regression-based estimation method in difference-in-differences settings in the presence of time-varying covariates-a scenario commonly encountered in applications.We impose only a conditional parallel trends assumption with timevarying covariates and plausible assumptions on the conditional expectation functions.We show that a family of causal effect parameters is exactly the coefficient estimators from our proposed regressions even in the presence of staggered treatment timing and treatment effect heterogeneity across cohorts,time periods,and covariates.These parameters can be further aggregated to the dynamic treatment effects and the overall effect of being treated.We establish the corresponding asymptotic properties.Simulation studies suggest that our proposed regression-based estimators successfully outperform in estimating the causal parameters.Finally,we apply this method to evaluate the effect of intrastate bank deregulation on income inequality in the United States in the setting of Beck et al.(2010).We find substantially different results based on our proposed method.
基金supported by the MME-DII center of excellence(ANR-11-LABEX-0023-01)the ANR BREAKRISK:ANR-17-CE26-0001-01+1 种基金the CY Initiative of Excellence(grant“Investissements d’Avenir”ANR-16-IDEX-0008)Project“EcoDep”PSI-AAP2020-0000000013.
文摘In this article,we study a robust estimation method for a general class of integervalued time series models.The conditional distribution of the process belongs to a broad class of distributions and unlike the classical autoregressive framework,the conditional mean of the process also depends on some exogenous covariates.We derive a robust inference procedure based on the minimum density power divergence.Under certain regularity conditions,we establish that the proposed estimator is consistent and asymptotically normal.In the case where the conditional distribution belongs to the exponential family,we provide sufficient conditions for the existence of a stationary and ergodicτ-weakly dependent solution.Simulation experiments are conducted to illustrate the empirical performances of the estimator.An application to the number of transactions per minute for the stock Ericsson B is also provided.
基金supported by National Natural Science Foundation of China(Grant Nos.12322114,12171188 and 11771171)the Fundamental Research Funds for the Central Universities。
文摘How to achieve the tradeoff between privacy and utility is one of fundamental problems in the private data analysis.In this paper,we give a rigorously differentially private analysis of networks in the appearance of covariates via a generalized β-model,which has an n-dimensional degree parameterβand a p-dimensional homophily parameter γ.Under(k_(n),∈_(n))-edge differential privacy,we use the popular Laplace mechanism to release the network statistics.The method of moments is used to estimate the unknown model parameters.We establish the conditions guaranteeing consistency of the differentially private estimators β and γ as the number of nodes n goes to infinity,which reveals an interesting tradeoff between a privacy parameter and model parameters.The consistency is shown by applying a two-stage Newton's method to obtain the upper bound of the error between(β,γ)and its true value(β,γ)in terms of thel_(∞)distance,which has a convergence rate of a rough order 1/n^(1/2)for β and 1/n for γ,up to a logarithm factor.Furthermore,we derive the asymptotic normalities of β and γ,whose asymptotic variances are the same as those of the non-private estimators under some conditions.Our paper sheds light on how to explore asymptotic theory under differential privacy in a principled manner;these principled methods should be applicable to a class of network models with covariates beyond the generalized β-model.
基金Supported by National Science Foundation of China grants(Grant No.11571263)
文摘This paper deals with the analysis of accelerated failure time model when the primary covariate is subject to missing. We assume that the true covariate is measured precisely on a randomly chosen validation set, whereas auxiliary information for primary covariate is available to all study subjects. The asymptotic properties for the proposed estimator are developed and the simulation studies show that the efficiency gain is remarkable compared to the method using only the validation sample. A real example is also provided as an illustration.
基金Supported by the National Natural Science Foundation of China(Grant Nos.12001277,12271046 and 12131006)。
文摘The penalized variable selection methods are often used to select the relevant covariates and estimate the unknown regression coefficients simultaneously,but these existing methods may fail to be consistent for the setting with highly correlated covariates.In this paper,the semi-standard partial covariance(SPAC)method with Lasso penalty is proposed to study the generalized linear model with highly correlated covariates,and the consistencies of the estimation and variable selection are shown in high-dimensional settings under some regularity conditions.Some simulation studies and an analysis of colon tumor dataset are carried out to show that the proposed method performs better in addressing highly correlated problem than the traditional penalized variable selection methods.
基金supported by the Natural Science Foundation China under Grant Nos.10801019 and 10726037
文摘Causation is a distinct concept from association and more important than association inepidemiologic studies.This paper proposes the concept of uniform non-confounding for causal distributioneffects over multiple covariates,and gives the sufficient conditions for uniform non-confoundingover a covariate set C including confounders or non-confounders,and also shows the conditions forconditionally non-confounding in the subpopulations.All these conditions can be tested by observeddata.
基金This research was supported by Key projects of philosophy and social science in Beijing(15ZDA47)National Natural Science Foundation of China(Grant Nos.11571340,11971045)Beijing Natural Science Foundation(1202001)and the Open Project of Key Laboratory of Big Data Mining and Knowledge Management,Chinese Academy of Sciences.
文摘In recent years,there has been a large amount of literature on missing data.Most of them focus on situations where there is only missingness in response or covariate.In this paper,we consider the adequacy check for the linear regression model with the response and covariates missing simultaneously.We apply model adjustment and inverse probability weighting methods to deal with the missingness of response and covariate,respectively.In order to avoid the curse of dimension,we propose an empirical process test with the linear indicator weighting function.The asymptotic properties of the proposed test under the null,local and global alternative hypothe tical models are rigorously investigated.A consisten t wild boot strap method is developed to approximate the critical value.Finally,simulation studies and real data analysis are performed to show that the proposed method performed well.
基金supported by the National Natural Science Foundation of China (No. 10901162,10926073)China Postdoctoral Science Foundation and the President Fund of GUCAS+1 种基金the foundation of the Key Laboratory of Random Complex Structures and Data Science, CASsupported by a research grant from the Research Committee, The Hong Kong Polytechnic University
文摘In this paper, we investigate the model checking problem for a general linear model with nonignorable missing covariates. We show that, without any parametric model assumption for the response probability, the least squares method yields consistent estimators for the linear model even if only the complete data are applied. This makes it feasible to propose two testing procedures for the corresponding model checking problem: a score type lack-of-fit test and a test based on the empirical process. The asymptotic properties of the test statistics are investigated. Both tests are shown to have asymptotic power 1 for local alternatives converging to the null at the rate n-r, 0 ≤ r 〈 1/2. Simulation results show that both tests perform satisfactorily.
文摘Feature screening with missing data is a critical problem but has not been well addressed in theliterature. In this discussion we propose a new screening index based on “information value” andapply it to feature screening with missing covariates.
基金financially supported by the National Key Research and Development Program of China(2021YFD2200405)the National Natural Science Foundation of China(31930078)special funds for Baotianman Forest Ecosystem Research Station from Chinese Academy of Forestry and Ministry of Science and Technology of China。
文摘Frequent droughts pose considerable threat to global forest carbon uptake,but little is known about the response of forest carbon fluxes in climatic transition zones to seasonal drought.In this study,the responses of carbon fluxes to seasonal drought in two natural forests(Quercus aliena var.acute serrata Maxim and Pinus tabuliformis Carr.)in the Baotianman Nature Reserve were investigated.The Q.aliena forest exhibited a high resilience with stable gross primary productivity(GPP).However,ecosystem respiration(Re)significantly declined by 18.4%compared with normal years,leading to an increase in net carbon sequestration capacity of 4.1%.This resilience was attributed to its deep root system accessing soil water(SWC_(50cm))to sustain stomatal openness,coupled with the efficient utilization of photosynthetically active radiation to drive photosynthesis.In contrast,the P.tabuliformis forest,which relied on shallow soil moisture(SWC_(20cm)),experienced simultaneous decreases in both GPP and Re during drought,with a sharply greater decrease in GPP,resulting in low net carbon sink capacity.Further analysis revealed that the Q.aliena forest prioritized carbon assimilation through a deep water-stomatal synergy strategy(anisohydric behavior),whereas the P.tabuliformis forest adopted an isohydric strategy favoring water conservation at the expense of carbon fixation efficiency.These findings highlight distinct mechanisms underlying drought adaptation between forest types,providing critical insight into optimizing forest carbon cycle models and selecting drought-resistant species under the influence of climate change.