The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In th...The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.展开更多
The paper discusses the regression analysis of current status data,which is common in various fields such as tumorigenic research and demographic studies.Analyzing this type of data poses a significant challenge and h...The paper discusses the regression analysis of current status data,which is common in various fields such as tumorigenic research and demographic studies.Analyzing this type of data poses a significant challenge and has recently gained considerable interest.Furthermore,the authors consider an even more difficult scenario where,apart from censoring,one also faces left-truncation and informative censoring,meaning that there is a potential correlation between the examination time and the failure time of interest.The authors propose a sieve maximum likelihood estimation(MLE)method and in the proposed method for inference,a copula-based procedure is applied to depict the informative censoring.Additionally,the authors utilise the splines to estimate the unknown nonparametric functions in the model,and the asymptotic properties of the proposed estimator are established.The simulation results indicate that the developed approach is effective in practice,and it has been successfully applied to a set of real data.展开更多
Differently from the general online social network(OSN),locationbased mobile social network(LMSN),which seamlessly integrates mobile computing and social computing technologies,has unique characteristics of temporal,s...Differently from the general online social network(OSN),locationbased mobile social network(LMSN),which seamlessly integrates mobile computing and social computing technologies,has unique characteristics of temporal,spatial and social correlation.Recommending friends instantly based on current location of users in the real world has become increasingly popular in LMSN.However,the existing friend recommendation methods based on topological structures of a social network or non-topological information such as similar user profiles cannot well address the instant making friends in the real world.In this article,we analyze users' check-in behavior in a real LMSN site named Gowalla.According to this analysis,we present an approach of recommending friends instantly for LMSN users by considering the real-time physical location proximity,offline behavior similarity and friendship network information in the virtual community simultaneously.This approach effectively bridges the gap between the offline behavior of users in the real world and online friendship network information in the virtual community.Finally,we use the real user check-in dataset of Gowalla to verify the effectiveness of our approach.展开更多
Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning des...Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning desires a large sample size,the existing methods may result in uncertain findings when the dataset has a small sample size,commonly seen in omics data analysis.With the explosion and availability of omics data from multiple populations/studies,the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets,which might lead to inaccurate variable selection results.We propose a penalized integrative deep neural network(PIN)to simultaneously select important variables from multiple datasets.PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework.Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets.The source code is freely available on Github(rucliyang/PINFunc).We speculate that the proposed PIN method will promote the identification of disease-related important variables based on multiple studies/datasets from diverse origins.展开更多
In this paper,we study the optimal timing to convert the risk of business for an insurance company in order to improve its solvency.The cash flow of company evolves according to a jump-diffusion process.Business conve...In this paper,we study the optimal timing to convert the risk of business for an insurance company in order to improve its solvency.The cash flow of company evolves according to a jump-diffusion process.Business conversion option offers the company an opportunity to transfer the jump risk business out.In exchange for this option,the company needs to pay both fixed and proportional transaction costs.The proportional cost can also be seen as the profit loading of the jump risk business.We formulated this problem as an optimal stopping problem.By solving this stopping problem,we find that the optimal timing of business conversion mainly depends on the profit loading of the jump risk business.A larger profit loading would make the conversion option valueless.The fixed cost,however,only delays the optimal timing of business conversion.In the end,numerical results are provided to illustrate the impacts of transaction costs and environmental parameters to the optimal strategies.展开更多
To the Editor:Esophageal cancer,one of the most common cancer types in China,with an estimated 346,633 new cases and 323,600 deaths in 2022,is becoming an increasingly serious clinical and public health problem.^([1])...To the Editor:Esophageal cancer,one of the most common cancer types in China,with an estimated 346,633 new cases and 323,600 deaths in 2022,is becoming an increasingly serious clinical and public health problem.^([1])The successful promotion of the self-management strategy has indicated that lifestyle modifications can be valuable in the primary prevention of cancer development.Adopting a healthy lifestyle has become a novel strategy for primary prevention and risk reduction in high-risk areas.Previous epidemiological studies have identified several lifestyle-related risk factors for esophageal cancer,including smoking and diet.^([2])Each factor can typically explain a modest proportion of cancer risk.However,when combined,these known risk factors may substantially affect the risk of esophageal cancer.Nevertheless,some risk factors for esophageal cancer are non-modifiable,including age,low socioeconomic status,and family history.Whether and how these non-modifiable risk factors affect primary cancer prevention by intervening with modifiable risk factors remain unclear.展开更多
Space-filling designs are popular for computer experiments.Therein space-filling designs with good two-dimensional projection are preferred as two-factor interactions are more likely to be important than three-or high...Space-filling designs are popular for computer experiments.Therein space-filling designs with good two-dimensional projection are preferred as two-factor interactions are more likely to be important than three-or higher-order interactions in practice.Considering two-dimensional projection,the authors propose a new class of designs called group strong orthogonal arrays.A group strong orthogonal array enjoys attractive two-dimensional space-filling property in the sense that it can be partitioned into groups,where any two columns can achieve stratifications on s^(u_(1))×s^(u_(2))grids for any positive integers u_(1),u_(2) with u_(1)+u_(2)=3,and any two columns from different groups can achieve stratifications on s^(v_(1))×s^(v_(2))grids for any positive integers v_(1),v_(2) with v_(1)+v_(2)=4.Few existing designs enjoy such a.ppealing two-dimensional stratification property in the literature.And the level numbers of the obtained designs can be s^(3)or s^(4).In addition to the attractive stratification property,the proposed designs perform very well under orthogonality and uniform projection criteria,and are flexible in run sizes,rendering them highly suitable for computer experiments.展开更多
This paper discusses regression analysis of interval-censored failure time data arising from the accelerated failure time model in the presence of informative censoring.For the problem,a sieve maximum likelihood estim...This paper discusses regression analysis of interval-censored failure time data arising from the accelerated failure time model in the presence of informative censoring.For the problem,a sieve maximum likelihood estimation approach is proposed and in the method,the copula model is employed to describe the relationship between the failure time of interest and the censoring or observation process.Also I-spline functions are used to approximate the unknown functions in the model,and a simulation study is carried out to assess the finite sample performance of the proposed approach and suggests that it works well in practical situations.In addition,an illustrative example is provided.展开更多
Based on Vector Aitken (VA) method, we propose an acceleration Expectation-Maximization (EM) algorithm, VA-accelerated EM algorithm, whose convergence speed is faster than that of EM algorithm. The VA-accelerated ...Based on Vector Aitken (VA) method, we propose an acceleration Expectation-Maximization (EM) algorithm, VA-accelerated EM algorithm, whose convergence speed is faster than that of EM algorithm. The VA-accelerated EM algorithm does not use the information matrix but only uses the sequence of estimates obtained from iterations of the EM algorithm, thus it keeps the flexibility and simplicity of the EM algorithm. Considering Steffensen iterative process, we have also given the Steffensen form of the VA-accelerated EM algorithm. It can be proved that the reform process is quadratic convergence. Numerical analysis illustrate the proposed methods are efficient and faster than EM algorithm.展开更多
Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and specificity.For the situation,anoth...Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and specificity.For the situation,another issue that may occur is that the observation time may be correlated with the interested failure time,which is often referred to as informative censoring or observation times.It is well-known that in the presence of informative censoring,the analysis that ignores it could yield biased or even misleading results.In this paper,the authors consider such data and propose a frailty-based inference procedure.In particular,an EM algorithm based on Poisson latent variables is developed and the asymptotic properties of the resulting estimators are established.The numerical results show that the proposed method works well in practice and an application to a set of real data is provided.展开更多
In this paper, linear errors-in-response models are considered in the presence of validation data on the responses. A semiparametric dimension reduction technique is employed to define an estimator of β with asymptot...In this paper, linear errors-in-response models are considered in the presence of validation data on the responses. A semiparametric dimension reduction technique is employed to define an estimator of β with asymptotic normality, the estimated empirical loglikelihoods and the adjusted empirical loglikelihoods for the vector of regression coefficients and linear combinations of the regression coefficients, respectively. The estimated empirical log-likelihoods are shown to be asymptotically distributed as weighted sums of independent X 2 1 and the adjusted empirical loglikelihoods are proved to be asymptotically distributed as standard chi-squares, respectively.展开更多
This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values ...This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.展开更多
Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the...Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the minimal spanning tree for one random vector X,and for each edges in minimal spanning tree,the corresponding rank number can be calculated based on another random vector Y.The resulting test statistics are constructed by these rank numbers.However,the existed statistics are not symmetrical tests about the random vectors X and Y such that the power performance from minimal spanning tree of X is not the same as that from minimal spanning tree of Y.In addition,the conclusion from minimal spanning tree of X might conflict with that from minimal spanning tree of Y.In order to solve these problems,we propose several symmetrical independence tests for X and Y.The exact distributions of test statistics are investigated when the sample size is small.Also,we study the asymptotic properties of the statistics.A permutation method is introduced for getting critical values of the statistics.Compared with the existing methods,our proposed methods are more efficient demonstrated by numerical analysis.展开更多
Prior empirical studies find positive and negative momentum effect across the global nations, but few focus on explaining the mixed results. In order to address this issue, we apply the quantile regression approach to...Prior empirical studies find positive and negative momentum effect across the global nations, but few focus on explaining the mixed results. In order to address this issue, we apply the quantile regression approach to analyze the momentum effect in the context of Chinese stock market in this paper. The evidence suggests that the momentum effect in Chinese stock is not stable across firms with different levels of performance. We find that negative momentum effect in the short and medium horizon (3 months and 9 months) increases with the quantile of stock returns. And the positive momentum effect is observed in the long horizon (12 months), which also intensifies for the high performing stocks. According to our study, momentum effect needs to be examined on the basis of stock returns. OLS estimation, which gives an exclusive and biased result, provides misguiding intuitions for momentum effect across the global nations. Based on the empirical results of quantile regression, effective risk control strategies can also be inspired by adjusting the proportion of assets with past performances.展开更多
In many fields, we need to deal with hierarchically structured data.For this kind of data, hierarchical mixed effects model can show the correlationof variables in the same level by establishing a model for regression...In many fields, we need to deal with hierarchically structured data.For this kind of data, hierarchical mixed effects model can show the correlationof variables in the same level by establishing a model for regression coefficients.Due to the complexity of the random part in this model, seeking an effectivemethod to estimate the covariance matrix is an appealing issue. Iterative generalizedleast squares estimation method was proposed by Goldstein in 1986 and wasapplied in special case of hierarchical model. In this paper, we extend themethod to the general hierarchical mixed effects model, derive its expressions indetail and apply it to economic examples.展开更多
The architecture of apple trees plays a pivotal role in shaping their growth and fruit-bearing potential,forming the foundation for precision apple management.Traditionally,2D imaging technologies were employed to del...The architecture of apple trees plays a pivotal role in shaping their growth and fruit-bearing potential,forming the foundation for precision apple management.Traditionally,2D imaging technologies were employed to delineate the architectural traits of apple trees,but their accuracy was hampered by occlusion and perspective ambiguities.This study aimed to surmount these constraints by devising a 3D geometry-based processing pipeline for apple tree structure segmentation and architectural trait characterization,utilizing point clouds collected by a terrestrial laser scanner(TLS).The pipeline consisted of four modules:(a)data preprocessing module,(b)tree instance segmentation module,(c)tree structure segmentation module,and(d)architectural trait extraction module.The developed pipeline was used to analyze 84 trees of two representative apple cultivars,characterizing architectural traits such as tree height,trunk diameter,branch count,branch diameter,and branch angle.Experimental results indicated that the established pipeline attained an R^(2)of 0.92 and 0.83,and a mean absolute error(MAE)of 6.1cm and 4.71mm for tree height and trunk diameter at the tree level,respectively.Additionally,at the branch level,it achieved an R^(2)of 0.77 and 0.69,and a MAE of 6.86 mm and 7.48°for branch diameter and angle,respectively.The accurate measurement of these architectural traits can enable precision management in high-density apple orchards and bolster phenotyping endeavors in breeding programs.Moreover,bottlenecks of 3D tree characterization in general were comprehensively analyzed to reveal future development.展开更多
Strong orthogonal arrays(SOAs) were recently introduced and studied as a class of spacefilling designs for computer experiments. To surely realize better space-filling properties, SOAs of strength three or higher are ...Strong orthogonal arrays(SOAs) were recently introduced and studied as a class of spacefilling designs for computer experiments. To surely realize better space-filling properties, SOAs of strength three or higher are desirable. In addition, orthogonality is also an important property for designs of computer experiments, because it guarantees that the estimates of the main effects are uncorrelated. This paper first provides a systematic study on the construction of(nearly) orthogonal strength-three SOAs with better space-filling properties. The newly proposed strength-three SOAs enjoy almost the same space-filling properties of strength-four SOAs, and can accommodate much more columns than the latter. Moreover, they are(nearly) orthogonal and flexible in run sizes. The construction methods are straightforward to implement, and their theoretical supports are well established. In addition to the theoretical results, many designs are tabulated for practical needs.展开更多
The first passage time has many applications in fields like finance,econometrics,statistics,and biology.However,explicit formulas for the first passage density have only been obtained for a few cases.This paper derive...The first passage time has many applications in fields like finance,econometrics,statistics,and biology.However,explicit formulas for the first passage density have only been obtained for a few cases.This paper derives an explicit formula for the first passage density of Brownian motion with twosided piecewise continuous boundaries which may have some points of discontinuity.Approximations are used to obtain a simplified formula for estimating the first passage density.Moreover,the results are also generalized to the case of two-sided general nonlinear boundaries.Simulations can be easily carried out with Monte Carlo method and it is demonstrated for several typical two-sided boundaries that the proposed approximation method offers a highly accurate approximation of first passage density.展开更多
Principal component analysis(PCA)is ubiquitous in statistics and machine learning domains.It is frequently used as an intermediate procedure in various regression and classification problems to reduce the dimensionali...Principal component analysis(PCA)is ubiquitous in statistics and machine learning domains.It is frequently used as an intermediate procedure in various regression and classification problems to reduce the dimensionality of datasets.However,as the size of datasets becomes extremely large,direct application of PCA may not be feasible since loading and storing massive datasets may exceed the computational ability of common machines.To address this problem,subsampling is usually performed,in which a small proportion of the data is used as a surrogate of the entire dataset.This paper proposes an A-optimal subsampling algorithm to decrease the computational cost of PCA for super-large datasets.To be more specific,we establish the consistency and asymptotic normality of the eigenvectors of the subsampled covariance matrix.Subsequently,we derive the optimal subsampling probabilities for PCA based on the A-optimality criterion.We validate the theoretical results by conducting extensive simulation studies.Moreover,the proposed subsampling algorithm for PCA is embedded into a classification procedure for handwriting data to assess its effectiveness in real-world applications.展开更多
基金Supported by the National Natural Science Foundation of China(Grant No.11861042)the China Statistical Research Project(Grant No.2020LZ25)。
文摘The lack of covariate data is one of the hotspots of modern statistical analysis.It often appears in surveys or interviews,and becomes more complex in the presence of heavy tailed,skewed,and heteroscedastic data.In this sense,a robust quantile regression method is more concerned.This paper presents an inverse weighted quantile regression method to explore the relationship between response and covariates.This method has several advantages over the naive estimator.On the one hand,it uses all available data and the missing covariates are allowed to be heavily correlated with the response;on the other hand,the estimator is uniform and asymptotically normal at all quantile levels.The effectiveness of this method is verified by simulation.Finally,in order to illustrate the effectiveness of this method,we extend it to the more general case,multivariate case and nonparametric case.
基金supported by the National Natural Science Foundation of China under Grant Nos.12171328,12001093,12231011,and 12071176the National Key Research and Development Program of China under Grant No.2020YFA0714102Beijing Natural Science Foundation under Grant No.Z210003。
文摘The paper discusses the regression analysis of current status data,which is common in various fields such as tumorigenic research and demographic studies.Analyzing this type of data poses a significant challenge and has recently gained considerable interest.Furthermore,the authors consider an even more difficult scenario where,apart from censoring,one also faces left-truncation and informative censoring,meaning that there is a potential correlation between the examination time and the failure time of interest.The authors propose a sieve maximum likelihood estimation(MLE)method and in the proposed method for inference,a copula-based procedure is applied to depict the informative censoring.Additionally,the authors utilise the splines to estimate the unknown nonparametric functions in the model,and the asymptotic properties of the proposed estimator are established.The simulation results indicate that the developed approach is effective in practice,and it has been successfully applied to a set of real data.
基金National Key Basic Research Program of China (973 Program) under Grant No.2012CB315802 and No.2013CB329102.National Natural Science Foundation of China under Grant No.61171102 and No.61132001.New generation broadband wireless mobile communication network Key Projects for Science and Technology Development under Grant No.2011ZX03002-002-01,Beijing Nova Program under Grant No.2008B50 and Beijing Higher Education Young Elite Teacher Project under Grant No.YETP0478
文摘Differently from the general online social network(OSN),locationbased mobile social network(LMSN),which seamlessly integrates mobile computing and social computing technologies,has unique characteristics of temporal,spatial and social correlation.Recommending friends instantly based on current location of users in the real world has become increasingly popular in LMSN.However,the existing friend recommendation methods based on topological structures of a social network or non-topological information such as similar user profiles cannot well address the instant making friends in the real world.In this article,we analyze users' check-in behavior in a real LMSN site named Gowalla.According to this analysis,we present an approach of recommending friends instantly for LMSN users by considering the real-time physical location proximity,offline behavior similarity and friendship network information in the virtual community simultaneously.This approach effectively bridges the gap between the offline behavior of users in the real world and online friendship network information in the virtual community.Finally,we use the real user check-in dataset of Gowalla to verify the effectiveness of our approach.
基金National Natural Science Foundation of China,Grant/Award Number:72271237Building World-class Universities of Renmin University of China,Grant/Award Number:21XNF037。
文摘Deep learning has been increasingly popular in omics data analysis.Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability.However,because deep learning desires a large sample size,the existing methods may result in uncertain findings when the dataset has a small sample size,commonly seen in omics data analysis.With the explosion and availability of omics data from multiple populations/studies,the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets,which might lead to inaccurate variable selection results.We propose a penalized integrative deep neural network(PIN)to simultaneously select important variables from multiple datasets.PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework.Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets.The source code is freely available on Github(rucliyang/PINFunc).We speculate that the proposed PIN method will promote the identification of disease-related important variables based on multiple studies/datasets from diverse origins.
基金supported by the National Natural Science Foundation of China(No.12101300,No.12371478 and No.12071498)。
文摘In this paper,we study the optimal timing to convert the risk of business for an insurance company in order to improve its solvency.The cash flow of company evolves according to a jump-diffusion process.Business conversion option offers the company an opportunity to transfer the jump risk business out.In exchange for this option,the company needs to pay both fixed and proportional transaction costs.The proportional cost can also be seen as the profit loading of the jump risk business.We formulated this problem as an optimal stopping problem.By solving this stopping problem,we find that the optimal timing of business conversion mainly depends on the profit loading of the jump risk business.A larger profit loading would make the conversion option valueless.The fixed cost,however,only delays the optimal timing of business conversion.In the end,numerical results are provided to illustrate the impacts of transaction costs and environmental parameters to the optimal strategies.
基金supported by grants from the National Natural Science Foundation of China(No.72104150)MOE Project of Key Research Institute of Humanities and Social Sciences(No.22JJD910001)+2 种基金CAMS Innovation Fund for Medical Sciences(No.2021-I2M-1-010)the Natural Science Foundation of Beijing(No.7204249)Platform of Public Health&Disease Control and Prevention,Major Innovation&Planning Interdisciplinary Platform for the"Double-First Class"Initiative,Renmin University of China
文摘To the Editor:Esophageal cancer,one of the most common cancer types in China,with an estimated 346,633 new cases and 323,600 deaths in 2022,is becoming an increasingly serious clinical and public health problem.^([1])The successful promotion of the self-management strategy has indicated that lifestyle modifications can be valuable in the primary prevention of cancer development.Adopting a healthy lifestyle has become a novel strategy for primary prevention and risk reduction in high-risk areas.Previous epidemiological studies have identified several lifestyle-related risk factors for esophageal cancer,including smoking and diet.^([2])Each factor can typically explain a modest proportion of cancer risk.However,when combined,these known risk factors may substantially affect the risk of esophageal cancer.Nevertheless,some risk factors for esophageal cancer are non-modifiable,including age,low socioeconomic status,and family history.Whether and how these non-modifiable risk factors affect primary cancer prevention by intervening with modifiable risk factors remain unclear.
基金supported by the National Natural Science Foundation of China under Grant Nos.12301323,12261011,and 12131001the MOE Project of Key Research Institute of Humanities and Social Sciences under Grant No.22JJD110001。
文摘Space-filling designs are popular for computer experiments.Therein space-filling designs with good two-dimensional projection are preferred as two-factor interactions are more likely to be important than three-or higher-order interactions in practice.Considering two-dimensional projection,the authors propose a new class of designs called group strong orthogonal arrays.A group strong orthogonal array enjoys attractive two-dimensional space-filling property in the sense that it can be partitioned into groups,where any two columns can achieve stratifications on s^(u_(1))×s^(u_(2))grids for any positive integers u_(1),u_(2) with u_(1)+u_(2)=3,and any two columns from different groups can achieve stratifications on s^(v_(1))×s^(v_(2))grids for any positive integers v_(1),v_(2) with v_(1)+v_(2)=4.Few existing designs enjoy such a.ppealing two-dimensional stratification property in the literature.And the level numbers of the obtained designs can be s^(3)or s^(4).In addition to the attractive stratification property,the proposed designs perform very well under orthogonality and uniform projection criteria,and are flexible in run sizes,rendering them highly suitable for computer experiments.
基金supported by the National Natural Science Foundation of China under Grant No.11671168the Science and Technology Developing Plan of Jilin Province under Grant No.20200201258JC。
文摘This paper discusses regression analysis of interval-censored failure time data arising from the accelerated failure time model in the presence of informative censoring.For the problem,a sieve maximum likelihood estimation approach is proposed and in the method,the copula model is employed to describe the relationship between the failure time of interest and the censoring or observation process.Also I-spline functions are used to approximate the unknown functions in the model,and a simulation study is carried out to assess the finite sample performance of the proposed approach and suggests that it works well in practical situations.In addition,an illustrative example is provided.
基金Supported by the National Natural Science Foundation of China(No.11071253,11471335,11626130)
文摘Based on Vector Aitken (VA) method, we propose an acceleration Expectation-Maximization (EM) algorithm, VA-accelerated EM algorithm, whose convergence speed is faster than that of EM algorithm. The VA-accelerated EM algorithm does not use the information matrix but only uses the sequence of estimates obtained from iterations of the EM algorithm, thus it keeps the flexibility and simplicity of the EM algorithm. Considering Steffensen iterative process, we have also given the Steffensen form of the VA-accelerated EM algorithm. It can be proved that the reform process is quadratic convergence. Numerical analysis illustrate the proposed methods are efficient and faster than EM algorithm.
基金supported by the National Natural Science Foundation of China under Grant Nos. 12001093,12071176the National Key Research and Development Program of China under Grant No. 2020YFA0714102the Science and Technology Developing Plan of Jilin Province under Grant No. 20200201258JC
文摘Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and specificity.For the situation,another issue that may occur is that the observation time may be correlated with the interested failure time,which is often referred to as informative censoring or observation times.It is well-known that in the presence of informative censoring,the analysis that ignores it could yield biased or even misleading results.In this paper,the authors consider such data and propose a frailty-based inference procedure.In particular,an EM algorithm based on Poisson latent variables is developed and the asymptotic properties of the resulting estimators are established.The numerical results show that the proposed method works well in practice and an application to a set of real data is provided.
基金This work was supported by the National Natural Science Foundation of China(Key Grant 10231030,Special Grant 10241001)Humboldt-Universitat Berlin-Sonderforschungsbereich 373.
文摘In this paper, linear errors-in-response models are considered in the presence of validation data on the responses. A semiparametric dimension reduction technique is employed to define an estimator of β with asymptotic normality, the estimated empirical loglikelihoods and the adjusted empirical loglikelihoods for the vector of regression coefficients and linear combinations of the regression coefficients, respectively. The estimated empirical log-likelihoods are shown to be asymptotically distributed as weighted sums of independent X 2 1 and the adjusted empirical loglikelihoods are proved to be asymptotically distributed as standard chi-squares, respectively.
基金supported by a grant from the University Grants Council of Hong Kong, National Natural Science Foundation of China (Grant No. 11471335)the Ministry of Education Project of Key Research Institute of Humanities and Social Sciences at Universities (Grant No. 16JJD910002)Fund for Building World-Class Universities (Disciplines) of Renmin University of China
文摘This paper proposes a novel method for testing the equality of high-dimensional means using a multiple hypothesis test. The proposed method is based on the maximum of standardized partial sums of logarithmic p-values statistic. Numerical studies show that the method performs well for both normal and non-normal data and has a good power performance under both dense and sparse alternative hypotheses. For illustration, a real data analysis is implemented.
基金Beijing Natural Science Foundation(Grant No.Z200001)National Natural Science Foundation of China(Grant Nos.11871001,11971478 and 11971001)the Fundamental Research Funds for the Central Universities(Grant No.2019NTSS18)。
文摘Test of independence between random vectors X and Y is an essential task in statistical inference.One type of testing methods is based on the minimal spanning tree of variables X and Y.The main idea is to generate the minimal spanning tree for one random vector X,and for each edges in minimal spanning tree,the corresponding rank number can be calculated based on another random vector Y.The resulting test statistics are constructed by these rank numbers.However,the existed statistics are not symmetrical tests about the random vectors X and Y such that the power performance from minimal spanning tree of X is not the same as that from minimal spanning tree of Y.In addition,the conclusion from minimal spanning tree of X might conflict with that from minimal spanning tree of Y.In order to solve these problems,we propose several symmetrical independence tests for X and Y.The exact distributions of test statistics are investigated when the sample size is small.Also,we study the asymptotic properties of the statistics.A permutation method is introduced for getting critical values of the statistics.Compared with the existing methods,our proposed methods are more efficient demonstrated by numerical analysis.
基金Supported in part by National Natural Science Foundation of China(No.11271368)Beijing Philosophy and Social Science Foundation Grant(No.12JGB051)+3 种基金Project of Ministry of Education supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.20130004110007)The Key Program of National Philosophy and Social Science Foundation Grant(No.13AZD064)Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China(No.10XNK025)China Statistical Research Project(No.2011LZ031)
文摘Prior empirical studies find positive and negative momentum effect across the global nations, but few focus on explaining the mixed results. In order to address this issue, we apply the quantile regression approach to analyze the momentum effect in the context of Chinese stock market in this paper. The evidence suggests that the momentum effect in Chinese stock is not stable across firms with different levels of performance. We find that negative momentum effect in the short and medium horizon (3 months and 9 months) increases with the quantile of stock returns. And the positive momentum effect is observed in the long horizon (12 months), which also intensifies for the high performing stocks. According to our study, momentum effect needs to be examined on the basis of stock returns. OLS estimation, which gives an exclusive and biased result, provides misguiding intuitions for momentum effect across the global nations. Based on the empirical results of quantile regression, effective risk control strategies can also be inspired by adjusting the proportion of assets with past performances.
文摘In many fields, we need to deal with hierarchically structured data.For this kind of data, hierarchical mixed effects model can show the correlationof variables in the same level by establishing a model for regression coefficients.Due to the complexity of the random part in this model, seeking an effectivemethod to estimate the covariance matrix is an appealing issue. Iterative generalizedleast squares estimation method was proposed by Goldstein in 1986 and wasapplied in special case of hierarchical model. In this paper, we extend themethod to the general hierarchical mixed effects model, derive its expressions indetail and apply it to economic examples.
基金supported by the USDA NIFA Hatch project(accession no.1025032)USDA NIFA Specialty Crop Research Initiative(award no.2020-51181-32197)+4 种基金the McIntire-Stennis award(accession 1027551)from the United States Department of Agriculture Institute of Food and AgricultureCornell Institute of Digital Agriculture Research Innovation FundBeijing Municipal Natural Science Foundation(grant no.1232019)National Natural Science Foundation of China(grant no.12101606)Renmin University of China Research Fund Program for Young Scholars.
文摘The architecture of apple trees plays a pivotal role in shaping their growth and fruit-bearing potential,forming the foundation for precision apple management.Traditionally,2D imaging technologies were employed to delineate the architectural traits of apple trees,but their accuracy was hampered by occlusion and perspective ambiguities.This study aimed to surmount these constraints by devising a 3D geometry-based processing pipeline for apple tree structure segmentation and architectural trait characterization,utilizing point clouds collected by a terrestrial laser scanner(TLS).The pipeline consisted of four modules:(a)data preprocessing module,(b)tree instance segmentation module,(c)tree structure segmentation module,and(d)architectural trait extraction module.The developed pipeline was used to analyze 84 trees of two representative apple cultivars,characterizing architectural traits such as tree height,trunk diameter,branch count,branch diameter,and branch angle.Experimental results indicated that the established pipeline attained an R^(2)of 0.92 and 0.83,and a mean absolute error(MAE)of 6.1cm and 4.71mm for tree height and trunk diameter at the tree level,respectively.Additionally,at the branch level,it achieved an R^(2)of 0.77 and 0.69,and a MAE of 6.86 mm and 7.48°for branch diameter and angle,respectively.The accurate measurement of these architectural traits can enable precision management in high-density apple orchards and bolster phenotyping endeavors in breeding programs.Moreover,bottlenecks of 3D tree characterization in general were comprehensively analyzed to reveal future development.
基金supported by the National Natural Science Foundation of China under Grant Nos. 12131001and 12226343the MOE Project of Key Research Institute of Humanities and Social Sciences under Grant No.22JJD110001the National Ten Thousand Talents Program of China。
文摘Strong orthogonal arrays(SOAs) were recently introduced and studied as a class of spacefilling designs for computer experiments. To surely realize better space-filling properties, SOAs of strength three or higher are desirable. In addition, orthogonality is also an important property for designs of computer experiments, because it guarantees that the estimates of the main effects are uncorrelated. This paper first provides a systematic study on the construction of(nearly) orthogonal strength-three SOAs with better space-filling properties. The newly proposed strength-three SOAs enjoy almost the same space-filling properties of strength-four SOAs, and can accommodate much more columns than the latter. Moreover, they are(nearly) orthogonal and flexible in run sizes. The construction methods are straightforward to implement, and their theoretical supports are well established. In addition to the theoretical results, many designs are tabulated for practical needs.
基金Supported by the Fundamental Research Funds for the Central Universities,the Research Funds of Renmin University of China(Grant No.22XNL016)。
文摘The first passage time has many applications in fields like finance,econometrics,statistics,and biology.However,explicit formulas for the first passage density have only been obtained for a few cases.This paper derives an explicit formula for the first passage density of Brownian motion with twosided piecewise continuous boundaries which may have some points of discontinuity.Approximations are used to obtain a simplified formula for estimating the first passage density.Moreover,the results are also generalized to the case of two-sided general nonlinear boundaries.Simulations can be easily carried out with Monte Carlo method and it is demonstrated for several typical two-sided boundaries that the proposed approximation method offers a highly accurate approximation of first passage density.
基金supported by the National Key R&D Program of China(Grant No.2022YFA1003803)National Social Science Foundation of China(Grant No.21BTJ048)+3 种基金National Natural Science Foundation of China(Grant Nos.12371276 and 12131006)supported by the National Key R&D Program of China(Grant No.2023YFA1008700)National Social Science Foundation of China(Grant No.24BTJ066)National Natural Science Foundation of China(Grant No.12171033)。
文摘Principal component analysis(PCA)is ubiquitous in statistics and machine learning domains.It is frequently used as an intermediate procedure in various regression and classification problems to reduce the dimensionality of datasets.However,as the size of datasets becomes extremely large,direct application of PCA may not be feasible since loading and storing massive datasets may exceed the computational ability of common machines.To address this problem,subsampling is usually performed,in which a small proportion of the data is used as a surrogate of the entire dataset.This paper proposes an A-optimal subsampling algorithm to decrease the computational cost of PCA for super-large datasets.To be more specific,we establish the consistency and asymptotic normality of the eigenvectors of the subsampled covariance matrix.Subsequently,we derive the optimal subsampling probabilities for PCA based on the A-optimality criterion.We validate the theoretical results by conducting extensive simulation studies.Moreover,the proposed subsampling algorithm for PCA is embedded into a classification procedure for handwriting data to assess its effectiveness in real-world applications.