A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense t...A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense that the estimators in a certain class have the same expectation as the mean survival time. The estimators have good properties such as strong consistency (with the rate of O(n^-1/1 (log log n)^1/2)) and asymptotic normality. The application to linear regression is considered and the simulation reports are given.展开更多
Based on left truncated and right censored dependent data, the estimators of higher derivatives of density function and hazard rate function are given by kernel smoothing method. When observed data exhibit α-mixing d...Based on left truncated and right censored dependent data, the estimators of higher derivatives of density function and hazard rate function are given by kernel smoothing method. When observed data exhibit α-mixing dependence, local properties including strong consistency and law of iterated logarithm are presented. Moreover, when the mode estimator is defined as the random variable that maximizes the kernel density estimator, the asymptotic normality of the mode estimator is established.展开更多
An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical...An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.展开更多
This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author als...This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.展开更多
Consider tile partial linear model Y=Xβ+ g(T) + e. Wilers Y is at risk of being censored from the right, g is an unknown smoothing function on [0,1], β is a 1-dimensional parameter to be estimated and e is an unobse...Consider tile partial linear model Y=Xβ+ g(T) + e. Wilers Y is at risk of being censored from the right, g is an unknown smoothing function on [0,1], β is a 1-dimensional parameter to be estimated and e is an unobserved error. In Ref[1,2], it wes proved that the estimator for the asymptotic variance of βn(βn) is consistent. In this paper, we establish the limit distribution and the law of the iterated logarithm for,En, and obtain the convergest rates for En and the strong uniform convergent rates for gn(gn).展开更多
Consider a semiparametric regression model Y_i=X_iβ+g(t_i)+e_i, 1 ≤ i ≤ n, where Y_i is censored on the right by another random variable C_i with known or unknown distribution G. The wavelet estimators of param...Consider a semiparametric regression model Y_i=X_iβ+g(t_i)+e_i, 1 ≤ i ≤ n, where Y_i is censored on the right by another random variable C_i with known or unknown distribution G. The wavelet estimators of parameter and nonparametric part are given by the wavelet smoothing and the synthetic data methods. Under general conditions, the asymptotic normality for the wavelet estimators and the convergence rates for the wavelet estimators of nonparametric components are investigated. A numerical example is given.展开更多
Exponentiated Generalized Weibull distribution is a probability distribution which generalizes the Weibull distribution introducing two more shapes parameters to best adjust the non-monotonic shape. The parameters of ...Exponentiated Generalized Weibull distribution is a probability distribution which generalizes the Weibull distribution introducing two more shapes parameters to best adjust the non-monotonic shape. The parameters of the new probability distribution function are estimated by the maximum likelihood method under progressive type II censored data via expectation maximization algorithm.展开更多
In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose a...In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose an empirical likelihood based variable selection procedure, and show that it is consistent and satisfies the sparsity. The simulation studies show that the proposed variable selection method is workable.展开更多
Type-I censoring mechanism arises when the number of units experiencing the event is random but the total duration of the study is fixed. There are a number of mathematical approaches developed to handle this type of ...Type-I censoring mechanism arises when the number of units experiencing the event is random but the total duration of the study is fixed. There are a number of mathematical approaches developed to handle this type of data. The purpose of the research was to estimate the three parameters of the Frechet distribution via the frequentist Maximum Likelihood and the Bayesian Estimators. In this paper, the maximum likelihood method (MLE) is not available of the three parameters in the closed forms;therefore, it was solved by the numerical methods. Similarly, the Bayesian estimators are implemented using Jeffreys and gamma priors with two loss functions, which are: squared error loss function and Linear Exponential Loss Function (LINEX). The parameters of the Frechet distribution via Bayesian cannot be obtained analytically and therefore Markov Chain Monte Carlo is used, where the full conditional distribution for the three parameters is obtained via Metropolis-Hastings algorithm. Comparisons of the estimators are obtained using Mean Square Errors (MSE) to determine the best estimator of the three parameters of the Frechet distribution. The results show that the Bayesian estimation under Linear Exponential Loss Function based on Type-I censored data is a better estimator for all the parameter estimates when the value of the loss parameter is positive.展开更多
The composite quantile regression should provide estimation efficiency gain over a single quantile regression. In this paper, we extend composite quantile regression to nonparametric model with random censored data. T...The composite quantile regression should provide estimation efficiency gain over a single quantile regression. In this paper, we extend composite quantile regression to nonparametric model with random censored data. The asymptotic normality of the proposed estimator is established. The proposed methods are applied to the lung cancer data. Extensive simulations are reported, showing that the proposed method works well in practical settings.展开更多
In this paper, based on random left truncated and right censored data, the authors derive strong representations of the cumulative hazard function estimator and the product-limit estimator of the survival function. wh...In this paper, based on random left truncated and right censored data, the authors derive strong representations of the cumulative hazard function estimator and the product-limit estimator of the survival function. which are valid up to a given order statistic of the observations. A precise bound for the errors is obtained which only depends on the index of the last order statistic to be included.展开更多
This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation ...This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation to enhance its capabilities.The updated distribution is very adaptable and may be efficiently used in the modeling of survival data and dependability issues.The suggested model incorporates a hazard rate function(HRF)that may display a rising,J-shaped,or bathtub form,depending on its unique characteristics.This model includes many well-known lifespan distributions as separate sub-models.The suggested model is accompanied with a range of statistical features.The model parameters are examined using the techniques of maximum likelihood and Bayesian estimation using progressively censored data.In order to evaluate the effectiveness of these techniques,we provide a set of simulated data for testing purposes.The relevance of the newly presented model is shown via two real-world dataset applications,highlighting its superiority over other respected similar models.展开更多
A kernel density estimator is proposed when tile data are subject to censorship in multivariate case. The asymptotic normality, strong convergence and asymptotic optimal bandwidth which minimize the mean square error ...A kernel density estimator is proposed when tile data are subject to censorship in multivariate case. The asymptotic normality, strong convergence and asymptotic optimal bandwidth which minimize the mean square error of the estimator are studied.展开更多
In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the sur...In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.展开更多
Survival analysis is a critical tool for cancer research,yet handling censored data remains challenging due to supervision bias and inaccurate hazard estimates.To address these issues,we propose a simple but effective...Survival analysis is a critical tool for cancer research,yet handling censored data remains challenging due to supervision bias and inaccurate hazard estimates.To address these issues,we propose a simple but effective method termed KD,which employs knowledge distillation using uncensored data to rectify the supervision bias in censored data.This approach leverages the combined power of both rectified censored data and uncensored data to improve survival prediction accuracy.Remarkably,our KD method not only effectively harnesses censored data but also better reflects clinical reality,demonstrating its immense value in survival analysis.We applied our KD method to 19 target cancer sites using The Cancer Genome Atlas(TCGA)dataset.Our results consistently outperform traditional machine learning and deep learning-based methods across both target cancer sites and independent cancer cohorts.More importantly,our data-driven approach enables the model to extract hidden information from censored data,leading to conclusions that align more closely with clinical knowledge and scenarios.This validation of our KD method's effectiveness highlights the substantial value of rational censored data usage,providing valuable insights for cancer research and clinical decisions.All data and codes are freely available at:https://datatellstruth.github.io/.展开更多
Rime ice is an effective winter ambient air pollution accumulator.Due to its higher ion content as compared to snow it is a non-negligible contributor to atmospheric deposition fluxes with potential environmental cons...Rime ice is an effective winter ambient air pollution accumulator.Due to its higher ion content as compared to snow it is a non-negligible contributor to atmospheric deposition fluxes with potential environmental consequences,particularly in mountain regions.Here we explore spatio-temporal patterns of rime formation as a proxy for the propensity of individual sites to form rime ice.We present the recent time trends in rime ice occurrence and thickness measured by 23 professional meteorological stations in the Czech Republic in 2002–2023.In an exploratory data analysis,we found high year-to-year variability in rime occurrence and thickness at all sites.According to the annual mean number of hours with rime detected,the stations situated at the highest altitudes are significantly different(higher)from the rest of the sites.The highest rime hour and thickness records by far were observed at the LYSA station in the Beskydy(Beskid)Mts situated at the exposed mountaintop and highly elevated above the surrounding terrain.For advanced statistical modelling of rime thickness,we used two generalised additive models that account for long-term trends(potentially nonlinear),seasonal and daily variability.In an expanded model we further considered the effect of the North Atlantic Oscillation(NAO)index.All the parameters included in the models proved to be statistically significant,although the strength of their effect differed.Factors affecting the rime formation(meteorology and terrain)are strongly site-specific and identification of the significance of individual influencing factors remains a challenging task for our future research.Here,we explore a rare long-term rime record with detailed temporal resolution from multiple uniformly measured sites,which significantly enhances our understanding of rime formation.Additionally,the rime record is from a temperate zone,where rime forms only during a small part of the year.展开更多
基于动态Spike and Slab先验,结合删失时间序列的似然函数,构建了适用于删失时间序列数据的贝叶斯动态变量选择回归模型.为处理计算问题,采用了EM算法进行求解,从而能够快速获得模型参数估计与变量选择结果.通过模拟研究验证了该方法的...基于动态Spike and Slab先验,结合删失时间序列的似然函数,构建了适用于删失时间序列数据的贝叶斯动态变量选择回归模型.为处理计算问题,采用了EM算法进行求解,从而能够快速获得模型参数估计与变量选择结果.通过模拟研究验证了该方法的有效性,并将其应用于实际磷浓度数据分析中.展开更多
基金Supported by the National Natural Science Foundation of China (70171008)
文摘A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense that the estimators in a certain class have the same expectation as the mean survival time. The estimators have good properties such as strong consistency (with the rate of O(n^-1/1 (log log n)^1/2)) and asymptotic normality. The application to linear regression is considered and the simulation reports are given.
文摘Based on left truncated and right censored dependent data, the estimators of higher derivatives of density function and hazard rate function are given by kernel smoothing method. When observed data exhibit α-mixing dependence, local properties including strong consistency and law of iterated logarithm are presented. Moreover, when the mode estimator is defined as the random variable that maximizes the kernel density estimator, the asymptotic normality of the mode estimator is established.
文摘An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.
文摘This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.
文摘Consider tile partial linear model Y=Xβ+ g(T) + e. Wilers Y is at risk of being censored from the right, g is an unknown smoothing function on [0,1], β is a 1-dimensional parameter to be estimated and e is an unobserved error. In Ref[1,2], it wes proved that the estimator for the asymptotic variance of βn(βn) is consistent. In this paper, we establish the limit distribution and the law of the iterated logarithm for,En, and obtain the convergest rates for En and the strong uniform convergent rates for gn(gn).
基金Supported by the National Natural Science Foundation of China (11071022)the Key Project of Hubei Provincial Department of Education (D20092207)
文摘Consider a semiparametric regression model Y_i=X_iβ+g(t_i)+e_i, 1 ≤ i ≤ n, where Y_i is censored on the right by another random variable C_i with known or unknown distribution G. The wavelet estimators of parameter and nonparametric part are given by the wavelet smoothing and the synthetic data methods. Under general conditions, the asymptotic normality for the wavelet estimators and the convergence rates for the wavelet estimators of nonparametric components are investigated. A numerical example is given.
文摘Exponentiated Generalized Weibull distribution is a probability distribution which generalizes the Weibull distribution introducing two more shapes parameters to best adjust the non-monotonic shape. The parameters of the new probability distribution function are estimated by the maximum likelihood method under progressive type II censored data via expectation maximization algorithm.
基金Supported by the National Natural Science Foundation of China(Grant Nos.1110111911126332)+2 种基金the National Social Science Foundation of China(Grant No.11CTJ004)the Natural Science Foundation of Guangxi Province(Grant No.2010GXNSFB013051)the Philosophy and Social Sciences Foundation of Guangxi Province(Grant No.11FTJ002)
文摘In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose an empirical likelihood based variable selection procedure, and show that it is consistent and satisfies the sparsity. The simulation studies show that the proposed variable selection method is workable.
文摘Type-I censoring mechanism arises when the number of units experiencing the event is random but the total duration of the study is fixed. There are a number of mathematical approaches developed to handle this type of data. The purpose of the research was to estimate the three parameters of the Frechet distribution via the frequentist Maximum Likelihood and the Bayesian Estimators. In this paper, the maximum likelihood method (MLE) is not available of the three parameters in the closed forms;therefore, it was solved by the numerical methods. Similarly, the Bayesian estimators are implemented using Jeffreys and gamma priors with two loss functions, which are: squared error loss function and Linear Exponential Loss Function (LINEX). The parameters of the Frechet distribution via Bayesian cannot be obtained analytically and therefore Markov Chain Monte Carlo is used, where the full conditional distribution for the three parameters is obtained via Metropolis-Hastings algorithm. Comparisons of the estimators are obtained using Mean Square Errors (MSE) to determine the best estimator of the three parameters of the Frechet distribution. The results show that the Bayesian estimation under Linear Exponential Loss Function based on Type-I censored data is a better estimator for all the parameter estimates when the value of the loss parameter is positive.
文摘The composite quantile regression should provide estimation efficiency gain over a single quantile regression. In this paper, we extend composite quantile regression to nonparametric model with random censored data. The asymptotic normality of the proposed estimator is established. The proposed methods are applied to the lung cancer data. Extensive simulations are reported, showing that the proposed method works well in practical settings.
文摘In this paper, based on random left truncated and right censored data, the authors derive strong representations of the cumulative hazard function estimator and the product-limit estimator of the survival function. which are valid up to a given order statistic of the observations. A precise bound for the errors is obtained which only depends on the index of the last order statistic to be included.
基金This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(Grant Number IMSIU-RG23142).
文摘This article introduces a novel variant of the generalized linear exponential(GLE)distribution,known as the sine generalized linear exponential(SGLE)distribution.The SGLE distribution utilizes the sine transformation to enhance its capabilities.The updated distribution is very adaptable and may be efficiently used in the modeling of survival data and dependability issues.The suggested model incorporates a hazard rate function(HRF)that may display a rising,J-shaped,or bathtub form,depending on its unique characteristics.This model includes many well-known lifespan distributions as separate sub-models.The suggested model is accompanied with a range of statistical features.The model parameters are examined using the techniques of maximum likelihood and Bayesian estimation using progressively censored data.In order to evaluate the effectiveness of these techniques,we provide a set of simulated data for testing purposes.The relevance of the newly presented model is shown via two real-world dataset applications,highlighting its superiority over other respected similar models.
文摘A kernel density estimator is proposed when tile data are subject to censorship in multivariate case. The asymptotic normality, strong convergence and asymptotic optimal bandwidth which minimize the mean square error of the estimator are studied.
文摘In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.
基金supported by National Natural Science Foundation of China under Grant(62272231)National Key R&D Program of China(2021YFA1001100)+1 种基金Natural Science Foundation of Jiangsu Province of China under Grant(BK20210340)the Fundamental Research Funds for the Central Universities(4009002401).
文摘Survival analysis is a critical tool for cancer research,yet handling censored data remains challenging due to supervision bias and inaccurate hazard estimates.To address these issues,we propose a simple but effective method termed KD,which employs knowledge distillation using uncensored data to rectify the supervision bias in censored data.This approach leverages the combined power of both rectified censored data and uncensored data to improve survival prediction accuracy.Remarkably,our KD method not only effectively harnesses censored data but also better reflects clinical reality,demonstrating its immense value in survival analysis.We applied our KD method to 19 target cancer sites using The Cancer Genome Atlas(TCGA)dataset.Our results consistently outperform traditional machine learning and deep learning-based methods across both target cancer sites and independent cancer cohorts.More importantly,our data-driven approach enables the model to extract hidden information from censored data,leading to conclusions that align more closely with clinical knowledge and scenarios.This validation of our KD method's effectiveness highlights the substantial value of rational censored data usage,providing valuable insights for cancer research and clinical decisions.All data and codes are freely available at:https://datatellstruth.github.io/.
基金financially supported by the Technological Agency of the Czech Republic (TAČR), Joint Grant No SS 02030031 ARAMISby the long-term strategic development financing of the Institute of Computer Science of the Czech Academy of Sciences (RVO 67985807)
文摘Rime ice is an effective winter ambient air pollution accumulator.Due to its higher ion content as compared to snow it is a non-negligible contributor to atmospheric deposition fluxes with potential environmental consequences,particularly in mountain regions.Here we explore spatio-temporal patterns of rime formation as a proxy for the propensity of individual sites to form rime ice.We present the recent time trends in rime ice occurrence and thickness measured by 23 professional meteorological stations in the Czech Republic in 2002–2023.In an exploratory data analysis,we found high year-to-year variability in rime occurrence and thickness at all sites.According to the annual mean number of hours with rime detected,the stations situated at the highest altitudes are significantly different(higher)from the rest of the sites.The highest rime hour and thickness records by far were observed at the LYSA station in the Beskydy(Beskid)Mts situated at the exposed mountaintop and highly elevated above the surrounding terrain.For advanced statistical modelling of rime thickness,we used two generalised additive models that account for long-term trends(potentially nonlinear),seasonal and daily variability.In an expanded model we further considered the effect of the North Atlantic Oscillation(NAO)index.All the parameters included in the models proved to be statistically significant,although the strength of their effect differed.Factors affecting the rime formation(meteorology and terrain)are strongly site-specific and identification of the significance of individual influencing factors remains a challenging task for our future research.Here,we explore a rare long-term rime record with detailed temporal resolution from multiple uniformly measured sites,which significantly enhances our understanding of rime formation.Additionally,the rime record is from a temperate zone,where rime forms only during a small part of the year.
文摘众数作为密度函数的最大值点,能有效刻画数据的集中趋势且对异常值具有较强稳健性。然而,在实际应用中,观测数据常因个体失访、退出实验或研究终止等原因出现右删失现象,且数据之间往往具有相依关系。为此,针对宽相依(widely orthant dependent,WOD)这一包含独立、负相依及部分正相依结构的宽泛相依序列,在右删失机制下结合逆概率加权(inverse probability weighting,IPW)方法构造核密度估计量,并据此提出众数的非参数核估计。在紧集和Lipschitz连续等适当条件下,证明密度估计量的一致强相合性,并进一步得出众数估计量的强相合性及其收敛速度。数值模拟和实证分析结果表明,该估计方法在有限样本下表现出良好的估计性能和稳健性,验证其渐近理论性质与实际应用价值。