In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observ...In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.展开更多
Achieving higher true positive rate when decreasing false positive rate is always a great challenge to the imbalance learning community.This work combines penalized empirical likelihood method,lower bound algorithm an...Achieving higher true positive rate when decreasing false positive rate is always a great challenge to the imbalance learning community.This work combines penalized empirical likelihood method,lower bound algorithm and Nyströmmethod and applies these techniques along with kernel method to density ratio model.The resulting classifier,density ratio classifier(DRC),is a combination of kernelization,regularization,efficient implementation and threshold moving,all of which are critical to enable DRC to be an effective and powerful method for solving difficult imbalance problems.Compared with other methods,DRC is competitive in that it is widely applicable and it is simple and easy to use without additional imbalance handling skills.In addition,the convergence rate of the estimate of log density ratio is discussed as well.And the results of numerical analysis also show that DRC outperforms other methods in AUC and G-mean score.展开更多
We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametri...We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.展开更多
基金Supported by the National Natural Science Foundation of China(No.11971433)the First Class Discipline of Zhejiang-A(Zhejiang Gongshang University-Statistics)the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development.
文摘In this paper,we consider testing the hypothesis concerning the means of two independent semicontinuous distributions whose observations are zero-inflated,characterized by a sizable number of zeros and positive observations from a continuous distribution.The continuous parts of the two semicontinuous distributions are assumed to follow a density ratio model.A new two-part test is developed for this kind of data.The proposed test takes the sum of one test for equality of proportions of zero values and one conditional test for the continuous distribution.The test is proved to follow a2 distribution with two degrees of freedom.Simulation studies show that the proposed test controls the type I error rates at the desired level,and is competitive to,and most of the time more powerful than two popular tests.A real data example from a dietary intervention study is used to illustrate the usefulness of the proposed test.
基金supported by National Natural Science Foundation of China(Grant No.71873128).
文摘Achieving higher true positive rate when decreasing false positive rate is always a great challenge to the imbalance learning community.This work combines penalized empirical likelihood method,lower bound algorithm and Nyströmmethod and applies these techniques along with kernel method to density ratio model.The resulting classifier,density ratio classifier(DRC),is a combination of kernelization,regularization,efficient implementation and threshold moving,all of which are critical to enable DRC to be an effective and powerful method for solving difficult imbalance problems.Compared with other methods,DRC is competitive in that it is widely applicable and it is simple and easy to use without additional imbalance handling skills.In addition,the convergence rate of the estimate of log density ratio is discussed as well.And the results of numerical analysis also show that DRC outperforms other methods in AUC and G-mean score.
基金the 11.5 Natural Scientific Plan (Grant No. 2006BAD09A04)Nanjing UniversityStart Fund (Grant No. 020822410110)
文摘We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chi-squared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.