In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
For functional data,the most popular dimension reduction methods are functional sliced inverse regression(FSIR)and functional sliced average variance estimation(FSAVE).Both FSIR and FSAVE methods are based on the slic...For functional data,the most popular dimension reduction methods are functional sliced inverse regression(FSIR)and functional sliced average variance estimation(FSAVE).Both FSIR and FSAVE methods are based on the slice approach to estimate the conditional expectation E[x(t)|y].While sliced-based methods are effective for scalar responses,they often perform poorly or even lead to failure for multivariate responses and small sample sizes as the so-called“curse of dimensionality”.To avoid this problem,this study proposes a projective resampling method that first projects the multivariate response into a scalar-response and then uses SDR method for the univariate response to estimate the effective dimension reduction space(e.d.r space).The proposed projective resampling method is insensitive to the number of slices and the dimensionality of the response variable.In theory,the proposed resampling method can fully recover the effective dimension reduction space.Furthermore,this study investigates the performance of the proposed method through simulation studies and one real data analysis and compares the proposed method with other methods.展开更多
This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regressi...This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regression. The proposed method not only keeps the merit of considering both response and predictors' information, but also enhances the capability of handling highly correlated variables. It is justified under certain linearity conditions. An empirical application on a macroeconomic data set shows that the proposed method has outperformed the dynamic factor model and other shrinkage methods.展开更多
The dimension reduction is helpful and often necessary in exploring the nonparametric regression structure.In this area,Sliced inverse regression (SIR) is a promising tool to estimate the central dimension reduction (...The dimension reduction is helpful and often necessary in exploring the nonparametric regression structure.In this area,Sliced inverse regression (SIR) is a promising tool to estimate the central dimension reduction (CDR) space.To estimate the kernel matrix of the SIR,we herein suggest the spline approximation using the least squares regression.The heteroscedasticity can be incorporated well by introducing an appropriate weight function.The root-n asymptotic normality can be achieved for a wide range choice of knots.This is essentially analogous to the kernel estimation.Moreover, we also propose a modified Bayes information criterion (BIC) based on the eigenvalues of the SIR matrix.This modified BIC can be applied to any form of the SIR and other related methods.The methodology and some of the practical issues are illustrated through the horse mussel data.Empirical studies evidence the performance of our proposed spline approximation by comparison of the existing estimators.展开更多
In order to explore the nonlinear structure hidden in high-dimensional data, some dimen-sion reduction techniques have been developed, such as the Projection Pursuit technique (PP).However, PP will involve enormous co...In order to explore the nonlinear structure hidden in high-dimensional data, some dimen-sion reduction techniques have been developed, such as the Projection Pursuit technique (PP).However, PP will involve enormous computational load. To overcome this, an inverse regressionmethod is proposed. In this paper, we discuss and develop this method. To seek the interestingprojective direction, the minimization of the residual sum of squares is used as a criterion, andspline functions are applied to approximate the general nonlinear transform function. The algo-rithm is simple, and saves the computational load. Under certain proper conditions, consistencyof the estimators of the interesting direction is shown.展开更多
Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse...Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse sliced inverse regression algorithm for the first time.Our method can simultaneously estimate the central dimension reduction subspace and perform variable selection in a federated setting.We transform this federated high-dimensional sparse sliced inverse regression problem into a convex optimization problem by constructing the covariance matrix safely and losslessly.We then use a linearized alternating direction method of multipliers algorithm to estimate the central subspace.We also give approaches of Bayesian information criterion and holdout validation to ascertain the dimension of the central subspace and the hyperparameter of the algorithm.We establish an upper bound of the statistical error rate of our estimator under the heterogeneous setting.We demonstrate the effectiveness of our method through simulations and real world applications.展开更多
Our recent progress on developments of laser-induced breakdown spectroscopy (L[BS) based equipments for on-line monitoring of pulverized coal and unburned carbon (UC) level of fly ash are reviewed. A fully softwar...Our recent progress on developments of laser-induced breakdown spectroscopy (L[BS) based equipments for on-line monitoring of pulverized coal and unburned carbon (UC) level of fly ash are reviewed. A fully software-controlled LIBS equipment comprising a self-cleaning device for on-line coal quality monitoring in power plants is developed. The system features an automated sampling device, which is capable of elemental (C, Ca, Mg, Ti, Si, H, Al, Fe, S, and organic oxygen) and proximate analysis (Qad and Aad) through optimal data processing methods. An automated prototype LIBS apparatus has been developed for possible application to power plants for on-line analysis of UC level in fly ash. New data processing methods are proposed to correct spectral interference and matrix effects, with the accuracy for UC level analysis estimated to be 0.26%.展开更多
A review of ten-year's practice in developing the improved simultaneous physical retrieval method(ISPRM)is given in the hope that some creative ideas can be drawn from it.The improvement upon the SPRM is associate...A review of ten-year's practice in developing the improved simultaneous physical retrieval method(ISPRM)is given in the hope that some creative ideas can be drawn from it.The improvement upon the SPRM is associated with the under-determinedness of this ill-posed inverse problem.In our experiment,the precondition is observed that prior information must be independent of the satellite measurements.The well-posed retrieval theory has told us that the forward process is fundamental for the retrieval,and it is the bridge between the input of satellite radiance and the output of retrievals.In order to obtain a better result from the forward process. the full advantage of every prior information available must be taken.It is necessary to turn the ill- posed inverse problem into the well-posed one.Then by using the Ridge regression or Bayes algorithm to find the optimal combination among the first guess,the theoretical analogue information and the satellite observations,the impact of the under-determinedness of this inverse problem on the numerical solution is minimized.展开更多
To estimate central dimension-reduction space in multivariate nonparametric rcgression, Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE) and Slicing Average Third-moment Estimation (SAT...To estimate central dimension-reduction space in multivariate nonparametric rcgression, Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE) and Slicing Average Third-moment Estimation (SAT) have been developed, Since slicing estimation has very different asymptotic behavior for SIR, and SAVE, the relevant study has been madc case by case, when the kernel estimators of SIH and SAVE share similar asymptotic properties. In this paper, we also investigate kernel estimation of SAT. We. prove the asymptotic normality, and show that, compared with tile existing results, the kernel Slnoothing for SIR, SAVE and SAT has very similar asymptotic behavior,展开更多
In this paper,we propose a new estimate for dimension reduction,called the weighted variance estimate(WVE),which includes Sliced Average Variance Estimate(SAVE)as a special case.Bootstrap method is used to select the ...In this paper,we propose a new estimate for dimension reduction,called the weighted variance estimate(WVE),which includes Sliced Average Variance Estimate(SAVE)as a special case.Bootstrap method is used to select the best estimate from the WVE and to estimate the structure dimension.And this selected best estimate usually performs better than the existing methods such as Sliced Inverse Regression(SIR),SAVE,etc.Many methods such as SIR,SAVE,etc.usually put the same weight on each observation to estimate central subspace(CS).By introducing a weight function,WVE puts different weights on different observations according to distance of observations from CS.The weight function makes WVE have very good performance in general and complicated situations,for example,the distribution of regressor deviating severely from elliptical distribution which is the base of many methods,such as SIR,etc.And compared with many existing methods,WVE is insensitive to the distribution of the regressor.The consistency of the WVE is established.Simulations to compare the performances of WVE with other existing methods confirm the advantage of WVE.展开更多
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金supported by the National Social Science Foundation of China under Grant No.20BTJ041。
文摘For functional data,the most popular dimension reduction methods are functional sliced inverse regression(FSIR)and functional sliced average variance estimation(FSAVE).Both FSIR and FSAVE methods are based on the slice approach to estimate the conditional expectation E[x(t)|y].While sliced-based methods are effective for scalar responses,they often perform poorly or even lead to failure for multivariate responses and small sample sizes as the so-called“curse of dimensionality”.To avoid this problem,this study proposes a projective resampling method that first projects the multivariate response into a scalar-response and then uses SDR method for the univariate response to estimate the effective dimension reduction space(e.d.r space).The proposed projective resampling method is insensitive to the number of slices and the dimensionality of the response variable.In theory,the proposed resampling method can fully recover the effective dimension reduction space.Furthermore,this study investigates the performance of the proposed method through simulation studies and one real data analysis and compares the proposed method with other methods.
基金supported by the National Science Foundation of China under Grant No.71101030the Program for Innovative Research Team in UIBE under Grant No.CXTD4-01
文摘This paper concerns the dimension reduction in regression for large data set. The authors introduce a new method based on the sliced inverse regression approach, cMled cluster-based regularized sliced inverse regression. The proposed method not only keeps the merit of considering both response and predictors' information, but also enhances the capability of handling highly correlated variables. It is justified under certain linearity conditions. An empirical application on a macroeconomic data set shows that the proposed method has outperformed the dynamic factor model and other shrinkage methods.
基金This work was supported by the special fund (2006) for selecting and training young teachers of universities in Shanghai (Grant No.79001320)an FRG grant (FRG/06-07/I-06) from Hong Kong Baptist University,Chinaa grant (HKU 7058/05P) from the Research Grants Council of Hong Kong,China
文摘The dimension reduction is helpful and often necessary in exploring the nonparametric regression structure.In this area,Sliced inverse regression (SIR) is a promising tool to estimate the central dimension reduction (CDR) space.To estimate the kernel matrix of the SIR,we herein suggest the spline approximation using the least squares regression.The heteroscedasticity can be incorporated well by introducing an appropriate weight function.The root-n asymptotic normality can be achieved for a wide range choice of knots.This is essentially analogous to the kernel estimation.Moreover, we also propose a modified Bayes information criterion (BIC) based on the eigenvalues of the SIR matrix.This modified BIC can be applied to any form of the SIR and other related methods.The methodology and some of the practical issues are illustrated through the horse mussel data.Empirical studies evidence the performance of our proposed spline approximation by comparison of the existing estimators.
基金This project is supported by the National Natural Science Foundation of China
文摘In order to explore the nonlinear structure hidden in high-dimensional data, some dimen-sion reduction techniques have been developed, such as the Projection Pursuit technique (PP).However, PP will involve enormous computational load. To overcome this, an inverse regressionmethod is proposed. In this paper, we discuss and develop this method. To seek the interestingprojective direction, the minimization of the residual sum of squares is used as a criterion, andspline functions are applied to approximate the general nonlinear transform function. The algo-rithm is simple, and saves the computational load. Under certain proper conditions, consistencyof the estimators of the interesting direction is shown.
文摘Federated learning has become a popular tool in the big data era nowadays.It trains a centralized model based on data from different clients while keeping data decentralized.In this paper,we propose a federated sparse sliced inverse regression algorithm for the first time.Our method can simultaneously estimate the central dimension reduction subspace and perform variable selection in a federated setting.We transform this federated high-dimensional sparse sliced inverse regression problem into a convex optimization problem by constructing the covariance matrix safely and losslessly.We then use a linearized alternating direction method of multipliers algorithm to estimate the central subspace.We also give approaches of Bayesian information criterion and holdout validation to ascertain the dimension of the central subspace and the hyperparameter of the algorithm.We establish an upper bound of the statistical error rate of our estimator under the heterogeneous setting.We demonstrate the effectiveness of our method through simulations and real world applications.
基金Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant Nos. 61127017, 61205216, 61275213, 61178009, 61108030, and 60978018), the National Basic Research Program (973 Program) (Grant No. 2012CB921603), International Science & Technology Cooperation Program of China (Grant No. 2001DFA12490), Major Program of the National Natural Science Foundation of China (Grant No. 10934004), NSFC Project for Excellent Research Team (Grant No. 61121064), Environmental Project of Shanxi Province (Grant No. 2011256).
文摘Our recent progress on developments of laser-induced breakdown spectroscopy (L[BS) based equipments for on-line monitoring of pulverized coal and unburned carbon (UC) level of fly ash are reviewed. A fully software-controlled LIBS equipment comprising a self-cleaning device for on-line coal quality monitoring in power plants is developed. The system features an automated sampling device, which is capable of elemental (C, Ca, Mg, Ti, Si, H, Al, Fe, S, and organic oxygen) and proximate analysis (Qad and Aad) through optimal data processing methods. An automated prototype LIBS apparatus has been developed for possible application to power plants for on-line analysis of UC level in fly ash. New data processing methods are proposed to correct spectral interference and matrix effects, with the accuracy for UC level analysis estimated to be 0.26%.
基金Supported by NNSF of China under Grant(49794030#)National"973"Program No.4 (G1998040909#).
文摘A review of ten-year's practice in developing the improved simultaneous physical retrieval method(ISPRM)is given in the hope that some creative ideas can be drawn from it.The improvement upon the SPRM is associated with the under-determinedness of this ill-posed inverse problem.In our experiment,the precondition is observed that prior information must be independent of the satellite measurements.The well-posed retrieval theory has told us that the forward process is fundamental for the retrieval,and it is the bridge between the input of satellite radiance and the output of retrievals.In order to obtain a better result from the forward process. the full advantage of every prior information available must be taken.It is necessary to turn the ill- posed inverse problem into the well-posed one.Then by using the Ridge regression or Bayes algorithm to find the optimal combination among the first guess,the theoretical analogue information and the satellite observations,the impact of the under-determinedness of this inverse problem on the numerical solution is minimized.
文摘To estimate central dimension-reduction space in multivariate nonparametric rcgression, Sliced Inverse Regression (SIR), Sliced Average Variance Estimation (SAVE) and Slicing Average Third-moment Estimation (SAT) have been developed, Since slicing estimation has very different asymptotic behavior for SIR, and SAVE, the relevant study has been madc case by case, when the kernel estimators of SIH and SAVE share similar asymptotic properties. In this paper, we also investigate kernel estimation of SAT. We. prove the asymptotic normality, and show that, compared with tile existing results, the kernel Slnoothing for SIR, SAVE and SAT has very similar asymptotic behavior,
基金supported by National Natural Science Foundation of China(Grant No.10771015)
文摘In this paper,we propose a new estimate for dimension reduction,called the weighted variance estimate(WVE),which includes Sliced Average Variance Estimate(SAVE)as a special case.Bootstrap method is used to select the best estimate from the WVE and to estimate the structure dimension.And this selected best estimate usually performs better than the existing methods such as Sliced Inverse Regression(SIR),SAVE,etc.Many methods such as SIR,SAVE,etc.usually put the same weight on each observation to estimate central subspace(CS).By introducing a weight function,WVE puts different weights on different observations according to distance of observations from CS.The weight function makes WVE have very good performance in general and complicated situations,for example,the distribution of regressor deviating severely from elliptical distribution which is the base of many methods,such as SIR,etc.And compared with many existing methods,WVE is insensitive to the distribution of the regressor.The consistency of the WVE is established.Simulations to compare the performances of WVE with other existing methods confirm the advantage of WVE.