期刊文献+
共找到199,908篇文章
< 1 2 250 >
每页显示 20 50 100
Subgroup Analysis of a Single-Index Threshold Penalty Quantile Regression Model Based on Variable Selection
1
作者 QI Hui XUE Yaxin 《Wuhan University Journal of Natural Sciences》 2025年第2期169-183,共15页
In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This... In clinical research,subgroup analysis can help identify patient groups that respond better or worse to specific treatments,improve therapeutic effect and safety,and is of great significance in precision medicine.This article considers subgroup analysis methods for longitudinal data containing multiple covariates and biomarkers.We divide subgroups based on whether a linear combination of these biomarkers exceeds a predetermined threshold,and assess the heterogeneity of treatment effects across subgroups using the interaction between subgroups and exposure variables.Quantile regression is used to better characterize the global distribution of the response variable and sparsity penalties are imposed to achieve variable selection of covariates and biomarkers.The effectiveness of our proposed methodology for both variable selection and parameter estimation is verified through random simulations.Finally,we demonstrate the application of this method by analyzing data from the PA.3 trial,further illustrating the practicality of the method proposed in this paper. 展开更多
关键词 longitudinal data subgroup analysis threshold model quantile regression variable selection
原文传递
Strong Laws of Large Numbers for Sequences of Blockwise m-Dependent and Sub-Orthogonal Random Variables under Sublinear Expectations
2
作者 Jialiang FU 《Journal of Mathematical Research with Applications》 2026年第1期103-118,共16页
In this paper,we establish some strong laws of large numbers,which are for nonindependent random variables under the framework of sublinear expectations.One of our main results is for blockwise m-dependent random vari... In this paper,we establish some strong laws of large numbers,which are for nonindependent random variables under the framework of sublinear expectations.One of our main results is for blockwise m-dependent random variables,and another is for sub-orthogonal random variables.Both extend the strong law of large numbers for independent random variables under sublinear expectations to the non-independent case. 展开更多
关键词 sublinear expectations strong law of large numbers blockwise m-dependent suborthogonal random variables
原文传递
Empirical Likelihood Based Variable Selection for Varying Coefficient Partially Linear Models with Censored Data 被引量:1
3
作者 Peixin ZHAO 《Journal of Mathematical Research with Applications》 CSCD 2013年第4期493-504,共12页
In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose a... In this paper, we consider the variable selection for the parametric components of varying coefficient partially linear models with censored data. By constructing a penalized auxiliary vector ingeniously, we propose an empirical likelihood based variable selection procedure, and show that it is consistent and satisfies the sparsity. The simulation studies show that the proposed variable selection method is workable. 展开更多
关键词 varying coefficient partially linear models empirical likelihood censored data variable selection.
原文传递
A Principal Component Analysis(PCA)-based framework for automated variable selection in geodemographic classification 被引量:5
4
作者 Yunzhe Liu Alex Singleton Daniel Arribas-Bel 《Geo-Spatial Information Science》 SCIE CSCD 2019年第4期251-264,I0003,共15页
A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their constructio... A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography.However,such representations are influenced by the methodological choices made during their construction.Of particular debate are the choice and specification of input variables,with the objective of identifying inputs that add value but also aim for model parsimony.Within this context,our paper introduces a principal component analysis(PCA)-based automated variable selection methodology that has the objective of identifying candidate inputs to a geodemographic classification from a collection of variables.The proposed methodology is exemplified in the context of variables from the UK 2011 Census,and its output compared to the Office for National Statistics 2011 Output Area Classification(2011 OAC).Through the implementation of the proposed methodology,the quality of the cluster assignment was improved relative to 2011 OAC,manifested by a lower total withincluster sum of square score.Across the UK,more than 70.2%of the Output Areas(OAs)occupied by the newly created classification(i.e.AVS-OAC)outperform the 2011 OAC,with particularly strong performance within Scotland and Wales. 展开更多
关键词 GEODEMOGRAPHICS variable selection UK census spatial data mining principal component analysis
原文传递
Quantitative analysis of the content of nitrogen and sulfur in coal based on laserinduced breakdown spectroscopy: effects of variable selection 被引量:6
5
作者 Fan DENG Yu DING +2 位作者 Yujuan CHEN Shaonong ZHU Feifan CHEN 《Plasma Science and Technology》 SCIE EI CAS CSCD 2020年第7期36-43,共8页
Coal is a crucial fossil energy in today’s society,and the detection of sulfir(S) and nitrogen(N)in coal is essential for the evaluation of coal quality.Therefore,an efficient method is needed to quantitatively analy... Coal is a crucial fossil energy in today’s society,and the detection of sulfir(S) and nitrogen(N)in coal is essential for the evaluation of coal quality.Therefore,an efficient method is needed to quantitatively analyze N and S content in coal,to achieve the purpose of clean utilization of coal.This study applied laser-induced breakdown spectroscopy(LIBS) to test coal quality,and combined two variable selection algorithms,competitive adaptive reweighted sampling(CARS) and the successive projections algorithm(SPA),to establish the corresponding partial least square(PLS) model.The results of the experiment were as follows.The PLS modeled with the full spectrum of 27,620 variables has poor accuracy,the coefficient of determination of the test set(R^2 P) and root mean square error of the test set(RMSEP) of nitrogen were 0.5172 and 0.2263,respectively,and those of sulfur were0.5784 and 0.5811,respectively.The CARS-PLS screened 37 and 25 variables respectively in the detection of N and S elements,but the prediction ability of the model did not improve significantly.SPA-PLS finally screened 14 and 11 variables respectively through successive projections,and obtained the best prediction effect among the three methods.The R^2 P and RMSEP of nitrogen were0.9873 and 0.0208,respectively,and those of sulfur were 0.9451 and 0.2082,respectively.In general,the predictive results of the two elements increased by about 90% for RMSEP and 60% for R2 P compared with PLS.The results show that LIBS combined with SPA-PLS has good potential for detecting N and S content in coal,and is a very promising technology for industrial application. 展开更多
关键词 variable selection LIBS COAL CARS and SPA
在线阅读 下载PDF
Incorporating empirical knowledge into data-driven variable selection for quantitative analysis of coal ash content by laser-induced breakdown spectroscopy 被引量:1
6
作者 吕一涵 宋惟然 +1 位作者 侯宗余 王哲 《Plasma Science and Technology》 SCIE EI CAS CSCD 2024年第7期148-156,共9页
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a... Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification. 展开更多
关键词 laser-induced breakdown spectroscopy(LIBS) coal ash content quantitative analysis variable selection empirical knowledge partial least squares regression(PLSR)
在线阅读 下载PDF
Fuzzy identification of nonlinear dynamic system based on selection of important input variables 被引量:1
7
作者 LYU Jinfeng LIU Fucai REN Yaxue 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第3期737-747,共11页
Input variables selection(IVS) is proved to be pivotal in nonlinear dynamic system modeling. In order to optimize the model of the nonlinear dynamic system, a fuzzy modeling method for determining the premise structur... Input variables selection(IVS) is proved to be pivotal in nonlinear dynamic system modeling. In order to optimize the model of the nonlinear dynamic system, a fuzzy modeling method for determining the premise structure by selecting important inputs of the system is studied. Firstly, a simplified two stage fuzzy curves method is proposed, which is employed to sort all possible inputs by their relevance with outputs, select the important input variables of the system and identify the structure.Secondly, in order to reduce the complexity of the model, the standard fuzzy c-means clustering algorithm and the recursive least squares algorithm are used to identify the premise parameters and conclusion parameters, respectively. Then, the effectiveness of IVS is verified by two well-known issues. Finally, the proposed identification method is applied to a realistic variable load pneumatic system. The simulation experiments indi cate that the IVS method in this paper has a positive influence on the approximation performance of the Takagi-Sugeno(T-S) fuzzy modeling. 展开更多
关键词 Takagi-Sugeno(T-S)fuzzy modeling input variable selection(IVS) fuzzy identification fuzzy c-means clustering algorithm
在线阅读 下载PDF
Variable Selection of Partially Linear Single-index Models 被引量:1
8
作者 L U Yi-qiang HU Bin 《Chinese Quarterly Journal of Mathematics》 CSCD 2014年第3期392-399,共8页
In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average varianc... In this article, we study the variable selection of partially linear single-index model(PLSIM). Based on the minimized average variance estimation, the variable selection of PLSIM is done by minimizing average variance with adaptive l1 penalty. Implementation algorithm is given. Under some regular conditions, we demonstrate the oracle properties of aLASSO procedure for PLSIM. Simulations are used to investigate the effectiveness of the proposed method for variable selection of PLSIM. 展开更多
关键词 variable selection adaptive LASSO minimized average variance estimation(MAVE) partially linear single-index model
在线阅读 下载PDF
Variable selection-based SPC procedures for high-dimensional multistage processes 被引量:2
9
作者 KIM Sangahn 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第1期144-153,共10页
Monitoring high-dimensional multistage processes becomes crucial to ensure the quality of the final product in modern industry environments. Few statistical process monitoring(SPC) approaches for monitoring and contro... Monitoring high-dimensional multistage processes becomes crucial to ensure the quality of the final product in modern industry environments. Few statistical process monitoring(SPC) approaches for monitoring and controlling quality in highdimensional multistage processes are studied. We propose a deviance residual-based multivariate exponentially weighted moving average(MEWMA) control chart with a variable selection procedure. We demonstrate that it outperforms the existing multivariate SPC charts in terms of out-of-control average run length(ARL) for the detection of process mean shift. 展开更多
关键词 diagnosis procedure deviance RESIDUAL fault identification MODEL-BASED control CHART MULTISTAGE process monitoring variable selection.
在线阅读 下载PDF
Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited 被引量:3
10
作者 Hans C. van Houwelingen Willi Sauerbrei 《Open Journal of Statistics》 2013年第2期79-102,共24页
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues.... In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis. 展开更多
关键词 Cross-Validation LASSO SHRINKAGE SIMULATION STUDY variable selection
暂未订购
Spectroscopic Multicomponent Analysis Using Multi-objective Optimization for Variable Selection 被引量:1
11
作者 Anderson da Silva Soares Telma Woerle de Lima +3 位作者 Daniel Vitor de LuPcena Rogerio Lopes Salvini GustavoTeodoro Laureano Clarimar Jose Coelho 《Computer Technology and Application》 2013年第9期466-475,共10页
The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. The... The multiple determination tasks of chemical properties are a classical problem in analytical chemistry. The major problem is concerned in to find the best subset of variables that better represents the compounds. These variables are obtained by a spectrophotometer device. This device measures hundreds of correlated variables related with physicocbemical properties and that can be used to estimate the component of interest. The problem is the selection of a subset of informative and uncorrelated variables that help the minimization of prediction error. Classical algorithms select a subset of variables for each compound considered. In this work we propose the use of the SPEA-II (strength Pareto evolutionary algorithm II). We would like to show that the variable selection algorithm can selected just one subset used for multiple determinations using multiple linear regressions. For the case study is used wheat data obtained by NIR (near-infrared spectroscopy) spectrometry where the objective is the determination of a variable subgroup with information about E protein content (%), test weight (Kg/HI), WKT (wheat kernel texture) (%) and farinograph water absorption (%). The results of traditional techniques of multivariate calibration as the SPA (successive projections algorithm), PLS (partial least square) and mono-objective genetic algorithm are presents for comparisons. For NIR spectral analysis of protein concentration on wheat, the number of variables selected from 775 spectral variables was reduced for just 10 in the SPEA-II algorithm. The prediction error decreased from 0.2 in the classical methods to 0.09 in proposed approach, a reduction of 37%. The model using variables selected by SPEA-II had better prediction performance than classical algorithms and full-spectrum partial least-squares. 展开更多
关键词 Multi-objective algorithms variable selection linear regression.
在线阅读 下载PDF
Logistic and SVM Credit Score Models Based on Lasso Variable Selection 被引量:2
12
作者 Qingqing Li 《Journal of Applied Mathematics and Physics》 2019年第5期1131-1148,共18页
There are many factors influencing personal credit. We introduce Lasso technique to personal credit evaluation, and establish Lasso-logistic, Lasso-SVM and Group lasso-logistic models respectively. Variable selection ... There are many factors influencing personal credit. We introduce Lasso technique to personal credit evaluation, and establish Lasso-logistic, Lasso-SVM and Group lasso-logistic models respectively. Variable selection and parameter estimation are also conducted simultaneously. Based on the personal credit data set from a certain lending platform, it can be concluded through experiments that compared with the full-variable Logistic model and the stepwise Logistic model, the variable selection ability of Group lasso-logistic model was the strongest, followed by Lasso-logistic and Lasso-SVM respectively. All three models based on Lasso variable selection have better filtering capability than stepwise selection. In the meantime, the Group lasso-logistic model can eliminate or retain relevant virtual variables as a group to facilitate model interpretation. In terms of prediction accuracy, Lasso-SVM had the highest prediction accuracy for default users in the training set, while in the test set, Group lasso-logistic had the best classification accuracy for default users. Whether in the training set or in the test set, the Lasso-logistic model has the best classification accuracy for non-default users. The model based on Lasso variable selection can also better screen out the key factors influencing personal credit risk. 展开更多
关键词 CREDIT Evaluation LOGISTIC ALGORITHM SVM ALGORITHM Lasso variable selection
暂未订购
Joint Variable Selection of Mean-Covariance Model for Longitudinal Data 被引量:2
13
作者 Dengke Xu Zhongzhan Zhang Liucang Wu 《Open Journal of Statistics》 2013年第1期27-35,共9页
In this paper we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance m... In this paper we reparameterize covariance structures in longitudinal data analysis through the modified Cholesky decomposition of itself. Based on this modified Cholesky decomposition, the within-subject covariance matrix is decomposed into a unit lower triangular matrix involving moving average coefficients and a diagonal matrix involving innovation variances, which are modeled as linear functions of covariates. Then, we propose a penalized maximum likelihood method for variable selection in joint mean and covariance models based on this decomposition. Under certain regularity conditions, we establish the consistency and asymptotic normality of the penalized maximum likelihood estimators of parameters in the models. Simulation studies are undertaken to assess the finite sample performance of the proposed variable selection procedure. 展开更多
关键词 JOINT Mean and COVARIANCE Models variable selection Cholesky DECOMPOSITION Longitudinal Data Penalized MAXIMUM LIKELIHOOD Method
在线阅读 下载PDF
Spatial Prediction of Soil Salinity in a Semiarid Oasis: Environmental Sensitive Variable Selection and Model Comparison 被引量:3
14
作者 LI Zhen LI Yong +4 位作者 XING An ZHUO Zhiqing ZHANG Shiwen ZHANG Yuanpei HUANG Yuanfang 《Chinese Geographical Science》 SCIE CSCD 2019年第5期784-797,共14页
Timely monitoring and early warning of soil salinity are crucial for saline soil management. Environmental variables are commonly used to build soil salinity prediction model. However, few researches have been done to... Timely monitoring and early warning of soil salinity are crucial for saline soil management. Environmental variables are commonly used to build soil salinity prediction model. However, few researches have been done to summarize the environmental sensitive variables for soil electrical conductivity(EC) estimation systematically. Additionally, the performance of Multiple Linear Regression(MLR), Geographically Weighted Regression(GWR), and Random Forest regression(RFR) model, the representative of current main methods for soil EC prediction, has not been explored. Taking the north of Yinchuan plain irrigation oasis as the study area, the feasibility and potential of 64 environmental variables, extracted from the Landsat 8 remote sensed images in dry season and wet season, the digital elevation model, and other data, were assessed through the correlation analysis and the performance of MLR, GWR, and RFR model on soil salinity estimation was compared. The results showed that: 1) 10 of 15 imagery texture and spectral band reflectivity environmental variables extracted from Landsat 8 image in dry season were significantly correlated with soil EC, while only 3 of these indices extracted from Landsat 8 image in wet season have significant correlation with soil EC. Channel network base level, one of the terrain attributes, had the largest absolute correlation coefficient of 0.47 and all spatial location factors had significant correlation with soil EC. 2) Prediction accuracy of RFR model was slightly higher than that of the GWR model, while MLR model produced the largest error. 3) In general, the soil salinization level in the study area gradually increased from south to north. In conclusion, the remote sensed imagery scanned in dry season was more suitable for soil EC estimation, and topographic factors and spatial location also play a key role. This study can contribute to the research on model construction and variables selection for soil salinity estimation in arid and semiarid regions. 展开更多
关键词 soil SALINITY ENVIRONMENTAL variable random forest regression GEOGRAPHIC weighted regression Yinchuan PLAIN irrigation OASIS
在线阅读 下载PDF
Variable Selection via Biased Estimators in the Linear Regression Model 被引量:1
15
作者 Manickavasagar Kayanan Pushpakanthie Wijekoon 《Open Journal of Statistics》 2020年第1期113-126,共14页
Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates havi... Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms,?which are combined LASSO with r-k class estimator and r-d class estimator,?outperformed other algorithms under the moderated and severe multicollinearity. 展开更多
关键词 variable selection Least ABSOLUTE SHRINKAGE and selection OPERATOR (LASSO) Least Angle Regression (LARS) Elastic Net (ENet) Biased ESTIMATORS
在线阅读 下载PDF
Automatic Variable Selection for High-Dimensional Linear Models with Longitudinal Data 被引量:1
16
作者 Ruiqin Tian Liugen Xue 《Open Journal of Statistics》 2014年第1期38-48,共11页
High-dimensional longitudinal data arise frequently in biomedical and genomic research. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We cons... High-dimensional longitudinal data arise frequently in biomedical and genomic research. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We consider the problem of variable selection in high-dimensional linear models with longitudinal data. A new variable selection procedure is proposed using the smooth-threshold generalized estimating equation and quadratic inference functions (SGEE-QIF) to incorporate correlation information. The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero, and simultaneously estimates the nonzero regression coefficients by solving the SGEE-QIF. The proposed procedure avoids the convex optimization problem and is flexible and easy to implement. We establish the asymptotic properties in a high-dimensional framework where the number of covariates increases as the number of cluster increases. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedure. 展开更多
关键词 variable selection Diverging Number of Parameters Longitudinal Data QUADRATIC INFERENCE FUNCTIONS Generalized ESTIMATING EQUATION
暂未订购
Fast Variable Selection by Block Addition and Block Deletion 被引量:1
17
作者 Takashi Nagatani Seiichi Ozawa Shigeo Abe 《Journal of Intelligent Learning Systems and Applications》 2010年第4期200-211,共12页
We propose the threshold updating method for terminating variable selection and two variable selection methods. In the threshold updating method, we update the threshold value when the approximation error smaller than... We propose the threshold updating method for terminating variable selection and two variable selection methods. In the threshold updating method, we update the threshold value when the approximation error smaller than the current threshold value is obtained. The first variable selection method is the combination of forward selection by block addi-tion and backward selection by block deletion. In this method, starting from the empty set of the input variables, we add several input variables at a time until the approximation error is below the threshold value. Then we search deletable variables by block deletion. The second method is the combination of the first method and variable selection by Linear Programming Support Vector Regressors (LPSVRs). By training an LPSVR with linear kernels, we evaluate the weights of the decision function and delete the input variables whose associated absolute weights are zero. Then we carry out block addition and block deletion. By computer experiments using benchmark data sets, we show that the proposed methods can perform faster variable selection than the method only using block deletion, and that by the threshold updating method, the approximation error is lower than that by the fixed threshold method. We also compare our method with an imbedded method, which determines the optimal variables during training, and show that our method gives comparable or better variable selection performance. 展开更多
关键词 Backward selection Forward selection Least SQUARES SUPPORT VECTOR MACHINES Linear Programming SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES variable selection
暂未订购
Variable Selection in Randomized Block Design Experiment 被引量:1
18
作者 Sadiah Mohammed Aljeddani 《American Journal of Computational Mathematics》 2022年第2期216-231,共16页
In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accu... In the experimental field, researchers need very often to select the best subset model as well as reach the best model estimation simultaneously. Selecting the best subset of variables will improve the prediction accuracy as noninformative variables will be removed. Having a model with high prediction accuracy allows the researchers to use the model for future forecasting. In this paper, we investigate the differences between various variable selection methods. The aim is to compare the analysis of the frequentist methodology (the backward elimination), penalised shrinkage method (the Adaptive LASSO) and the Least Angle Regression (LARS) for selecting the active variables for data produced by the blocked design experiment. The result of the comparative study supports the utilization of the LARS method for statistical analysis of data from blocked experiments. 展开更多
关键词 variable selection Shrinkage Methods Linear Mixed Model Blocked Designs
在线阅读 下载PDF
Bayesian Variable Selection for Mixture Process Variable Design Experiment 被引量:1
19
作者 Sadiah M. A. Aljeddani 《Open Journal of Modelling and Simulation》 2022年第4期391-416,共26页
This paper discussed Bayesian variable selection methods for models from split-plot mixture designs using samples from Metropolis-Hastings within the Gibbs sampling algorithm. Bayesian variable selection is easy to im... This paper discussed Bayesian variable selection methods for models from split-plot mixture designs using samples from Metropolis-Hastings within the Gibbs sampling algorithm. Bayesian variable selection is easy to implement due to the improvement in computing via MCMC sampling. We described the Bayesian methodology by introducing the Bayesian framework, and explaining Markov Chain Monte Carlo (MCMC) sampling. The Metropolis-Hastings within Gibbs sampling was used to draw dependent samples from the full conditional distributions which were explained. In mixture experiments with process variables, the response depends not only on the proportions of the mixture components but also on the effects of the process variables. In many such mixture-process variable experiments, constraints such as time or cost prohibit the selection of treatments completely at random. In these situations, restrictions on the randomisation force the level combinations of one group of factors to be fixed and the combinations of the other group of factors are run. Then a new level of the first-factor group is set and combinations of the other factors are run. We discussed the computational algorithm for the Stochastic Search Variable Selection (SSVS) in linear mixed models. We extended the computational algorithm of SSVS to fit models from split-plot mixture design by introducing the algorithm of the Stochastic Search Variable Selection for Split-plot Design (SSVS-SPD). The motivation of this extension is that we have two different levels of the experimental units, one for the whole plots and the other for subplots in the split-plot mixture design. 展开更多
关键词 variable selection Bayesian Analysis Mixture Experiment Split-Plot Design
在线阅读 下载PDF
VSOLassoBag:a variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research 被引量:4
20
作者 Jiaqi Liang Chaoye Wang +6 位作者 Di Zhang Yubin Xie Yanru Zeng Tianqin Li Zhixiang Zuo Jian Ren Qi Zhao 《Journal of Genetics and Genomics》 SCIE CAS CSCD 2023年第3期151-162,共12页
Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretabilit... Screening biomolecular markers from high-dimensional biological data is one of the long-standing tasks for biomedical translational research.With its advantages in both feature shrinkage and biological interpretability,Least Absolute Shrinkage and Selection Operator(LASSO)algorithm is one of the most popular methods for the scenarios of clinical biomarker development.However,in practice,applying LASSO on omics-based data with high dimensions and low-sample size may usually result in an excess number of predictive variables,leading to the overfitting of the model.Here,we present VSOLassoBag,a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient and stable variables with high confidence from omics-based data.Using a bagging strategy in combination with a parametric method or inflection point search method,VSOLassoBag can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates.The application of VSOLassoBag on both simulation datasets and real-world datasets shows that the algorithm can effectively identify markers for either case-control binary classification or prognosis prediction.In addition,by comparing with multiple existing algorithms,VSOLassoBag shows a comparable performance under different scenarios while resulting in fewer features than others.In summary,VSOLassoBag,which is available at https://seqworld.com/VSOLassoBag/under the GPL v3 license,provides an alternative strategy for selecting reliable biomarkers from high-dimensional omics data.For user’s convenience,we implement VSOLassoBag as an R package that provides multithreading computing configurations. 展开更多
关键词 Feature selection LASSO bagging algorithm Biomarker discovery Omics data
原文传递
上一页 1 2 250 下一页 到第
使用帮助 返回顶部