Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection s...Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection simultaneously.However,the nonconvex penalized SVMs in general cannot be solved globally and efficiently due to their nondifferentiability,nonconvexity,and nonsmoothness.Existing solutions to the nonconvex penalized SVMs typically solve this problem in a serial fashion,which are unable to fully use the parallel computing power of modern multi-core machines.On the other hand,the fact that many real-world data are stored in a distributed manner urgently calls for a parallel and distributed solution to the nonconvex penalized SVMs.To circumvent this challenge,we propose an efficient alternating direction method of multipliers(ADMM)based algorithm that solves the nonconvex penalized SVMs in a parallel and distributed way.We design many useful techniques to decrease the computation and synchronization cost of the proposed parallel algorithm.The time complexity analysis demonstrates the low time complexity of the proposed parallel algorithm.Moreover,the convergence of the parallel algorithm is guaranteed.Experimental evaluations on four LIBSVM benchmark datasets demonstrate the efficiency of the proposed parallel algorithm.展开更多
High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumptio...High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumption of sparsity, many computational problems can be handled efficiently in practice. Structured sparse learning encodes the structural information of the variables and has been quite successful in numerous research fields. With various types of structures discovered, sorts of structured regularizations have been proposed. These regularizations have greatly improved the efficacy of sparse learning algorithms through the use of specific structural information. In this article, we present a systematic review of structured sparse learning including ideas, formulations, algorithms, and applications. We present these algorithms in the unified framework of minimizing the sum of loss and penalty functions, summarize publicly accessible software implementations, and compare the computational complexity of typical optimization methods to solve structured sparse learning problems. In experiments, we present applications in unsupervised learning, for structured signal recovery and hierarchical image reconstruction, and in supervised learning in the context of a novel graph-guided logistic regression.展开更多
The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale real-...The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale real-world problems due to a large amount of data that needs to be analyzed and the difficulty in solving problems with nonconvex nonlinear settings. We aim to minimize the composite of a smooth nonlinear function and a block-separable nonconvex function on a large number of block variables with inequality constraints. We propose a novel parallel first-order optimization method, called asynchronous block coordinate descent with time perturbation (ATP), which adopts a time perturbation technique that escapes from saddle points and sub-optimal local points. The details of the proposed method are presented with analyses of convergence and iteration complexity properties. Experiments conducted on real-world machine learning problems validate the efficacy of our proposed method. The experimental results demonstrate that time perturbation enables ATP to escape from saddle points and sub-optimal points, providing a promising way to handle nonconvex optimization problems with inequality constraints employing asynchronous block coordinate descent. The asynchronous parallel implementation on shared memory multi-core platforms indicates that the proposed algorithm, ATP, has strong scalability.展开更多
In this study, we propose and compare stochastic variants of the extra-gradient alternating direction method, named the stochastic extra-gradient alternating direction method with Lagrangian function(SEGL) and the s...In this study, we propose and compare stochastic variants of the extra-gradient alternating direction method, named the stochastic extra-gradient alternating direction method with Lagrangian function(SEGL) and the stochastic extra-gradient alternating direction method with augmented Lagrangian function(SEGAL), to minimize the graph-guided optimization problems, which are composited with two convex objective functions in large scale.A number of important applications in machine learning follow the graph-guided optimization formulation, such as linear regression, logistic regression, Lasso, structured extensions of Lasso, and structured regularized logistic regression. We conduct experiments on fused logistic regression and graph-guided regularized regression. Experimental results on several genres of datasets demonstrate that the proposed algorithm outperforms other competing algorithms, and SEGAL has better performance than SEGL in practical use.展开更多
Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable m...Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable model. Solution to the convex problem is a global minimum, and the final model can be explained mathematically. Typically, the convex problem is re-casted as a regularized risk minimization problem to prevent overfitting. The cutting plane method (CPM) is one of the best solvers for the convex problem, irrespective of whether the objective function is differentiable or not. However, CPM and its variants fail to adequately address large-scale data-intensive cases because these algorithms access the entire dataset in each iteration, which substantially increases the computational burden and memory cost. To alleviate this problem, we propose a novel algorithm named the mini-batch cutting plane method (MBCPM), which iterates with estimated cutting planes calculated on a small batch of sampled data and is capable of handling large-scale problems. Furthermore, the proposed MBCPM adopts a "sink" operation that detects and adjusts noisy estimations to guarantee convergence. Numerical experiments on extensive real-world datasets demonstrate the effectiveness of MBCPM, which is superior to the bundle methods for regularized risk minimization as well as popular stochastic gradient descent methods in terms of convergence speed.展开更多
基金Project supported by the Major State Research Development Program,China(No.2016YFB0201305)。
文摘Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection simultaneously.However,the nonconvex penalized SVMs in general cannot be solved globally and efficiently due to their nondifferentiability,nonconvexity,and nonsmoothness.Existing solutions to the nonconvex penalized SVMs typically solve this problem in a serial fashion,which are unable to fully use the parallel computing power of modern multi-core machines.On the other hand,the fact that many real-world data are stored in a distributed manner urgently calls for a parallel and distributed solution to the nonconvex penalized SVMs.To circumvent this challenge,we propose an efficient alternating direction method of multipliers(ADMM)based algorithm that solves the nonconvex penalized SVMs in a parallel and distributed way.We design many useful techniques to decrease the computation and synchronization cost of the proposed parallel algorithm.The time complexity analysis demonstrates the low time complexity of the proposed parallel algorithm.Moreover,the convergence of the parallel algorithm is guaranteed.Experimental evaluations on four LIBSVM benchmark datasets demonstrate the efficiency of the proposed parallel algorithm.
基金Project supported by the National Natural Science Foundation of China (No. 61303264)
文摘High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumption of sparsity, many computational problems can be handled efficiently in practice. Structured sparse learning encodes the structural information of the variables and has been quite successful in numerous research fields. With various types of structures discovered, sorts of structured regularizations have been proposed. These regularizations have greatly improved the efficacy of sparse learning algorithms through the use of specific structural information. In this article, we present a systematic review of structured sparse learning including ideas, formulations, algorithms, and applications. We present these algorithms in the unified framework of minimizing the sum of loss and penalty functions, summarize publicly accessible software implementations, and compare the computational complexity of typical optimization methods to solve structured sparse learning problems. In experiments, we present applications in unsupervised learning, for structured signal recovery and hierarchical image reconstruction, and in supervised learning in the context of a novel graph-guided logistic regression.
基金Project supported by the National Key R&D Program of China(No.2018YFB2101100)the National Natural Science Foundation of China(Nos.61806216 and 61702533)。
文摘The era of big data in healthcare is here, and this era will significantly improve medicine and especially oncology. However, traditional machine learning algorithms need to be promoted to solve such large-scale real-world problems due to a large amount of data that needs to be analyzed and the difficulty in solving problems with nonconvex nonlinear settings. We aim to minimize the composite of a smooth nonlinear function and a block-separable nonconvex function on a large number of block variables with inequality constraints. We propose a novel parallel first-order optimization method, called asynchronous block coordinate descent with time perturbation (ATP), which adopts a time perturbation technique that escapes from saddle points and sub-optimal local points. The details of the proposed method are presented with analyses of convergence and iteration complexity properties. Experiments conducted on real-world machine learning problems validate the efficacy of our proposed method. The experimental results demonstrate that time perturbation enables ATP to escape from saddle points and sub-optimal points, providing a promising way to handle nonconvex optimization problems with inequality constraints employing asynchronous block coordinate descent. The asynchronous parallel implementation on shared memory multi-core platforms indicates that the proposed algorithm, ATP, has strong scalability.
基金supported by the National Natural Science Foundation of China(No.61303264)the National Key Research and Development Program of China(No.2016YFB1000401)
文摘In this study, we propose and compare stochastic variants of the extra-gradient alternating direction method, named the stochastic extra-gradient alternating direction method with Lagrangian function(SEGL) and the stochastic extra-gradient alternating direction method with augmented Lagrangian function(SEGAL), to minimize the graph-guided optimization problems, which are composited with two convex objective functions in large scale.A number of important applications in machine learning follow the graph-guided optimization formulation, such as linear regression, logistic regression, Lasso, structured extensions of Lasso, and structured regularized logistic regression. We conduct experiments on fused logistic regression and graph-guided regularized regression. Experimental results on several genres of datasets demonstrate that the proposed algorithm outperforms other competing algorithms, and SEGAL has better performance than SEGL in practical use.
基金Project supported by the National Key R&D Program of China(No.2018YFB0204300)the National Natural Science Foundation of China(Nos.61872376 and 61806216)。
文摘Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable model. Solution to the convex problem is a global minimum, and the final model can be explained mathematically. Typically, the convex problem is re-casted as a regularized risk minimization problem to prevent overfitting. The cutting plane method (CPM) is one of the best solvers for the convex problem, irrespective of whether the objective function is differentiable or not. However, CPM and its variants fail to adequately address large-scale data-intensive cases because these algorithms access the entire dataset in each iteration, which substantially increases the computational burden and memory cost. To alleviate this problem, we propose a novel algorithm named the mini-batch cutting plane method (MBCPM), which iterates with estimated cutting planes calculated on a small batch of sampled data and is capable of handling large-scale problems. Furthermore, the proposed MBCPM adopts a "sink" operation that detects and adjusts noisy estimations to guarantee convergence. Numerical experiments on extensive real-world datasets demonstrate the effectiveness of MBCPM, which is superior to the bundle methods for regularized risk minimization as well as popular stochastic gradient descent methods in terms of convergence speed.