Proximal gradient descent and its accelerated version are resultful methods for solving the sum of smooth and non-smooth problems. When the smooth function can be represented as a sum of multiple functions, the stocha...Proximal gradient descent and its accelerated version are resultful methods for solving the sum of smooth and non-smooth problems. When the smooth function can be represented as a sum of multiple functions, the stochastic proximal gradient method performs well. However, research on its accelerated version remains unclear. This paper proposes a proximal stochastic accelerated gradient (PSAG) method to address problems involving a combination of smooth and non-smooth components, where the smooth part corresponds to the average of multiple block sums. Simultaneously, most of convergence analyses hold in expectation. To this end, under some mind conditions, we present an almost sure convergence of unbiased gradient estimation in the non-smooth setting. Moreover, we establish that the minimum of the squared gradient mapping norm arbitrarily converges to zero with probability one.展开更多
In this paper,an accelerated proximal gradient algorithm is proposed for Hankel tensor completion problems.In our method,the iterative completion tensors generated by the new algorithm keep Hankel structure based on p...In this paper,an accelerated proximal gradient algorithm is proposed for Hankel tensor completion problems.In our method,the iterative completion tensors generated by the new algorithm keep Hankel structure based on projection on the Hankel tensor set.Moreover,due to the special properties of Hankel structure,using the fast singular value thresholding operator of the mode-s unfolding of a Hankel tensor can decrease the computational cost.Meanwhile,the convergence of the new algorithm is discussed under some reasonable conditions.Finally,the numerical experiments show the effectiveness of the proposed algorithm.展开更多
Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term.Proximal stochastic gradient methods are popular for solving such composite optimization proble...Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term.Proximal stochastic gradient methods are popular for solving such composite optimization problems.We propose a minibatch proximal stochastic recursive gradient algorithm SRG-DBB,which incorporates the diagonal Barzilai–Borwein(DBB)stepsize strategy to capture the local geometry of the problem.The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions.We further establish the linear convergence of SRGDBB under the non-strong convexity condition.Moreover,it is proved that SRG-DBB converges sublinearly in the convex case.Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes.Furthermore,SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.展开更多
Gradient descent(GD)algorithm is the widely used optimisation method in training machine learning and deep learning models.In this paper,based on GD,Polyak’s momentum(PM),and Nesterov accelerated gradient(NAG),we giv...Gradient descent(GD)algorithm is the widely used optimisation method in training machine learning and deep learning models.In this paper,based on GD,Polyak’s momentum(PM),and Nesterov accelerated gradient(NAG),we give the convergence of the algorithms from an ini-tial value to the optimal value of an objective function in simple quadratic form.Based on the convergence property of the quadratic function,two sister sequences of NAG’s iteration and par-allel tangent methods in neural networks,the three-step accelerated gradient(TAG)algorithm is proposed,which has three sequences other than two sister sequences.To illustrate the perfor-mance of this algorithm,we compare the proposed algorithm with the three other algorithms in quadratic function,high-dimensional quadratic functions,and nonquadratic function.Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning.For conveniently facilitate the proposed algorithms,we rewite the R package‘neuralnet’and extend it to‘supneuralnet’.All kinds of deep learning algorithms in this paper are included in‘supneuralnet’package.Finally,we show our algorithms are superior to other algorithms in four case studies.展开更多
文摘Proximal gradient descent and its accelerated version are resultful methods for solving the sum of smooth and non-smooth problems. When the smooth function can be represented as a sum of multiple functions, the stochastic proximal gradient method performs well. However, research on its accelerated version remains unclear. This paper proposes a proximal stochastic accelerated gradient (PSAG) method to address problems involving a combination of smooth and non-smooth components, where the smooth part corresponds to the average of multiple block sums. Simultaneously, most of convergence analyses hold in expectation. To this end, under some mind conditions, we present an almost sure convergence of unbiased gradient estimation in the non-smooth setting. Moreover, we establish that the minimum of the squared gradient mapping norm arbitrarily converges to zero with probability one.
文摘In this paper,an accelerated proximal gradient algorithm is proposed for Hankel tensor completion problems.In our method,the iterative completion tensors generated by the new algorithm keep Hankel structure based on projection on the Hankel tensor set.Moreover,due to the special properties of Hankel structure,using the fast singular value thresholding operator of the mode-s unfolding of a Hankel tensor can decrease the computational cost.Meanwhile,the convergence of the new algorithm is discussed under some reasonable conditions.Finally,the numerical experiments show the effectiveness of the proposed algorithm.
基金the National Natural Science Foundation of China(Nos.11671116,11701137,12071108,11991020,11991021 and 12021001)the Major Research Plan of the NSFC(No.91630202)+1 种基金the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDA27000000)the Natural Science Foundation of Hebei Province(No.A2021202010)。
文摘Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term.Proximal stochastic gradient methods are popular for solving such composite optimization problems.We propose a minibatch proximal stochastic recursive gradient algorithm SRG-DBB,which incorporates the diagonal Barzilai–Borwein(DBB)stepsize strategy to capture the local geometry of the problem.The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions.We further establish the linear convergence of SRGDBB under the non-strong convexity condition.Moreover,it is proved that SRG-DBB converges sublinearly in the convex case.Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes.Furthermore,SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.
基金This work was supported by National Natural Science Foun-dation of China(11271136,81530086)Program of Shanghai Subject Chief Scientist(14XD1401600)the 111 Project of China(No.B14019).
文摘Gradient descent(GD)algorithm is the widely used optimisation method in training machine learning and deep learning models.In this paper,based on GD,Polyak’s momentum(PM),and Nesterov accelerated gradient(NAG),we give the convergence of the algorithms from an ini-tial value to the optimal value of an objective function in simple quadratic form.Based on the convergence property of the quadratic function,two sister sequences of NAG’s iteration and par-allel tangent methods in neural networks,the three-step accelerated gradient(TAG)algorithm is proposed,which has three sequences other than two sister sequences.To illustrate the perfor-mance of this algorithm,we compare the proposed algorithm with the three other algorithms in quadratic function,high-dimensional quadratic functions,and nonquadratic function.Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning.For conveniently facilitate the proposed algorithms,we rewite the R package‘neuralnet’and extend it to‘supneuralnet’.All kinds of deep learning algorithms in this paper are included in‘supneuralnet’package.Finally,we show our algorithms are superior to other algorithms in four case studies.