Date of Award

8-2022

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematical Sciences

Committee Chair/Advisor

Yuyuan Ouyang

Committee Member

Cheng Guo

Committee Member

Boshi Yang

Abstract

In this thesis, we focus on convergence performance of first-order methods to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$ at the $N$-th iteration.

In our introduction of the above research question, we first introduce the gradient descent method with constant step size $h=1/L$. The gradient descent method has a $\mathcal{O}(L^2\|x_0-x^*\|^2/\epsilon)$ convergence with respect to $\|\nabla f(x_N)\|^2$. Next we introduce Nesterov’s accelerated gradient method, which has an $\mathcal{O}(L\|x_0-x^*\|\sqrt{1/\epsilon})$ complexity in terms of $\|\nabla f(x_N)\|^2$. The convergence performance of Nesterov’s accelerated gradient method is much better than that of the gradient descent method but still not optimal. We also briefly introduce some other first order methods in the literature to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$, including a monotone convergence accelerated gradient method and a perturbed gradient method in \cite{nesterov2018lectures}. They have $\mathcal{O}(L^{2/3}\|x_0-x^*\|^{2/3}/\epsilon^{1/3})$ and $\mathcal{O}((\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})\ln(1+2L\|x_0-x^*\|/\sqrt{\epsilon}))$ complexities respectively. Those results are better than that of Nesterov’s accelerated gradient method, but the convergence performance of first order methods can still be better. Our main focus is to design a first order method for reducing the gradient norm of the objective function. Our research is closely related to \cite{kim2021optimizing}, in which a first order method is proposed with complexity of order $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$. This method is studied through the performance enhancement program (PEP) originated from \cite{drori2014performance}. In \cite{nesterov2021primal} it is pointed out that by combining the accelerated gradient method and the method in \cite{kim2021optimizing} into a two-phase optimal gradient method, one is actually able to obtain an optimal $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity.

Our new result in this thesis is a different set of parameters from \cite{kim2021optimizing} that also achieves the $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$ convergence with respect to $\|\nabla f(x_N)\|^2$. Combining with Nesterov's accelerated gradient method, we are able to derive an $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity, which is optimal among first-order methods, by the two-phase optimal gradient method.

Recommended Citation

Jiang, Yunheng, "Optimal First Order Methods for Reducing Gradient Norm in Unconstrained Convex Smooth Optimization" (2022). All Theses. 3860.
https://tigerprints.clemson.edu/all_theses/3860

Download

Included in

Other Applied Mathematics Commons

COinS

All Theses

Optimal First Order Methods for Reducing Gradient Norm in Unconstrained Convex Smooth Optimization

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Theses

Optimal First Order Methods for Reducing Gradient Norm in Unconstrained Convex Smooth Optimization

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links