#### Title

Optimal First Order Methods for Reducing Gradient Norm in Unconstrained Convex Smooth Optimization

#### Date of Award

8-2022

#### Document Type

Thesis

#### Degree Name

Master of Science (MS)

#### Department

Mathematical Sciences

#### Committee Chair/Advisor

Yuyuan Ouyang

#### Committee Member

Cheng Guo

#### Committee Member

Boshi Yang

#### Abstract

In this thesis, we focus on convergence performance of first-order methods to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$ at the $N$-th iteration.

In our introduction of the above research question, we first introduce the gradient descent method with constant step size $h=1/L$. The gradient descent method has a $\mathcal{O}(L^2\|x_0-x^*\|^2/\epsilon)$ convergence with respect to $\|\nabla f(x_N)\|^2$. Next we introduce Nesterov’s accelerated gradient method, which has an $\mathcal{O}(L\|x_0-x^*\|\sqrt{1/\epsilon})$ complexity in terms of $\|\nabla f(x_N)\|^2$. The convergence performance of Nesterov’s accelerated gradient method is much better than that of the gradient descent method but still not optimal. We also briefly introduce some other first order methods in the literature to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$, including a monotone convergence accelerated gradient method and a perturbed gradient method in \cite{nesterov2018lectures}. They have $\mathcal{O}(L^{2/3}\|x_0-x^*\|^{2/3}/\epsilon^{1/3})$ and $\mathcal{O}((\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})\ln(1+2L\|x_0-x^*\|/\sqrt{\epsilon}))$ complexities respectively. Those results are better than that of Nesterov’s accelerated gradient method, but the convergence performance of first order methods can still be better. Our main focus is to design a first order method for reducing the gradient norm of the objective function. Our research is closely related to \cite{kim2021optimizing}, in which a first order method is proposed with complexity of order $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$. This method is studied through the performance enhancement program (PEP) originated from \cite{drori2014performance}. In \cite{nesterov2021primal} it is pointed out that by combining the accelerated gradient method and the method in \cite{kim2021optimizing} into a two-phase optimal gradient method, one is actually able to obtain an optimal $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity.

Our new result in this thesis is a different set of parameters from \cite{kim2021optimizing} that also achieves the $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$ convergence with respect to $\|\nabla f(x_N)\|^2$. Combining with Nesterov's accelerated gradient method, we are able to derive an $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity, which is optimal among first-order methods, by the two-phase optimal gradient method.

#### Recommended Citation

Jiang, Yunheng, "Optimal First Order Methods for Reducing Gradient Norm in Unconstrained Convex Smooth Optimization" (2022). *All Theses*. 3860.

https://tigerprints.clemson.edu/all_theses/3860