Date of Award

8-2022

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematical Sciences

Committee Chair/Advisor

Yuyuan Ouyang

Committee Member

Cheng Guo

Committee Member

Boshi Yang

Abstract

In this thesis, we focus on convergence performance of first-order methods to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$ at the $N$-th iteration.

In our introduction of the above research question, we first introduce the gradient descent method with constant step size $h=1/L$. The gradient descent method has a $\mathcal{O}(L^2\|x_0-x^*\|^2/\epsilon)$ convergence with respect to $\|\nabla f(x_N)\|^2$. Next we introduce Nesterov’s accelerated gradient method, which has an $\mathcal{O}(L\|x_0-x^*\|\sqrt{1/\epsilon})$ complexity in terms of $\|\nabla f(x_N)\|^2$. The convergence performance of Nesterov’s accelerated gradient method is much better than that of the gradient descent method but still not optimal. We also briefly introduce some other first order methods in the literature to compute an $\epsilon$-approximate solution of minimizing convex smooth function $f$, including a monotone convergence accelerated gradient method and a perturbed gradient method in \cite{nesterov2018lectures}. They have $\mathcal{O}(L^{2/3}\|x_0-x^*\|^{2/3}/\epsilon^{1/3})$ and $\mathcal{O}((\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})\ln(1+2L\|x_0-x^*\|/\sqrt{\epsilon}))$ complexities respectively. Those results are better than that of Nesterov’s accelerated gradient method, but the convergence performance of first order methods can still be better. Our main focus is to design a first order method for reducing the gradient norm of the objective function. Our research is closely related to \cite{kim2021optimizing}, in which a first order method is proposed with complexity of order $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$. This method is studied through the performance enhancement program (PEP) originated from \cite{drori2014performance}. In \cite{nesterov2021primal} it is pointed out that by combining the accelerated gradient method and the method in \cite{kim2021optimizing} into a two-phase optimal gradient method, one is actually able to obtain an optimal $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity.

Our new result in this thesis is a different set of parameters from \cite{kim2021optimizing} that also achieves the $\mathcal{O}(\sqrt{L(f(x_0)-f(x^*))}/\sqrt{\epsilon})$ convergence with respect to $\|\nabla f(x_N)\|^2$. Combining with Nesterov's accelerated gradient method, we are able to derive an $\mathcal{O}(\sqrt{L\|x_0-x^*\|}/\epsilon^{1/4})$ complexity, which is optimal among first-order methods, by the two-phase optimal gradient method.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.