Abstract
In this paper, we propose an accelerated variant of the proximal point algorithm for computing a zero of an arbitrary maximally monotone operator A. The method incorporates an inertial term (based upon the acceleration techniques introduced by Nesterov), relaxation factors and a correction term. In a general Hilbert space setting, we obtain the weak convergence of the iterates \((x_n)\) to equilibria, with the fast rate \(\Vert x_{n+1}-x_{n}\Vert =o(n^{-1})\) (for the discrete velocity). In particular, when using constant proximal indexes, we establish the accuracy of \((x_n)\) to a zero of A with the worst-case rate \(\Vert A_{\lambda }(x_n)\Vert =o(n^{-1})\) (\(A_{\lambda }\) being the Yosida regularization of A with some index \(\lambda \)). Furthermore, given any positive integer p and using an appropriate adjustment of increasing proximal indexes, we obtain the worst-case convergence rate \(\Vert A_{\lambda }(x_n)\Vert =o(n^{-(p+1)})\). This acceleration process can be applied to various proximal-type algorithms such as the augmented Lagrangian method, the alternating direction method of multipliers, operator splitting methods and so on. Our methodology relies on a Lyapunov analysis combined with a suitable reformulation of the considered algorithm. Numerical experiments are also performed.
Similar content being viewed by others
References
Attouch, H.: Fast inertial proximal ADMM algorithms for convex structured optimization with linear constraint. Minimax Theory Its Appl. 06(1), 1–24 (2021)
Attouch, H., László, S.C.: Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 30(4), 3252–3283 (2020)
Attouch, H., László, S.C.: Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. (2020). https://doi.org/10.1007/s11228-020-00564-y
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Programm. 174, 391–432 (2019)
Attouch, H., Peypouquet, J.: Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions. In: Bauschke, H., Burachik, R., Luke, D. (eds.) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham (2019)
Attouch, H., Soueycatt, M.: Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces. Applications to games, PDE’s and control. Pac. J. Optim. 5(1), 17–37 (2009)
Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Boţ, R.I., Csetnek, E.R.: Second order forward-backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Boţ, R.I., Csetnek, E.R.: ADMM for monotone operators: convergence analysis and rates. Adv. Comput. Math. 45(1), 327–359 (2019)
Brezis, H.: Opérateurs maximaux monotones, Mathematical Studies, vol. 5. North-Holland, Amsterdam (1973)
Brezis, H.: Function Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York (2010). https://doi.org/10.1007/978-0-387-70914-7
Brezis, H., Lions, P.L.: Produits infinis de résolvantes. Isr. J. Math. 29, 329–345 (1978)
Chambolle, A., Dossal, C.: On the convergence of the iterates of Fista. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Combettes, P.L.: Monotone operator theory in convex optimization. Math. Programm. Vol. B 170(1), 177–206 (2018)
Corman, E., Yuan, X.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–1638 (2014)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Programm. 145(1–2), 451–482 (2014)
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programm. 55(1–3), 293–318 (1992)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1), 17–40 (1976)
Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems, (2019), arXiv:1904.05495
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29, 403–419 (1991)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969)
Kim, D.: Accelerated proximal point method for maximally monotone operators, (2019), arXiv:1905.05149
Kim, D., Fessler, J.A.: Another look at the Fast Iterative Shrinkage/Thresholding Algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018)
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28(2), 1920–1950 (2018)
Lemaire, B.: The proximal algorithm, in: New methods in optimization and their industrial uses. In: J.P. Penot (ed), Internat. Ser. Numer. Math. 87, Birkhauser, Basel, pp. 73-87 (1989)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learning Res. 18(212), 1–54 (2018)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Maingé, P.E.: First-order continuous newton-like systems for monotone inclusions. SIAM J. Control Optim. 51(2), 1615–1638 (2013)
Martinet, B.: Régularisation d’in’ equations variationnelles par approximations successives. Rev. Fr. Infor. Rech. Opération. 4, 154–158 (1970)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Dokl. Akad. Nauk. USSR 269(3), 543–7 (1983)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programm. Ser. B 140, 125–161 (2013)
Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28–41 (1955)
Powell, M.J.D.: Optimization. In: Fletcher, R. (ed.) A Method for Nonlinear Constraints in Minimization Problems, pp. 283–98. Academic Press, New York (1969)
Rockafellar, R.T.: Monotone operators associated with saddle functions and minimax problems. In: F. E. Browder, (ed), Nonlinear Functional Analysis, Part 1. Symposia in Pure Math., vol. 18, American Mathematical Society, Providence, RI., pp. 397–407 (1970)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Opt. 14(5), 877–898 (1976)
Rockafellar, R.T., Wets, J.B.: Variational Analysis. Springer, Berlin (1998)
Takahashi, W.: Nonlinear Functional Analysis. Yokohama Publishers, Yokohama (2000)
Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
APPENDIX.
APPENDIX.
1.1 Proof of Proposition 1.
In order to get this result, we compute the discrete derivative \({\dot{G}}_{n+1}(s,q):={G}_{n+1}(s,q)-{G}_{n}(s,q)\). For convenience of the reader we recall that \(G_n(s,q)\) is given from (2.10) by
It is then readily noticed that \({\dot{G}}_{n+1}(s,q)\) can be formulated as
where
Note also that for any bilinear symmetric form \(\langle .,.\rangle _E\) on a real vector space E and for any sequences \(\{ \phi _{n} , \varphi _{n} \} \subset E\) we have the discrete derivative rules:
The sequel of the proof can be divided into the following parts (1)–(4):
(1) Basic estimates. Setting \(P_n= \langle q-x_{n+1}, {\dot{x}}_{n+1}\rangle \), we establish the elementary but useful facts below:
-
Let us prove (A.4a). From \( 2 b_{n+1}= \Vert x_{n+1}-q\Vert ^2\), by the derivative rule (A.3a) we get
$$\begin{aligned} \begin{array}{l} 2 {\dot{b}}_{n+1} =\langle {\dot{x}}_{n+1}, x_{n+1}-q \rangle + \langle x_{n}-q, {\dot{x}}_{n+1}\rangle \\ =\langle {\dot{x}}_{n+1}, x_{n+1}-q \rangle + \langle x_n -x_{n+1}, {\dot{x}}_{n+1}\rangle + \langle x_{n+1}-q, {\dot{x}}_{n+1}\rangle , \end{array} \end{aligned}$$namely \( 2 {\dot{b}}_{n+1} = -2 P_n - \Vert {\dot{x}}_{n+1}\Vert ^2\), which leads to (A.4a).
-
Let us prove (A.4b). From \(a_n= \langle q-x_n, - {\dot{x}}_{n}\rangle \), we simply get
$$\begin{aligned} \begin{array}{l} a_n=\langle q-x_{n+1}, -{\dot{x}}_{n}\rangle + \langle {\dot{x}}_{n+1}, -{\dot{x}}_{n}\rangle =- \langle q-x_{n+1}, {\dot{x}}_{n}\rangle - \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle . \end{array}\nonumber \\ \end{aligned}$$(A.5)Moreover, from \(a_n:=\langle x_n-q, {\dot{x}}_{n}\rangle \), by the rule (A.3b) we readily get
$$\begin{aligned} {\dot{a}}_{n+1} = \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + \langle x_{n+1}-q , {\dot{x}}_{n+1}-{\dot{x}}_{n}\rangle . \end{aligned}$$(A.6)In addition, the derivative rule (A.3b) yields
$$\begin{aligned} \nu _{n+1} a_{n+1} -\nu _n a_n= {\dot{\nu }}_{n+1} a_{n} + {\nu }_{n+1}{\dot{a}}_{n+1}. \end{aligned}$$This latter result, in light of (A.5) and (A.6), amounts to
$$\begin{aligned} \begin{array}{l} a_{n+1}\nu _{n+1} - a_n \nu _n \\ \quad = {\dot{\nu }}_{n+1} \bigg (- \langle q-x_{n+1}, {\dot{x}}_{n}\rangle - \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle \bigg )\\ \qquad + {\nu }_{n+1}\bigg ( \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - P_n + \langle q-x_{n+1}, {\dot{x}}_{n}\rangle \bigg ) \\ \quad = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \nu _{n+1} P_n + \nu _{n} \langle q-x_{n+1}, {\dot{x}}_{n}\rangle . \end{array} \end{aligned}$$(A.7)Furthermore, (2.9) gives us \( {\dot{x}}_{n+1}+ d_{n} - \theta _n {\dot{x}}_{n}=0\). Taking the scalar product of each side of this equality by \(q-x_{n+1}\) yields
$$\begin{aligned} \hbox { }\ \theta _n\langle q-x_{n+1}, {\dot{x}}_{n}\rangle = P_n - \langle d_{n}, x_{n+1}-q \rangle . \end{aligned}$$So, recalling that \( \nu _{n} =(e+\nu _{n+1}) \theta _n\) (from (2.9)), we get
$$\begin{aligned} \nu _{n} \langle q-x_{n+1}, {\dot{x}}_{n}\rangle&=(e+\nu _{n+1}) (\theta _n\langle q-x_{n+1}, {\dot{x}}_{n}\rangle )\\&=(e+\nu _{n+1}) \left( P_n - \langle d_{n}, x_{n+1}-q \rangle \right) , \end{aligned}$$which, in light of (A.7), entails
$$\begin{aligned} \begin{array}{l} a_{n+1}\nu _{n+1} - a_n \nu _n \\ = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \nu _{n+1} P_n+ (e+\nu _{n+1}) \left( P_n - \langle d_{n}, x_{n+1}-q \rangle \right) \\ = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + e P_n - (e+\nu _{n+1})\langle d_{n}, x_{n+1}-q \rangle , \end{array} \end{aligned}$$that is (A.4b).
(2) An estimate from the inertial part. Now, given \((s,q) \in [0,\infty ) \times {\mathcal {H}}\), we prove that the discrete derivative \( {\dot{G}}_{n+1}(s,q)\) satisfies
Indeed, in light of (A.2) and (A.4), we obtain
which leads obviously to the desired equality.
(3) An estimate from the proximal part. We prove that, for any \(\xi _n \ne 1\), it holds that
Indeed, we have \({\dot{x}}_{n+1}=\theta _n{\dot{x}}_n -d_n\) (from (2.9)), hence, for any \(\xi _n \ne 1\), and setting \(H_n= {\dot{x}}_{n+1}- (1-\xi _n)^{-1}\theta _n{\dot{x}}_n \), we have
or equivalently
Furthermore, by \(-d_n= {\dot{x}}_{n+1}- \theta _n{\dot{x}}_n\) (again using (2.9)) we simply obtain
Therefore, taking the scalar product of each side of (A.10) with \(d_n\), also adding \((1/2) \Vert d_n\Vert ^2\) to the resulting equality, and next using (A.11) we get
This, while noticing that \( \Vert d_n\Vert ^2= \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2\) (from (2.9)) , yields (A.9).
(4) Combining proximal and inertial effects. At once we show for \(s \in \left( 0, e \right] \) that the iterates verify (for \(n \ge 0\))
Indeed, denoting \(\tau _n=e+\nu _{n+1}\), from (A.8) we know that
Furthermore, assuming that \(s \in \left( 0, e \right] \) and taking \(\xi _n=1- s \tau _n^{-1} \) in (A.9), while noticing that \(\tau _n \theta _n=\nu _n \) (from (2.9b)), we obtain
Then multiplying equality (A.14) by \(\tau _n^2\) and adding the resulting equality to (A.13) yields
where \(T_n\) is defined by
In addition, as \(\tau _n:=e+\nu _{n+1}\), a simple computation yields
As a consequence, by (A.16) we obtain
This ends the proof \(\square \)
1.2 Proof of Proposition 2.
Let us first prove the result for \(b=1\). Consider the positive sequence \((k_n)_{n \ge 0}\) defined, for \(n \ge 1\) and for some constants \(\{a, c \} \subset [0,\infty )\), by the recursive formula \(\frac{k_{n}}{k_{n-1}}= 1 + \frac{a}{ n + c } \), . As a consequence, from the basic inequalities \(a \ge [a]\) and \(c \le [c]+1\), we get
Hence, for \(n \ge 1\), we obtain
Moreover, for \(n \ge 3 + [a]\), we simply get
As a consequence, for \(n \ge 3 + [a]\), by (A.17) in light of (A.18) we get
where \(C_0:= k_0 \bigg ( \frac{ 2 + [c]+[a] }{ \prod _{j=2+ [c]}^{ 2 + [c]+[a] } j } \bigg )\), which implies that
The desired result follows immediately from the previous arguments when replacing a and c by \(\frac{a}{b}\) and \(\frac{c}{b}\), respectively. This completes the proof \(\square \)
1.3 Proof of Theorem 1
The proof will be divided into the following steps (a), (b) and (c):
(a) In order to establish (3.29a) and (3.29b), we first prove the following estimates
Indeed, passing to the limit as \(s \rightarrow 0^+\) in (2.8) amounts to
Concerning the second quantity in the right side of the above inequality, by \(b>0\), \(a \ge 0\) and \(a_2 >0\) (hence \(a+a_2 >0\)) we readily have
Then, in light of \(\sum _n n k_{n-1} |\langle A_{\lambda }(x_{n}), {\dot{x}}_{n+1}\rangle | < \infty \) (from (3.14d)), we infer that
It is then a classical matter to derive from (A.20) that there exists some \( l_2 \ge 0\) such that \( \lim _{n \rightarrow \infty } n ^2 \Vert {\dot{x}}_{n}\Vert ^2=l_2\). Moreover, recalling that \(\sum _n n \Vert {\dot{x}}_{n}\Vert ^2 < \infty \) (from (3.3f)), and noticing that \(\sum _n n^{-1}=\infty \), we infer that \( \liminf _{n \rightarrow \infty } n ^2 \Vert {\dot{x}}_{n}\Vert ^2=0\). It follows that \(l_2=0\), that is (A.19a).
Next combining the latter result with \(\Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda }(x_{n}) \Vert =o(n^{-1})\) (from (3.14b)) gives us immediately that \( \lim _{n \rightarrow \infty } n k_{n-1} \Vert A_{\lambda }(x_{n})\Vert =0\), that is (A.19b).
Then (3.29a) is given by (3.3f) and (A.19a), while (3.29b) follows from (3.14c) and (A.19b).
(b) The estimates in (3.29c) are derived from (3.3e) and (3.3g), while the estimate (3.29d) is given by (3.3d).
(c) Finally, we prove the weak convergence of the iterates \(\{x_{n}, z_n\}\) given by CRIPA by means of the Opial lemma. This latter result guarantees that \((x_n)\) converges weakly to some element of S, provided that the following results (h1)–(h2) hold: (h1) for any \(q \in S\), the sequence (\(\Vert x_n-q\Vert \)) is convergent; (h2) any weak-cluster point of (\(x_n\)) belongs to S.
- Let us prove (h1). Take \(q \in S\). Clearly, as a straightforward consequence of the bounded-ness of \((x_n)\) (given by (3.3a)) along with (A.19b) we have
Moreover, we know that (\({{\mathcal {E}}}_n(a+a_2,q)\)) is convergent (from Lemma 3) and writes
Then, by \(n \Vert {\dot{x}}_{n}\Vert \rightarrow 0\) (from (A.19a)) and \(n k_{n-1} | \langle A_{\lambda }(x_{n}), x_{n}-q \rangle | \rightarrow 0\) (according to (1.21)) as \(n\rightarrow \infty \), we deduce that
This entails (h1).
- Now, we prove (h2). Let u be a weak cluster point of \((x_n)\), namely there exists a subsequence (\(x_{n_k}\)) that converges to u as \(k \rightarrow \infty \). So, in view of (A.19b), we simply have \( \lim _{k \rightarrow +\infty } \Vert A_{\lambda } (x_ {n_k})\Vert =0\), while a classical result gives us \(A_{\lambda } (x_{n_k}) \in A (x_{n_k}-\lambda A_{\lambda } (x_{n_k}))\). Then passing to the limit as \(k \rightarrow \infty \) in this latter result and recalling that the graph of a maximally monotone operator is demi-closed (see, for instance, [13]), we deduce that \(0 \in A(u)\), namely \( u \in S\). This proves (h2).
Then we infer by Opial’s lemma that \((x_n)\) converges weakly to some element \({\bar{x}} \in S\).
Finally, by \(\sum _n n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 < \infty \) (from (3.14c)), we get \(k_{n-1} A_{\lambda } (x_{n}) \rightarrow 0\) (as \(n \rightarrow \infty \)), hence, by \( z_{n-1}-x_{n}=\lambda k_{n-1} A_{\lambda } (x_{n})\) (from (2.4a)), we are led to \( z_{n-1}-x_{n}\rightarrow 0\) (as \(n \rightarrow \infty \)). It follows that \((z_n)\) converges weakly to the element \({\bar{x}}\). This completes the proof \(\square \)
Rights and permissions
About this article
Cite this article
Maingé, PE. Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions. Appl Math Optim 84 (Suppl 2), 2027–2061 (2021). https://doi.org/10.1007/s00245-021-09819-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-021-09819-y