# Circumcentering approximate reflections for solving the convex feasibility problem – Fixed Point Theory and Algorithms for Sciences and Engineering

#### ByG. H. M. Araújo, R. Arefidamghani, R. Behling, Y. Bello-Cruz, A. Iusem and L.-R. Santos

Jan 4, 2022

In [9], the following global error bound assumption on the sets K, U, denoted as (EB), was considered:

1. (EB)

There exists (bar{omega}>0) such that (operatorname{dist}(x,K)ge bar{omega} operatorname{dist}(Kcap U)) for all (xin U).

Let us comment on the connection between (EB) and other notions of error bounds which have been introduced in the past, all of them related to regularity assumptions imposed on the solutions of certain problems. If the problem at hand consists of solving (H(x)=0) with a smooth (H:mathbb{R}^{n}to mathbb{R}^{m}), a classical regularity condition demands that (m=n) and the Jacobian matrix of H be nonsingular at a solution (x^{*}), in which case Newton’s method, for instance, is known to enjoy superlinear or quadratic convergence. This condition implies local uniqueness of the solution (x^{*}). For problems with nonisolated solutions, a less demanding assumption is the notion of calmness (see [25], Chap. 8, Sect. F), which requires that

$$frac{ Vert H(x) Vert }{operatorname{dist}(x,S^{*})}ge omega$$

(4.1)

for all (xin mathbb{R}^{n}setminus S^{*}) and some (omega >0), where (S^{*}) is the solution set, i.e., the set of zeros of H. Calmness, also called upper-Lipschitz continuity (see [26]), is a classical example of error bound, and it holds in many situations (e.g., when H is affine by virtue of Hoffman’s lemma [27]). It implies that the solution set is locally a Riemannian manifold (see [28]), and it has been used for establishing superlinear convergence of Levenberg–Marquardt methods in [29].

When dealing with convex feasibility problems, as in this paper, it seems reasonable to replace the numerator of (4.1) by the distance from x to some of the convex sets, giving rise to (EB). Similar error bounds for feasibility problems can be found, for instance, in [3033].

Under (EB), it was proved in [9] that MAP converges linearly with asymptotic constant bounded above by (sqrt{1-bar{omega}^{2}}), and that CRM also converges linearly with a better upper bound for the asymptotic constant, namely (sqrt{(1-bar{omega}^{2})/(1+bar{omega}^{2})}). In this section, we prove similar results for MAAP and CARM, assuming that a local error bound related not just to K, U, but also to the separating operator S. The local error bound, denoted as (LEB), is defined as follows:

1. (LEB)

There exist a set (Vsubset mathbb{R}^{n}) and a scalar (omega >0) such that

$$operatorname{dist}bigl(x,S(x)bigr)ge omega operatorname{dist}(x,Kcap U)quad text{for all } x in Ucap V.$$

We reckon that (LEB) becomes meaningful, and relevant for establishing convergence rate results, only when the set V contains the tail of the sequence generated by the algorithm; otherwise it might be void (e.g., it holds trivially, with any ω, when (Ucap V=emptyset )). In order to facilitate the presentation, we opted for introducing additional conditions on V in our convergence results rather than in the definition of (LEB).

The use of a local error bound instead of a global one makes sense, because the definition of linear convergence rate deals only with iterates (x^{k}) of the generated sequence with large enough k, and the convergence of the sequences of interest has already been established in Theorem 3.5, so that only points close enough to the limit (x^{*}) of the sequence matter. In fact, the convergence rate analysis for MAP and CRM in [9] holds, without any substantial change, under a local rather than global error bound.

The set V could be expected to be a neighborhood of the limit (x^{*}) of the sequence, but we do not specify it for the time being, because for the prototypical example of separating operator, i.e., the one in Example 2.6 of Sect. 3, it will have, as we will show later, a slightly more complicated structure: a ball centered at (x^{*}) minus a certain “slice”.

We start with the convergence rate analysis for MAAP.

### Proposition 4.1

Assume that K, U and the separating operator S satisfy (LEB). Consider (T^{S}:mathbb{R}^{n}to mathbb{R}^{n}) as in (3.1). Then, for all (xin Ucap V),

$$bigl(1-omega ^{2}bigr) biglVert x-P_{Kcap U}(x) bigrVert ^{2}ge biglVert T^{S}(x)-P_{Kcap U}bigl(T^{S}(x)bigr) bigrVert ^{2}$$

(4.2)

with ω as in Assumption (LEB).

### Proof

By Proposition 3.1, for all (zin Kcap U) and all (xin mathbb{R}^{n}),

begin{aligned} biglVert T^{S}(x)-z bigrVert ^{2}&le Vert z-x Vert ^{2}- biglVert T^{S}(x)-P^{S}(x) bigrVert ^{2}- biglVert P^{S}(x)-x bigrVert ^{2} \ &le Vert x-z Vert ^{2}- biglVert P^{S}(x)-x bigrVert ^{2}. end{aligned}

(4.3)

Note that (Vert P^{S}(x)-x Vert =operatorname{dist}(x,S(x))) and that (Vert T^{S}(x)-P_{Kcap U}(T^{S}(x)) Vert le Vert T^{S}(x)-z Vert ) by the definition of (P_{Kcap U}). Take (z= P_{Kcap U}(x)), and get from (4.3)

begin{aligned} biglVert T^{S}(x)-P_{Kcap U}bigl(T^{S}(x) bigr) bigrVert ^{2}&le biglVert T^{S}(x)-P_{Kcap U}(x) bigrVert ^{2} \ &le biglVert x-P_{Kcap U}(x) bigrVert ^{2}-operatorname{dist}bigl(x,S(x)bigr)^{2}. end{aligned}

(4.4)

Take now (xin Ucap V) and invoke (LEB) to get from (4.4)

begin{aligned} biglVert T^{S}(x)-P_{Kcap U}bigl(T^{S}(x) bigr) bigrVert ^{2}&le biglVert x-P_{Kcap U}(x) bigrVert ^{2}-omega ^{2}operatorname{dist}(x,K cap U)^{2} \ &=(1-omega )^{2} biglVert x-P_{Kcap U}(x) bigrVert ^{2}, end{aligned}

which immediately implies the result. □

Proposition 4.1 implies that if ({x^{k}}_{kin mathbb{N}}) is the sequence generated by MAAP and (x^{k}in V) for large enough k, then the sequence ({operatorname{dist}(x^{k},Kcap U)}_{kin mathbb{N}}) converges Q-linearly, with asymptotic constant bounded above by (sqrt{1-omega ^{2}}). In order to move from the distance sequence to the sequence ({x^{k}}_{kin mathbb{N}}) itself, we will invoke the following lemma from [9].

### Lemma 4.2

Consider a nonempty closed convex set (Msubset mathbb{R}^{n}) and a sequence ({y^{k}}_{kin mathbb{N}}subset mathbb{R}^{n}). Assume that ({y^{k}}_{kin mathbb{N}}) is Fejér monotone with respect to M, and that ({operatorname{dist}(y^{k},M)}_{kin mathbb{N}}) converges Rlinearly to 0. Then ({y^{k}}_{kin mathbb{N}}) converges Rlinearly to some point (y^{*}in M), with asymptotic constant bounded above by the asymptotic constant of ({operatorname{dist}(y^{k},M)}_{kin mathbb{N}}).

### Proof

See Lemma 3.4 in [9]. □

Next we establish the linear convergence of MAAP under (LEB).

### Theorem 4.3

Consider a closed and convex set (Ksubset mathbb{R}^{n}) and an affine manifold (Usubset mathbb{R}^{n}) such that (Kcap Uneq emptyset ). Moreover, assume that S is a separating operator for K satisfying Assumptions A1–A3. Suppose that K, U and the separating operator S satisfy (LEB). Let ({x^{k}}_{kin mathbb{N}}) be the sequence generated by MAAP from any starting point (x^{0}in mathbb{R}^{n}). If there exists (k_{0}) such that (x^{k}in V) for all (kge k_{0}), then ({x^{k}}_{kin mathbb{N}}) converges Rlinearly to some point (x^{*}in Kcap U), and the asymptotic constant is bounded above by (sqrt{1-omega ^{2}}), with ω and V as in (LEB).

### Proof

The fact that ({x^{k}}_{kin mathbb{N}}) converges to some (x^{*}in Kcap U) has been established in Theorem 3.5. Take any (kge k_{0}). By assumption, (x^{k}in V), and by definition of (T^{S}), (x^{k}in U). So, we can take (x=x^{k}) in Proposition 4.1, in which case (T^{S}(x)=x^{k+1}), and rewrite (4.2) as ((1-omega ^{2})operatorname{dist}(x^{k},Kcap U)^{2}ge operatorname{dist}(x^{k+1},Kcap U)^{2}) for (kge k_{0}), which implies that ({operatorname{dist}(x^{k},Kcap U)}_{kin mathbb{N}}) converges Q-linearly, and hence R-linearly, with asymptotic constant bounded by (sqrt{1-omega ^{2}}). The corresponding result for the R-linear convergence of ({x^{k}}_{kin mathbb{N}}) with the same bound for the asymptotic constant follows then from Lemma 4.2, since ({x^{k}}_{kin mathbb{N}}) is Fejér monotone with respect to (Kcap U) by Theorem 3.5. □

Now we analyze the convergence rate of CARM under (LEB), for which a preliminary result, relating x, (C^{S}(x)) and (T^{S}(x)), is needed. The corresponding result for x, (C(x)), (T(x)) can be found in [15], where it is proved with a geometrical argument. Here we present an analytical one.

### Proposition 4.4

Consider the operators (C^{S},T^{S}:mathbb{R}^{n}to mathbb{R}^{n}) defined in (3.2). Then (T^{S}(x)) belongs to the segment between x and (C^{S}(x)) for all (xin U).

### Proof

Let E denote the affine manifold spanned by x, (R^{S}(x)) and (R_{U}(R^{S}(x))). By definition, the circumcenter of these three points, namely (C^{S}(x)), belongs to E. We claim that (T^{S}(x)) also belongs to E, and next we proceed to proving the claim. Since U is an affine manifold, (P_{U}) is an affine operator, so that (P_{U}(alpha x+(1-alpha )x’)=alpha P_{U}(x)+(1-alpha )P_{U}(x’)) for all (alpha in mathbb{R}) and all (x,x’in mathbb{R}^{n}). By (3.1), (R_{U}(R^{S}(x))=2P_{U}(R^{S}(x))-R^{S}(x)), so that

$$P_{U}bigl(R^{S}(x)bigr)= frac{1}{2} bigl(R_{U}bigl(R^{S}(x) bigr)+R^{S}(x) bigr).$$

(4.5)

On the other hand, using the affinity of (P_{U}), the definition of (T^{S}), and the assumption that (xin U), we have

$$P_{U}bigl(R^{S}(x) bigr)=P_{U}bigl(2P^{S}(x)-xbigr)=2P_{U} bigl(P^{S}(x)bigr)-P_{U}(x)=2T^{S}(x)-x,$$

(4.6)

so that

$$T^{S}(x)=frac{1}{2} bigl(P_{U}bigl(R^{S}(x)bigr)+x bigr).$$

(4.7)

Combining (4.5) and (4.7),

$$T^{S}(x)=frac{1}{2}x+frac{1}{4}R_{U} bigl(R^{S}(x)bigr)+frac{1}{4}R^{S}(x),$$

i.e., (T^{S}(x)) is a convex combination of x, (R_{U}(R^{S}(x))) and (R^{S}(x)). Since these three points belong to E, the same holds for (T^{S}(x)), and the claim holds. We observe now that (xin U) by assumption, (T^{S}(x)in U) by definition, and (C^{S}(x)in U) by Proposition 3.4(ii). Now we consider three cases: if (operatorname{dim} (E cap U)=0) then x, (T^{S}(x)) and (C^{S}(x)) coincide, and the result holds trivially. If (operatorname{dim} (Ecap U)=2) then (Esubset U), so that (R^{S}(x)in U) so that (R_{U}(R^{S}(x))=R^{S}(x)), in which case (C^{S}(x)) is the midpoint between x and (R^{S}(x)), which is precisely (P^{S}(x)). Hence, (P^{S}(x)in U), so that (T^{S}(x)=P_{U}(P^{S}(x))=P^{S}(x)=C^{S}(x)), implying that (T^{S}(x)) and (C^{S}(x)) coincide, and the result holds trivially. The interesting case is the remaining one, i.e., (operatorname{dim} (Ecap U)=1). In this case x, (T^{S}(x)) and (C^{S}(x)) lie in a line, so that we can write (C^{S}(x)=x+eta (T^{S}(x)-x)) with (eta in mathbb{R}), and it suffices to prove that (eta ge 1).

By the definition of η,

$$biglVert C^{S}(x)-x bigrVert = vert eta vert biglVert T^{S}(x)-x bigrVert .$$

(4.8)

In view of (3.4) with (M=U), (y=C^{S}(x)), and (x=R^{S}(x)),

$$biglVert C^{S}(x)-R^{S}(x) bigrVert ge biglVert C^{S}(x)-P_{U} bigl(R^{S}(x)bigr) bigrVert .$$

(4.9)

Then

begin{aligned} biglVert C^{S}(x)-x bigrVert &= biglVert C^{S}(x) – {R}^{S}(x) bigrVert ge biglVert C^{S}(x)-P_{U}bigl(R^{S}(x)bigr) bigrVert \ &= biglVert bigl(C^{S}(x)-x bigr)- bigl(P_{U} bigl(R^{S}(x)bigr)-x bigr) bigrVert \ &= biglVert eta bigl(T^{S}(x)-x bigr)-2 bigl(T^{S}(x)-x bigr) bigrVert \ &= vert eta -2 vert biglVert T^{S}(x)-x bigrVert , end{aligned}

(4.10)

using the definition of the circumcenter in the first equality, (4.9) in the inequality, and (4.6), as well as the definition of η, in the third equality. Combining (4.8) and (4.10), we get

$$vert eta vert biglVert T^{S}(x)-x bigrVert ge vert eta -2 vert biglVert T^{S}(x)-x bigrVert ,$$

implying that (vert eta vert ge vert 2-eta vert ), which holds only when (eta ge 1), completing the proof. □

We continue with another intermediate result.

### Proposition 4.5

Assume that (LEB) holds for K, U, and S, and take (xin U). If (x,C^{S}(x)in V), then

$$bigl(1+omega ^{2}bigr)operatorname{dist}bigl(C^{S}(x),Kcap Ubigr)^{2}le bigl(1-omega ^{2}bigr)operatorname{dist}(x,K cap U)^{2},$$

(4.11)

with V, ω as in (LEB).

### Proof

Take (zin Kcap U), (xin Vcap U). We use Proposition 3.1, rewriting (3.3) as

$$biglVert x-P^{S}(x) bigrVert ^{2}le Vert x-z Vert ^{2}- biglVert P_{U}bigl(P^{S}(x)bigr)-z bigrVert ^{2}- biglVert P_{U}bigl(P^{S}(x) bigr)-P^{S}(x) bigrVert ^{2}$$

(4.12)

for all (xin mathbb{R}^{n}) and all (zin Kcap U). Since (xin U), we get from Lemma 3.3 that (C^{S}(x)=P_{H(x)}(x)). We apply next the well-known characterization of projections [24, Theorem 3.16] to get

$$bigl(x-C^{S}(x)bigr)^{top } bigl(z-C^{S}(x)bigr)le 0.$$

(4.13)

In view of Proposition 4.4, (P_{U}(P^{S}(x))) is a convex combination of x and (C^{S}(x)), meaning that (P_{U}(P^{S}(x))-C^{S}(x)) is a nonnegative multiple of (x-C^{S}(x)), so that (4.13) implies

$$bigl(P_{U}bigl(P^{S}(x) bigr)-C^{S}(x)bigr)^{top }bigl(z-C^{S}(x) bigr)le 0.$$

(4.14)

Expanding the inner product in (4.14), we obtain

$$biglVert P_{U}bigl(P^{S}(x) bigr)-z bigrVert ^{2}ge biglVert C^{S}(x)-z bigrVert ^{2}+ biglVert C^{S}(x)-P_{U} bigl(P^{S}(x)bigr) bigrVert ^{2}.$$

(4.15)

Combining (4.12) and (4.15), we have

begin{aligned} operatorname{dist}bigl(x,S(x)bigr)^{2}le {}& Vert x-z Vert ^{2}- biglVert C^{S}(x)-z bigrVert ^{2}- biglVert C^{S}(x)-P_{U} bigl(P^{S}(x)bigr) bigrVert ^{2} \ &{}- biglVert P_{U}bigl(P^{S}(x) bigr)-P^{S}(x) bigrVert ^{2}. end{aligned}

(4.16)

Now, since U is an affine manifold, ((y-P_{U}(y))^{top }(w-P_{U}(y))=0) for all (yin mathbb{R}^{n}) and all (win U), so that

$$Vert w-y Vert ^{2}= biglVert w-P_{U}(y) bigrVert ^{2}+ biglVert P_{U}(y)-y bigrVert ^{2}.$$

(4.17)

Since (C^{S}(x)in U) by Lemma 3.3, we use (4.17) with (y=P^{S}(x)), (w=C^{S}(x)), getting

$$biglVert C^{S}(x)-P_{U} bigl(P^{S}(x)bigr) bigrVert ^{2}+ biglVert P_{U}bigl(P^{S}(x)bigr)-P^{S}(x) bigrVert ^{2}= biglVert C^{S}(x)-P^{S}(x) bigrVert ^{2}.$$

(4.18)

Replacing (4.18) in (4.16), we obtain

begin{aligned} operatorname{dist}bigl(x,S(x)bigr)^{2}&le Vert x-z Vert ^{2}- biglVert C^{S}(x)-z bigrVert ^{2}- biglVert C^{S}(x)-P^{S}(x) bigrVert ^{2} \ & le Vert x-z Vert ^{2}-operatorname{dist}bigl(C^{S}(x),K cap Ubigr)^{2}- operatorname{dist}bigl(C^{S}(x),S(x) bigr)^{2}, end{aligned}

(4.19)

using the facts that (P^{S}(x)in S(x)) and (zin Kcap U) in the last inequality. Now, we take (z=P_{Kcap U}(x)), recall that (x,C^{S}(x)in V) by hypothesis, and invoke the (LEB) assumption, together with (4.19), in order to get

begin{aligned} omega ^{2}operatorname{dist}(x,Kcap U)^{2}le{} & operatorname{dist}bigl(x,S(x)bigr)^{2} \ le{} & operatorname{dist}(x,Kcap U)^{2}-operatorname{dist}bigl(C^{S}(x),Kcap Ubigr)^{2} \ &{}-omega ^{2}operatorname{dist}bigl(C^{S}(x),Kcap U bigr)^{2} \ ={}&operatorname{dist}(x,Kcap U)^{2}-bigl(1+omega ^{2}bigr)operatorname{dist}bigl(C^{S}(x),Kcap Ubigr)^{2}. end{aligned}

(4.20)

The result follows rearranging (4.20). □

Next we present our convergence rate result for CARM.

### Theorem 4.6

Consider a closed and convex set (Ksubset mathbb{R}^{n}), an affine manifold (Usubset mathbb{R}^{n}) such that (Kcap Uneq emptyset ), and a separating operator S for K satisfying Assumptions A1–A3. Suppose that K, U and the separating operator S satisfy (LEB). Let ({x^{k}}_{kin mathbb{N}}) be the sequence generated by CARM from any starting point (x^{0}in U). If there exists (k_{0}) such that (x^{k}in V) for all (kge k_{0}), then ({x^{k}}_{kin mathbb{N}}) converges Rlinearly to some point (x^{*}in Kcap U), and the asymptotic constant is bounded above by (sqrt{{(1-omega ^{2})}/{(1+omega ^{2})}}), with ω and V as in (LEB).

### Proof

The fact that ({x^{k}}_{kin mathbb{N}}) converges to some (x^{*}in Kcap U) has been established in Theorem 3.5. Take any (kge k_{0}). By assumption, (x^{k}in V) and by the definition of (T^{S}), (x^{k}in U). Also, (k+1ge k_{0}), so that (C^{S}(x^{k})=x^{k+1}in V) So, we can take (x=x^{k}) in Proposition 4.5 and rewrite (4.11) as ((1+omega ^{2})operatorname{dist}(x^{k+1},Kcap U)^{2}le (1-omega ^{2})operatorname{dist}(x^{k},K cap U)^{2}) or equivalently as

$$frac{operatorname{dist}(x^{k+1},Kcap U)}{operatorname{dist}(x^{k}, Kcap U)}le sqrt{frac{1-omega ^{2}}{1+omega ^{2}}}$$

for all (kge 0), which immediately implies that ({operatorname{dist}(x^{k},Kcap U)}_{kin mathbb{N}}) converges Q-linearly, and hence R-linearly, with asymptotic constant bounded by (sqrt{(1-omega ^{2})/(1+omega ^{2})}). The corresponding result for the R-linear convergence of ({x^{k}}_{kin mathbb{N}}) with the same bound for the asymptotic constant follows then from Lemma 4.2, since ({x^{k}}_{kin mathbb{N}}) is Fejér monotone with respect to (Kcap U) by Theorem 3.5. □

From now on, given (zin mathbb{R}^{n}), (alpha >0, B(z,alpha )) will denote the closed ball centered at z with radius α.

The results of Theorems 4.3 and 4.6 become relevant only if we are able to find a separating operator S for K such that (LEB) holds. This is only possible if the “trivial” separating operator satisfies an error bound, i.e., if an error bound holds for the sets K, U themselves. Let ({ x^{k}}_{kin mathbb{N}}) be a sequence generated by CARM starting at some (x^{0}in U). By Theorem 3.5, ({x^{k}}_{kin mathbb{N}}) converges to some (x^{*}in Kcap U). Without loss of generality, we assume that (x^{k}notin K) for all k, because otherwise the sequence is finite and it makes no sense to deal with convergence rates. In such a case, (x^{*}in partial K), the boundary of K. We also assume from now on that a local error bound for K, U, say (LEB1), holds at some neighborhood of (x^{*}), i.e.,

1. (LEB1)

There exist (rho ,bar{omega}>0) such that (operatorname{dist}(x,K)ge bar{omega} operatorname{dist}(x,Kcap U)) for all (xin Ucap B(x^{*},rho )).

Note that (LEB1) is simply a local version of (EB). Observe further that (LEB1) does not involve the separating operator S, and that it gives a specific form to the set V in (LEB), namely a ball around (x^{*}).

We will analyze the situation for what we call the “prototypical” separating operator, namely the operator S presented in Example 2.6, for the case in which K is the 0-sublevel set of a convex function (g:mathbb{R}^{n}to mathbb{R}).

We will prove that under some additional mild assumptions on g, for any (omega <bar{omega}), there exists a set V such that U, K, S satisfy a local error bound assumption, say (LEB), with the pair ω, V.

Indeed, it will not be necessary to assume (LEB) in the convergence rate result; we will prove that under (LEB1), (LEB) will be satisfied for any (omega <bar{omega}) with an appropriate set V which does contain the tail of the sequence.

Our proof strategy will be as follows: in order to check that the error bounds for K, U and (S(x)), U are virtually the same for x close to the limit (x^{*}) of the CARM sequence, we will prove that the quotient between (operatorname{dist}(x,S(x))) and (operatorname{dist}(x,K)) approaches 1 when x approaches (x^{*}). Since both distances vanish at (x=x^{*}), we will take the quotient of their first order approximations, in a L’Hôspital’s rule fashion, and the result will be established as long as the numerator and denominator of the new quotient are bounded away from 0, because otherwise this quotient remains indeterminate. This “bad” situation occurs when x approaches (x^{*}) along a direction almost tangent to (Kcap U), or equivalently almost normal to (nabla g(x^{*})). Fortunately, the CARM sequence, being Fejér monotone with respect to (Kcap U), does not approach (x^{*}) in such a tangential way. We will take an adequate value smaller than the angle between (nabla g(x^{*})) and (x^{k}-x^{*}). Then, we will exclude directions whose angle with (nabla g(x^{*})) is smaller than such a value and find a ball around (x^{*}) such that, given any (omega <bar{omega}), (LEB) holds with parameter ω in the set V defined as the ball minus the “slice” containing the “bad” directions. Because of the Fejér monotonicity of the CARM sequence, its iterates will remain in V for large enough k, and the results of Theorem 4.6 will hold with such ω. We proceed to follow this strategy in detail.

The additional assumptions on g are continuous differentiability and a Slater condition, meaning that there exists (hat{x}in mathbb{R}^{n}) such that (g(hat{x})<0). When g is of class (mathcal{C}^{1}), the separating operator of Example 2.6 becomes

$$S(x)= textstylebegin{cases} K, & mathrm{if} xin K, \ {zin mathbb{R}^{n}:nabla g(x)^{top }(z-x)+g(x)leq 0} & mathrm{otherwise}. end{cases}$$

(4.21)

### Proposition 4.7

Let (g:mathbb{R}^{n}to mathbb{R}) be convex, of class (mathcal{C}^{1}), and such that there exists (hat{x}in mathbb{R}^{n}) satisfying (g(hat{x})< 0). Take (K={xin mathbb{R}^{n}: g(x)le 0}). Assume that K, U satisfy (LEB1). Take (x^{*}) as in (LEB1), fix (0<nu < Vert nabla g(x^{*}) Vert ) (we will establish that (0ne nabla g(x^{*})) in the proof of this proposition), and define (L_{nu }:={zin mathbb{R}^{n}: vert nabla g(x^{*})^{top }(z-x^{*}) vert le nu Vert z-x^{*} Vert }). Consider the separating operator S defined in (4.21). Then, for any (omega <bar{omega}), with ω̄ as in (LEB1), there exists (beta >0) such that K, U, S satisfy (LEB) with ω and (V:=B(x^{*},beta )setminus L_{nu }).

### Proof

The fact that (0 < nu < Vert nabla g(x^{*}) Vert ) ensures that (Vne emptyset ). We will prove that, for x close to (x^{*}), the quotient (operatorname{dist}(x,S(x))/operatorname{dist}(x,K)) approaches 1, and first we proceed to evaluate (operatorname{dist}(x,S(x))). Note that when (xin Ksubset S(x)), the inequality in (LEB1) holds trivially because of A1. Thus, we assume that (xnotin K), so that (xnotin S(x)) by Proposition 2.5, and hence (g(x)> 0) and (S(x)={zin mathbb{R}^{n}:nabla g(x)^{top }(z-x)+g(x)leq 0}), implying, in view of (2.2), that

$$operatorname{dist}bigl(x,S(x)bigr)= biglVert x-P^{S}(x) bigrVert = frac{g(x)}{ Vert nabla g(x) Vert }.$$

(4.22)

Now we look for a more manageable expression for (operatorname{dist}(x,K)= Vert x-P_{K}(x) Vert ). Let (y=P_{K}(x)). So, y is the unique solution of the problem (min Vert z-x Vert ^{2}) s.t. (g(z)le 0), whose first order optimality conditions, sufficient by convexity of g, are

$$x-z=lambda nabla g(z)$$

(4.23)

with (lambda ge 0), so that

$$operatorname{dist}(x,K)= Vert x-y Vert =lambda biglVert nabla g(y) bigrVert .$$

(4.24)

Now we observe that the Slater condition implies that the right-hand sides of both (4.22) and (4.24) are well defined: since (xnotin K), (g(x)> 0); since (y=P_{K}(x)in partial K), (g(y)=0). By the Slater condition, (g(x)>g(hat{x})) and (g(y)>g(hat{x})), so that neither x nor y are minimizers of g, and hence both (nabla g(y)) and (nabla g(x)) are nonzero. By the same token, (nabla g(x^{*})ne 0), because (x^{*}) is not a minimizer of g: being the limit of a sequence lying outside K, (x^{*}) belongs to the boundary of K, so that (g(x^{*})=0>g(hat{x})).

From (4.22), (4.24),

$$frac{operatorname{dist}(x,S(x))}{operatorname{dist}(x,K)}= biglVert nabla gbigl(y(x)bigr) bigrVert biglVert nabla g(x) bigrVert biggl[ frac{lambda (x)}{g(x)} biggr],$$

(4.25)

where the notation (y(x)), (lambda (x)) emphasizes that both (y=P_{K}(x)) and the multiplier λ depend on x.

We look at the right-hand side of (4.25) for x close to (x^{*}in K), in which case y, by the continuity of (P_{K}), approaches (P(x^{*})=x^{*}), so that (nabla g(y(x))) approaches (nabla g(x^{*})ne 0), and hence, in view of (4.22), (lambda (x)) approaches 0. So, the product of the first two factors in the right-hand side of (4.25) approaches (Vert nabla g(x^{*}) Vert ^{2}), but the quotient is indeterminate, because both the numerator and the denominator approach 0, requiring a more precise first order analysis.

Expanding (g(x)) around (x^{*}) and taking into account that (g(x^{*})=0), we get

$$g(x)=nabla gbigl(x^{*}bigr) bigl(x-x^{*}bigr)+obigl( biglVert x-x^{*} bigrVert bigr).$$

Define (t= Vert x-x^{*} Vert ), (d=t^{-1}(x-x^{*})) so that (Vert d Vert =1), and (4.26) becomes

$$g(x)=tnabla gbigl(x^{*}bigr)^{top }d+o(t).$$

(4.26)

Now we look at (lambda (x)). Let (phi (t)=lambda (x^{*}+td)). Note that, for (xin partial K), we get (y(x)=x), so that (0=lambda (x)nabla g(x)) and hence (lambda (x)=0). Thus, (phi (0)=0) and

$$lambda (x)=phi (t)=tphi ‘_{+}(0)+o(t),$$

(4.27)

where (phi ‘_{+}(0)) denotes the right derivative of (phi (t)) at 0. Since we assume that (xnotin K), we have (y(x)in partial K) and hence, using (4.23),

$$0=gbigl(y(x)bigr)=gbigl(x-lambda (x)nabla gbigl(y(x) bigr)bigr)=gbigl(x^{*}+td-phi (t)nabla gbigl(y bigl(x^{*}+tdbigr)bigr)bigr)$$

(4.28)

for all (t>0). Let (sigma (t)=phi (t)nabla g(y(x^{*}+td))), (psi (t)=g(x^{*}+td-sigma (t))), so that (4.28) becomes (0=psi (t)=g(x^{*}+td-sigma (t))) for all (t>0) and hence

$$0=psi ‘(t)=nabla gbigl(ybigl(x^{*}+td bigr)bigr)^{top }bigl(d-sigma ‘(t)bigr).$$

(4.29)

Taking limits in (4.29) with (tto 0^{+}) and noting that (y(x^{*})=x^{*}) because (x^{*}in K), we get

$$0= nabla gbigl(x^{*}bigr)^{top } bigl(d-sigma ‘_{+}(0)bigr),$$

(4.30)

where (sigma ‘_{+}(0)) denotes the right derivative of (sigma (t)) at 0. We compute (sigma ‘_{+}(0)) directly from the definition, because we assume that g is of class (mathcal{C}^{1}) but perhaps not of class (mathcal{C}^{2}). Recalling that (phi (0)=0), we have that

begin{aligned}[b] sigma ‘_{+}(0)&=lim _{tto 0^{+}}frac{phi (t)}{t}nabla gbigl(y bigl(x^{*}+tdbigr)bigr)\ &= lim_{tto 0^{+}} frac{phi (t)}{t}lim_{tto 0^{+}}nabla gbigl(y bigl(x^{*}+tdbigr)bigr)= phi ‘_{+}(0) nabla gbigl(x^{*}bigr), end{aligned}

(4.31)

using the facts that g is class (mathcal{C} ^{1}) and that (y(x^{*})=x^{*}). Replacing (4.31) in (4.30), we get that (0=nabla g(x^{*})^{top }(d-phi ‘_{+}(0)nabla g(x^{*}))), and therefore

$$phi ‘_{+}(0)= frac{nabla g(x^{*})^{top }d}{ Vert nabla g(x^{*}) Vert ^{2}}.$$

(4.32)

Using (4.27) and (4.32), we obtain

$$lambda (x)= frac{tnabla g(x^{*})^{top }d}{ Vert nabla g(x^{*}) Vert ^{2}}+o(t)= frac{1}{ Vert nabla g(x^{*}) Vert ^{2}} bigl[tnabla gbigl(x^{*}bigr)^{ top }d +o(t)bigr].$$

(4.33)

Replacing (4.33) and (4.26) in (4.25), we obtain

begin{aligned} frac{operatorname{dist}(x,S(x))}{operatorname{dist}(x,K)}&= biggl[ frac{ Vert nabla g(y(x)) Vert Vert nabla g(x) Vert }{ Vert nabla g(x^{*}) Vert ^{2}} biggr] biggl[ frac{tnabla g(x^{*})^{top }d +o(t)}{tnabla g(x^{*})^{top }d +o(t)} biggr] \ &= biggl[ frac{ Vert nabla g(y(x^{*}+td)) Vert Vert nabla g(x^{*}+td) Vert }{ Vert nabla g(x^{*}) Vert ^{2}} biggr] biggl[ frac{nabla g(x^{*})^{top }d +o(t)/t}{nabla g(x^{*})^{top }d +o(t)/t} biggr]. end{aligned}

(4.34)

Now we recall that we must check the inequality of (LEB) only for points in V, and that (Vcap L_{nu }=emptyset ) with (L_{nu }={zin mathbb{R}^{n}:nabla g(x^{*})(z-x^{*})le nu Vert z-x^{*} Vert }). So, for (xin V), we have (vert nabla g(x^{*})^{top }(x-x^{*}) vert ge nu Vert x-x^{*} Vert ), which implies (vert nabla g(x^{*})^{top }d vert ge nu ), i.e., (nabla g(x^{*})^{top }d) is bounded away from 0, independently of the direction d. In this situation, it is clear that the rightmost expression of (4.34) tends to 1 when (tto 0^{+}), and so there exists some (beta >0) such that, for (tin (0,beta )), such an expression is not smaller than (omega /bar{omega}), with ω as in (LEB) and ω̄ as in (LEB1). Without loss of generality, we assume that (beta le rho ), with ρ as in Assumption (LEB1). Since (t= Vert x-x^{*} Vert ), we have proved that, for (xin Ucap B(x^{*},beta )setminus L_{nu }=Ucap V), it holds that

$$frac{operatorname{dist}(x,S(x))}{operatorname{dist}(x,K)}ge frac{omega }{bar{omega}}.$$

(4.35)

It follows from (4.35) that

$$operatorname{dist}bigl(x,S(x)bigr)ge operatorname{dist}(x,K)frac{omega }{bar{omega}}$$

(4.36)

for all (xin Vcap U). Dividing both sides of (4.36) by (operatorname{dist}(x,Kcap U)), recalling that (beta le rho ), and invoking Assumption (LEB1), we obtain

$$frac{operatorname{dist}(x,S(x))}{operatorname{dist}(x,Kcap U)}ge frac{operatorname{dist}(x,K)}{operatorname{dist}(x,Kcap U)}frac{omega }{bar{omega}}ge bar{omega}frac{omega }{bar{omega}}=omega$$

for all (xin Ucap V), thus proving that (LEB) holds for any (omega <bar{omega}), with (V=B(x^{*},beta )setminus L_{nu }) and with ω̄ as in (LEB1). □

We have proved that for the prototypical separating operator given by (4.21), the result of Proposition 4.5 holds. In order to obtain the convergence rate result of Theorem 4.6 for this operator, we must prove that in this case the tail of the sequence ({x^{k}}_{kin mathbb{N}}) generated by CARM is contained in (V=B(x^{*},beta )setminus L_{nu }). Note that β depends on ν. Next we will show that if we take ν smaller than a certain constant which depends on (x^{*}), the initial iterate (x^{0}), the Slater point , and the parameter ω̄ of (LEB1), then the tail of the sequence ({x^{k}}_{kin mathbb{N}}) will remain outside (L_{nu }). Clearly, this will suffice, because the sequence eventually remains in any ball around its limit, which is (x^{*}), so that its tail will surely be contained in (B(x^{*},beta )). The fact that (x^{k}notin L_{nu }) for large enough k is a consequence of the Fejér monotonicity of the sequence with respect to (Kcap U), proved in Theorem 3.5. In the next proposition we will prove that indeed (x^{k}notin L_{nu }) for large enough k, and so the result of Theorem 4.6 holds for this separating operator.

### Proposition 4.8

Let (g:mathbb{R}^{n}to mathbb{R}) be convex, of class (mathcal{C}^{1}), and such that there exists (hat{x}in mathbb{R}^{n}) satisfying (g(hat{x})< 0). Take (K={xin mathbb{R}^{n}: g(x)le 0}). Assume that K, U satisfy (LEB1). Consider the separating operator S defined in (4.21). Let ({x^{k}}_{kin mathbb{N}}) be a sequence generated by (CARM) with starting point (x^{0}in U) and limit point (x^{*}in Kcap U). Take (nu >0) satisfying

$$nu < min biggl{ frac{bar{omega} vert g(hat{x}) vert }{4 ( Vert hat{x}-x^{*} Vert + Vert x^{*}-x^{0} Vert )}, frac{ Vert nabla g(x^{*}) Vert }{2} biggr} ,$$

(4.37)

with ω̄ as in (LEB1), and define

$$L_{nu }:=bigl{ zin mathbb{R}^{n}: biglvert nabla gbigl(x^{*}bigr)^{top }bigl(z-x^{*}bigr) bigrvert le nu biglVert z-x^{*} bigrVert bigr} .$$

Then there exists (k_{0}) such that, for all (kge k_{0}), (x_{k}in B(x^{*},beta )setminus L_{nu }), with β as in Proposition 4.7.

### Proof

Assume that (x^{k}in L_{nu }), i.e.,

$$biglvert nabla gbigl(x^{*} bigr)^{top }bigl(x^{k}-x^{*}bigr) bigrvert le nu biglVert x^{k}-x^{*} bigrVert .$$

(4.38)

Using the gradient inequality, the fact that (g(x^{*})=0), and (4.38), we obtain

begin{aligned} gbigl(x^{k}bigr) & le gbigl(x^{*}bigr)-nabla g bigl(x^{k}bigr)^{top }bigl(x^{*}-x^{k} bigr) \ & =bigl[nabla gbigl(x^{*}bigr)-nabla gbigl(x^{k} bigr) -nabla gbigl(x^{*}bigr)bigr]^{top } bigl(x^{*}-x^{k}bigr) \ & leq biglVert nabla gbigl(x^{*}bigr)-nabla g bigl(x^{k}bigr) bigrVert biglVert x^{*}-x^{k} bigrVert + biglvert nabla gbigl(x^{*}bigr)^{ top } bigl(x^{k}-x^{*}bigr) bigrvert \ & le bigl( biglVert nabla gbigl(x^{*}bigr)-nabla g bigl(x^{k}bigr) bigrVert + nu bigr) biglVert x^{k}-x^{*} bigrVert . end{aligned}

(4.39)

By Theorem 3.5, ({x^{k}}_{kin mathbb{N}}) is Fejér monotone with respect to (Kcap U). Thus, we use Proposition 2.3(iii) and (LEB1) in (4.39), obtaining

begin{aligned} gbigl(x^{k}bigr)& le 2 bigl( biglVert nabla g bigl(x^{*}bigr)-nabla gbigl(x^{k}bigr) bigrVert + nu bigr)operatorname{dist}bigl(x^{k},Kcap Ubigr) \ & le frac{2 ( Vert nabla g(x^{*})-nabla g(x^{k}) Vert +nu )operatorname{dist}(x^{k},K)}{bar{omega}}. end{aligned}

(4.40)

Denote (y^{k}=P_{K}(x^{k})). Using again the gradient inequality, together with the facts that (g(y^{k})=0) and that (x^{k}-y^{k}) and (nabla g(y^{k})) are collinear, which is a consequence of (4.23), and the nonnegativity of λ, we get from (4.40)

begin{aligned} gbigl(x^{k}bigr)& ge gbigl(y^{k}bigr)+nabla g bigl(y^{k}bigr)^{top }bigl(x^{k}-y^{k} bigr) \ &= biglVert nabla gbigl(y^{k}bigr) bigrVert biglVert x^{k}-y^{k} bigrVert = biglVert nabla g bigl(y^{k}bigr) bigrVert operatorname{dist}bigl(x^{k},Kbigr). end{aligned}

(4.41)

Now we use the Slater assumption on g for finding a lower bound for (Vert nabla g(y^{k}) Vert ). Take such that (g(hat{x}) <0), and apply once again the gradient inequality.

$$g(hat{x})ge gbigl(y^{k}bigr)+nabla g bigl(y^{k}bigr)^{top }bigl(hat{x}-y^{k} bigr)=nabla gbigl(y^{k}bigr)^{ top }bigl( hat{x}-y^{k}bigr)ge – biglVert nabla gbigl(y^{k} bigr) bigrVert biglVert hat{x}-y^{k} bigrVert .$$

(4.42)

Multiplying (4.42) by −1, we get

begin{aligned} biglvert g(hat{x}) bigrvert & le biglVert nabla g bigl(y^{k}bigr) bigrVert biglVert hat{x}-y^{k} bigrVert le biglVert nabla gbigl(y^{k}bigr) bigrVert bigl( biglVert hat{x}-x^{*} bigrVert + biglVert x^{*}-y^{k} bigrVert bigr) \ & le biglVert nabla gbigl(y^{k}bigr) bigrVert bigl( biglVert hat{x}-x^{*} bigrVert + biglVert x^{*}-x^{k} bigrVert bigr) \ &le biglVert nabla gbigl(y^{k}bigr) bigrVert bigl( biglVert hat{x}-x^{*} bigrVert + biglVert x^{*}-x^{0} bigrVert bigr), end{aligned}

(4.43)

using the facts that (y^{k}=P_{K}(x^{k})) and that (x^{*}in K) in the third inequality and the Féjer monotonicity of ({ x^{k}}_{kin mathbb{N}}) with respect to (Kcap U) in the fourth one. Now, since (lim_{kto infty } x^{k}=x^{*}), there exists (k_{1}) such that (Vert x^{k}-x^{*} Vert le rho ) for (kge k_{1}), with ρ as in (LEB1). So, in view of (4.43), with (kge k_{1}), (vert g(hat{x}) vert le Vert nabla g(y^{k}) Vert ( Vert hat{x}-x^{*} Vert + Vert x^{*}-x^{0} Vert )), implying that

$$biglVert nabla gbigl(y^{k}bigr) bigrVert ge frac{ vert g(hat{x}) vert }{ Vert hat{x}-x^{*} Vert + Vert x^{*}-x^{0} Vert }.$$

(4.44)

Combining (4.40), (4.41), (4.44), and (4.37), we obtain

$$2nu < frac{bar{omega} vert g(hat{x}) vert }{2 ( Vert hat{x}-x^{*} Vert + Vert x^{*}-x^{0} Vert )}le biglVert nabla gbigl(x^{k} bigr)-nabla gbigl(x^{*}bigr) bigrVert +nu ,$$

implying

$$nu < biglVert nabla gbigl(x^{k}bigr)-nabla gbigl(x^{*}bigr) bigrVert .$$

(4.45)

The inequality in (4.45) has been obtained by assuming that (x^{k}in L_{nu }). Now, since (lim_{kto infty }x^{k}=x^{*}) and g is of class (mathcal{C}^{1}), there exists (k_{0}ge k_{1}) such that (Vert nabla g(x^{*})-nabla g(x^{k}) Vert le nu ) for (kge k_{0}), and hence (4.45) implies that, for (kge k_{0}), (x^{k}notin L_{nu }). Since (k_{0}ge k_{1}), (x^{k}in B(x^{*},beta )) for (kge k_{0}), meaning that when (kge k_{0}), (x^{k}in B(x^{*},beta )setminus L_{nu }), establishing the result. □

Now we conclude the analysis of CARM with the prototypical separating operator, proving that under smoothness of g and a Slater condition, the CARM method achieves linear convergence with precisely the same bound for the asymptotic constant as CRM, thus showing that the approximation of (P_{K}) by (P^{S}) produces no deterioration in the convergence rate. We emphasize again that, for this operator S, (P_{S}) has an elementary closed formula, namely the one given by

$$P^{S}(x)= x- biggl(frac{max {0,g(x)}}{ Vert nabla g(x) Vert ^{2}} biggr)nabla g(x).$$

### Theorem 4.9

Let (g:mathbb{R}^{n}to mathbb{R}) be convex, of class (mathcal{C}^{1}), and such that there exists (hat{x}in mathbb{R}^{n}) satisfying (g(hat{x})< 0). Take (K={xin mathbb{R}^{n}: g(x)le 0}). Assume that K, U satisfy (LEB1). Consider the separating operator S defined in (4.21). Let ({x^{k}}_{kin mathbb{N}}) be a sequence generated by CARM with the starting point (x^{0}in U). Then ({x^{k}}_{kin mathbb{N}}) converges to some (x^{*}in Kcap U) with linear convergence rate and asymptotic constant bounded above by (sqrt{{(1-bar{omega}^{2})}/{(1+bar{omega}^{2})}}), with ω̄ as in (LEB1).

### Proof

The fact that ({x^{k}}_{kin mathbb{N}}) converges to some (x^{*}in Kcap 1) follows from Theorem 3.5. Let ω̄ be the parameter in (LEB1). By Proposition 4.7, P, K, and S satisfy (LEB) with any parameter (omega le bar{omega}) and a suitable V. By Proposition 4.8, (x^{k}in V) for large enough k, so that the assumptions of Theorem 4.6 hold, and hence

$$limsup_{kto infty } frac{ Vert x^{k+1}-x^{*} Vert }{ Vert x^{k}-x^{*} Vert } le sqrt{frac{1-omega ^{2}}{1+omega ^{2}}}$$

(4.46)

for any (omega le bar{omega}). Taking infimum in the right-hand side of (4.46) with (omega <bar{omega}), we conclude that the inequality holds also for ω̄, i.e.,

$$limsup_{kto infty }frac{ Vert x^{k+1}-x^{*} Vert }{ Vert x^{k}-x^{*} Vert }le sqrt{ frac{1-bar{omega}^{2}}{1+bar{omega}^{2}}},$$

completing the proof. □

We mention that the results of Propositions 4.7 and 4.8 and Theorem 4.9 can be extended without any complications to the separating operator S in Example 2.7, so that they can be applied for accelerating SiPM for CFP with m convex sets, presented as 0-sublevel sets of smooth convex functions. We omit the details.

Let us continue with a comment on the additional assumptions on g used for proving Theorem 4.9, namely continuous differentiability and the Slater condition. We guess that the second one is indeed needed for the validity of the result; regarding smoothness of g, we conjecture that the CARM sequence still converges linearly under (LEB) when g is not smooth, but with an asymptotic constant possibly larger than the one for CRM. It seems clear that the proof of such a result requires techniques quite different from those used here.

Finally, we address the issue of the extension of the results in this paper to the framework of infinite dimensional Hilbert spaces. We have refrained from developing our analysis in such a framework because our main focus lies in the extension of the convergence rate results for the exact algorithms presented in [9] to the approximate methods introduced in this paper, so that in order to establish the appropriate comparisons between the exact and approximate methods one should start by rewriting the results of [9] in the context of Hilbert spaces, which would unduly extend the length of this paper. We just comment that it is possible to attain such an aim following the approach presented in [11, 12].