The weak and strong laws of large numbers

Jordan Bell
May 29, 2015

1 Introduction

Using Egorov’s theorem, one proves that if a sequence of random variables Xn converges almost surely to a random variable X then Xn converges in probability to X.11 1 V. I. Bogachev, Measure Theory, volume I, p. 111, Theorem 2.2.3; http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 3, Theorem 3.

Let Xn be a sequence of L1 random variables, n1. A weak law of large numbers is a statement that

1nk=1n(Xk-E(Xk)) (1)

converges in probability to 0. A strong law of large numbers is a statement that (1) converges almost surely to 0. Thus, if the hypotheses assumed on the sequence of random variables are the same, a strong law implies a weak law.

We shall prove the weak law of large numbers for a sequence of independent identically distributed L1 random variables, and the strong law of large for the same hypotheses. We give separate proofs for these theorems as an occasion to inspect different machinery, although to establish the weak law it thus suffices to prove the strong law. One reason to distinguish these laws is for cases when we impose different hypotheses.22 2 cf. Jordan M. Stoyanov, Counterexamples in Probability, third ed., p. 163, §15.3; Dieter Landers and Lothar Rogge, Identically distributed uncorrelated random variables not fulfilling the WLLN, Bull. Korean Math. Soc. 38 (2001), no. 3, 605–610.

We also prove Markov’s weak law of large numbers, which states that if Xn is a sequence of L2 random variables that are pairwise uncorrelated and

1n2n=knVar(Xk)0,

then 1nk=1n(Xk-E(Xk)) converges to 0 in L2, from which it follows using Chebyshev’s inequality that it converges in probability to 0. (We remark that a sequence of L2 random variables converging in L2 to 0 does not imply that it converges almost surely to 0, although there is indeed a subsequence that converges almost surely to 0.33 3 http://individual.utoronto.ca/jordanbell/notes/L0.pdf, p. 4, Theorem 5.)

If (Ω,,P) is a probability space, (Y,𝒜) is a measurable space, and T:(Ω,)(Y,𝒜) is measurable, the pushforward measure of P by T is

(T*P)(A)=P(T-1(A)),A𝒜.

Then (Y,𝒜,T*P) is a probability space. We remind ourselves of the change of variables theorem, which we shall use.44 4 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 485, Theorem 13.46. Let f:Y be a function. On the one hand, if fL1(T*P) then fTL1(P) and

Yfd(T*P)=ΩfT𝑑P. (2)

On the other hand, if f is T*P-measurable and fTL1(P), then fL1(T*P) and

Yfd(T*P)=ΩfT𝑑P.

2 The weak law of large numbers

Suppose that Xn:(Ω,,P), n1, are independent identically distributed L1 random variables, and write

Sn=k=1nXk,

for which E(Sn)=k=1nE(Xk)=nE(X1).

Lemma 1.

If Xn are L2, then for any λ>0 and for any n1,

P(|Snn-E(X1)|λ)E(|X1-E(X1)|2)nλ2.
Proof.

Using

Sn2=k=1nXk2+jkXjXk,

we have

E(|Snn-E(X1)|2) =E(k=1nXk2n2+jkXjXkn2-2E(X1)Snn+E(X1)2)
=k=1nE(Xk2)n2+jkE(XjXk)n2-2E(X1)2+E(X1)2
=k=1nE(X12)n2+jkE(Xj)E(Xk)n2-E(X1)2
=k=1nE(X12)n2+jkE(X1)2n2+E(X1)2
=E(X12)n+E(X1)2n.

On the other hand,

E(|X1-E(X1)|2)=E(X12-2E(X1)X1+E(X1)2)=E(X12)+E(X1)2.

So

E(|Snn-E(X1)|2)=1nE(|X1-E(X1)|2).

Using this and Chebyshev’s inequality,

P(|Snn-E(X1)|λ) 1λ2E(|Snn-E(X1)|2)
=1nλ2E(|X1-E(X1)|2),

proving the claim. ∎

We now prove the weak law of large numbers, which states that if Xn are independent identically distributed L1 random variables each with mean 0, then Snn converges in probability to E(X1). Zn=Xn-E(Xn)=Xn-E(X1) are independent and identically distributed L1 random variables with mean 0, and if Zn converges in probability to 0 then Xn=Zn+E(X1) converges in probability to E(X1), showing that it suffices to prove the theorem when E(X1)=0.55 5 Allan Gut, Probability: A Graduate Course, second ed., p. 270, Theorem 6.3.1, and p. 121, Theorem 3.1.5.

Theorem 2 (Weak law of large numbers).

Suppose that E(X1)=0. For each ϵ>0,

P(|Snn|ϵ)0,n.
Proof.

For n1 and 1kn, let

Yn,k=1{|Xk|nϵ3}Xk=fXk,

where f(x)=1[-nϵ3,nϵ3](x)x. Because X1,,Xk are independent and identically distributed, so are Yn,1,,Yn,n.66 6 Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed., p. 316, Proposition 10.2. Moreover, E(Yn,k2)(nϵ3)2, so each Yn,k belongs to L2. Let

Tn=k=1nYn,k,

for which E(Tn)=nE(Yn,1).

If ωk=1n{|Xk|nϵ3}, then

Tn(ω)=k=1nYn,k(ω)=k=1nXk(ω)=Sn(ω),

and using this and Lemma 1, for t>0 we have

P(|Sn-E(Tn)|t) =P({|Sn-E(Tn)|t}k=1n{|Xk|nϵ3})
+P({|Sn-E(Tn)|t}k=1n{|Xk|>nϵ3})
P(|Tn-E(Tn)|t)+P(k=1n{|Xk|>nϵ3})
=P(|Tnn-E(Yn,1)|tn)+P(k=1n{|Xk|>nϵ3})
E(|Yn,1-E(Yn,1)|2)n(tn)2+k=1nP(|Xk|>nϵ3)
=nt2(E(Yn,12)-E(Yn,1)2)+nP(|X1|>nϵ3)
nt2E(Yn,12)+nP(|X1|>nϵ3).

For t=nϵ this is

P(|Sn-E(Tn)|nϵ) 1nϵ2E(Yn,12)+nP(|X1|>nϵ3)
=1nϵ2E(1{|X1|nϵ3}X12)+nP(|X1|>nϵ3)
1nϵ2E(1{|X1|nϵ3}nϵ3|X1|)+nP(|X1|>nϵ3)
ϵE(|X1|)+nP(|X1|>nϵ3).

Now,

nϵ3P(|X1|>nϵ3)=nϵ3(nϵ3,)d(X1*P)(y)(nϵ3,)yd(X1*P)(y),

which tends to 0 as n because

|y|d(X1*P)(y)=E(|X1|)<.

Therefore,

lim supnP(|Sn-E(Tn)|nϵ)ϵE(|X1|),

that is,

lim supnP(|Sn-E(Tn)n|ϵ)ϵE(|X1|),

which implies that Sn-E(Tn)n converges in probability to 0.

Because E(X1)=0,

E(1{|X1|nϵ3}X1)+E(1{|X1|>nϵ3}X1)=0,

and hence

|E(Tn)|=|nE(Yn,1)|=n|E(1{|X1|nϵ3}X1)|=n|E(1{|X1|>nϵ3}X1)|,

thus

|E(Tn)|n=|E(1{|X1|>nϵ3}X1)|E(1{|X1|>nϵ3}|X1|),

which tends to 0 as n, because E(|X1|)<. Thus E(Tn)n converges in probability to 0, and therefore Snn converges in probability to 0, completing the proof. ∎

Lemma 3 (Bienaymé’s formula).

If Xn, n1, are L2 random variables that are pairwise uncorrelated, then

Var(k=1nXk)=k=1nVar(Xk).
Proof.

Let Yk=Xk-E(Xk). Using that the Xk are pairwise uncorrelated, for jk we have

E(YjYk) =E(XjXk-XjE(Xk)-XkE(Xj)+E(Xj)E(Xk))
=E(Xj)E(Xk)-E(Xj)E(Xk)-E(Xk)E(Xj)+E(Xj)E(Xk)
=0,

showing that the Yk are pairwise uncorrelated. Then, because E(Sn)=k=1nE(Xk),

Var(Sn) =E((Sn-k=1nE(Xk))2)
=E((k=1nYk)2)
=E(k=1nYk2+jkYjYk)
=k=1nE(Yk2)+jkE(Yj)E(Yk)
=k=1nE(Yk2),

and as E(Yk2)=Var(Xk), we have

Var(Sn)=k=1nVar(Xk).

We now prove a weak law of large numbers, that is sometimes attributed to Markov, that neither supposes that the random variables are independent nor supposes that they are identically distributed. We remind ourselves that if a sequence Yn of L2 random variables converges to 0 in L2 then it converges in probability to 0; this is proved using Chebyshev’s inequality.

Theorem 4 (Markov’s weak law of large numbers).

If Xn, n1, are L2 random variables that are pairwise uncorrelated and which satisfy

1n2n=knVar(Xk)0,

then 1nk=1n(Xk-E(Xk)) converges to 0 in L2 and thus in probability.

Proof.

Because E(k=1nXk)=k=1nE(Xk) and Var(X)=E((X-E(X))2), using Bienaymé’s formula we get

E((1nk=1n(Xk-E(Xk)))2) =1n2E((k=1nXk-k=1nE(Xk))2)
=1n2Var(k=1nXk)
=1n2k=1nVar(Xk).

Thus

E((1nk=1n(Xk-E(Xk)))2)=1n2k=1nVar(Xk)0

as n, namely, 1nk=1n(Xk-E(Xk)) converges to 0 in L2 as n, proving the claim. ∎

3 Ergodic theory

We here assemble machinery and results that we will use to prove the strong law of large numbers. For a probability space (Ω,,P), a function T:ΩΩ is said to be a measure-preserving transformation if (i) it is measurable, and (ii) T*P=P. To say that T*P=P means that for any A, (T*P)(A)=P(A), i.e. P(T-1(A))=P(A).

A collection 𝒜 of subsets of Ω is called an algebra of sets if (i) 𝒜, (ii) Ω𝒜, (iii) if A,B𝒜 then AB,AB𝒜. If 𝒢 is a nonempty collection of subsets of Ω, it is a fact that there is a unique algebra of sets A(𝒢) that (i) contains 𝒢 and (ii) is contained in any algebra of sets that contains 𝒢. We call A(𝒢) the algebra of sets generated by G.

A collection 𝒮 of subsets of Ω is called a semialgebra of sets if (i) 𝒮, (ii) Ω𝒮, (iii) if A,B𝒮 then AB𝒮, (iv) if A,B𝒮 then there are pairwise disjoint E1,,En𝒮 such that

AB=k=1nEk.

It is a fact that A(𝒮) is equal to the collection of all unions of finitely many pairwise disjoint elements of 𝒮.77 7 V. I. Bogachev, Measure Theory, volume I, p. 8, Lemma 1.2.14.

A nonempty collection of subsets of Ω is called a monotone class if whenever An, AnAn+1 (an increasing sequence of sets), implies that nAn and An, An+1An (a decreasing sequence of sets), implies that nAn. In other words, a monotone class is a nonempty collection of subsets of Ω such that if An is a monotone sequence in then limnAn. If 𝒢 is a nonempty collection of subsets of Ω, it is a fact that there is a unique monotone class M(𝒢) that (i) contains 𝒢 and (ii) is contained in any monotone class that contains 𝒢. We call M(𝒢) the monotone class generated by G. The monotone class theorem states that if 𝒜 is an algebra of sets then σ(𝒜)=M(𝒜).88 8 V. I. Bogachev, Measure Theory, volume I, p. 33, Theorem 1.9.3.

The following lemma gives conditions under which we can establish that a function is measure-preserving.99 9 Peter Walters, An Introduction to Ergodic Theory, p. 20, Theorem 1.1.

Lemma 5.

Let T:ΩΩ be a function and suppose that S is a semialgebra of sets for which F is the σ-algebra generated by S. If (i) T-1(A)F for each AS and (ii) P(T-1(A))=P(A) for each AS, then T is measure-preserving.

Proof.

Let

𝒞={A:T-1(A),P(T-1(A))=P(A)};

we wish to prove that 𝒞=.

If An𝒞 is an increasing sequence, let A=n=1An. Then, as T-1(An) for each n,

T-1(A)=n=1T-1(An)

and as (i) T-1(An) is an increasing sequence, (ii) P is continuous from below,1010 10 V. I. Bogachev, Measure Theory, volume I, p. 9, Proposition 1.3.3. and (iii) P(T-1(An))=P(An),

P(T-1(A)) =P(n=1T-1(An))
=limnP(T-1(An))
=limnP(An)
=P(n=1An)
=P(A),

and hence A𝒞. If An𝒞 is a decreasing sequence, let A=n=1An. Because T-1(An),

T-1(A)=n=1T-1(An),

and as (i) T-1(An) is a decreasing sequence, (ii) P is continuous from above, and (iii) P(T-1(An))=P(An),

P(T-1(A)) =P(n=1T-1(An))
=limnP(T-1(An))
=limnP(An)
=P(n=1An)
=P(A),

and hence A𝒞. Therefore, 𝒞 is a monotone class.

𝒮𝒞. If AA(𝒮), then there are pairwise disjoint A1,,An𝒮 with A=k=1nAk. As T-1(Ak),

T-1(A)=k=1nT-1(Ak).

As the Ak are pairwise disjoint, so are the sets T-1(Ak), so, because P(T-1(Ak))=P(Ak),

P(T-1(A)) =P(k=1nT-1(Ak))
=k=1nP(T-1(Ak))
=k=1nP(Ak)
=P(k=1nAk)
=P(A).

Therefore A𝒞. This shows that A(𝒮)𝒞.

The monotone class theorem tells us σ(A(𝒮))=M(A(𝒮)). On the one hand, =σ(𝒮) and hence =σ(A(𝒮)). On the other hand, A(𝒮)𝒞 and the fact that 𝒞 is a monotone class yield

M(A(𝒮))M(𝒞)=𝒞.

Therefore

𝒞.

Of course 𝒞, so 𝒞=, proving the claim. ∎

Let (Ω,,P) be a probability space. A measure-preserving transformation T:ΩΩ is called ergodic if A and T-1(A)=A implies that P(A)=0 or P(A)=1. It is proved1111 11 Peter Walters, An Introduction to Ergodic Theory, p. 37, Corollary 1.14.2. using the Birkhoff ergodic theorem that for a measure-preserving transformation T:ΩΩ, T is ergodic if and only if for all A,B,

1nk=0n-1P(T-1(A)B)P(A)P(B).

A measure-preserving transformation T:ΩΩ is called weak-mixing if for all A,B,

1nk=0n-1|P(T-k(A)B)-P(A)P(B)|0.

It is immediate that a weak-mixing transformation is ergodic.

A measure-preserving transformation T:ΩΩ is called strong-mixing if for all A,B,

P(T-n(A)B)P(A)P(B).

If a sequence of real numbers an tends to 0, then

1nk=0n-1|ak|0,

and using this we check that a strong-mixing transformation is weak-mixing.

The following statement gives conditions under which a measure-preserving transformation is ergodic, weak-mixing, or strong-mixing.1212 12 Peter Walters, An Introduction to Ergodic Theory, p. 41, Theorem 1.17.

Theorem 6.

Let (Ω,F,P) be a probability space, let S be a semialgebra that generates F, and let T:ΩΩ be a measure-preserving transformation.

  1. 1.

    T is ergodic if and only if for all A,B𝒮,

    1nk=0n-1P(T-k(A)B)P(A)P(B).
  2. 2.

    T is weak-mixing if and only if for all A,B𝒮,

    1nk=0n-1|P(T-k(A)B)-P(A)P(B)|0.
  3. 3.

    T is strong-mixing if and only if for all A,B𝒮,

    P(T-n(A)B)P(A)P(B).

4 The strong law of large numbers

Let μ be a Borel probability measure on with finite first moment:

|x|𝑑m(x)<.

We shall specify when we use the hypothesis that μ has finite first moment; until we say so, what we say merely supposes that it is a Borel probability measure on .

For n0, let Ωn=, let n=, the Borel σ-algebra of , and let μn=μ, for which (Ωn,n,μn) is a probability space. Let Ω=n=0Ωn. A cylinder set is a subset of Ω of the form

n=0An,

where Ann for each n and where {n0:AnΩn} is finite. We denote by 𝒞 the collection of all cylinder sets. It is a fact that 𝒞 is a semialgebra of sets.1313 13 S. J. Taylor, Introduction to Measure Theory and Integration, p. 136, §6.1; http://individual.utoronto.ca/jordanbell/notes/productmeasure.pdf

The product σ-algebra is =σ(𝒞), the σ-algebra generated by the collection of all cylinder sets. The product measure,1414 14 http://individual.utoronto.ca/jordanbell/notes/productmeasure.pdf which we denote by P, is the unique probability measure on such that for any cylinder set n=0An,

P(n=0An)=n=0μn(An).

Let πn:ΩΩn, n0, be the projection map. Define τ:ΩΩ by

τ(ω0,ω1,)=(ω1,ω2,),

the left shift map. In other words, for n0, τ satisfies πnτ=πn+1, and so πn=π0(τn).

Lemma 7.

τ is measure-preserving.

Proof.

For A=n=0An𝒞,

τ-1(A)=n=0Bn=B,

where B0=Ω0 and for n1, Bn=An-1. B is a cylinder set so a fortiori belongs to , and

P(B)=n=0μn(Bn)=μ0(Ω0)n=1μn(Bn)=n=1μn(An-1)=n=1μn-1(An-1),

so

P(τ-1(A))=n=0μn(An)=P(A).

Therefore by Lemma 5, because 𝒞 is a semialgebra that generates , it follows that τ is measure-preserving. ∎

Lemma 8.

τ is strong-mixing.

Proof.

Let A=n=0An and B=n=0Bn be cylinder sets. For n0,

τ-n(A)=m=0Cm,

where Cm=Ωm for 0mn-1 and Cm=Am-n for mn. Because A and B are cylinder sets, there is some N such that when mN, Am=Ωm and Bm=Ωm. Thus for nN,

τ-n(A)B) =m=0Cmm=0Bm
=m=0(CmBm)
=m=0n-1(CmBm)×m=n(CmBm)
=m=0n-1(ΩmBm)×m=n(Am-nΩm)
=m=0n-1Bm×m=nAm-n.

Hence

P(τ-n(A)B) =m=0n-1μm(Bm)m=nμm(Am-n)
=m=0n-1μm(Bm)m=0μm+n(Am)
=m=0n-1μm(Bm)m=0μm(Am)
=P(B)P(A).

That is, there is some N such that when nN,

P(τ-n(A)B)=P(A)P(B),

and so a fortiori,

limnP(τ-n(A)B)=P(A)P(B).

Therefore, because the cylinder sets generate the σ-algebra , by Theorem 3 we get that τ is strong-mixing. ∎

We now use the hypothesis that μ has finite first moment.1515 15 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 208, chapter 5, §2.1.

Lemma 9.

π0L1(P).

Proof.

T=π0:ΩΩ0 is measurable, and T*P=μ0. The statement that μ=μ0 has finite first moment means that f:Ω0 defined by f(x)=|x| belongs to L1(μ0). Therefore by the change of variables theorem (2), we have fTL1(P). ∎

Lemma 10.

For almost all ωΩ,

1nk=0n-1πk(ω)t𝑑μ(t).
Proof.

Because τ:ΩΩ is ergodic and π0L1(P), the Birkhoff ergodic theorem1616 16 Peter Walters, An Introduction to Ergodic Theory, p. 34, Theorem 1.14. tells us that for almost all ωΩ,

1nk=0n-1π0(τk(ω))Ωπ0(ω)𝑑P(ω),

i.e.,

1nk=0n-1πk(ω)Ω0ω0𝑑μ0(ω0),

proving the claim. ∎

We will use Lemma 10 to prove the strong law of large numbers. First we prove two lemmas about joint distributions.1717 17 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 208, Lemma 2.2, Lemma 2.3.

Lemma 11.

If X1,,Xn:ΩR and Y1,,Yn:ΩR are random variables with the same joint distribution and for each 1kn, Φk:RnR is a continuous function, then for Uk=Φk(X1,,Xn) and Vk=Φk(Y1,,Yn), the random variables U1,,Un and V1,,Vn have the same joint distribution.

Proof.

Write X=(X1,,Xn), Y=(Y1,,Yn), U=(U1,,Un), and V=(V1,,Vn), which are Borel measurable. Let Φ=(Φ1,,Φn), which is continuous nn, and for which

U=Φ(X),V=Φ(Y).

To show that U1,,Un and V1,,Vn have the same joint distribution means to show that U*P=V*P. Let An, for which Φ-1(A)n and

(U*P)(A) =P(U-1(A))
=P(X-1(Φ-1(A)))
=P(Y-1(Φ-1(A)))
=P(V-1(A))
=(V*P)(A),

where, because X1,,Xn and Y1,,Yn have the same joint distribution,

P(X-1(Φ-1(A)))=(X*P)(Φ-1(A))=(Y*P)(Φ-1(A))=P(Y-1(Φ-1(A))).

Lemma 12.

If sequences of random variables Xn:(Ω,F,P)R and Yn:(Ω,F,P)R, n1, have the same finite-dimensional distributions and there is some aR such that Xn converges to a almost surely, then Yn converges to a almost surely.

Proof.

That the sequences Xn and Yn have the same finite-dimensional distributions means that for each n1,

(X1,,Xn)*P=(Y1,,Yn)*P,

i.e., for each n1 and for each An,

P((X1,,Xn)A)=P((Y1,,Yn)A).

Define

Ek,N,n ={ωΩ:|Xn(ω)-a|1k}
Fk,N,n ={ωΩ:|Yn(ω)-a|1k}.

Then define

E =k=1N=1n=NEk,N,n=k=1N=1Ek,N=k=1Ek
F =k=1N=1n=NFk,N,n=k=1N=1Fk,N=k=1Fk,

and

Gk,N,n =m=NnEk,N,m
Hk,N,n =m=NnFk,N,m.

Because (X1,,Xn) and (Y1,,Yn) have the same distribution,

P(Gk,N,n)=P(Hk,N,n).

But

Ek,N=n=NEk,N,n=n=NGk,N,n=limnGk,N,n

and

Fk,N=limnHk,N,n,

so

P(Ek,N)=P(Fk,N).

Then

P(Ek)=limNP(Ek,N)=limNP(Fk,N)=P(Fk).

Then

P(E)=limkP(Ek)=limkP(Fk)=P(F).

That the sequence Xn converges almost surely means that P(E)=1, and therefore P(F)=1, i.e. the sequence Yn converges almost surely. ∎

We now use what we have established to prove the strong law of large numbers.1818 18 Elias M. Stein and Rami Shakarchi, Functional Analysis, Princeton Lectures in Analysis, volume IV, p. 206, Theorem 2.1. (We write (Ω,,P) for a probability space because (Ω,,P) denotes the product probability space constructed already.)

Theorem 13.

If Xn:(Ω,F,P)R, n1, are independent identically distributed L1 random variables, then for almost all ωΩ,

1nk=1nXk(ω)E(X1).
Proof.

For n0, let Yn=Xn+1; we do this to make the index set the same as for Ωn. Let μn=Yn*P and set μ=μ0. Because the Yn are identically distributed, μn=μ0 for each n.

Because Y0 is L1, μ has finite first moment. As μ0=Y0*P, applying the change of variables theorem (2),

t𝑑μ(t)=td(Y0*P)(t)=ΩY0(ω)𝑑P(ω)=E(Y0).

With this, Lemma 10 says that for almost all ωΩ,

1nk=0n-1πk(ω)E(Y0). (3)

For each n,

(π0,,πn)*P=μ0××μn,

and because the Yn are independent,

(Y0,,Yn)*P=Y0*P××Yn*P=μ0××μn,

so the sequences πn and Yn have the same finite-dimensional distributions. For n1, define Φn:n by

Φn(t1,,tn)=1nk=1ntk.

Lemma 11 then tells us that the sequences

Un=Φn(Y0,,Yn-1)=1nk=0n-1Yk

and

Vn=Φn(π0,,πn-1)=1nk=0n-1πk,

n1, have the same finite-dimensional distributions.

Now, (3) says that Vn converges to E(Y0) almost surely, and thus applying Lemma 12 we get that Un converges to E(Y0) almost surely. That is,

1nk=0n-1YkE(Y0)

almost surely, and because Yk=Xk+1,

1nk=1nXkE(X1)

almost surely, completing the proof. ∎