Discrete harmonic functions: the probabilistic take on weak discrete harmonic principle for SSRW on integer lattices

In a previous blogpost I discussed a way of using results about random walks to solve the functional equation

\displaystyle f: \mathbf{Z}^2 \to [0,1] \text{    satisfying    }  f (x,y) = \frac{f(x-1,y) + f(x, y-1)}{2} ~~~~~~~ (1)

and we saw that the only solutions to (1) are the constant functions. My friend Drago posted an alternate solution to this problem on his weblog over here where he used the famous Krein-Milman theorem to solve not just the functional equation in (1) for bounded functions f : \mathbf{Z}^2 \to [0,1] but a wide range of similar functional equations for for positive-valued functions f : \mathbf{Z}^2 \to [0,1] as well; even for this simple case, finding a probabilistic solution is not an easy task, though there exists one.

Note: A part of the contents discussed on this blogpost was presented by me, as a class presentation in one of the classes of a course on martingale theory which I am studying this semester.

We begin with some definitions.

Notation and terminology 1:
Throughout this blogpost, unless mentioned otherwise, by Markov chains we would mean a discrete-time, time homogeneous Markov chain on a countable state space \mathcal{X}.
Despite the time points being discretely spaced, we would use t as the generic notation for them; the more discrete-sounding notation n will be used as a placeholder for parameters defining the state space or initial distribution, etc., of the chain.
Also we will use a generic notation P to denote the transition kernel of such Markov chains, i.e., P : \mathcal{X} \times \mathcal{X} \to [0,1] is a function which satisfies:
\displaystyle P(x,y) = \mathbf{P} \{ X_1 = y | X_0 = x \} = \mathbf{P} \{ X_{t+1} = y | X_t = x \} ~~~~~~~ (2)
for all t \ge 0 and for all states \forall x \in \mathcal{X} , \forall y \in \mathcal{X}.
Definition (Discrete harmonic functions):
A function h : \mathcal{X} \to \mathbf{R} is said to be harmonic for P at a state x \in \mathcal{X} if one has
\displaystyle h(x) = \sum_{y \in \mathcal{X}} h(y) P(x,y); ~~~~~~~ (3)
the function h is said to be harmonic for P if it is harmonic for P at all states x \in \mathcal{X}.
A function h : \mathcal{X} \to \mathbf{R} which is harmonic for some transition kernel P (at a point x \in \mathbf{X}) is sometimes called a discrete harmonic function for P (at point x),
where the word ”discrete” reminds us that we are in the discrete setting as the underlying state space \mathcal{X} is countable.

This is closely related to the notion of harmonic functions u: \mathbf{R}^d \to \mathbf{R} which are continuously twice differentiable functions that satisfy

\displaystyle \Delta u = 0 ~~~~~~~ (4)

where \Delta = \nabla ^2 = \frac{\partial ^2}{\partial x_1 ^2}+....+ \frac{\partial^2}{\partial x_d ^2} is the Laplace operator.

These continuous harmonic functions appear in complex analysis, often in disguise (thanks to the following result from complex analysis).

Proposition 2 (Harmonic functions \equiv real part of holomorphic functions)
A function u : \mathbf{R}^2 \to \mathbf{R} is a harmonic function if and only if there is a holomorphic function f : \mathbf{C} \to \mathbf{C} such that u (x,y) = \text{Re} ( f (x+iy)) for all (x,y) \in \mathbf{R}^2.

A celebrated result in complex analysis, namely the Liouville’s theorem can then be stated as follows:

Theorem 3 (Liouville’s theorem)
\bullet (i) Suppose f : \mathbf{C} \to \mathbf{C} is an entire function. Then f can be bounded (i.e., \sup_{z \in \mathbf{C}} | f(z) | < \infty if and only if f is a constant function.

\bullet (ii) The only bounded harmonic functions u : \mathbf{R}^2 \to \mathbf{R} are the constant functions.

Discrete harmonic functions are not totally alien to the other notion of harmonicity defined by (4). In fact if one replaces \Delta by its discrete or finitary version (often known as the discrete Laplacian-averaging operator or finite Laplacian) then (4) turns out to become equivalent to (3).

Getting motivated by Liouville’s theorem one may feel tempted to investigate whether bounded discrete harmonic functions for a transition kernel are always constant or not. The answer is no, in general.

Exercise 4 (thanks to Yuval Peres): Suppose \mathcal{T} is a d regular tree on a countably infinite set of vertices and let \rho be a vertex distinguished as the root of the tree. Suppose x is a neighbour of the root \rho; let F_x be the family of vertices spawned by x, that is to say that, F_x = \{ y \in \text{    vertex set of    } \mathcal{T} : \text{  there is a path between  } x , y \text{    which doesn't pass through the root  } \rho \text{   of this tree    } \}. Suppose (X_t) is the simple symmetric random walk on \mathcal{T}. Defne for all vertices z of this tree

f(z) = \mathbf{P} \{ X_t \in A \text{    for all but finitely many natural numbers    } t | X_0 = z \}.

Show that f is a non-constant, bounded (in fact [0,1] valued) harmonic function.

However, in the special case of SSRW on integer lattice \mathbf{Z}^d it is true that bounded harmonic functions must be constant. It is also true that one-side bounded harmonic functions for the SSRW on \mathbf{Z}^d are also constant.

In a more precise tone we have the following:

Theorem 5 (Harmonic principle for SSRW on \mathbf{Z}^d)
Suppose (X_t) is a simple symmetric random walk on \mathbf{Z}^d and suppose h : \mathbf{Z}^d \to \mathbf{R} be a harmonic function for this Markov chain. Then the following are true:

\bullet (i) (Weak discrete harmonic principle, Blackwell) If h is bounded then h is constant.

\bullet (ii) (Strong discrete harmonic principle) If h is bounded on one side (i.e., if there is some real number q such that q \le h(z) \forall z \in \mathbf{Z}^d or q \ge h(z) \forall z \in \mathbf{Z}^d) then h is constant.

Here I discuss two proofs of the weak harmonic principle (one using coupling of Markov chains and the other using the Hewitt-Savage 0-1 law). There is a third proof of the claim using the Krein-Milman theorem, but that would not be elaborated in this blogpost.

Note: when I initially planned to write this blogpost, I thought I would discuss the Krein-Milman based proof here itself; however owing to the significant difference of flavour between these two proofs, I eventually ended up deciding to put it on a different blogpost.

References: I learnt the coupling-based proof of the weak discrete harmonic principle from professor Yuval Peres; see here for more details.

Proof of Weak version of the discrete harmonic principle for SSRW on \mathbf{Z}^d (Using coupling):

Consider any two arbitrary points x , y \in \mathbf{Z}^d. Let two persons A, B start performing a random walk on \mathbf{Z}^d from points x, y respectively, as per the following rules: at each step draw an element i \in \{ 1,... , n \} uniformly at random; if at this time point the i th coordinate of the position of A is same as the i th coordinate of the position of B then with probability 1/2 we leave both the random walkers’ position unchanged and with the remaining probability 1/2 we move both the walkers together in the direction i; otherwise if the i th coordinates of the positions of the two random walkers at this point are different then pick one of the two walkers at random and let the chosen walker move in the direction i, choosing one the two possible transitions in the i th direction, uniformly at random.

Notice that, if (X_t) and (Y_t) are the random walks traced by A, B when they move as per the above rules, then (X_t) and (Y_t) are lazy simple symmetric random walks on \mathbf{Z}^d, and if for some $t \ge 0$ and j \in \{1,...,d\} the j th coordinates of X_t and Y_t are equal then these two sequences agree on their j th coordinates at all time points t ' \ge t.

Now notice that for any j \in \{1,....,d \} the sequence of difference between the j th coordinate of (X_t) and the j th coordinate of (Y_t) is a lazy simple symmetric random walk on \mathbf{Z} with 0 as an absorbing state whence it follows that \mathbf{P} \{ X_t \ne Y_t \} \to 0 as t \to \infty.

To prove the claim it suffices to notice that:

|h(x) - h(y)| = | \mathbf{E} [ h (X_t) - h(Y_t) | X_0 = x , Y_0 = y] | \le 2 \sup \{ h(z) : z \in \mathbf{Z}^d \} \cdot \mathbf{P} \{ X_t \ne Y_t \} \to 0 as t \to \infty.

Thus |h(x) - h(y)| < \varepsilon for all \varepsilon >0 whence h(x) = h(y); since x,y were arbitrarily chosen elements of \mathbf{Z}^d it follows that h (x) = h(y) for all x , y \in \mathbf{Z}^d. \square

Now we discuss a proof of this result using Hewitt-Savage 0-1 law (for permutation invariant combinations of iid random variables).

Proof of Weak version of the discrete harmonic principle for SSRW on \mathbf{Z}^d (Using Hewitt-Savage 0-1 law)

Named after Edwin Hewitt and Leonard Savage, the Hewitt-Savage 0-1 law can be stated as follows:

Lemma 6 (Hewitt-Savage 0-1 law, iid version)
If X_1 , X_2 ,.... are iid copies of a random variable X taking values in \mathbf{R}^n then any event in the exchangeable (or finite-permutation invariant) \sigma algebra \mathcal{E} (X_1 , X_2 , ......) has probability either one or zero.

Exercise 7: Suppose D_1 , D_2 , ..... are iid copies of a \mathbf{R}^n valued random variable D with zero mean. Suppose f : \mathbf{R}^n \to \mathbf{R} be a bounded function such that \mathbf{E} [ f( D_1 + .... + D_t) ] exists and is equal to \mathbf{E} [ f(D_1)] =: \mu for all t \ge 0. Show that \mathbf{P} \{ \lim_{t \to \infty} f (D_1 + ..... + D_t ) = \mu \} = 1.

Let D_1 , D_2 , .... be a sequence of iid random variables having law uniform over \{ e_{1;d} , -e_{1;d} , ..... , e_{d;d} , -e_{d;d} \} where as usual e_{j;d} is the d vector with the j th coordinate one and all other coordinates zero.

Thanks to Exercise 7 it follows that \mathbf{P} \{ \lim_{t \to \infty} h ( z + D_1 + D_2 + ..... + D_t ) = h(z) \} = 1, for all z \in \mathbf{R}^d.

So it suffices to show that the limit \lim_{t \to \infty} h (z + D_1 + D_2 + .... + D_t ) does not depend on z.

But this is true, because for any z_1 , z_2 \in \mathbf{Z}^d a SSRW on \mathbf{Z}^d originating at z_1 and a SSRW on \mathbf{Z}^d originating at z_2 intersect with a strictly positive probability, hence

\lim_{t \to \infty} h (z_1 + D_1 + ..... + D_t) = \lim_{t \to \infty} h(z_2 + D_1 + ..... + D_t) with a strictly positive probability; but since we also established that these limits must be deterministic and equal to h(z_1), h(z_2) respectively, it follows that h(z_1) = h(z_2), and we are done. \square

Interestingly enough, proving the strong discrete harmonic principle for SSRW on \mathbf{Z}^d is significantly simple when d \in \{1,2\} and thanks to Polya’s theorem this is precisely the case when the SSRW on \mathbf{Z}^d is recurrent; recurrence of the SSRW on the one and two dimensional integer lattice makes it significantly easier to establish the claim in this special situation.

Proof of Strong version of the discrete harmonic principle for SSRW on \mathbf{Z}^d when d \in \{1,2\} :

Suppose that d \in \{1,2\}. It is known that the SSRW on \mathbf{Z}^d in this case would be recurrent, and hence it would visit every state \in \mathbf{Z}^d infinitely often with probability one, in this case.

Let (X_t) be the SSRW on \mathbf{Z}^d, (d is a member of the set \{1,2\}) and suppose h : \mathbf{Z}^d \to \mathbf{R} is a lower bounded harmonic function for this Markov chain; we may assume (by scaling and shifting, as necessary) that h \ge 0.

Now clearly the sequence (h(X_t))_{t \in \mathbf{N} \cup \{0\} } is a non-negative martingale with respect to its own natural filtration; we know that non-negative martingales converge almost surely, it follows that (h(X_t)) converges almost surely as t \to \infty; but since (X_t) is recurrent (thanks to Polya’s theorem), for any x \in \mathbf{Z}^d the sequence (h(X_t)) has infinitely many terms equal to h(x), with probability one, but then one must have h(y) = h(z) for all y , z \in \mathbf{R}^d, and this proves the claim. \square

#harmonic-functions, #hewitt-savage-0-1-law, #liouvilles-theorem, #markov-chains, #martingales, #random-walks, #yuval-peres

245B notes 4: The Stone and Loomis-Sikorski representation theorems (optional)

What's new

A (concrete) Boolean algebra is a pair $latex (X, {mathcal B})$, where X is a set, and $latex {mathcal B}$ is a collection of subsets of X which contain the empty set $latex emptyset$, and which is closed under unions $latex A, B mapsto A cup B$, intersections $latex A, B mapsto A cap B$, and complements $latex A mapsto A^c := X backslash A$. The subset relation $latex subset$ also gives a relation on $latex {mathcal B}$. Because the $latex {mathcal B}$ is concretely represented as subsets of a space X, these relations automatically obey various axioms, in particular, for any $latex A,B,C in {mathcal B}$, we have:

  1. $latex subset$ is a partial ordering on $latex {mathcal B}$, and A and B have join $latex A cup B$ and meet $latex A cap B$.
  2. We have the distributive laws $latex A cup (B cap C) = (A cup B)…

View original post 3,325 more words

Equivalence of norms on finite dimensional vector spaces

As the title suggests, this blogpost is concerned about showing that any two norms on a finite dimensional vector space over \mathbf{R} or \mathbf{C} are equivalent.

We begin by recalling the notion of equivalence of norms.

Definition:
Let X be a linear space over the field \mathbf{C} of complex numbers (or the real numbers \mathbf{R}). Two norms || \cdot ||_1 and || \cdot || _2 on X are said to be equivalent if there exists positive real numbers c > 0 , C >0 such that
\displaystyle c || x ||_1 \le ||x||_2 \le C ||x||_1 ~~~~~~~ (1)
for all x \in X.

Exercise 1: Show that the relation of norms being equivalent is an equivalence relation.

Exercise 2: Let X be a \mathbf{C} linear space and let || \cdot ||_1 and || \cdot ||_2 be two norms on $latex $X.$ Show that the following are equivalent:

\bullet ( i ) || \cdot ||_1 and || \cdot ||_2 are equivalent;

\bullet (ii) for any sequence (x_n) of elements of X, it is true that ||x_n||_1 converges to 0 as n \to \infty if and only if ||x_n||_2 converges to 0 as n \to \infty;

\bullet (iii) for any sequence (x_n) of elements of X, it is true that (x_n) is convergent with respect to || \cdot ||_1 if and only if (x_n) is convergent with respect to || \cdot ||_2.

It is a well known result in mathematics that any two norms on a finite dimensional \mathbf{R} or \mathbf{C} vector space are equivalent.

Thanks to the fact that, any finite dimensional linear space over \mathbf{R} (or \mathbf{C}) is isomorphic to \mathbf{R}^n (respectively, \mathbf{C}^n) it suffices to prove the claim, assuming the underlying linear space to be \mathbf{R}^n (resp. \mathbf{C}^n). Then the standard proof of this result proceeds by showing that any norm on \mathbf{R}^n is equivalent to the supremum norm (or the uniform norm) || \cdot ||_{\infty}; one direction of the bound in (1) is easy to establish; while to establish the other part one restricts attention to the unit sphere of radius one, with respect to the supremum norm, and using a compactness argument (its essentially same as proving compactness of a finite direct product of compact sets).

Here we discuss a new proof of this fact, using induction on the dimension of the space. For concretness, we take the real numbers \mathbf{R} as the underlying field of scalars; the proof of the statement for the case when the underlying field of scalars are the complex numbers is analogous, and filling up the details will be left as an exercise for the readers.

More precisely we prove the following:

Proposition 3:
Let X be a finite dimensional \mathbf{R} linear space over the reals \mathbf{R}. Then any two norms || \cdot ||_1 and || \cdot ||_2 on X are equivalent.

Spoof: We first discuss an “almost proof” of the result, and then repair it to obtain an actual proof.

Take a non-zero vector u \in \mathbf{R}^{m+1}; we know that \mathbf{R}^{m+1} can be written as a direct sum as:

\displaystyle \mathbf{R}^{m+1} = \mathbf{R} u \oplus W ~~~~~~~ (2)

where W is a m dimensional subspace of \mathbf{R}^{m+1},

thus every vector z \in \mathbf{R}^{m+1} can be written as

\displaystyle z = \lambda_z u + w_z ~~~~~~~ (3)

where \lambda_z \in \mathbf{R} and w_z \in W;

furthermore these constants \lambda_z and w_z are unique.

Thanks to Exercise 2, it suffices to show that if a sequence (z_n) of elements of \mathbf{R}^{m+1} converges to 0 with respect to || \cdot ||_1 then it also converges to 0 with respect to || \cdot ||_2 and vice versa.

So let (z_n) be a sequence of elements of \mathbf{R}^{m+1} with || z_n ||_1 \to 0 as n \to \infty.

By appealing to (3) if \lambda_n = \lambda_{z_n} and w_n = w_{z_n} for all natural numbers n then one has

|| \lambda_n u + w_n ||_1 \to 0.

Suppose $(\lambda_n)$ does not converge to 0 in the usual Euclidean norm on \mathbf{R}. Then if it were absolutely bounded, then by picking a monotone subsequence and by picking a further subsequence of (w_n) for which we have coordinate-by-coordinate convergence in the usual one-dimensional Euclidean norm and then passing on to a common subsequence we can assume without loss of generality that \lambda_n \to \lambda_{\infty} \in \mathbf{R} \setminus \{ 0 \}; and w_n \to w_{\infty} \in W.

But then by the claim for lower dimensions one has || \lambda_n u - \lambda_{\infty} u ||_1 \to 0, and ||w_n - w_{\infty}||_1 \to 0 which in turn implies by the upper and lower triangle inequalities that

|| (\lambda_n - \lambda_{\infty} )u + (w_n - w_{\infty}) ||_1 \to 0 as n \to \infty, whence by uniqueness of limit one must have 0 = \lambda_{\infty} u + w_{\infty} but this is not possible since 0 admits the unique representation as 0 = 0 u  + 0 in spirit of (3), whereas \lambda_{\infty} \ne 0, a contradiction.

Now one may become ambitious and try to start with convergence to zero in the Euclidean norm, and try to prove covergence with respect to a given norm. However, if we retrace the argument above, we see that the argument hinges crucially on the availability of a monotone subsequence which is in principle a very special property enjoyed by real numbers. For an arbitrary norm, this might fail since we have no assurance that convergence as a vector would hold if and only if one has coordinate-wise convergence; this claim is true and follows from Proposition 3, but proving this for an arbitrary norm, without using Proposition 3, is no easier than proving Proposition 3, itself.

So we are stuck!

If we analyze our proof carefully, we see that there is another thing which we implicitly assumed; namely that the sequence (\lambda_n) is absolutely bounded. This needs a proof, as well!

Another thing we notice is that: we have not made the two components talk to each other, in the above arguments. This is the key which when properly used leads us to an actual proof of Proposition 3, as discussed below.

Proof: Let X be a finite dimensional linear space over the reals; without losing any generality we may assume X = \mathbf{R}^n for some natural number n. Thanks to Exercise 1, it suffices to show that any norm || \cdot || on X is equivalent to the Euclidean norm on \mathbf{R}^n.

We split our proof into two parts:

\circ Step 1: we establish the lower bound; namely that there exists a positive real number c >0 such that

\displaystyle c ||z||_{\ell^2} \le ||z|| ~~~~~~~ (4)

for all z \in \mathbf{R}^n; and

\circ Step 2: we establish the upper bound; namely that there exists a positive real number C > 0 such that

\displaystyle ||z|| \le C ||z||_{\ell^2} ~~~~~~~ (5)

for all z \in \mathbf{R}^n.

Step 1: (establishing the lower bound)

We induce on the dimension n of the (finite-dimensional) linear space.

The base case n=1 is trivial (see Exercise 4).

Exercise 4: Prove Proposition 3 when the space X has dimension 1 as a vector space over the reals.

Suppose m is a natural number such that the claim holds for all natural numbers n \le m; we would prove that the claim holds for $n = m+1$ as well.

We prove an auxiliary result first:

Claim 5: Let M (x) = \inf_{y \in \mathbf{R}} || (x , y) || for all x \in \mathbf{R}^m where (x,y) is the (m+1) vector obtained by appending the real number y as the (m+1) th coordinate to the m vector x. Then M is a norm.

Proof of Claim 5: The only non-elementary step is that M(x) = 0 if and only if x = 0; it is easy to verify the other properties of M (see Exercise 6).

Suppose M(x) = 0 for some x \in \mathbf{R}^m.

Then there exists a sequence of reals (y_n) such that || (x , y_n) || \to 0 as n \to \infty.

Now we have for all r \in \mathbf{R},

||(x , r )|| \ge |r| || (0 , 1)|| - ||(x,0)||;

thus (since x is fixed, and only y_n s vary with n) the sequence (y_n) should be bounded; thus passing on to a monotone subsequence we can assume without loss of generality that y_n \to a \in \mathbf{R} as n \to \infty;

but then

0 \le || (x , a) || \le || (x , y_n) || + |a-y_n| \cdot || (0 , 1 )||

and since both the terms on the right hand side of the above line of display converge to 0 as n \to \infty we conclude that || (x , a )|| = 0, but since || \cdot || is a norm, it follows that x = 0.

Exercise 6: Verify that M satisfies the other properties of a norm.

This completes the proof of Claim 5. \Delta

Thanks to all this, and our induction hypothesis, we thus have that

for all x \in \mathbf{R}^m and y \in \mathbf{R}

||( x , y ) || \ge c_{m+1} || x ||_{\ell^2} for some absolute constant c_{m+1} > 0;

by working with all the possible (m+1) many m vectors obtained by elimination of one coordinate from the vector (x,y) \in \mathbf{R}^{m+1} we conclude that:

there exists constants c_1 >0 , .... , c_{m+1} > 0 such that for all z \in \mathbf{R}^{m+1} and for all j \in \{1 , .... , m+1 \} one has

\displaystyle || z || \ge c_j || z^{(j)} ||_{\ell^2} where z^{(j)} is obtained from z by eliminating its j th coordinate.

Finally notice that ||z||_{\ell^2} \le \sum_{j=1}^{m+1} ||z^{(j)}||_{\ell ^2}.

Putting all this together it follows that for c = \frac{1}{m+1} \min \{ c_1 , ..... , c_{m+1} \} one has

c || z ||_{\ell^2} \le ||z||, which completes Step 1 of the proof.

Step 2: (establishing an upper bound)

This is fairly straightforward, hence will be left as an exercise; see Exercise 7.

Exercise 7: Prove that there exists some C >0 such that one has for all z \in \mathbf{R}^{m+1} the bound

||z|| \le C ||z||_{\ell^2}.

(Hint: write a vector (z_1 , .... , z_{m+1}) \in \mathbf{R}^{m+1} as (z_1 , .... , z_{m+1}) = (z_1 , 0,....,0) + .... + ( 0 , 0 ,..... , z_{m+1}) and observe that the summands on the right hand side are in span of the standard vectors which are one dimensional subspaces, thus using the result for one dimensional subspaces, conclude the upper bound.)

Putting all this together, completes the proof. \square

#easy-beautiful-proofs, #normed-spaces

The weak and strong laws of large numbers: notes 0

In a previous blogpost we discussed about the notions of convergence of random variables in probability, almost surely and in distribution.

We begin by recalling Kolmogorov zero-one law.

Proposition 1 (Kolmogorov zero-one law)
Let X_1 , X_2 , .... be a sequence of (jointly) independent random variables. Then every event E in the tail \sigma algebra \mathcal{T} has probability equal to either 0 or 1.

An immediate corollary of the zero-one law is that any real scalar tail random variable Y will be deterministic because for every t \in \mathbf{Q} the event Y \ge t has probability either zero or one.

Example 2 Let (X_n) be a sequence of jointly independent random variables. If \limsup _{n \to \infty} X_n exists then it must be deterministic, similarly if \liminf_{n \to \infty} X_n exists or \lim_{n \to \infty} exists then these quantities must be almost surely constants.

Example 3 Suppose (X_n) is a sequence of jointly independent random variables with mean zero and equal finite non-zero variance. Then notice that for any sequence (j_n) of natural numbers going to infinity in n the sequence ( \frac{X_1 + .... + X_{j_n}}{\sqrt{j_n}}) cannot converge in almost sure sense or in the sense of convergence in probability. Indeed if it converged then passing on to a further subsequence we can assume almost sure convergence, but then since this is a sequence of tail random variables so Kolmogorov’s zero-one law would imply that the limiting random variable is almost surely a constant, however since all the X_k s have equal finite non-zero variance \sigma^2 (say), and as they are jointly independent so \frac{X_1 + .... + X_{j_n}}{\sqrt{j_n}} would have variance \sigma^2 \ne 0 for every natural number n which implies that the limiting random variable would also have variance \sigma^2 which gives us a contradiction, thus proving the claim.

Example 4 The argument given in Example 3 can be uplifted to see that if (X_n) is a sequence of jointly independent random variables having zero mean, and finite non-zero variance then if \delta \in [0 , 1/2] then the sequence ( \frac{X_1 + ..... + X_n}{n^{\delta}} ) cannot have a subsequence converging almost surely or even in probability.

Thus the only hope that remains is to study convergence in probability or almost sure convergence of ( \frac{X_1 + .... + X_n}{n^{\delta}} ) when \delta > 1/2.

For a sequence (X_n) of jointly independent random variables, under certain pretty mild conditions the almost sure convergence of

(\frac{X_1 + .... + X_n}{n^{\delta} } would follow from the strong law of large numbers, for the case when \delta \ge 1,

whereas to deal with the case when \delta \in (1/2 , 1) we would require the central limit theorem.

The weak and strong laws of large numbers are stated below.

Theorem 5 (Weak and strong laws of large numbers, model case)
Let X_1 , X_2 , ..... be iid copies of an absolutely integrable random variable X, write \mu := \mathbf{E} [ X ]. Then one has

\bullet (i) (Weak law of large numbers). The random variables \frac{X_1 + .... + X_n}{n} converges in probability to \mu as n \to \infty.

\bullet (ii) (Strong law of large numbers). The random variables \frac{X_1 + .... + X_n}{n} converges almost surely to \mu as n \to \infty.

Clearly the strong law of large numbers imply the weak law of large numbers.

A practice I find helpful is to always think of the weak law of large numbers as a statement about the individual random variables in the average, whereas to think of the strong law of large numbers as a statement about the whole tail side of the average. This point of view makes it somewhat easier to understand the essence of these two law of large numbers and helps to understand what type of arguments could be required for proving it. This way of thinking also gives some hope for relaxing joint independence in case of weak law to pairwise independence, but also shows why the strong law might fail to hold after such a relaxation. Indeed as we will see here the conclusion of weak law of large numbers holds in a setting of pairwise independence. The same practice extends to understanding more general statements about convergence in probability and almost sure convergences. It might be helpful to contrast this way of thinking with the following exercise.

Exercise 6 (Optional)

\bullet (i) Suppose (X_n) is a sequence of independent random variables converging almost surely to a random variable Z. What can you say about the almost sure convergence of the sequence ( \frac{X_1 + .... + X_n}{n}) as n \to \infty ?

\bullet (ii) Suppose (X_n) is a sequence of independent random variables converging in probability to a random variable Z. Give a counterexample to show that \frac{X_1 + .... + X_n}{n} need not converge in probability to any random variable.

Theorem 7 (Khinchtine law of large numbers)
Let (X_n) be a sequence of pairwise independent copies of an absolutely integrable random variable X.
Then \frac{X_1 + .... + X_n}{n} converges in L^1 to \mathbf{E} [ X ] and hence also converges in probability to \mathbf{E} [ X ].

A proof of this would be discussed in a later blogpost.

Visualising random variables

What's new

When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon, despite all the advances in modern technology.  However, there are some nice things one can do in an electronic medium, such as this blog.  Here, I would like to experiment with the ability to animate images, which I think can convey some mathematical concepts in ways that cannot be easily replicated by traditional static text and images. Given that many readers may find these animations annoying, I am placing the rest of the post below the fold.

View original post 619 more words