Part 3 - Markov chains I

Introduction

In this part, we will introduce and define Markov Chains along with some examples. We will conduct some basic computations using them and investigate their long term evolution using powers of matrices. Finally, we will discuss induction and limiting and stationary distributions.

Markov Chains

Example 1 (Number Game)

Consider a game, at each time step ii, you are at position 1,2,1, 2, or 33. We write Xi=1X_i = 1, Xi=2X_i = 2, or Xi=3X_i = 3 for i=0,1,2,i = 0, 1, 2, \ldots. At each time step, you move to a higher number (or from 33 to 11) with probability pp, or stay put with probability 1p1 - p.

The transitions can be specified with,

P(Xi+1=1Xi=1)=1p,P(Xi+1=2Xi=1)=p,P(Xi+1=3Xi=1)=0,P(Xi+1=1Xi=2)=0,P(Xi+1=2Xi=2)=1p,P(Xi+1=3Xi=2)=p,P(Xi+1=1Xi=3)=p,P(Xi+1=2Xi=3)=0,P(Xi+1=3Xi=3)=1p.\begin{align*} P(X_{i+1} = 1 \mid X_i = 1) & = 1 - p, & P(X_{i+1} = 2 \mid X_i = 1) & = p, & P(X_{i+1} = 3 \mid X_i = 1) & = 0, \newline P(X_{i+1} = 1 \mid X_i = 2) & = 0, & P(X_{i+1} = 2 \mid X_i = 2) & = 1 - p, & P(X_{i+1} = 3 \mid X_i = 2) & = p, \newline P(X_{i+1} = 1 \mid X_i = 3) & = p, & P(X_{i+1} = 2 \mid X_i = 3) & = 0, & P(X_{i+1} = 3 \mid X_i = 3) & = 1 - p. \end{align*}

A more succinct specification is with the transition matrix,

P=[1pp001ppp01p]P = \begin{bmatrix} 1 - p & p & 0 \newline 0 & 1 - p & p \newline p & 0 & 1 - p \end{bmatrix}

The sequence X0,X1,X2,X_0, X_1, X_2, \ldots is an example of a Markov chain.

Definition 1 (Markov Chain)

Let SS be a discrete set (not necessarily finite), called the state space. A Markov Chain is a sequence of random variables X0,X1,X2,X_0, X_1, X_2, \ldots taking values in SS, with the property,

π(Xn+1X0,X1,,Xn)=π(Xn+1Xn)\pi(X_{n+1} \mid X_0, X_1, \ldots, X_n) = \pi(X_{n+1} \mid X_n)

for all n1n \geq 1.

The chain is said to be time-homogeneous if, for all n0n \geq 0 1We generally assume this unless otherwise stated.,

π(Xn+1Xn)=π(X1X0)\pi(X_{n+1} \mid X_n) = \pi(X_1 \mid X_0)

The transition matrix is defined with,

Pij=π(Xn+1=jXn=i)P_{ij} = \pi(X_{n+1} = j \mid X_n = i)

Further, a stochastic matrix is a square matrix PP with non-negative entries, satisfying P1=1P \mathbf{1} = \mathbf{1}, where 1\mathbf{1} is a row vector of ones (i.e., all rows sum to one).

Lastly, all transition matrices are stochastic matrices, and all stochastic matrices can be used as transition matrices.

Basic Computations and Long Term Evolution

Intuition (Basic Computations with Markov Chains)

If vv is a vector describing the probability distribution of states at stage kk, then vPvP is the vector describing the probability distribution of states at stage k+1k + 1. Further, if vv is a vector describing the probability distribution of states at stage kk, then vPnvP^n is the vector describing the probability distribution of states at stage k+nk + n.

Thus, the probability to go from state ii to state jj in nn steps is given by (Pn)ij(P^n)_{ij}, but we will write PijnP^n_{ij}.

The Champman-Kolmogorov [^1] relationship Pijn+m=kPikmPkjnP^{n + m}_{ij} = \sum_{k} P^m_{ik} P^n_{kj} is derived from Pm+n=PmPnP^{m + n} = P^m P^n.

Lastly, the probability of being at i1i_1 at stage n1n_1, and then at i2i_2 in stage n2n_2 and so on up to iki_k at stage nkn_k, with n1<n2<<nkn_1 < n_2 < \ldots < n_k, is by the product of corresponding entries of powers of the transition matrix,

(p0Pn1)i1(Pn2n1)i1i2(Pn3n2)i2i3(Pnknk1)ik1ik(p_0 P^{n_1})_{i_1} (P^{n_2 - n_1})_{i_1 i_2} (P^{n_3 - n_2})_{i_2 i_3} \cdots (P^{n_k - n_{k-1}})_{i_{k-1} i_k}
Intuition (Long Term Evolution of Markov Chains)

When the number of states in SS is finite and not too big, we can investigate the long term behavior by computing PnP^n for large nn. In some cases, the powers stabilize into a matrix where all rows are identical. It may also stabilize without identical rows, or it may not stabilize at all.

Note, that if PP is block-diagonal [^2], it may combine several of these behaviors,

If P=[P1000P20000Pk], then Pn=[P1n000P2n0000Pkn]\text{If } P = \begin{bmatrix} P_1 & 0 & \ldots & 0 \newline 0 & P_2 & \ldots & 0 \newline \vdots & \vdots & \ddots & 0 \newline 0 & 0 & \ldots & P_k \end{bmatrix} \text{, then } P^n = \begin{bmatrix} P_1^n & 0 & \ldots & 0 \newline 0 & P_2^n & \ldots & 0 \newline \vdots & \vdots & \ddots & 0 \newline 0 & 0 & \ldots & P_k^n \end{bmatrix}
Note (Long Term Evolution Using Simulation)

If SS is large of infinite, we may instead investigate the long term behavior using simulation.

Algorithm (Simulating a Markov Chain)

Repeat many times:

  • Draw x0x_0 according to π(x0)\pi(x_0)

  • For ii in 11 through nn:

  • Draw xix_i according to π(xixi1)\pi(x_i \mid x_{i-1}) Use the distribution of the xnx_n to approximate the distribution of XnX_n.

Induction, Limiting and Stationary Distributions

Intuition (Proving stuff using Induction)

Many results about Markov chains can be proved using induction.

  1. Formulate a statement S(n)S(n) depending on a non-negative integer nn.

  2. Prove S(0)S(0) (the base case).

  3. Prove that if S(n)S(n) is true, then S(n+1)S(n + 1) is true (the inductive step).

With this, one may conclude that S(n)S(n) is true for all non-negative integers nn.

Example 2 (2-state Markov Chain)

Any 2-state Markov Chain has the transition matrix [1ppq1q]\begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix} for some 0<p<10 < p < 1 and 0<q<10 < q < 1.

We can prove by induction that, for any n0n \geq 0,

[1ppq1q]n=1p+q[[qpqp]+[ppqq](1pq)n]\begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^n \right]

We can use this to study what happens when nn goes to infinity.

Solution

We proceed by induction.

  • Base case (n=0n = 0):
[1001]=1p+q[[qpqp]+[ppqq](1pq)0]\begin{bmatrix} 1 & 0 \newline 0 & 1 \end{bmatrix} = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^0 \right]

This is true, since the right-hand side simplifies to,

1p+q[[q+pppqqp+q]]=[1001]\frac{1}{p + q} \left[ \begin{bmatrix} q + p & p - p \newline q - q & p + q \end{bmatrix} \right] = \begin{bmatrix} 1 & 0 \newline 0 & 1 \end{bmatrix}
  • Inductive step: Assume the statement is true for some n0n \geq 0. We need to show that it is true for n+1n + 1. We have,
[1ppq1q]n+1=[1ppq1q]n[1ppq1q]=1p+q[[qpqp]+[ppqq](1pq)n][1ppq1q]=1p+q[[q(1p)+pqqp+p(1q)q(1p)+pqqp+p(1q)]+[p(1p)pqp2p(1q)q(1p)+qqqp+q(1q)](1pq)n]=1p+q[[qpqp]+[ppqq](1pq)n+1]\begin{align*} & \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^{n + 1} \newline & = \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix} \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^n \right] \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix} \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q (1 - p) + p q & q p + p (1 - q) \newline q (1 - p) + p q & q p + p (1 - q) \end{bmatrix} + \begin{bmatrix} p (1 - p) - p q & p^2 - p (1 - q) \newline -q (1 - p) + q q & - q p + q (1 - q) \end{bmatrix} (1 - p - q)^n \right] \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^{n + 1} \right] \end{align*}

Thus, by induction, the statement is true for all non-negative integers nn. From this, we can see that as nn goes to infinity, the powers of the transition matrix converge to,

limn[1ppq1q]n=1p+q[qpqp]\lim_{n \to \infty} \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n = \frac{1}{p + q} \begin{bmatrix} q & p \newline q & p \end{bmatrix}
Definition 2 (Limiting Distribution)

A limiting distribution for a Markov chain with transition matrix PP is a probability vector vv such that,

limn(Pn)ij=vj\lim_{n \to \infty} (P^n)_{ij} = v_j

for all states ii and jj.

An equivalent formulation is, the limit limn(Pn)ij\lim_{n \to \infty} (P^n)_{ij} exists and does not depend on ii.

Another equivalent formulation is, limnPn\lim_{n \to \infty} P^n is a stochastic matrix with all rows identical.

Further, a Markov chain has either no or one unique limiting distribution. If a limiting distribution exists, its probabilities correspond to the proportion of time steps the chain spends at each state in the long run.

Definition 3 (Stationary Distribution)

A stationary distribution for a Markov chain is a distribution that is unchanged when applying one step of the Markov chain. If PP is the transition matrix, then a probability vector vv represents a stationary distribution if and only if,

vP=vvP = v

A Markov chain can have zero, one, or many stationary distributions. Limiting distributions are always stationary distributions (but not necessarily vice versa).

Definition 4 (Regular Transition Matrices)

A stochastic matrix PP is positive if all entries are strictly positive, i.e., Pij>0P_{ij} > 0 for all ii and jj. A stochastic matrix PP is regular if PnP^n is positive for some n>0n > 0.

Theorem 1 (Limit Theorem for Regular Markov Chains)

If the transition matrix PP is regular, the limiting distribution exists. There are no other stationary distributions and the limiting distribution is positive, i.e., all states have positive probability in the limiting distribution.

Intuition (Finding Stationary Distributions)

Find the vv satisfying vP=vvP = v by, for example,

  1. Solving the linear system vP=vvP = v

  2. Guessing at a vv, and showing that vP=vvP = v, or,

  3. Computing an eigenvector for the transpose PTP^T belonging to the eigenvalue 1.

Having found a vv satisfying vP=vvP = v, if the transition matrix PP is regular, we know vv represents the unique limiting distribution and the unique stationary distribution.

Example 3 (Random Walks on Undirected Graphs)

An undirected graph consists of nodes and undirected edges connecting them. An undirected graph defines a random walk Markov chain by, at every time step, following one of the edges out of a node, with equal probability.

When the graph is finite, we show that the vector uu is a stationary distribution, where ui=deg(i)Su_i = \frac{\mathrm{deg}(i)}{S}, where deg(i)\mathrm{deg}(i) is the number of edges going into node ii and SS is the sum over all nodes of their degrees.

If we further generalize, a weighted undirected graph is a graph with a positive weight at any edge between ii and jj for all ii and jj. If we define the Markov chain by choosing the next node with probabilities according to the weights. Then, when the graph is finite, the vector uu is a stationary distribution, where ui=w(i)eu_i = \frac{w(i)}{e}, where w(i)w(i) is the sum of the weights of the edges going into ii, and ee is the total sum over all nodes of their w(i)w(i) values.

Intuition (Moving Between States)

State jj is accessible from state ii if (Pn)ij>0(P^n)_{ij} > 0 for some n0n \geq 0. States ii and jj communicate if ii is accessible from jj and jj is accessible from ii.

“Communication” is transitive, i.e., if ii communicates with jj and jj communicates with kk, then ii communicates with kk.

Communication is an equivalence relation, subdividing all states into communication classes. Communication classes can be found for example by drawing transition graphs.

Lastly, a Markov chain is irreducible if it has exactly one communication class.

Definition 5 (Recurrence and Transience)

Let TjT_j be the first passage time to state jj,

Tj=min{n>0:Xn=j}T_j = \min \{n > 0 : X_n = j \}

Let fjf_j be the probability that a chain starting at jj will return to jj,

fj=P(Tj<X0=j)f_j = P(T_j < \infty \mid X_0 = j)

A state jj is recurrent if a chain starting at jj will eventually revisit jj, i.e., if fj=1f_j = 1.

A state jj is transient if a chain starting at jj has a positive probability of never revisiting jj, i.e., if fj<1f_j < 1.

Note

The expected number of visits at jj when the chain starts at ii is given by n=0(Pn)ij\sum_{n = 0}^{\infty} (P^n)_{ij}.

jj is recurrent if and only if n=0(Pn)jj=\sum_{n = 0}^{\infty} (P^n)_{jj} = \infty.

jj is transient if and only if n=0(Pn)jj<\sum_{n = 0}^{\infty} (P^n)_{jj} < \infty.

Note (Communication Classes)

The states of a communication class are either all recurrent or all transient. The states of a finite irreducible Markov chain are all recurrent.

If a state is recurrent, only states inside its communication class are accessible from it. If no states outside a finite communication class are accessible from it, then the class consists of recurrent states.