Part 3 - Markov chains I | rezarezvan.com

Introduction

In this part, we will introduce and define Markov Chains along with some examples. We will conduct some basic computations using them and investigate their long term evolution using powers of matrices. Finally, we will discuss induction and limiting and stationary distributions.

Markov Chains

Example 1 (Number Game)

Consider a game, at each time step $i$ , you are at position $1, 2,$ or $3$ . We write $X_i = 1$ , $X_i = 2$ , or $X_i = 3$ for $i = 0, 1, 2, \ldots$ . At each time step, you move to a higher number (or from $3$ to $1$ ) with probability $p$ , or stay put with probability $1 - p$ .

The transitions can be specified with,

\begin{align*} P(X_{i+1} = 1 \mid X_i = 1) & = 1 - p, & P(X_{i+1} = 2 \mid X_i = 1) & = p, & P(X_{i+1} = 3 \mid X_i = 1) & = 0, \newline P(X_{i+1} = 1 \mid X_i = 2) & = 0, & P(X_{i+1} = 2 \mid X_i = 2) & = 1 - p, & P(X_{i+1} = 3 \mid X_i = 2) & = p, \newline P(X_{i+1} = 1 \mid X_i = 3) & = p, & P(X_{i+1} = 2 \mid X_i = 3) & = 0, & P(X_{i+1} = 3 \mid X_i = 3) & = 1 - p. \end{align*}

A more succinct specification is with the transition matrix,

P = \begin{bmatrix} 1 - p & p & 0 \newline 0 & 1 - p & p \newline p & 0 & 1 - p \end{bmatrix}

The sequence $X_0, X_1, X_2, \ldots$ is an example of a Markov chain.

Definition 1 (Markov Chain)

Let $S$ be a discrete set (not necessarily finite), called the state space. A Markov Chain is a sequence of random variables $X_0, X_1, X_2, \ldots$ taking values in $S$ , with the property,

\pi(X_{n+1} \mid X_0, X_1, \ldots, X_n) = \pi(X_{n+1} \mid X_n)

for all $n \geq 1$ .

The chain is said to be time-homogeneous if, for all $n \geq 0$ 11We generally assume this unless otherwise stated.,

\pi(X_{n+1} \mid X_n) = \pi(X_1 \mid X_0)

The transition matrix is defined with,

P_{ij} = \pi(X_{n+1} = j \mid X_n = i)

Further, a stochastic matrix is a square matrix $P$ with non-negative entries, satisfying $P \mathbf{1} = \mathbf{1}$ , where $\mathbf{1}$ is a row vector of ones (i.e., all rows sum to one).

Lastly, all transition matrices are stochastic matrices, and all stochastic matrices can be used as transition matrices.

Basic Computations and Long Term Evolution

Intuition (Basic Computations with Markov Chains)

If $v$ is a vector describing the probability distribution of states at stage $k$ , then $vP$ is the vector describing the probability distribution of states at stage $k + 1$ . Further, if $v$ is a vector describing the probability distribution of states at stage $k$ , then $vP^n$ is the vector describing the probability distribution of states at stage $k + n$ .

Thus, the probability to go from state $i$ to state $j$ in $n$ steps is given by $(P^n)_{ij}$ , but we will write $P^n_{ij}$ .

The Champman-Kolmogorov [^1] relationship $P^{n + m}_{ij} = \sum_{k} P^m_{ik} P^n_{kj}$ is derived from $P^{m + n} = P^m P^n$ .

Lastly, the probability of being at $i_1$ at stage $n_1$ , and then at $i_2$ in stage $n_2$ and so on up to $i_k$ at stage $n_k$ , with $n_1 < n_2 < \ldots < n_k$ , is by the product of corresponding entries of powers of the transition matrix,

(p_0 P^{n_1})_{i_1} (P^{n_2 - n_1})_{i_1 i_2} (P^{n_3 - n_2})_{i_2 i_3} \cdots (P^{n_k - n_{k-1}})_{i_{k-1} i_k}

Intuition (Long Term Evolution of Markov Chains)

When the number of states in $S$ is finite and not too big, we can investigate the long term behavior by computing $P^n$ for large $n$ . In some cases, the powers stabilize into a matrix where all rows are identical. It may also stabilize without identical rows, or it may not stabilize at all.

Note, that if $P$ is block-diagonal [^2], it may combine several of these behaviors,

\text{If } P = \begin{bmatrix} P_1 & 0 & \ldots & 0 \newline 0 & P_2 & \ldots & 0 \newline \vdots & \vdots & \ddots & 0 \newline 0 & 0 & \ldots & P_k \end{bmatrix} \text{, then } P^n = \begin{bmatrix} P_1^n & 0 & \ldots & 0 \newline 0 & P_2^n & \ldots & 0 \newline \vdots & \vdots & \ddots & 0 \newline 0 & 0 & \ldots & P_k^n \end{bmatrix}

Note (Long Term Evolution Using Simulation)

If $S$ is large of infinite, we may instead investigate the long term behavior using simulation.

Algorithm (Simulating a Markov Chain)

Repeat many times:

Draw $x_0$ according to $\pi(x_0)$
For $i$ in $1$ through $n$ :
Draw $x_i$ according to $\pi(x_i \mid x_{i-1})$ Use the distribution of the $x_n$ to approximate the distribution of $X_n$ .

Induction, Limiting and Stationary Distributions

Intuition (Proving stuff using Induction)

Many results about Markov chains can be proved using induction.

Formulate a statement $S(n)$ depending on a non-negative integer $n$ .
Prove $S(0)$ (the base case).
Prove that if $S(n)$ is true, then $S(n + 1)$ is true (the inductive step).

With this, one may conclude that $S(n)$ is true for all non-negative integers $n$ .

Example 2 (2-state Markov Chain)

Any 2-state Markov Chain has the transition matrix $\begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}$ for some $0 < p < 1$ and $0 < q < 1$ .

We can prove by induction that, for any $n \geq 0$ ,

\begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^n \right]

We can use this to study what happens when $n$ goes to infinity.

Solution

We proceed by induction.

Base case ( $n = 0$ ):

\begin{bmatrix} 1 & 0 \newline 0 & 1 \end{bmatrix} = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^0 \right]

This is true, since the right-hand side simplifies to,

\frac{1}{p + q} \left[ \begin{bmatrix} q + p & p - p \newline q - q & p + q \end{bmatrix} \right] = \begin{bmatrix} 1 & 0 \newline 0 & 1 \end{bmatrix}

Inductive step: Assume the statement is true for some $n \geq 0$ . We need to show that it is true for $n + 1$ . We have,

\begin{align*} & \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^{n + 1} \newline & = \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix} \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^n \right] \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix} \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q (1 - p) + p q & q p + p (1 - q) \newline q (1 - p) + p q & q p + p (1 - q) \end{bmatrix} + \begin{bmatrix} p (1 - p) - p q & p^2 - p (1 - q) \newline -q (1 - p) + q q & - q p + q (1 - q) \end{bmatrix} (1 - p - q)^n \right] \newline & = \frac{1}{p + q} \left[ \begin{bmatrix} q & p \newline q & p \end{bmatrix} + \begin{bmatrix} p & -p \newline -q & q \end{bmatrix} (1 - p - q)^{n + 1} \right] \end{align*}

Thus, by induction, the statement is true for all non-negative integers $n$ . From this, we can see that as $n$ goes to infinity, the powers of the transition matrix converge to,

\lim_{n \to \infty} \begin{bmatrix} 1 - p & p \newline q & 1 - q \end{bmatrix}^n = \frac{1}{p + q} \begin{bmatrix} q & p \newline q & p \end{bmatrix}

Definition 2 (Limiting Distribution)

A limiting distribution for a Markov chain with transition matrix $P$ is a probability vector $v$ such that,

\lim_{n \to \infty} (P^n)_{ij} = v_j

for all states $i$ and $j$ .

An equivalent formulation is, the limit $\lim_{n \to \infty} (P^n)_{ij}$ exists and does not depend on $i$ .

Another equivalent formulation is, $\lim_{n \to \infty} P^n$ is a stochastic matrix with all rows identical.

Further, a Markov chain has either no or one unique limiting distribution. If a limiting distribution exists, its probabilities correspond to the proportion of time steps the chain spends at each state in the long run.

Definition 3 (Stationary Distribution)

A stationary distribution for a Markov chain is a distribution that is unchanged when applying one step of the Markov chain. If $P$ is the transition matrix, then a probability vector $v$ represents a stationary distribution if and only if,

vP = v

A Markov chain can have zero, one, or many stationary distributions. Limiting distributions are always stationary distributions (but not necessarily vice versa).

Definition 4 (Regular Transition Matrices)

A stochastic matrix $P$ is positive if all entries are strictly positive, i.e., $P_{ij} > 0$ for all $i$ and $j$ . A stochastic matrix $P$ is regular if $P^n$ is positive for some $n > 0$ .

Theorem 1 (Limit Theorem for Regular Markov Chains)

If the transition matrix $P$ is regular, the limiting distribution exists. There are no other stationary distributions and the limiting distribution is positive, i.e., all states have positive probability in the limiting distribution.

Intuition (Finding Stationary Distributions)

Find the $v$ satisfying $vP = v$ by, for example,

Solving the linear system $vP = v$
Guessing at a $v$ , and showing that $vP = v$ , or,
Computing an eigenvector for the transpose $P^T$ belonging to the eigenvalue 1.

Having found a $v$ satisfying $vP = v$ , if the transition matrix $P$ is regular, we know $v$ represents the unique limiting distribution and the unique stationary distribution.

Example 3 (Random Walks on Undirected Graphs)

An undirected graph consists of nodes and undirected edges connecting them. An undirected graph defines a random walk Markov chain by, at every time step, following one of the edges out of a node, with equal probability.

When the graph is finite, we show that the vector $u$ is a stationary distribution, where $u_i = \frac{\mathrm{deg}(i)}{S}$ , where $\mathrm{deg}(i)$ is the number of edges going into node $i$ and $S$ is the sum over all nodes of their degrees.

If we further generalize, a weighted undirected graph is a graph with a positive weight at any edge between $i$ and $j$ for all $i$ and $j$ . If we define the Markov chain by choosing the next node with probabilities according to the weights. Then, when the graph is finite, the vector $u$ is a stationary distribution, where $u_i = \frac{w(i)}{e}$ , where $w(i)$ is the sum of the weights of the edges going into $i$ , and $e$ is the total sum over all nodes of their $w(i)$ values.

Intuition (Moving Between States)

State $j$ is accessible from state $i$ if $(P^n)_{ij} > 0$ for some $n \geq 0$ . States $i$ and $j$ communicate if $i$ is accessible from $j$ and $j$ is accessible from $i$ .

“Communication” is transitive, i.e., if $i$ communicates with $j$ and $j$ communicates with $k$ , then $i$ communicates with $k$ .

Communication is an equivalence relation, subdividing all states into communication classes. Communication classes can be found for example by drawing transition graphs.

Lastly, a Markov chain is irreducible if it has exactly one communication class.

Definition 5 (Recurrence and Transience)

Let $T_j$ be the first passage time to state $j$ ,

T_j = \min \{n > 0 : X_n = j \}

Let $f_j$ be the probability that a chain starting at $j$ will return to $j$ ,

f_j = P(T_j < \infty \mid X_0 = j)

A state $j$ is recurrent if a chain starting at $j$ will eventually revisit $j$ , i.e., if $f_j = 1$ .

A state $j$ is transient if a chain starting at $j$ has a positive probability of never revisiting $j$ , i.e., if $f_j < 1$ .

Note

The expected number of visits at $j$ when the chain starts at $i$ is given by $\sum_{n = 0}^{\infty} (P^n)_{ij}$ .

$j$ is recurrent if and only if $\sum_{n = 0}^{\infty} (P^n)_{jj} = \infty$ .

$j$ is transient if and only if $\sum_{n = 0}^{\infty} (P^n)_{jj} < \infty$ .

Note (Communication Classes)

The states of a communication class are either all recurrent or all transient. The states of a finite irreducible Markov chain are all recurrent.

If a state is recurrent, only states inside its communication class are accessible from it. If no states outside a finite communication class are accessible from it, then the class consists of recurrent states.