Part 4 - Markov chains II | rezarezvan.com

Introduction

In this part, we will define more important properties and concepts related to Markov Chains. Firstly, we will define the limit theorem for finite irreducible Markov chains and see how it is (weaker) compared to the limit theorem for regular Markov chains. Then, we will define periodicity and ergodic Markov chains. Finally, we will define and discuss time reversibility, canonical distributions, and absorbing chains.

Limit Theorem for Finite Irreducible Markov Chains

Theorem: Limit Theorem for Finite Irreducible Markov Chains

Recall

In a finite irreducible Markov chain, all states are recurrent.

Let $\mu_j = \mathbb{E}[T_j \mid X_0 = j]$ be the expected return time to state $j$. Then $\mu_j < \infty$ for all states $j$ and the vector $v$ with $v_j = \frac{1}{\mu_j}$ is the unique stationary distribution.

Further, $$ v_j = \lim_{n \to \infty} \frac{1}{n} \sum_{m = 0}^{n - 1} (P^m)_{ij} $$

Note

All finite regular Markov chains are finite irreducible Markov chains, but not vice versa. The overall conclusion here is weaker than that for finite regular Markov chains. Not all finite irreducible Markov chains have limiting distributions.

Intuition: Extention to Infinite Irreducible Markov Chains

In a finite irreducible Markov chian, all states are recurrent, and all expected return times $\mu_j$ are finite.

In a Markov chain, states may be recurrent but with infinite expected return times. Such states are called null recurrent, while recurrent states with finite expected return times are called positive recurrent.

The previous theorem can be extended to infinite irreducible Markov chains where all states are positive recurrent.

Periodicity and Ergodic Markov Chains

Definition: Periodicity

The period of a state $i$ is the greatest common divisor of all $n > 0$ such that $(P^n)_{ii} > 0$. All states of a communication class have the same period.

Proof: Periodicity is the Same for all States in a Communication Class

Let $i$ and $j$ be two states in the same communication class. Then, there exist $m, k > 0$ such that $(P^m)_{ij} > 0$ and $(P^k)_{ji} > 0$. Let $d_i$ and $d_j$ be the periods of states $i$ and $j$, respectively. For any $n$ such that $(P^n)_{ii} > 0$, we have, $$ (P^{m + n + k})_{jj} \geq (P^m)_{ji} (P^n)_{ii} (P^k)_{ij} > 0 $$ Thus, $d_j$ divides $m + n + k$. Since $d_j$ also divides $m + k$, it must divide $n$. By symmetry, $d_i$ divides $n$ for any $n$ such that $(P^n)_{jj} > 0$. Therefore, $d_i = d_j$.

A Markov chain is periodic if it is irreducible and all states have period greater than 1.

A Markov chain is aperiodic if it is irreducible and all states have period 1.

Definition: Ergodic Markov Chains

A Markov chain is ergodic if,

It is irreducible.
It is aperiodic.
All states are positive recurrent (i..e, have finite expected return times (which always is true if the state space is finite)).

Theorem: Fundamental Limit Theorem for Ergodic Markov Chains

There exists a unique stationary distribution $v$ which is the limiting distribution of the chain.

Note

We can also show that a finite Markov chain is ergodic if and only if its transition matrix is regular.

Time Reversibility, Canonical Distributions, and Absorbing Chains

Definition: Time Reversibility

Let $P$ be the transition matrix of an irreducible Markov chain with stationary distribution $v$. The chain is time reversible if, when running from its stationary distribution, it looks the same moving forward as backwards, i.e., $$ \pi(X_k = i, X_{k + 1} = j) = \pi(X_k = j, X_{k + 1} = i) $$ This may also be written as $v_i P_{ij} = v_j P_{ji}$ for all states $i$ and $j$. It is called the detailed balance equations.

Note

If $x$ is a probability vector satisfying $x_i P_{ij} = x_j P_{ji}$ for all states $i$ and $j$, then $x$ is the stationary distribution of the chain, and the chain is time reversible.

Proof

We have, $$ \begin{align*} (xP)_j & = \sum_{i} x_i P_{ij} \newline & = \sum_i x_j P_{ji} \newline & = x_j \sum_i P_{ji} \newline & = x_j \end{align*} $$ Thus, $xP = x$, so $x$ is the stationary distribution.

If a Markov chain is defined as a random walk on a weighted undirected graph, then it is time reversible.

Proof

Let $w(i, j)$ be the weight of the edge between nodes $i$ and $j$ (0 if no edge exists). Let $w(i) = \sum_{k} w(i, k)$ be the sum of weights of edges going into node $i$. The transition matrix is given by $P_{ij} = \frac{w(i, j)}{w(i)}$. Let $v_i = \frac{w(i)}{e}$, where $e$ is the total sum over all nodes of their $w(i)$ values. Then, $$ \begin{align*} v_i P_{ij} & = \frac{w(i)}{e} \cdot \frac{w(i, j)}{w(i)} \newline & = \frac{w(i, j)}{e} \newline & = \frac{w(j, i)}{e} \newline & = \frac{w(j)}{e} \cdot \frac{w(j, i)}{w(j)} \newline & = v_j P_{ji} \end{align*} $$ Thus, the detailed balance equations are satisfied, so the chain is time reversible.

If a finite Markov chain is time reversible, it can be represented as a random walk on a weighted undirected graph.

Proof

Let $P$ be the transition matrix of a finite time reversible Markov chain with stationary distribution $v$. Let $w(i, j) = v_i P_{ij}$ be the weight of the edge between nodes $i$ and $j$. Then, $$ \sum_{j} w(i, j) = \sum_{j} v_i P_{ij} = v_i \sum_{j} P_{ij} = v_i $$ Thus, the sum of weights of edges going into node $i$ is $v_i$. The transition matrix is given by, $$ P_{ij} = \frac{w(i, j)}{\sum_{k} w(i, k)} = \frac{v_i P_{ij}}{v_i} = P_{ij} $$ Thus, the Markov chain can be represented as a random walk on a weighted undirected graph.

Definition: Canonical Distributions

The states of a Markov chain can be subdivided into communication classes, each consisting only of a transient or recurrent states. Let $T$ denote the union of all communication classes with transient states. Let the remaining communication classes be $R_1, R_2, \ldots, R_m$.

Each $R_i$ must necessarily be closed in the sense that no states outside $R_i$ are accessible from $R_i$. Ordering states according to $T, R_1, R_2, \ldots, R_m$, the transition matrix $P$ has the block structure, $$ P = \begin{bmatrix} \star & \star & \star & \ldots & \star \newline 0 & P_1 & 0 & \ldots & 0 \newline \vdots & \vdots & \vdots & \ddots & \vdots \newline 0 & 0 & 0 & \ldots & P_m \end{bmatrix}. $$ We will later on dicuss what $\star$ is here.

Note that, $$ P^n = \begin{bmatrix} \star & \star & \star & \ldots & \star \newline 0 & P_1^n & 0 & \ldots & 0 \newline \vdots & \vdots & \vdots & \ddots & \vdots \newline 0 & 0 & 0 & \ldots & P_m^n \end{bmatrix} $$ and we can take the limits of each $P^n_i$, if they exist.

Definition: Absorbing Chains

State $i$ is absorbing if $P_{ii} = 1$a A Markov chain is an absorbing chain if it has at least one absorbing state.

By reordering the states, the transition matrix for an absorbing chain can be written in the block form, $$ P = \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix} $$ where $I$ is the identity matrix, $0$ is a matrix of zeroes, and $Q$ corresponds to transient states. We can prove by induction that, $$ P^n = \begin{bmatrix} Q^n & (I + Q + Q^2 + \ldots + Q^{n - 1}) R \newline 0 & I \end{bmatrix} $$

Proof: Induction for Powers of Transition Matrix of Absorbing Chains

Base case ($n = 1$): $$ \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix} = \begin{bmatrix} Q^1 & (I) R \newline 0 & I \end{bmatrix} $$
Inductive step: Assume the statement is true for some $n \geq 1$. We need to show that it is true for $n + 1$. $$ \begin{align*} & \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix}^{n + 1} \newline & = \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix}^n \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix} \newline & = \begin{bmatrix} Q^n & (I + Q + Q^2 + \ldots + Q^{n - 1}) R \newline 0 & I \end{bmatrix} \begin{bmatrix} Q & R \newline 0 & I \end{bmatrix} \newline & = \begin{bmatrix} Q^{n + 1} & Q^n R + (I + Q + Q^2 + \ldots + Q^{n - 1}) R \newline 0 & I \end{bmatrix} \newline & = \begin{bmatrix} Q^{n + 1} & (I + Q + Q^2 + \ldots + Q^n) R \newline 0 & I \end{bmatrix} \end{align*} $$ Thus, by induction, the statement is true for all positive integers $n$.

Taking the limit and using $\lim_{n \to \infty} Q^n = 0$ (since all states corresponding to $Q$ are transient), we have, $$ \lim_{n \to \infty} P^n = \begin{bmatrix} 0 & (I - Q)^{-1} R \newline 0 & I \end{bmatrix} = \begin{bmatrix} 0 & FR \newline 0 & I \end{bmatrix} $$ where $F = (I - Q)^{-1} = \lim_{n \to \infty} (I + Q + Q^2 + \ldots + Q^{n - 1})$ is called the fundamental matrix.

The probability to be absorbed in a particular absorbing state given a start in a transient state is given by the entries of $FR$.

Further, the expected number of visits in state $j$ for a chain that starts in the transient state $i$ is given by $F_{ij}$ (as this is equal to $\sum_{n = 0}^{\infty} (P^n)_{ij}$).

Thus, the expected number of steps until absorption is given by the vector $F \mathbf{1}^T$, where $\mathbf{1}$ is a column vector of ones.

Note

Given an irreducible Markov chain, to compute the expected number of steps needed to go from state $i$ to the first visit to state $j$, one can change the chain into one where state $j$ is absorbing, and compute the expected number of steps until absorption when starting at state $i$.