Part 6 - MCMC for Bayesian Inference

Introduction

In this part we will introduce how MCMC can be used from a Bayesian perspective when doing inference.

Discrete-Time Markov Chains with Continuous State Space

Definition 1 (Discrete-Time Markov Chains with Continuous State Space)

Our so far basic definitions are the same. $\{X_i\}_{i = 0, 1, \ldots}$ is a set of continuous random satisfying the Markov property,

\pi(X_{n + 1} \mid X_0, X_1, \ldots, X_n) = \pi(X_{n + 1} \mid X_n)

for all $n$ .

The limiting distribution has the same definition. A stationary distribution is a density $f(x)$ satisfying,

f(x_{n + 1}) = \int \pi(x_{n + 1} \mid x_n) f(x_n) \ dx_n.

Ergodicity still means (with adjusted definitions) that the Markov chain is irreducible, aperiodic, and positive recurrent.

Ergodic Markov chains have a unique limiting distribution; Theorem same as before.

The Strong Law of Large Numbers for Markov Chains also holds in this setting. Lastly, the theory for Metropolis-Hastings is also OK (although, some technical complications in the proof arise).

Bayesian Inference

Intuition (MCMC for Bayesian Inference)

Consider that we have some data $y_1, \ldots, y_n$ , and we want to make a probability prediction for $y_{\text{new}}$ .

We (often) define a parameter $\theta$ , and a probabilistic model so that $y_1, \ldots, y_n, y_{\text{new}}$ are all conditionally independent given $\theta$ ,

\pi(y_1, \ldots, y_n, y_{\text{new}}, \theta) = \left[\prod_{i=1}^{n} \pi(y_i \mid \theta) \right] \pi(y_{\text{new} \mid \theta} \pi(\theta).

Then,

\begin{align*} \pi(y_{\text{new}} \mid y_1, \ldots, y_n) & = \int_{\theta} \pi(y_{\text{new}} \mid \theta) \pi(\theta \mid y_1, \ldots, y_n) \ d\theta \newline & = \mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] \newline \end{align*}

Using strong law of large numbers, we can make predictions by,

Simulating $\theta_1, \ldots, \theta_k$ from the posterior $\pi(\theta \mid y_1, \ldots, y_n)$ .
Averaging,

\mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] \approx \frac{1}{k} \sum_{i = 1}^{k} \pi(y_{\text{new}} \mid \theta_i)

However, simulating from the posterior is often difficult. MCMC can help us here by simulating from a Markov chain having the posterior as its limiting distribution.

By the strong law of large numbers for Markov chains, we still have (as $k \to \infty$ ),

\mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] = \lim_{k \to \infty} \frac{1}{k} \sum_{i = 1}^{k} \pi(y_{\text{new}} \mid \theta_i)

when $\theta_1, \ldots, \theta_k$ is a realization from a Markov chain with the posterior $\pi(\theta \mid y_1, \ldots, y_n)$ as its limiting distribution.

As $\pi(\theta \mid y_1, \ldots, y_n) \propto \prod_{i = 1}^{n} \pi(y_i \mid \theta) \pi(\theta)$ , we can use Metropolis-Hastings with some proposal distribution $q(\theta^{\star} \mid \theta)$ and acceptance probability,

a = \min\left(1, \frac{\pi(\theta^{\star} \mid y_1, \ldots, y_n) q(\theta \mid \theta^{\star})}{\pi(\theta \mid y_1, \ldots, y_n) q(\theta^{\star} \mid \theta)}\right)

To avoid underflow we generate $U \sim \mathrm{Uniform}(0, 1)$ and accept if,

U > \exp\left(\sum_{i = 1}^{n} \left(\log \pi(y_i \mid \theta^{\star}) - \log \pi(y_i \mid \theta) \right) + \log \pi(\theta^{\star}) - \log \pi(\theta) + \log q(\theta \mid \theta^{\star}) - \log q(\theta^{\star} \mid \theta)\right)

Example 1 (Toy Example)

Consider the following,

\begin{align*} y \mid p & \sim \mathrm{Binomial}(n = 17, p) \newline p & \sim \mathrm{Beta}(\alpha = 2.3, \beta = 4.1) \newline y_{\text{new}} \mid p & \sim \mathrm{Binomial}(n = 3, p) \newline \end{align*}

We would like to compute $P(y_{\text{new}} = 1 \mid y = 4)$ .

We first note that,

p \mid y \sim \mathrm{Beta}(\alpha = 2.3 + y, \beta = 4.1 + n - y)

Thus, the predictive distribution,

\begin{align*} \pi(y \mid p) & = \frac{\pi(y \mid p) \pi(p)}{\pi(p \mid y)} \newline & = \frac{\mathrm{Binomial}(y \mid n = 17, p) \mathrm{Beta}(p \mid \alpha = 2.3, \beta = 4.1)}{\mathrm{Beta}(p \mid \alpha = 2.3 + y, \beta = 4.1 + n - y)} \newline & = \frac{\binom{n}{y} p^y (1 - p)^{n - y} \cdot \frac{p^{\alpha - 1} (1 - p)^{\beta - 1}}{B(\alpha, \beta)}}{\frac{p^{\alpha + y - 1} (1 - p)^{\beta + n - y - 1}}{B(\alpha + y, \beta + n - y)}} \newline & = \binom{n}{y} \frac{B(\alpha + y, \beta + n - y)}{B(\alpha, \beta)} \newline \end{align*}

Since $y = 4$ and $n = 17$ , the posterior is $\mathrm{Beta}(6.3, 17.1)$ , predicting 1 success among 3 trials gives,

\binom{3}{1} \frac{B(6.3 + 1, 17.1 + 2)}{B(6.3, 17.1)} = 0.3933

Alternatively, we can use MCMC to estimate this value.