Part 6 - MCMC for Bayesian Inference

Introduction

In this part we will introduce how MCMC can be used from a Bayesian perspective when doing inference.

Discrete-Time Markov Chains with Continuous State Space

Definition: Discrete-Time Markov Chains with Continuous State Space

Our so far basic definitions are the same. $\{X_i\}_{i = 0, 1, \ldots}$ is a set of continuous random satisfying the Markov property, $$ \pi(X_{n + 1} \mid X_0, X_1, \ldots, X_n) = \pi(X_{n + 1} \mid X_n) $$ for all $n$.

The limiting distribution has the same definition. A stationary distribution is a density $f(x)$ satisfying, $$ f(x_{n + 1}) = \int \pi(x_{n + 1} \mid x_n) f(x_n) \ dx_n. $$ Ergodicity still means (with adjusted definitions) that the Markov chain is irreducible, aperiodic, and positive recurrent.

Ergodic Markov chains have a unique limiting distribution; Theorem same as before.

The Strong Law of Large Numbers for Markov Chains also holds in this setting. Lastly, the theory for Metropolis-Hastings is also OK (although, some technical complications in the proof arise).

Bayesian Inference

Intuition: MCMC for Bayesian Inference

Consider that we have some data $y_1, \ldots, y_n$, and we want to make a probability prediction for $y_{\text{new}}$.

We (often) define a parameter $\theta$, and a probabilistic model so that $y_1, \ldots, y_n, y_{\text{new}}$ are all conditionally independent given $\theta$, $$ \pi(y_1, \ldots, y_n, y_{\text{new}}, \theta) = \left[\prod_{i=1}^{n} \pi(y_i \mid \theta) \right] \pi(y_{\text{new} \mid \theta} \pi(\theta). $$ Then, $$ \begin{align*} \pi(y_{\text{new}} \mid y_1, \ldots, y_n) & = \int_{\theta} \pi(y_{\text{new}} \mid \theta) \pi(\theta \mid y_1, \ldots, y_n) \ d\theta \newline & = \mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] \newline \end{align*} $$ Using strong law of large numbers, we can make predictions by,

Simulating $\theta_1, \ldots, \theta_k$ from the posterior $\pi(\theta \mid y_1, \ldots, y_n)$.
Averaging, $$ \mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] \approx \frac{1}{k} \sum_{i = 1}^{k} \pi(y_{\text{new}} \mid \theta_i) $$ However, simulating from the posterior is often difficult. MCMC can help us here by simulating from a Markov chain having the posterior as its limiting distribution.

By the strong law of large numbers for Markov chains, we still have (as $k \to \infty$), $$ \mathbb{E}_{\theta \mid y_1, \ldots, y_n}[\pi(y_{\text{new}} \mid \theta)] = \lim_{k \to \infty} \frac{1}{k} \sum_{i = 1}^{k} \pi(y_{\text{new}} \mid \theta_i) $$ when $\theta_1, \ldots, \theta_k$ is a realization from a Markov chain with the posterior $\pi(\theta \mid y_1, \ldots, y_n)$ as its limiting distribution.

As $\pi(\theta \mid y_1, \ldots, y_n) \propto \prod_{i = 1}^{n} \pi(y_i \mid \theta) \pi(\theta)$, we can use Metropolis-Hastings with some proposal distribution $q(\theta^{\star} \mid \theta)$ and acceptance probability, $$ a = \min\left(1, \frac{\pi(\theta^{\star} \mid y_1, \ldots, y_n) q(\theta \mid \theta^{\star})}{\pi(\theta \mid y_1, \ldots, y_n) q(\theta^{\star} \mid \theta)}\right) $$ To avoid underflow we generate $U \sim \mathrm{Uniform}(0, 1)$ and accept if, $$ U > \exp\left(\sum_{i = 1}^{n} \left(\log \pi(y_i \mid \theta^{\star}) - \log \pi(y_i \mid \theta) \right) + \log \pi(\theta^{\star}) - \log \pi(\theta) + \log q(\theta \mid \theta^{\star}) - \log q(\theta^{\star} \mid \theta)\right) $$

Example: Toy Example

Consider the following, $$ \begin{align*} y \mid p & \sim \mathrm{Binomial}(n = 17, p) \newline p & \sim \mathrm{Beta}(\alpha = 2.3, \beta = 4.1) \newline y_{\text{new}} \mid p & \sim \mathrm{Binomial}(n = 3, p) \newline \end{align*} $$ We would like to compute $P(y_{\text{new}} = 1 \mid y = 4)$.

We first note that, $$ p \mid y \sim \mathrm{Beta}(\alpha = 2.3 + y, \beta = 4.1 + n - y) $$ Thus, the predictive distribution, $$ \begin{align*} \pi(y \mid p) & = \frac{\pi(y \mid p) \pi(p)}{\pi(p \mid y)} \newline & = \frac{\mathrm{Binomial}(y \mid n = 17, p) \mathrm{Beta}(p \mid \alpha = 2.3, \beta = 4.1)}{\mathrm{Beta}(p \mid \alpha = 2.3 + y, \beta = 4.1 + n - y)} \newline & = \frac{\binom{n}{y} p^y (1 - p)^{n - y} \cdot \frac{p^{\alpha - 1} (1 - p)^{\beta - 1}}{B(\alpha, \beta)}}{\frac{p^{\alpha + y - 1} (1 - p)^{\beta + n - y - 1}}{B(\alpha + y, \beta + n - y)}} \newline & = \binom{n}{y} \frac{B(\alpha + y, \beta + n - y)}{B(\alpha, \beta)} \newline \end{align*} $$ Since $y = 4$ and $n = 17$, the posterior is $\mathrm{Beta}(6.3, 17.1)$, predicting 1 success among 3 trials gives, $$ \binom{3}{1} \frac{B(6.3 + 1, 17.1 + 2)}{B(6.3, 17.1)} = 0.3933 $$ Alternatively, we can use MCMC to estimate this value.