In this part we will introduce how MCMC can be used from a Bayesian perspective when doing inference.
Discrete-Time Markov Chains with Continuous State Space
Definition 1 (Discrete-Time Markov Chains with Continuous State Space)
Our so far basic definitions are the same. {Xi}i=0,1,… is a set of continuous random satisfying the Markov property,
π(Xn+1∣X0,X1,…,Xn)=π(Xn+1∣Xn)
for all n.
The limiting distribution has the same definition. A stationary distribution is a density f(x) satisfying,
f(xn+1)=∫π(xn+1∣xn)f(xn)dxn.
Ergodicity still means (with adjusted definitions) that the Markov chain is irreducible, aperiodic, and positive recurrent.
Ergodic Markov chains have a unique limiting distribution; Theorem same as before.
The Strong Law of Large Numbers for Markov Chains also holds in this setting.
Lastly, the theory for Metropolis-Hastings is also OK (although, some technical complications in the proof arise).
Bayesian Inference
Intuition (MCMC for Bayesian Inference)
Consider that we have some data y1,…,yn, and we want to make a probability prediction for ynew.
We (often) define a parameter θ, and a probabilistic model so that y1,…,yn,ynew are all conditionally independent given θ,
Using strong law of large numbers, we can make predictions by,
Simulating θ1,…,θk from the posterior π(θ∣y1,…,yn).
Averaging,
Eθ∣y1,…,yn[π(ynew∣θ)]≈k1i=1∑kπ(ynew∣θi)
However, simulating from the posterior is often difficult. MCMC can help us here by simulating from a Markov chain having the posterior as its limiting distribution.
By the strong law of large numbers for Markov chains, we still have (as k→∞),