Introduction to Stochastic Processes
In this part, we will define what a stochastic process. We will further recap some basic probability theory concepts that are essential for understanding stochastic processes.
In Figure 1, we can see that some models make exact predictions (deterministic processes), while others provide predictions with uncertainty (stochastic processes). In most cases, it is more realistic to model systems as stochastic processes, as they often involve inherent randomness and uncertainty.
Note
We will use the terminology stochastic models and probabilistic models interchangeably.
Definition: Stochastic Process
A stochastic process is a collection oif random variables, $\{X_t, t \in \mathcal{I} \}$.
The set $\mathcal{I}$ is the index set of the process. $\mathcal{I}$ most often represents a set of specific times.
The random variables $X_t$ are defined on a common state space $\mathcal{S}$. This set represents the possible values the random variables $X_t$ can take.
Finally, these sets can be discrete or continuous and simple (e.g., $\mathcal{I} = {1, 2, 3}$) or more complex (e.g., vectors of real numbers).
Motivation for Probabilistic Models
One thing we need to consider is, what do we want to do with these models? What is the point? While it is an open question, a easy answer is that we want to set up a model, based on some general knowledge, and make new predictions from the model.
- You have data.
- You want to find a model so that the data could reasonably be produced by the model.
- You want to use this model for predictions of future observations.
Using data in this way is called (statistical) inference.
Definition: Frequentist VS. Bayesian Inference
There are two main paradigms for statistical inference: frequentist inference and Bayesian inference.
In classical (frequentist) inference,
-
We use contextual knowledge to find a model with some unknown parameters. The model relates observed data to future predicitons.
-
Find an estimate for the parameters, using estimators that have desirable properties.
-
Plug the estimates into the model and make predictions.
In Bayesian inference,
-
We use contextual knowledge to find a model with some uncertain parameters. The model relates observed data to future predictions.
-
Find the conditional probability for future predictions given the value of the observed data.
The Markov Property
Definition: Markov Property
A process fulfills the Markov property if, for any $t_0 \in \mathcal{I}$, whenever $X_{t_0}$ is known, $X_t$ (with $t > t_0$) is independent of the values for $X_s, \forall s < t_0$.
Note
Most of the stochastic processes that we will deal with will have the Markov property.
Probability Theory Recap
Intuition: Random Variable
A random variable is a variable which has possible values in some state space $\mathcal{S}$, together with probabilities assigned to values and sets of values in the state space.
We separate between discrete and continuous random variables.
For discrete random variables, we assign a probability to each single value in the state space.
For continuous random variables, we assign probabilities to intervals of values in the state space.
Definition: Measureable Subsets
It turns out to be impossible to define a concept with reasonable properties that assigns a “size” to all subets of for example $\mathbb{R}^N$. Instead we need to restrict the concept of “sizee” to “measurable” sets.
Let $S$ be any set. A sigma-algebra $\Omega$ on $S$ is a set of subsets of $S$ such that,
-
$\Omega$ includes $S$.
-
If $A \in \Omega$, then the complement $A^c \coloneqq S \setminus A$ is also in $\Omega$.
-
If $A_1, A_2, \ldots \in \Omega$, then the union $\bigcup_{i=1}^{\infty} A_i$ is also in $\Omega$.
Measurable sets are those that are in an appropriately defined sigma-algebra.
Note
In $\mathbb{R}^N$, one generally uses the Borel subsets 1
What is most important, when $S$ is finite or countable, all subets will be measurable. When $S$ is some interval of real numbers, there will exist subseets that are not measurable, but we will not be concerned with those.
Definition: Probability Measure and Random Variables
A probability (measure) is a real function $P(\cdot)$ defined on the measureable subsets $A \subseteq S$, satisfying,
-
$0 \leq P(A) \leq 1$ for all measurable subsets $A \subseteq S$.
-
$P(S) = 1$.
-
For any countable collection of disjoint measurable subsets $A_1, A_2, \ldots$, we have $P\left( \bigcup_{i=1}^{\infty} A_i \right) = \sum_{i=1}^{\infty} P(A_i)$.
These are called the Kolmogorov axioms for probability 2.
Measureable subsets $S$ are called events, and $P(S)$ is the probability of the event $S$ occurring.
Thus, a random variable $X$ with state space $\mathcal{S}$ is a real-valued (measurable) function on $S$ together with a probability measure $P(\cdot)$ defined on the measurable subsets of $\mathcal{S}$.
Recall: Conditional Probability and Independence
Given the events $A$ and $B$, the conditional probability of $A$ given $B$ is defined as, $$ P(A \mid B) = \frac{P(A \cap B)}{P(B)} $$ The events $A$ and $B$ are independent if $P(A \cap B) = P(A) P(B)$, or equivalently, if $P(A \mid B) = P(A)$.
The law of total probability states that if $B_1, B_2, \ldots, B_n$ is a sequence of events that partitions S. Then, $$ P(A) = \sum_{i=1}^{n} P(A \cap B_i) = \sum_{i=1}^{n} P(A \mid B_i) P(B_i) $$ Thus, Bayes’ law follow directly from the definition of conditional probability and the law of total probability, $$ P(B \mid A) = \frac{P(A \mid B) P(B)}{P(A)} $$
Notation: The generic $\pi$-notation
We will use the generic $\pi$-notation as a shorthand. We write $\pi(x)$ for $P(X = x)$, $\pi(x, y)$ for $P(X = x, Y = y)$, and $\pi(x \mid y)$ for $P(X = x \mid Y = y)$.
Definition: Conditional Densities for Continuous Distributions
For a continuous random variable $X$, we will write its density function as $\pi(x)$, extending the generic $\pi$-notation. If we have a joint distribution for continuous random variables $X$ and $Y$, we write the joint density as $\pi(x, y)$.
Thus, we get formulas like, $$ \int \pi(x) \ dx = 1 \quad \text{and} \quad \int \pi(x, y) \ dy = \pi(x) $$ We can define the conditional density as, $$ \pi(x \mid y) = \frac{\pi(x, y)}{\pi(y)} $$
Recall: Expectation and Conditional Expectation
Recall, the expectation of a discrete random variable is, $$ \mathbb{E}[Y] = \sum_{y} y \ \pi(y) $$ and of a continuous random variable is, $$ \mathbb{E}[Y] = \int y \ \pi(y) \ dy $$ The conditional expectation in the discrete case is, $$ \mathbb{E}[Y \mid X = x] = \sum_{y} y \ \pi(y \mid x) $$ and in the continuous case is, $$ \mathbb{E}[Y \mid X = x] = \int y \ \pi(y \mid x) \ dy $$
Definition: Law of Total Expectation
If $X$ is a discrete random variable, we get that, $$ \mathbb{E}[Y] = \sum_{x} \mathbb{E}[Y \mid X = x] \ \pi(x) $$ If $X$ is a continuous random variable, we get that, $$ \mathbb{E}[Y] = \int \mathbb{E}[Y \mid X = x] \ \pi(x) \ dx $$ In both cases, this can be written as, $$ \mathbb{E}[Y] = \mathbb{E}[\mathbb{E}[Y \mid X]] $$
Definition: Law of Total Variance
Recall that, by definition, $$ \mathrm{Var}(Y) \coloneqq \mathbb{E}\left[(Y - \mathbb{E}[Y])^2\right] = \mathbb{E}[Y^2] - (\mathbb{E}[Y])^2 $$ Similarly, we have for the conditional variance, $$ \mathrm{Var}(Y \mid X) \coloneqq \mathbb{E}_{Y \mid X = x}\left[(Y - \mathbb{E}[Y \mid X])^2 \mid X\right] $$ With these definitions, we can state the law of total variance as, $$ \mathrm{Var}(Y) = \mathbb{E}[\mathrm{Var}(Y \mid X)] + \mathrm{Var}(\mathbb{E}[Y \mid X]) $$