Update monte carlo method descriptions

This commit is contained in:
Andrew Cumming 2023-09-28 10:36:18 -04:00
parent 596b124130
commit 845965a2b9
2 changed files with 19 additions and 10 deletions

View file

@ -1,20 +1,22 @@
# Metropolis-Hastings algorithm # Metropolis-Hastings algorithm
Another way to sample a probability distribution is to use a **Markov Chain Monte Carlo** method. In these methods, we generate a sequence (or chain) of $x$ values that sample a given probability distribution $f(x)$. Each sample $x_i$ is generated from the previous one $x_{i-1}$ in some probabilistic way, so $x_i$ depends only on the state before it $x_{i-1}$ (this is the Markov chain part). Another way to sample a probability distribution is to use a **Markov Chain Monte Carlo** method. In these methods, we generate a sequence (or chain) of $x$ values that sample a given probability distribution $f(x)$. Each sample $x_i$ is generated from the previous one $x_{i-1}$ in some probabilistic way, so $x_i$ depends only on the state before it $x_{i-1}$ (this is the Markov chain part). By choosing the dependence of $x_i$ on $x_{i-1}$ in an appropriate way, we can generate a sequence $\{x_i\}$ that are random samples from any given probability distribution $f(x)$.
The **Metropolis-Hastings algorithm** is a simple method to do this. To generate the next value in the chain $x_{i+1}$ given the current value $x_i$, the procedure is as follows: The **Metropolis-Hastings algorithm** is a simple method to do this. To generate the next value in the chain $x_{i+1}$ given the current value $x_i$, the procedure is as follows:
- Propose a jump from the current value $x_i=x$ to a new candidate value $x^\prime$ according to a proposal distribution $p(x^\prime|x)$. The proposal distribution should be symmetric, ie. the probability of choosing a new location $x^\prime$ given that we are now at $x$ should be the same as the probability of choosing $x$ if we were located at $x^\prime$. For example, in 1D we could choose $x^\prime = x_i + \Delta x$ with $\Delta x$ drawn from a Gaussian distribution. 1. Propose a jump from the current value $x_i=x$ to a new candidate value $x^\prime$ according to a proposal distribution $p(x^\prime|x)$. The proposal distribution should be symmetric, ie. the probability of choosing a new location $x^\prime$ given that we are now at $x$ should be the same as the probability of choosing $x$ if we were located at $x^\prime$. For example, in 1D we could choose $x^\prime = x_i + \Delta x$ with $\Delta x$ drawn from a Gaussian distribution.
- Evaluate the ratio $\alpha = f(x^\prime)/f(x)$. If $\alpha>1$, ie. the new point is located in a region with larger $f(x)$, then we accept the move and set $x_{i+1} = x^\prime$. If $\alpha<1$, we accept the move with probability $\alpha$. If the move is rejected, then we set $x_{i+1} = x_i$. (Note that this entire step can be achieved by choosing a uniform deviate $u$ between 0 and 1, and accepting the move if $u<\alpha$.) 2. Evaluate the ratio $\alpha = f(x^\prime)/f(x)$. If $\alpha>1$, ie. the new point is located in a region with larger $f(x)$, then we accept the move and set $x_{i+1} = x^\prime$. If $\alpha<1$, we accept the move with probability $\alpha$. If the move is rejected, then we set $x_{i+1} = x_i$. (Note that this entire step can be achieved by choosing a uniform deviate $u$ between 0 and 1, and accepting the move if $u<\alpha$.)
After an initial *burn-in* phase, the chain will reach equilibrium and the samples $\{x_i\}$ will be distributed according to the probability distribution $f(x)$. The reason this works is that in equilibrium, the rate at which we move from point $x$ to $x^\prime$ should be equal to the rate at which we move in the reverse direction, from $x^\prime$ to $x$. This is the principle of detailed balance. Therefore The size of the jump in step 1 is set by the width of the Gaussian used to sample the jump size $\Delta x$. The width should be chosen so that a reasonable number of trial jumps are accepted. Typically an acceptance fraction of $\approx 0.25$ is a good choice.
After an initial *burn-in* phase, the chain will reach equilibrium and the samples $\{x_i\}$ will be distributed according to the probability distribution $f(x)$. The reason this works is that in equilibrium, the rate at which we move from point $x$ to $x^\prime$ should be equal to the rate at which we move in the reverse direction, from $x^\prime$ to $x$. This is the principle of *detailed balance*. Therefore
$$n(x) \times (\mathrm{rate\ of}\ x\rightarrow x^\prime) $$n(x) \times (\mathrm{rate\ of}\ x\rightarrow x^\prime)
= =
n(x^\prime) \times (\mathrm{rate\ of}\ x^\prime\rightarrow x) n(x^\prime) \times (\mathrm{rate\ of}\ x^\prime\rightarrow x),
$$ $$
where $n(x)$ is the density of points at $x$.
If $x^\prime$ has a larger value of the probability $f(x^\prime)>f(x)$, then the transition from $x$ to $x^\prime$ will always happen (we always accept), i.e. the rate is 1. The rate of the reverse transition from $x^\prime$ to $x$ is then $f(x)/f(x^\prime)<1$. Therefore If $x^\prime$ has a larger value of the probability $f(x^\prime)>f(x)$, then the transition from $x$ to $x^\prime$ will always happen (we always accept), i.e. the rate is 1. The rate of the reverse transition from $x^\prime$ to $x$ is then $f(x)/f(x^\prime)<1$. Therefore
$$n(x) = n(x^\prime) f(x)/f(x^\prime)$$ $$n(x) = n(x^\prime) f(x)/f(x^\prime)$$
@ -23,6 +25,11 @@ $${n(x)\over n(x^\prime)}= {f(x)\over f(x^\prime)}$$
which shows that $n(x)$ will be distributed as $f(x)$. which shows that $n(x)$ will be distributed as $f(x)$.
```{admonition} Exercise ```{admonition} Exercise
Code up this algorithm and generate $10^4$ samples from the distribution $f(x) = \exp(-\left|x^3\right|)$. Confirm that you are getting the correct distribution by comparing the histogram of $x$-values with the analytic expression. Code up this algorithm and generate $10^4$ samples from the distribution $f(x) = \exp(-\left|x^3\right|)$. Confirm that you are getting the correct distribution by comparing the histogram of $x$-values with the analytic expression.

View file

@ -1,15 +1,17 @@
# Monte Carlo integration # Monte Carlo integration
In our discussion of the rejection method, you might have wondered whether we could use the samples of $f(x)$ to calculate the integral under the curve. Indeed, for a uniform sampling of $x$ values $\left\{x_i\right\}$ and the corresponding function values $f(x_i)$, In our discussion of the rejection method, you might have wondered whether we could use the samples of $f(x)$ to calculate the integral under the curve. Indeed, for a uniform sampling of $x$ values $\left\{x_i\right\}$ between $x=a$ and $x=b$ and the corresponding function values $f(x_i)$,
we can estimate the value of the integral we can estimate the value of the integral using
$$\int_a^b dx f(x) \approx {(b-a)\over N} \sum_i f(x_i).$$ $$\int_a^b dx f(x) \approx {(b-a)\over N} \sum_i f(x_i).$$
Depending on the shape of the integral, this estimate can be improved by using a different distribution for the $x$-values which concentrates the samples of $x$ in the regions where the integrand is largest (the ideal case would be to sample from $f(x)$ itself). In this case, we have to rescale the weight of a given $x$-value according to the ratio $f(x)/p(x)$. The integral then becomes You can think of this as $\sum_i f(x_i) \Delta x$ with the mean spacing of points $\Delta x = (b-a)/N$.
Depending on the shape of the integral, this estimate can be improved by using a different distribution for the $x$-values which concentrates the samples of $x$ in the regions where the integrand is largest (the ideal case would be to sample from $f(x)$ itself). In this case, we have to rescale the weight of a given $x$-value according to the ratio $f(x)/p(x)$. The procedure for estimating the integral is: (1) select $N$ samples of $x$ from the probability distribution $p(x)$, and (2) evaluate the integral using
$$ {1\over N} \sum_i {f(x_i)\over p(x_i)}.$$ $$ {1\over N} \sum_i {f(x_i)\over p(x_i)}.$$
In the case of a uniform distribution, $p(x) = 1/(b-a)$, we recover the expression above. The factor $1/p(x_i)$ measures the effective spacing of the samples at $x=x_i$. If $p(x)$ is large, many points will be chosen near that value of $x$, so the effective spacing between points is small in that region. In the case of a uniform distribution, $p(x) = 1/(b-a)$, we recover our original expression for uniform sampling.
```{admonition} Exercise ```{admonition} Exercise