Update monte carlo method descriptions

2025-03-04 17:11:42 -05:00 · 2023-09-28 10:36:18 -04:00 · 2023-09-28 10:36:18 -04:00 · 845965a2b9
commit 845965a2b9
parent 596b124130
2 changed files with 19 additions and 10 deletions
--- a/metropolis.md
+++ b/metropolis.md
@ -1,20 +1,22 @@
 # Metropolis-Hastings algorithm
-Another way to sample a probability distribution is to use a **Markov Chain Monte Carlo** method. In these methods, we generate a sequence (or chain) of $x$ values that sample a given probability distribution $f(x)$. Each sample $x_i$ is generated from the previous one $x_{i-1}$ in some probabilistic way, so $x_i$ depends only on the state before it $x_{i-1}$ (this is the Markov chain part).
+Another way to sample a probability distribution is to use a **Markov Chain Monte Carlo** method. In these methods, we generate a sequence (or chain) of $x$ values that sample a given probability distribution $f(x)$. Each sample $x_i$ is generated from the previous one $x_{i-1}$ in some probabilistic way, so $x_i$ depends only on the state before it $x_{i-1}$ (this is the Markov chain part). By choosing the dependence of $x_i$ on $x_{i-1}$ in an appropriate way, we can generate a sequence $\{x_i\}$ that are random samples from any given probability distribution $f(x)$.
 The **Metropolis-Hastings algorithm** is a simple method to do this. To generate the next value in  the chain $x_{i+1}$ given the current value $x_i$, the procedure is as follows:
- Propose a jump from the current value $x_i=x$ to a new candidate value $x^\prime$ according to a proposal distribution $p(x^\prime|x)$. The proposal distribution should be symmetric, ie. the probability of choosing a new  location $x^\prime$ given that we are now at $x$ should be the same as the probability of choosing $x$ if we were located at $x^\prime$. For example, in 1D we could choose $x^\prime = x_i + \Delta x$ with $\Delta x$ drawn from a Gaussian distribution.
+1. Propose a jump from the current value $x_i=x$ to a new candidate value $x^\prime$ according to a proposal distribution $p(x^\prime|x)$. The proposal distribution should be symmetric, ie. the probability of choosing a new  location $x^\prime$ given that we are now at $x$ should be the same as the probability of choosing $x$ if we were located at $x^\prime$. For example, in 1D we could choose $x^\prime = x_i + \Delta x$ with $\Delta x$ drawn from a Gaussian distribution.
- Evaluate the ratio $\alpha = f(x^\prime)/f(x)$. If $\alpha>1$, ie. the new point is located in a region with larger $f(x)$, then we accept the move and set $x_{i+1} = x^\prime$. If $\alpha<1$, we accept the move with probability $\alpha$. If the move is rejected, then we set $x_{i+1} = x_i$.  (Note that this entire step can be achieved by choosing a uniform deviate $u$ between 0 and 1, and accepting the move if $u<\alpha$.)
+2. Evaluate the ratio $\alpha = f(x^\prime)/f(x)$. If $\alpha>1$, ie. the new point is located in a region with larger $f(x)$, then we accept the move and set $x_{i+1} = x^\prime$. If $\alpha<1$, we accept the move with probability $\alpha$. If the move is rejected, then we set $x_{i+1} = x_i$.  (Note that this entire step can be achieved by choosing a uniform deviate $u$ between 0 and 1, and accepting the move if $u<\alpha$.)
-After an initial *burn-in* phase, the chain will reach equilibrium and the samples $\{x_i\}$ will be distributed according to the probability distribution $f(x)$. The reason this works is that in equilibrium, the rate at which we move from point $x$ to $x^\prime$ should be equal to the rate at which we move in the reverse direction, from $x^\prime$ to $x$. This is the principle of detailed balance. Therefore
+The size of the jump in step 1 is set by the width of the Gaussian used to sample the jump size $\Delta x$. The width should be chosen so that a reasonable number of trial jumps are accepted. Typically an acceptance fraction of $\approx 0.25$ is a good choice.
 After an initial *burn-in* phase, the chain will reach equilibrium and the samples $\{x_i\}$ will be distributed according to the probability distribution $f(x)$. The reason this works is that in equilibrium, the rate at which we move from point $x$ to $x^\prime$ should be equal to the rate at which we move in the reverse direction, from $x^\prime$ to $x$. This is the principle of *detailed balance*. Therefore
 $$n(x) \times (\mathrm{rate\ of}\ x\rightarrow x^\prime)
 =
-n(x^\prime) \times (\mathrm{rate\ of}\ x^\prime\rightarrow x)
+n(x^\prime) \times (\mathrm{rate\ of}\ x^\prime\rightarrow x),
 $$
-
+where $n(x)$ is the density of points at $x$.
 If $x^\prime$ has a larger value of the probability $f(x^\prime)>f(x)$, then the transition from $x$ to $x^\prime$ will always happen (we always accept), i.e. the rate is 1. The rate of the reverse transition from $x^\prime$ to $x$ is then $f(x)/f(x^\prime)<1$. Therefore
 $$n(x) = n(x^\prime) f(x)/f(x^\prime)$$
@ -23,6 +25,11 @@ $${n(x)\over n(x^\prime)}= {f(x)\over f(x^\prime)}$$
 which shows that $n(x)$ will be distributed as $f(x)$.
 ```{admonition} Exercise
 Code up this algorithm and generate $10^4$ samples from the distribution $f(x) = \exp(-\left|x^3\right|)$. Confirm that you are getting the correct distribution by comparing the histogram of $x$-values with the analytic expression.
--- a/monte_carlo_integration.md
+++ b/monte_carlo_integration.md
@ -1,15 +1,17 @@
 # Monte Carlo integration
-In our discussion of the rejection method, you might have wondered whether we could use the samples of $f(x)$ to calculate the integral under the curve. Indeed, for a uniform sampling of $x$ values $\left\{x_i\right\}$ and the corresponding function values $f(x_i)$, 
+In our discussion of the rejection method, you might have wondered whether we could use the samples of $f(x)$ to calculate the integral under the curve. Indeed, for a uniform sampling of $x$ values $\left\{x_i\right\}$ between $x=a$ and $x=b$ and the corresponding function values $f(x_i)$, 
-we can estimate the value of the integral
+we can estimate the value of the integral using
 $$\int_a^b dx f(x) \approx {(b-a)\over N} \sum_i f(x_i).$$
-Depending on the shape of the integral, this estimate can be improved by using a different distribution for the $x$-values which concentrates the samples of $x$ in the regions where the integrand is largest (the ideal case would be to sample from $f(x)$ itself). In this case, we have to rescale the weight of a given $x$-value according to the ratio $f(x)/p(x)$. The integral then becomes
+You can think of this as $\sum_i f(x_i) \Delta x$ with the mean spacing of points $\Delta x = (b-a)/N$.
 Depending on the shape of the integral, this estimate can be improved by using a different distribution for the $x$-values which concentrates the samples of $x$ in the regions where the integrand is largest (the ideal case would be to sample from $f(x)$ itself). In this case, we have to rescale the weight of a given $x$-value according to the ratio $f(x)/p(x)$. The procedure for estimating the integral is: (1) select $N$ samples of $x$ from the probability distribution $p(x)$, and (2) evaluate the integral using
 $$ {1\over N} \sum_i {f(x_i)\over p(x_i)}.$$
-In the case of a uniform distribution, $p(x) = 1/(b-a)$, we recover the expression above. 
+The factor $1/p(x_i)$ measures the effective spacing of the samples at $x=x_i$. If $p(x)$ is large, many points will be chosen near that value of $x$, so the effective spacing between points is small in that region. In the case of a uniform distribution, $p(x) = 1/(b-a)$, we recover our original expression for uniform sampling.
 ```{admonition} Exercise