reformulate shit about stratified sampling

2025-03-06 01:51:38 -05:00 · 2020-05-12 19:20:16 +02:00 · 2020-05-12 19:20:16 +02:00 · e6b0f69502
commit e6b0f69502
parent 9e35ef1a54
2 changed files with 54 additions and 49 deletions
--- a/latex/tex/monte-carlo/integration.tex
+++ b/latex/tex/monte-carlo/integration.tex
@ -223,4 +223,4 @@ sample density and lower weights, flattening out the integrand.
 Generally the result gets better with more increments, but at the cost
 of more \vegas\ iterations. The intermediate values of those
 iterations can be accumulated to improve the accuracy of the end
-result.~\cite[197]{Lepage:19781an}
+result~\cite[197]{Lepage:19781an}.
--- a/latex/tex/monte-carlo/sampling.tex
+++ b/latex/tex/monte-carlo/sampling.tex
@ -7,20 +7,21 @@
 \label{sec:mcsamp}
 Drawing representative samples from a probability distribution (for
-example a differential cross section) from which one can the calculate
+example a differential cross section) results in a set of
-samples from the distribution of other observables without explicit
+\emph{events}, the same kind of data, that is gathered in experiments
-transformation of the distribution is another important problem. Here
+and from which one can the calculate samples from the distribution of
-the one-dimensional case is discussed. The general case follows by
+other observables without explicit transformation of the
-sampling the dimensions sequentially.
+distribution. Here the one-dimensional case is discussed. The general
 case follows by sampling the dimensions sequentially.
 Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
 where \(\Omega = [0, 1]\) without loss of generality. Such a function
 is proportional to a probability density \(\tilde{f}\). When \(X\) is
-a uniformly distributed random variable on~\([0, 1]\) then a sample
+a uniformly distributed random variable on~\([0, 1]\) (which can be
-\({x_i}\) of this variable can be transformed into a sample of
+easily generated) then a sample \({x_i}\) of this variable can be
-\(Y\sim\tilde{f}\). Let \(x\) be a single sample of \(X\), then a
+transformed into a sample of \(Y\sim\tilde{f}\). Let \(x\) be a single
-sample \(y\) of \(Y\) can be obtained by solving~\eqref{eq:takesample}
+sample of \(X\), then a sample \(y\) of \(Y\) can be obtained by
-for \(y\).
+solving~\eqref{eq:takesample} for \(y\).
 \begin{equation}
  \label{eq:takesample}
@ -50,11 +51,11 @@ obtained numerically or one can change variables to simplify.
 \subsection{Hit or Miss}%
 \label{sec:hitmiss}
-
+If integrating \(f\) and/or inverting \(F\) is too expensive or a
-The problem can be reformulated by introducing a
+fully \(f\)-agnostic method is desired, the problem can be
-positive function \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
+reformulated by introducing a positive function
-with \(\forall x\in\Omega\colon g(x)\geq
+\(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) with
-f(x)\).
+\(\forall x\in\Omega\colon g(x)\geq f(x)\).
 Observing~\eqref{eq:takesample2d} suggests, that one generates samples
 which are distributed according to \(g/B\), where
@ -69,20 +70,21 @@ probability~\(f/g\), so that \(g\) cancels out. This method is called
  = \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
 \end{equation}
-The thus obtained samples are then distributed according to \(f/B\) so
+The thus obtained samples are then distributed according to \(f/B\)
-that~\eqref{eq:impsampeff} holds.
+and the total probability of accepting a sample (efficiency
 \(\mathfrak{e}\)) is given by hat~\eqref{eq:impsampeff} holds.
 \begin{equation}
  \label{eq:impsampeff}
-  \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
+  \int_0^1\frac{g(x)}{B}\cdot\frac{f(x)}{g(x)}\dd{x} = \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
 \end{equation}
-This means that not all samples are being accepted and gives a measure
+The closer the volumes enclosed by \(g\) and \(f\) are to each other,
-on the efficiency \(\mathfrak{e}\) of the sampling method. The closer
+higher is \(\mathfrak{e}\).
 \(g\) is to \(f\) the higher is \(\mathfrak{e}\).
-Choosing \(g\) like~\eqref{eq:primitiveg} yields \(y = x\cdot A\), so
+Choosing \(g\) like~\eqref{eq:primitiveg} and looking back
-that the procedure simplifies to choosing random numbers
+at~\eqref{eq:solutionsamp} yields \(y = x\cdot A\), so that the
 sampling procedure simplifies to choosing random numbers
 \(x\in [0,1]\) and accepting them with the probability
 \(f(x)/g(x)\). The efficiency of this approach is related to how much
 \(f\) differs from \(f_{\text{max}}\) which in turn related to the
@ -109,7 +111,7 @@ This very low efficiency stems from the fact, that \(f_{\cos\theta}\)
 is a lot smaller than its upper bound for most of the sampling
 interval.
-\begin{wrapfigure}{l}{.5\textwidth}
+\begin{wrapfigure}[15]{l}{.5\textwidth}
  \plot{xs_sampling/upper_bound}
  \caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
    the form \(a + b\cdot x^2\).}
@ -126,25 +128,28 @@ to~\result{xs/python/eta_eff}, again due to the decrease in variance.
 \subsection{Stratified Sampling}%
 \label{sec:stratsamp}
 Finding a suitable upper bound or variable transform requires effort
 and detail knowledge about the distribution and is hard to
 automate\footnote{Sherpa does in fact do this by looking at the
  propagators in the matrix elements.}.  Revisiting the idea
 behind~\eqref{eq:takesample2d} but looking at probability density
 \(\rho\) on \(\Omega\) leads to a slight reformulation of the method
 discussed in~\ref{sec:hitmiss}. Note that without loss of generality
 one can again choose \(\Omega = [0, 1]\).
-Revisiting the idea behind~\eqref{eq:takesample2d} but choosing
+Define \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
-\(g=\rho\) where \(\rho\) is a probability density on \(\Omega\)
+\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and
-leads to another two-stage process. Note that without loss of
+accept each sample point with the probability
-generality one can choose \(\Omega = [0, 1]\) as is done here.
+\(f(x_i)/(\rho(x_i)\cdot h)\).  This is very similar to the procedure
 described in~\ref{sec:hitmiss} with \(g=\rho\cdot h\), but here the
 step of generating samples distributed according to \(\rho\) is left
 out.
-Assume that a sample \(\{x_i\}\) of \(f/\rho\) has been obtained
+The important benefit of this method is, that step of generating
-through by the means of~\ref{sec:mcsamp}
+samples according to some other function \(g\) is no longer
-and~\ref{sec:hitmiss}. Accepting each sample item \(x_i\) with the
+necessary. This is useful when samples of \(\rho\) can be obtained
-probability \(\rho(x_i)\) will cancel out the \(\rho^{-1}\) factor and
+with little effort (see below). The efficiency of this method is given
-the resulting sample will be distributed according to \(f\). Now,
+by~\eqref{eq:strateff}.
 instead of discarding samples, one can combine this idea with the hit
 and miss method with a constant upper bound. Define
 \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
 \(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and accept
 each sample point with the probability \(f(x_i)/(\rho(x_i)\cdot
 h)\). The resulting probability that \(x_i\in[x, x+\dd{x}]\) is
 \(\rho(x)\cdot f(x)/(\rho(x)\cdot h)\dd{x}=f(x)\dd{x}/h\). The efficiency
 of this method is given by~\eqref{eq:strateff}.
 \begin{equation}
  \label{eq:strateff}
@ -156,7 +161,7 @@ It may seem startling that \(h\) determines the efficiency, because
 but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
 (\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
 point, being the maximum is a global property and there is also the
-constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
+constraint \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
 \begin{equation}
  \label{eq:hlessa}
@ -164,17 +169,17 @@ constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
  \int_0^1\rho(x)\cdot h\dd{x} = h
 \end{equation}
 The closer \(h\) approaches \(A\) the better the efficiency gets. In
 the optimal case \(\rho=f/A\) and thus \(h=A\) or
 \(\mathfrak{e} = 1\). Now this distribution can be approximated in the
 way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
-by~\vegas. The distribution \(\rho\) takes on the
+by~\vegas and simply generating the same number of uniformly
-form~\eqref{eq:vegasrho}. The effect of this approach is visualized
+distributed samples in each hypercube (stratified sampling). The
-in~\ref{fig:vegasdist} and the resulting sampling efficiency
+distribution \(\rho\) takes on the form~\eqref{eq:vegasrho}. The
-\result{xs/python/strat_th_samp} (using
+effect of this approach is visualized in~\ref{fig:vegasdist} and the
 resulting sampling efficiency \result{xs/python/strat_th_samp} (using
 \result{xs/python/vegas_samp_num_increments} increments) is a great
-improvement over the hit or miss method in \ref{sec:hitmiss}. By using
+improvement over the hit or miss method in~\ref{sec:hitmiss}. By using
 more increments better efficiencies can be achieved, although the
 run-time of \vegas\ increases. The advantage of \vegas\ in this
 situation is, that the computation of the increments has to be done