reformulate shit about stratified sampling

2025-03-05 17:41:41 -05:00 · 2020-05-12 19:20:16 +02:00 · 2020-05-12 19:20:16 +02:00 · e6b0f69502
commit e6b0f69502
parent 9e35ef1a54
2 changed files with 54 additions and 49 deletions
--- a/latex/tex/monte-carlo/integration.tex
+++ b/latex/tex/monte-carlo/integration.tex
@ -223,4 +223,4 @@ sample density and lower weights, flattening out the integrand.
 Generally the result gets better with more increments, but at the cost
 of more \vegas\ iterations. The intermediate values of those
 iterations can be accumulated to improve the accuracy of the end
-result.~\cite[197]{Lepage:19781an}
+result~\cite[197]{Lepage:19781an}.
--- a/latex/tex/monte-carlo/sampling.tex
+++ b/latex/tex/monte-carlo/sampling.tex
@ -7,20 +7,21 @@
 \label{sec:mcsamp}

 Drawing representative samples from a probability distribution (for
-example a differential cross section) from which one can the calculate
-samples from the distribution of other observables without explicit
-transformation of the distribution is another important problem. Here
-the one-dimensional case is discussed. The general case follows by
-sampling the dimensions sequentially.
+example a differential cross section) results in a set of
+\emph{events}, the same kind of data, that is gathered in experiments
+and from which one can the calculate samples from the distribution of
+other observables without explicit transformation of the
+distribution. Here the one-dimensional case is discussed. The general
+case follows by sampling the dimensions sequentially.

 Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
 where \(\Omega = [0, 1]\) without loss of generality. Such a function
 is proportional to a probability density \(\tilde{f}\). When \(X\) is
-a uniformly distributed random variable on~\([0, 1]\) then a sample
-\({x_i}\) of this variable can be transformed into a sample of
-\(Y\sim\tilde{f}\). Let \(x\) be a single sample of \(X\), then a
-sample \(y\) of \(Y\) can be obtained by solving~\eqref{eq:takesample}
-for \(y\).
+a uniformly distributed random variable on~\([0, 1]\) (which can be
+easily generated) then a sample \({x_i}\) of this variable can be
+transformed into a sample of \(Y\sim\tilde{f}\). Let \(x\) be a single
+sample of \(X\), then a sample \(y\) of \(Y\) can be obtained by
+solving~\eqref{eq:takesample} for \(y\).

 \begin{equation}
  \label{eq:takesample}
@ -50,11 +51,11 @@ obtained numerically or one can change variables to simplify.

 \subsection{Hit or Miss}%
 \label{sec:hitmiss}
-
-The problem can be reformulated by introducing a
-positive function \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
-with \(\forall x\in\Omega\colon g(x)\geq
-f(x)\).
+If integrating \(f\) and/or inverting \(F\) is too expensive or a
+fully \(f\)-agnostic method is desired, the problem can be
+reformulated by introducing a positive function
+\(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) with
+\(\forall x\in\Omega\colon g(x)\geq f(x)\).

 Observing~\eqref{eq:takesample2d} suggests, that one generates samples
 which are distributed according to \(g/B\), where
@ -69,20 +70,21 @@ probability~\(f/g\), so that \(g\) cancels out. This method is called
  = \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
 \end{equation}

-The thus obtained samples are then distributed according to \(f/B\) so
-that~\eqref{eq:impsampeff} holds.
+The thus obtained samples are then distributed according to \(f/B\)
+and the total probability of accepting a sample (efficiency
+\(\mathfrak{e}\)) is given by hat~\eqref{eq:impsampeff} holds.

 \begin{equation}
  \label{eq:impsampeff}
-  \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
+  \int_0^1\frac{g(x)}{B}\cdot\frac{f(x)}{g(x)}\dd{x} = \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
 \end{equation}

-This means that not all samples are being accepted and gives a measure
-on the efficiency \(\mathfrak{e}\) of the sampling method. The closer
-\(g\) is to \(f\) the higher is \(\mathfrak{e}\).
+The closer the volumes enclosed by \(g\) and \(f\) are to each other,
+higher is \(\mathfrak{e}\).

-Choosing \(g\) like~\eqref{eq:primitiveg} yields \(y = x\cdot A\), so
-that the procedure simplifies to choosing random numbers
+Choosing \(g\) like~\eqref{eq:primitiveg} and looking back
+at~\eqref{eq:solutionsamp} yields \(y = x\cdot A\), so that the
+sampling procedure simplifies to choosing random numbers
 \(x\in [0,1]\) and accepting them with the probability
 \(f(x)/g(x)\). The efficiency of this approach is related to how much
 \(f\) differs from \(f_{\text{max}}\) which in turn related to the
@ -109,7 +111,7 @@ This very low efficiency stems from the fact, that \(f_{\cos\theta}\)
 is a lot smaller than its upper bound for most of the sampling
 interval.

-\begin{wrapfigure}{l}{.5\textwidth}
+\begin{wrapfigure}[15]{l}{.5\textwidth}
  \plot{xs_sampling/upper_bound}
  \caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
    the form \(a + b\cdot x^2\).}
@ -126,25 +128,28 @@ to~\result{xs/python/eta_eff}, again due to the decrease in variance.

 \subsection{Stratified Sampling}%
 \label{sec:stratsamp}
+Finding a suitable upper bound or variable transform requires effort
+and detail knowledge about the distribution and is hard to
+automate\footnote{Sherpa does in fact do this by looking at the
+  propagators in the matrix elements.}.  Revisiting the idea
+behind~\eqref{eq:takesample2d} but looking at probability density
+\(\rho\) on \(\Omega\) leads to a slight reformulation of the method
+discussed in~\ref{sec:hitmiss}. Note that without loss of generality
+one can again choose \(\Omega = [0, 1]\).

-Revisiting the idea behind~\eqref{eq:takesample2d} but choosing
-\(g=\rho\) where \(\rho\) is a probability density on \(\Omega\)
-leads to another two-stage process. Note that without loss of
-generality one can choose \(\Omega = [0, 1]\) as is done here.
+Define \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
+\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and
+accept each sample point with the probability
+\(f(x_i)/(\rho(x_i)\cdot h)\).  This is very similar to the procedure
+described in~\ref{sec:hitmiss} with \(g=\rho\cdot h\), but here the
+step of generating samples distributed according to \(\rho\) is left
+out.

-Assume that a sample \(\{x_i\}\) of \(f/\rho\) has been obtained
-through by the means of~\ref{sec:mcsamp}
-and~\ref{sec:hitmiss}. Accepting each sample item \(x_i\) with the
-probability \(\rho(x_i)\) will cancel out the \(\rho^{-1}\) factor and
-the resulting sample will be distributed according to \(f\). Now,
-instead of discarding samples, one can combine this idea with the hit
-and miss method with a constant upper bound. Define
-\(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
-\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and accept
-each sample point with the probability \(f(x_i)/(\rho(x_i)\cdot
-h)\). The resulting probability that \(x_i\in[x, x+\dd{x}]\) is
-\(\rho(x)\cdot f(x)/(\rho(x)\cdot h)\dd{x}=f(x)\dd{x}/h\). The efficiency
-of this method is given by~\eqref{eq:strateff}.
+The important benefit of this method is, that step of generating
+samples according to some other function \(g\) is no longer
+necessary. This is useful when samples of \(\rho\) can be obtained
+with little effort (see below). The efficiency of this method is given
+by~\eqref{eq:strateff}.

 \begin{equation}
  \label{eq:strateff}
@ -156,7 +161,7 @@ It may seem startling that \(h\) determines the efficiency, because
 but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
 (\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
 point, being the maximum is a global property and there is also the
-constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
+constraint \(\int_0^1\rho(x)\dd{x}=1\) to be considered.

 \begin{equation}
  \label{eq:hlessa}
@ -164,17 +169,17 @@ constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
  \int_0^1\rho(x)\cdot h\dd{x} = h
 \end{equation}

-
 The closer \(h\) approaches \(A\) the better the efficiency gets. In
 the optimal case \(\rho=f/A\) and thus \(h=A\) or
 \(\mathfrak{e} = 1\). Now this distribution can be approximated in the
 way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
-by~\vegas. The distribution \(\rho\) takes on the
-form~\eqref{eq:vegasrho}. The effect of this approach is visualized
-in~\ref{fig:vegasdist} and the resulting sampling efficiency
-\result{xs/python/strat_th_samp} (using
+by~\vegas and simply generating the same number of uniformly
+distributed samples in each hypercube (stratified sampling). The
+distribution \(\rho\) takes on the form~\eqref{eq:vegasrho}. The
+effect of this approach is visualized in~\ref{fig:vegasdist} and the
+resulting sampling efficiency \result{xs/python/strat_th_samp} (using
 \result{xs/python/vegas_samp_num_increments} increments) is a great
-improvement over the hit or miss method in \ref{sec:hitmiss}. By using
+improvement over the hit or miss method in~\ref{sec:hitmiss}. By using
 more increments better efficiencies can be achieved, although the
 run-time of \vegas\ increases. The advantage of \vegas\ in this
 situation is, that the computation of the increments has to be done