reformulate shit about stratified sampling

This commit is contained in:
hiro98 2020-05-12 19:20:16 +02:00
parent 9e35ef1a54
commit e6b0f69502
2 changed files with 54 additions and 49 deletions

View file

@ -223,4 +223,4 @@ sample density and lower weights, flattening out the integrand.
Generally the result gets better with more increments, but at the cost Generally the result gets better with more increments, but at the cost
of more \vegas\ iterations. The intermediate values of those of more \vegas\ iterations. The intermediate values of those
iterations can be accumulated to improve the accuracy of the end iterations can be accumulated to improve the accuracy of the end
result.~\cite[197]{Lepage:19781an} result~\cite[197]{Lepage:19781an}.

View file

@ -7,20 +7,21 @@
\label{sec:mcsamp} \label{sec:mcsamp}
Drawing representative samples from a probability distribution (for Drawing representative samples from a probability distribution (for
example a differential cross section) from which one can the calculate example a differential cross section) results in a set of
samples from the distribution of other observables without explicit \emph{events}, the same kind of data, that is gathered in experiments
transformation of the distribution is another important problem. Here and from which one can the calculate samples from the distribution of
the one-dimensional case is discussed. The general case follows by other observables without explicit transformation of the
sampling the dimensions sequentially. distribution. Here the one-dimensional case is discussed. The general
case follows by sampling the dimensions sequentially.
Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
where \(\Omega = [0, 1]\) without loss of generality. Such a function where \(\Omega = [0, 1]\) without loss of generality. Such a function
is proportional to a probability density \(\tilde{f}\). When \(X\) is is proportional to a probability density \(\tilde{f}\). When \(X\) is
a uniformly distributed random variable on~\([0, 1]\) then a sample a uniformly distributed random variable on~\([0, 1]\) (which can be
\({x_i}\) of this variable can be transformed into a sample of easily generated) then a sample \({x_i}\) of this variable can be
\(Y\sim\tilde{f}\). Let \(x\) be a single sample of \(X\), then a transformed into a sample of \(Y\sim\tilde{f}\). Let \(x\) be a single
sample \(y\) of \(Y\) can be obtained by solving~\eqref{eq:takesample} sample of \(X\), then a sample \(y\) of \(Y\) can be obtained by
for \(y\). solving~\eqref{eq:takesample} for \(y\).
\begin{equation} \begin{equation}
\label{eq:takesample} \label{eq:takesample}
@ -50,11 +51,11 @@ obtained numerically or one can change variables to simplify.
\subsection{Hit or Miss}% \subsection{Hit or Miss}%
\label{sec:hitmiss} \label{sec:hitmiss}
If integrating \(f\) and/or inverting \(F\) is too expensive or a
The problem can be reformulated by introducing a fully \(f\)-agnostic method is desired, the problem can be
positive function \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) reformulated by introducing a positive function
with \(\forall x\in\Omega\colon g(x)\geq \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) with
f(x)\). \(\forall x\in\Omega\colon g(x)\geq f(x)\).
Observing~\eqref{eq:takesample2d} suggests, that one generates samples Observing~\eqref{eq:takesample2d} suggests, that one generates samples
which are distributed according to \(g/B\), where which are distributed according to \(g/B\), where
@ -69,20 +70,21 @@ probability~\(f/g\), so that \(g\) cancels out. This method is called
= \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'} = \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
\end{equation} \end{equation}
The thus obtained samples are then distributed according to \(f/B\) so The thus obtained samples are then distributed according to \(f/B\)
that~\eqref{eq:impsampeff} holds. and the total probability of accepting a sample (efficiency
\(\mathfrak{e}\)) is given by hat~\eqref{eq:impsampeff} holds.
\begin{equation} \begin{equation}
\label{eq:impsampeff} \label{eq:impsampeff}
\int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1 \int_0^1\frac{g(x)}{B}\cdot\frac{f(x)}{g(x)}\dd{x} = \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
\end{equation} \end{equation}
This means that not all samples are being accepted and gives a measure The closer the volumes enclosed by \(g\) and \(f\) are to each other,
on the efficiency \(\mathfrak{e}\) of the sampling method. The closer higher is \(\mathfrak{e}\).
\(g\) is to \(f\) the higher is \(\mathfrak{e}\).
Choosing \(g\) like~\eqref{eq:primitiveg} yields \(y = x\cdot A\), so Choosing \(g\) like~\eqref{eq:primitiveg} and looking back
that the procedure simplifies to choosing random numbers at~\eqref{eq:solutionsamp} yields \(y = x\cdot A\), so that the
sampling procedure simplifies to choosing random numbers
\(x\in [0,1]\) and accepting them with the probability \(x\in [0,1]\) and accepting them with the probability
\(f(x)/g(x)\). The efficiency of this approach is related to how much \(f(x)/g(x)\). The efficiency of this approach is related to how much
\(f\) differs from \(f_{\text{max}}\) which in turn related to the \(f\) differs from \(f_{\text{max}}\) which in turn related to the
@ -109,7 +111,7 @@ This very low efficiency stems from the fact, that \(f_{\cos\theta}\)
is a lot smaller than its upper bound for most of the sampling is a lot smaller than its upper bound for most of the sampling
interval. interval.
\begin{wrapfigure}{l}{.5\textwidth} \begin{wrapfigure}[15]{l}{.5\textwidth}
\plot{xs_sampling/upper_bound} \plot{xs_sampling/upper_bound}
\caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of \caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
the form \(a + b\cdot x^2\).} the form \(a + b\cdot x^2\).}
@ -126,25 +128,28 @@ to~\result{xs/python/eta_eff}, again due to the decrease in variance.
\subsection{Stratified Sampling}% \subsection{Stratified Sampling}%
\label{sec:stratsamp} \label{sec:stratsamp}
Finding a suitable upper bound or variable transform requires effort
and detail knowledge about the distribution and is hard to
automate\footnote{Sherpa does in fact do this by looking at the
propagators in the matrix elements.}. Revisiting the idea
behind~\eqref{eq:takesample2d} but looking at probability density
\(\rho\) on \(\Omega\) leads to a slight reformulation of the method
discussed in~\ref{sec:hitmiss}. Note that without loss of generality
one can again choose \(\Omega = [0, 1]\).
Revisiting the idea behind~\eqref{eq:takesample2d} but choosing Define \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
\(g=\rho\) where \(\rho\) is a probability density on \(\Omega\) \(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and
leads to another two-stage process. Note that without loss of accept each sample point with the probability
generality one can choose \(\Omega = [0, 1]\) as is done here. \(f(x_i)/(\rho(x_i)\cdot h)\). This is very similar to the procedure
described in~\ref{sec:hitmiss} with \(g=\rho\cdot h\), but here the
step of generating samples distributed according to \(\rho\) is left
out.
Assume that a sample \(\{x_i\}\) of \(f/\rho\) has been obtained The important benefit of this method is, that step of generating
through by the means of~\ref{sec:mcsamp} samples according to some other function \(g\) is no longer
and~\ref{sec:hitmiss}. Accepting each sample item \(x_i\) with the necessary. This is useful when samples of \(\rho\) can be obtained
probability \(\rho(x_i)\) will cancel out the \(\rho^{-1}\) factor and with little effort (see below). The efficiency of this method is given
the resulting sample will be distributed according to \(f\). Now, by~\eqref{eq:strateff}.
instead of discarding samples, one can combine this idea with the hit
and miss method with a constant upper bound. Define
\(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and accept
each sample point with the probability \(f(x_i)/(\rho(x_i)\cdot
h)\). The resulting probability that \(x_i\in[x, x+\dd{x}]\) is
\(\rho(x)\cdot f(x)/(\rho(x)\cdot h)\dd{x}=f(x)\dd{x}/h\). The efficiency
of this method is given by~\eqref{eq:strateff}.
\begin{equation} \begin{equation}
\label{eq:strateff} \label{eq:strateff}
@ -156,7 +161,7 @@ It may seem startling that \(h\) determines the efficiency, because
but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
(\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single (\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
point, being the maximum is a global property and there is also the point, being the maximum is a global property and there is also the
constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered. constraint \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
\begin{equation} \begin{equation}
\label{eq:hlessa} \label{eq:hlessa}
@ -164,17 +169,17 @@ constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
\int_0^1\rho(x)\cdot h\dd{x} = h \int_0^1\rho(x)\cdot h\dd{x} = h
\end{equation} \end{equation}
The closer \(h\) approaches \(A\) the better the efficiency gets. In The closer \(h\) approaches \(A\) the better the efficiency gets. In
the optimal case \(\rho=f/A\) and thus \(h=A\) or the optimal case \(\rho=f/A\) and thus \(h=A\) or
\(\mathfrak{e} = 1\). Now this distribution can be approximated in the \(\mathfrak{e} = 1\). Now this distribution can be approximated in the
way discussed in~\ref{sec:mcintvegas} by using the hypercubes found way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
by~\vegas. The distribution \(\rho\) takes on the by~\vegas and simply generating the same number of uniformly
form~\eqref{eq:vegasrho}. The effect of this approach is visualized distributed samples in each hypercube (stratified sampling). The
in~\ref{fig:vegasdist} and the resulting sampling efficiency distribution \(\rho\) takes on the form~\eqref{eq:vegasrho}. The
\result{xs/python/strat_th_samp} (using effect of this approach is visualized in~\ref{fig:vegasdist} and the
resulting sampling efficiency \result{xs/python/strat_th_samp} (using
\result{xs/python/vegas_samp_num_increments} increments) is a great \result{xs/python/vegas_samp_num_increments} increments) is a great
improvement over the hit or miss method in \ref{sec:hitmiss}. By using improvement over the hit or miss method in~\ref{sec:hitmiss}. By using
more increments better efficiencies can be achieved, although the more increments better efficiencies can be achieved, although the
run-time of \vegas\ increases. The advantage of \vegas\ in this run-time of \vegas\ increases. The advantage of \vegas\ in this
situation is, that the computation of the increments has to be done situation is, that the computation of the increments has to be done