reformulate shit about stratified sampling

This commit is contained in:
hiro98 2020-05-12 19:20:16 +02:00
parent 9e35ef1a54
commit e6b0f69502
2 changed files with 54 additions and 49 deletions

View file

@ -223,4 +223,4 @@ sample density and lower weights, flattening out the integrand.
Generally the result gets better with more increments, but at the cost
of more \vegas\ iterations. The intermediate values of those
iterations can be accumulated to improve the accuracy of the end
result.~\cite[197]{Lepage:19781an}
result~\cite[197]{Lepage:19781an}.

View file

@ -7,20 +7,21 @@
\label{sec:mcsamp}
Drawing representative samples from a probability distribution (for
example a differential cross section) from which one can the calculate
samples from the distribution of other observables without explicit
transformation of the distribution is another important problem. Here
the one-dimensional case is discussed. The general case follows by
sampling the dimensions sequentially.
example a differential cross section) results in a set of
\emph{events}, the same kind of data, that is gathered in experiments
and from which one can the calculate samples from the distribution of
other observables without explicit transformation of the
distribution. Here the one-dimensional case is discussed. The general
case follows by sampling the dimensions sequentially.
Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
where \(\Omega = [0, 1]\) without loss of generality. Such a function
is proportional to a probability density \(\tilde{f}\). When \(X\) is
a uniformly distributed random variable on~\([0, 1]\) then a sample
\({x_i}\) of this variable can be transformed into a sample of
\(Y\sim\tilde{f}\). Let \(x\) be a single sample of \(X\), then a
sample \(y\) of \(Y\) can be obtained by solving~\eqref{eq:takesample}
for \(y\).
a uniformly distributed random variable on~\([0, 1]\) (which can be
easily generated) then a sample \({x_i}\) of this variable can be
transformed into a sample of \(Y\sim\tilde{f}\). Let \(x\) be a single
sample of \(X\), then a sample \(y\) of \(Y\) can be obtained by
solving~\eqref{eq:takesample} for \(y\).
\begin{equation}
\label{eq:takesample}
@ -50,11 +51,11 @@ obtained numerically or one can change variables to simplify.
\subsection{Hit or Miss}%
\label{sec:hitmiss}
The problem can be reformulated by introducing a
positive function \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
with \(\forall x\in\Omega\colon g(x)\geq
f(x)\).
If integrating \(f\) and/or inverting \(F\) is too expensive or a
fully \(f\)-agnostic method is desired, the problem can be
reformulated by introducing a positive function
\(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) with
\(\forall x\in\Omega\colon g(x)\geq f(x)\).
Observing~\eqref{eq:takesample2d} suggests, that one generates samples
which are distributed according to \(g/B\), where
@ -69,20 +70,21 @@ probability~\(f/g\), so that \(g\) cancels out. This method is called
= \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
\end{equation}
The thus obtained samples are then distributed according to \(f/B\) so
that~\eqref{eq:impsampeff} holds.
The thus obtained samples are then distributed according to \(f/B\)
and the total probability of accepting a sample (efficiency
\(\mathfrak{e}\)) is given by hat~\eqref{eq:impsampeff} holds.
\begin{equation}
\label{eq:impsampeff}
\int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
\int_0^1\frac{g(x)}{B}\cdot\frac{f(x)}{g(x)}\dd{x} = \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
\end{equation}
This means that not all samples are being accepted and gives a measure
on the efficiency \(\mathfrak{e}\) of the sampling method. The closer
\(g\) is to \(f\) the higher is \(\mathfrak{e}\).
The closer the volumes enclosed by \(g\) and \(f\) are to each other,
higher is \(\mathfrak{e}\).
Choosing \(g\) like~\eqref{eq:primitiveg} yields \(y = x\cdot A\), so
that the procedure simplifies to choosing random numbers
Choosing \(g\) like~\eqref{eq:primitiveg} and looking back
at~\eqref{eq:solutionsamp} yields \(y = x\cdot A\), so that the
sampling procedure simplifies to choosing random numbers
\(x\in [0,1]\) and accepting them with the probability
\(f(x)/g(x)\). The efficiency of this approach is related to how much
\(f\) differs from \(f_{\text{max}}\) which in turn related to the
@ -109,7 +111,7 @@ This very low efficiency stems from the fact, that \(f_{\cos\theta}\)
is a lot smaller than its upper bound for most of the sampling
interval.
\begin{wrapfigure}{l}{.5\textwidth}
\begin{wrapfigure}[15]{l}{.5\textwidth}
\plot{xs_sampling/upper_bound}
\caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
the form \(a + b\cdot x^2\).}
@ -126,25 +128,28 @@ to~\result{xs/python/eta_eff}, again due to the decrease in variance.
\subsection{Stratified Sampling}%
\label{sec:stratsamp}
Finding a suitable upper bound or variable transform requires effort
and detail knowledge about the distribution and is hard to
automate\footnote{Sherpa does in fact do this by looking at the
propagators in the matrix elements.}. Revisiting the idea
behind~\eqref{eq:takesample2d} but looking at probability density
\(\rho\) on \(\Omega\) leads to a slight reformulation of the method
discussed in~\ref{sec:hitmiss}. Note that without loss of generality
one can again choose \(\Omega = [0, 1]\).
Revisiting the idea behind~\eqref{eq:takesample2d} but choosing
\(g=\rho\) where \(\rho\) is a probability density on \(\Omega\)
leads to another two-stage process. Note that without loss of
generality one can choose \(\Omega = [0, 1]\) as is done here.
Define \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and
accept each sample point with the probability
\(f(x_i)/(\rho(x_i)\cdot h)\). This is very similar to the procedure
described in~\ref{sec:hitmiss} with \(g=\rho\cdot h\), but here the
step of generating samples distributed according to \(\rho\) is left
out.
Assume that a sample \(\{x_i\}\) of \(f/\rho\) has been obtained
through by the means of~\ref{sec:mcsamp}
and~\ref{sec:hitmiss}. Accepting each sample item \(x_i\) with the
probability \(\rho(x_i)\) will cancel out the \(\rho^{-1}\) factor and
the resulting sample will be distributed according to \(f\). Now,
instead of discarding samples, one can combine this idea with the hit
and miss method with a constant upper bound. Define
\(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and accept
each sample point with the probability \(f(x_i)/(\rho(x_i)\cdot
h)\). The resulting probability that \(x_i\in[x, x+\dd{x}]\) is
\(\rho(x)\cdot f(x)/(\rho(x)\cdot h)\dd{x}=f(x)\dd{x}/h\). The efficiency
of this method is given by~\eqref{eq:strateff}.
The important benefit of this method is, that step of generating
samples according to some other function \(g\) is no longer
necessary. This is useful when samples of \(\rho\) can be obtained
with little effort (see below). The efficiency of this method is given
by~\eqref{eq:strateff}.
\begin{equation}
\label{eq:strateff}
@ -156,7 +161,7 @@ It may seem startling that \(h\) determines the efficiency, because
but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
(\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
point, being the maximum is a global property and there is also the
constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
constraint \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
\begin{equation}
\label{eq:hlessa}
@ -164,17 +169,17 @@ constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
\int_0^1\rho(x)\cdot h\dd{x} = h
\end{equation}
The closer \(h\) approaches \(A\) the better the efficiency gets. In
the optimal case \(\rho=f/A\) and thus \(h=A\) or
\(\mathfrak{e} = 1\). Now this distribution can be approximated in the
way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
by~\vegas. The distribution \(\rho\) takes on the
form~\eqref{eq:vegasrho}. The effect of this approach is visualized
in~\ref{fig:vegasdist} and the resulting sampling efficiency
\result{xs/python/strat_th_samp} (using
by~\vegas and simply generating the same number of uniformly
distributed samples in each hypercube (stratified sampling). The
distribution \(\rho\) takes on the form~\eqref{eq:vegasrho}. The
effect of this approach is visualized in~\ref{fig:vegasdist} and the
resulting sampling efficiency \result{xs/python/strat_th_samp} (using
\result{xs/python/vegas_samp_num_increments} increments) is a great
improvement over the hit or miss method in \ref{sec:hitmiss}. By using
improvement over the hit or miss method in~\ref{sec:hitmiss}. By using
more increments better efficiencies can be achieved, although the
run-time of \vegas\ increases. The advantage of \vegas\ in this
situation is, that the computation of the increments has to be done