mirror of
https://github.com/vale981/bachelor_thesis
synced 2025-03-06 01:51:38 -05:00
reformulate shit about stratified sampling
This commit is contained in:
parent
9e35ef1a54
commit
e6b0f69502
2 changed files with 54 additions and 49 deletions
|
@ -223,4 +223,4 @@ sample density and lower weights, flattening out the integrand.
|
||||||
Generally the result gets better with more increments, but at the cost
|
Generally the result gets better with more increments, but at the cost
|
||||||
of more \vegas\ iterations. The intermediate values of those
|
of more \vegas\ iterations. The intermediate values of those
|
||||||
iterations can be accumulated to improve the accuracy of the end
|
iterations can be accumulated to improve the accuracy of the end
|
||||||
result.~\cite[197]{Lepage:19781an}
|
result~\cite[197]{Lepage:19781an}.
|
||||||
|
|
|
@ -7,20 +7,21 @@
|
||||||
\label{sec:mcsamp}
|
\label{sec:mcsamp}
|
||||||
|
|
||||||
Drawing representative samples from a probability distribution (for
|
Drawing representative samples from a probability distribution (for
|
||||||
example a differential cross section) from which one can the calculate
|
example a differential cross section) results in a set of
|
||||||
samples from the distribution of other observables without explicit
|
\emph{events}, the same kind of data, that is gathered in experiments
|
||||||
transformation of the distribution is another important problem. Here
|
and from which one can the calculate samples from the distribution of
|
||||||
the one-dimensional case is discussed. The general case follows by
|
other observables without explicit transformation of the
|
||||||
sampling the dimensions sequentially.
|
distribution. Here the one-dimensional case is discussed. The general
|
||||||
|
case follows by sampling the dimensions sequentially.
|
||||||
|
|
||||||
Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
|
Consider a function \(f\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
|
||||||
where \(\Omega = [0, 1]\) without loss of generality. Such a function
|
where \(\Omega = [0, 1]\) without loss of generality. Such a function
|
||||||
is proportional to a probability density \(\tilde{f}\). When \(X\) is
|
is proportional to a probability density \(\tilde{f}\). When \(X\) is
|
||||||
a uniformly distributed random variable on~\([0, 1]\) then a sample
|
a uniformly distributed random variable on~\([0, 1]\) (which can be
|
||||||
\({x_i}\) of this variable can be transformed into a sample of
|
easily generated) then a sample \({x_i}\) of this variable can be
|
||||||
\(Y\sim\tilde{f}\). Let \(x\) be a single sample of \(X\), then a
|
transformed into a sample of \(Y\sim\tilde{f}\). Let \(x\) be a single
|
||||||
sample \(y\) of \(Y\) can be obtained by solving~\eqref{eq:takesample}
|
sample of \(X\), then a sample \(y\) of \(Y\) can be obtained by
|
||||||
for \(y\).
|
solving~\eqref{eq:takesample} for \(y\).
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:takesample}
|
\label{eq:takesample}
|
||||||
|
@ -50,11 +51,11 @@ obtained numerically or one can change variables to simplify.
|
||||||
|
|
||||||
\subsection{Hit or Miss}%
|
\subsection{Hit or Miss}%
|
||||||
\label{sec:hitmiss}
|
\label{sec:hitmiss}
|
||||||
|
If integrating \(f\) and/or inverting \(F\) is too expensive or a
|
||||||
The problem can be reformulated by introducing a
|
fully \(f\)-agnostic method is desired, the problem can be
|
||||||
positive function \(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\)
|
reformulated by introducing a positive function
|
||||||
with \(\forall x\in\Omega\colon g(x)\geq
|
\(g\colon x\in\Omega\mapsto\mathbb{R}_{\geq 0}\) with
|
||||||
f(x)\).
|
\(\forall x\in\Omega\colon g(x)\geq f(x)\).
|
||||||
|
|
||||||
Observing~\eqref{eq:takesample2d} suggests, that one generates samples
|
Observing~\eqref{eq:takesample2d} suggests, that one generates samples
|
||||||
which are distributed according to \(g/B\), where
|
which are distributed according to \(g/B\), where
|
||||||
|
@ -69,20 +70,21 @@ probability~\(f/g\), so that \(g\) cancels out. This method is called
|
||||||
= \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
|
= \int_{0}^{y}g(x')\int_{0}^{\frac{f(x')}{g(x')}}\dd{z}\dd{x'}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
The thus obtained samples are then distributed according to \(f/B\) so
|
The thus obtained samples are then distributed according to \(f/B\)
|
||||||
that~\eqref{eq:impsampeff} holds.
|
and the total probability of accepting a sample (efficiency
|
||||||
|
\(\mathfrak{e}\)) is given by hat~\eqref{eq:impsampeff} holds.
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:impsampeff}
|
\label{eq:impsampeff}
|
||||||
\int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
|
\int_0^1\frac{g(x)}{B}\cdot\frac{f(x)}{g(x)}\dd{x} = \int_0^1\frac{f(x)}{B}\dd{x} = \frac{A}{B} = \mathfrak{e}\leq 1
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
This means that not all samples are being accepted and gives a measure
|
The closer the volumes enclosed by \(g\) and \(f\) are to each other,
|
||||||
on the efficiency \(\mathfrak{e}\) of the sampling method. The closer
|
higher is \(\mathfrak{e}\).
|
||||||
\(g\) is to \(f\) the higher is \(\mathfrak{e}\).
|
|
||||||
|
|
||||||
Choosing \(g\) like~\eqref{eq:primitiveg} yields \(y = x\cdot A\), so
|
Choosing \(g\) like~\eqref{eq:primitiveg} and looking back
|
||||||
that the procedure simplifies to choosing random numbers
|
at~\eqref{eq:solutionsamp} yields \(y = x\cdot A\), so that the
|
||||||
|
sampling procedure simplifies to choosing random numbers
|
||||||
\(x\in [0,1]\) and accepting them with the probability
|
\(x\in [0,1]\) and accepting them with the probability
|
||||||
\(f(x)/g(x)\). The efficiency of this approach is related to how much
|
\(f(x)/g(x)\). The efficiency of this approach is related to how much
|
||||||
\(f\) differs from \(f_{\text{max}}\) which in turn related to the
|
\(f\) differs from \(f_{\text{max}}\) which in turn related to the
|
||||||
|
@ -109,7 +111,7 @@ This very low efficiency stems from the fact, that \(f_{\cos\theta}\)
|
||||||
is a lot smaller than its upper bound for most of the sampling
|
is a lot smaller than its upper bound for most of the sampling
|
||||||
interval.
|
interval.
|
||||||
|
|
||||||
\begin{wrapfigure}{l}{.5\textwidth}
|
\begin{wrapfigure}[15]{l}{.5\textwidth}
|
||||||
\plot{xs_sampling/upper_bound}
|
\plot{xs_sampling/upper_bound}
|
||||||
\caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
|
\caption{\label{fig:distcos} The distribution~\eqref{eq:distcos} and an upper bound of
|
||||||
the form \(a + b\cdot x^2\).}
|
the form \(a + b\cdot x^2\).}
|
||||||
|
@ -126,25 +128,28 @@ to~\result{xs/python/eta_eff}, again due to the decrease in variance.
|
||||||
|
|
||||||
\subsection{Stratified Sampling}%
|
\subsection{Stratified Sampling}%
|
||||||
\label{sec:stratsamp}
|
\label{sec:stratsamp}
|
||||||
|
Finding a suitable upper bound or variable transform requires effort
|
||||||
|
and detail knowledge about the distribution and is hard to
|
||||||
|
automate\footnote{Sherpa does in fact do this by looking at the
|
||||||
|
propagators in the matrix elements.}. Revisiting the idea
|
||||||
|
behind~\eqref{eq:takesample2d} but looking at probability density
|
||||||
|
\(\rho\) on \(\Omega\) leads to a slight reformulation of the method
|
||||||
|
discussed in~\ref{sec:hitmiss}. Note that without loss of generality
|
||||||
|
one can again choose \(\Omega = [0, 1]\).
|
||||||
|
|
||||||
Revisiting the idea behind~\eqref{eq:takesample2d} but choosing
|
Define \(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
|
||||||
\(g=\rho\) where \(\rho\) is a probability density on \(\Omega\)
|
\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and
|
||||||
leads to another two-stage process. Note that without loss of
|
accept each sample point with the probability
|
||||||
generality one can choose \(\Omega = [0, 1]\) as is done here.
|
\(f(x_i)/(\rho(x_i)\cdot h)\). This is very similar to the procedure
|
||||||
|
described in~\ref{sec:hitmiss} with \(g=\rho\cdot h\), but here the
|
||||||
|
step of generating samples distributed according to \(\rho\) is left
|
||||||
|
out.
|
||||||
|
|
||||||
Assume that a sample \(\{x_i\}\) of \(f/\rho\) has been obtained
|
The important benefit of this method is, that step of generating
|
||||||
through by the means of~\ref{sec:mcsamp}
|
samples according to some other function \(g\) is no longer
|
||||||
and~\ref{sec:hitmiss}. Accepting each sample item \(x_i\) with the
|
necessary. This is useful when samples of \(\rho\) can be obtained
|
||||||
probability \(\rho(x_i)\) will cancel out the \(\rho^{-1}\) factor and
|
with little effort (see below). The efficiency of this method is given
|
||||||
the resulting sample will be distributed according to \(f\). Now,
|
by~\eqref{eq:strateff}.
|
||||||
instead of discarding samples, one can combine this idea with the hit
|
|
||||||
and miss method with a constant upper bound. Define
|
|
||||||
\(h=\max_{x\in\Omega}f(x)/\rho(x)\), take a sample
|
|
||||||
\(\{\tilde{x}_i\}\sim\rho\) distributed according to \(\rho\) and accept
|
|
||||||
each sample point with the probability \(f(x_i)/(\rho(x_i)\cdot
|
|
||||||
h)\). The resulting probability that \(x_i\in[x, x+\dd{x}]\) is
|
|
||||||
\(\rho(x)\cdot f(x)/(\rho(x)\cdot h)\dd{x}=f(x)\dd{x}/h\). The efficiency
|
|
||||||
of this method is given by~\eqref{eq:strateff}.
|
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:strateff}
|
\label{eq:strateff}
|
||||||
|
@ -156,7 +161,7 @@ It may seem startling that \(h\) determines the efficiency, because
|
||||||
but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
|
but~\eqref{eq:hlessa} states that \(\mathfrak{e}\) is well-formed
|
||||||
(\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
|
(\(\mathfrak{e}\leq 1\)). Albeit \(h\) is determined through a single
|
||||||
point, being the maximum is a global property and there is also the
|
point, being the maximum is a global property and there is also the
|
||||||
constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
|
constraint \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:hlessa}
|
\label{eq:hlessa}
|
||||||
|
@ -164,17 +169,17 @@ constrain \(\int_0^1\rho(x)\dd{x}=1\) to be considered.
|
||||||
\int_0^1\rho(x)\cdot h\dd{x} = h
|
\int_0^1\rho(x)\cdot h\dd{x} = h
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
|
|
||||||
The closer \(h\) approaches \(A\) the better the efficiency gets. In
|
The closer \(h\) approaches \(A\) the better the efficiency gets. In
|
||||||
the optimal case \(\rho=f/A\) and thus \(h=A\) or
|
the optimal case \(\rho=f/A\) and thus \(h=A\) or
|
||||||
\(\mathfrak{e} = 1\). Now this distribution can be approximated in the
|
\(\mathfrak{e} = 1\). Now this distribution can be approximated in the
|
||||||
way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
|
way discussed in~\ref{sec:mcintvegas} by using the hypercubes found
|
||||||
by~\vegas. The distribution \(\rho\) takes on the
|
by~\vegas and simply generating the same number of uniformly
|
||||||
form~\eqref{eq:vegasrho}. The effect of this approach is visualized
|
distributed samples in each hypercube (stratified sampling). The
|
||||||
in~\ref{fig:vegasdist} and the resulting sampling efficiency
|
distribution \(\rho\) takes on the form~\eqref{eq:vegasrho}. The
|
||||||
\result{xs/python/strat_th_samp} (using
|
effect of this approach is visualized in~\ref{fig:vegasdist} and the
|
||||||
|
resulting sampling efficiency \result{xs/python/strat_th_samp} (using
|
||||||
\result{xs/python/vegas_samp_num_increments} increments) is a great
|
\result{xs/python/vegas_samp_num_increments} increments) is a great
|
||||||
improvement over the hit or miss method in \ref{sec:hitmiss}. By using
|
improvement over the hit or miss method in~\ref{sec:hitmiss}. By using
|
||||||
more increments better efficiencies can be achieved, although the
|
more increments better efficiencies can be achieved, although the
|
||||||
run-time of \vegas\ increases. The advantage of \vegas\ in this
|
run-time of \vegas\ increases. The advantage of \vegas\ in this
|
||||||
situation is, that the computation of the increments has to be done
|
situation is, that the computation of the increments has to be done
|
||||||
|
|
Loading…
Add table
Reference in a new issue