What the heck should be in there. Let's draft up an outline.
20 minutes: bloody short, so just results

* Intro :1_30m:
** Importance of MC Methods :SHORT:
 - important tool in particle physics
   - not just numerical
 - also applications in stat. phys and lattice QCD
 - somewhat romantic: distilling information with entropy
 - interface with exp
 - precision predictions within, beyond sm
 - validation of new theories
   - some predictions are often more subtle than just the existense of
     new particles
   - backgrounds have to be substracted
** Diphoton Process
 - feynman diags and reaction formula
 - higgs decay channel
 - dihiggs decay
 - pure QED
* Calculation of the XS :TOO_LONG: :5m:
** Approach
 - formalism well separated from underlying theory
   - but can fool intuition (spin arguments)
   - in the course of semester: learned more about the theory :)
 - translating feynman diagrams to abstract matrix elements straight
   forward
 - first try: casimir's trick
   - error in calculation + one identity unknown
 - second try: evaluating the matrices directly
   - discovered a lot of tricks
   - error prone
 - back to studying the formalism: completeness relation for real
   photons
   - a matter of algebraic gymnastics
   - boils down to some trace and dirac matrix gymnastics
   - mixing terms cancel out, not zero in themselves
 - resulting expression for ME essentially t/u channel propagator
   (1/(t*u)) and spin correlation 1 + cos(x)^2
 - only angular dependencies, no kinematics, "nice" factors
 - symmetric in θ
** Result + Sherpa
 - apply the golden rule for 2->2 processes
 - show plots and total xs
 - shape verified later -> we need sampling techniques first
* Monte Carlo Methods :8m:
 - one simple idea, can be exploited and refined
 - how to extract information from a totally unknown function
   - look at it -> random points are the most "symmetric" choice
   - statistics to the rescue
 - what does this have to do with minecraft
 - theory deals with truly random (uncorrelated) so that statistics
   apply, prng's cater to that: deterministic, efficient (we don't do
   crypto)

** Integration
 - integration as mean value
 - convergence due to law of large numbers
   - independent of dimension
   - trivially parallelism
 - result normal distributed with σ due to central limit theorem
 - goal: speeding up convergence
   1. modify distribution
   2. integration variable
   3. subdivide integration volume
 - all those methods can be somewhat intertwined
 - focus on some simple methods

*** Naive Integration
 - why mix in that distribution: we choose it uniform
 - integral is mean
 - variance is variance of function: stddev linear in Volume!
 - include result
 - rediculous sample size

**** TODO compare to other numeric

*** Change of Variables
 - drastic improvement by transf. to η
 - only works by chance (more or less)
   - pseudo rapidity eats up angular divergence
 - can be shown: same effect as propability density
 - implementation is different

*** VEGAS
 - a simple ρ: step function on hypercubes, can be trivially generated
 - effectively subdividing the integration volume
 - optimal: same variance in every cube
 - easier to optimize: approximate optimal rho by step function
 - clarify: use rectangular grid and blank out unwated edges with θ
   function
 - nice feature: integrand does not have to be smooth :)
 - similar efficiency as the travo case
   - but a lot of room for parameter adjustment and tuning

**** TODO research the drawbacks that led to VEGAS
**** TODO nice visualization of vegas working
**** TODO look at original vegas
   - in 70s/80s memory a constraint

** Sampling
 - why: generate events
   - same as exp. measurements
   - (includes statistical effects)
   - events can be "dressed" with more effects
 - usual case: we have access to uniformly distributed random values
 - task: convert this sample into a sample of another distribution
 - short: solve equation

*** Hit or Miss
 - we don't always know f, may have complicated (inexplicit) form
 - solve "by proxy": generate sample of g and accept with propability f/g
 - the closer g to f, the better the efficiency
 - simplest choice: flat upper bound
 - show results etc
 - one can optimize upper bound with VEGAS

*** Change of Variables
 - reduction of variance similar to integration
 - simplify or reduce variance
 - one removes the step of generating g-samples
 - show results etc
 - hard to automate, but intuition and 'general rules' may serve well
   - see later case with PDFs -> choose eta right away

*** Hit or Miss VEGAS
 - use scaled vegas distribution as g and to hit or miss
 - samples for g are trivial to generate
 - vegas again approximates optimal distribution
 - results etc
 - advantage: no function specific input
 - problem: isolated parts of the distribution can drag down
   efficiency
   - where the hypercube approx does not work well
   - especially at discontinuities

**** TODO add pic that i've sent Frank

*** Stratified Sampling
 - avoid global effects: subdivide integration interval and sample
   independently
 - first generate coarse samples and distribute them in the respective grid points
 - optimizing: make cubes with low efficiency small! -> VEGAS
 - this approach was used for the self-made event generator and
   improved the efficiency greatly (< 1% to 30%)
 - disadvantage: accuracies of upper bounds and grid weights has to be
   good
   - will come back to this

*** Observables
 - particle identities and kinematics determine final state
 - other observables can be calculated on a per-event base
   - as can be shown, this results in the correct distributions
     without knowledge of the Jacobian

** Outlook
 - of course more methods
 - Sherpa exploits form propagators etc
 - multichannel uses multiple distributions for importance sampling
   and can be optimized "live"
   - https://www.sciencedirect.com/science/article/pii/0010465594900434
*** TODO Other modern Stuff

* Toy Event Generator :3m:
** Basics :SHORT:
 - just sampling the hard xs not realistic
   1. free quarks do not occur in nature
   2. hadron interaction more complicated in general
 - we address the first problem here
 - quarks in protons: no analytical bound state solution known so-far

*** Parton Density Functions
 - in leading order, high momentum limit: propability to encounter
   parton at some energy scale with some momentum fraction
 - can not be calcualated from first principles
   - have to be fitted from exp. data
   - can be evolved to other Q^2 with DGLAP
   - *calculated* with lattice QCQ: very recently
     https://arxiv.org/abs/2005.02102
 - scale has to be chosen appropriately: in deep inelastic scattering
   -> momentum transfer
   - p_T good choice
   - here s/2 (mean of t and u in this case)
 - xs formula
 - here LO fit and evolution of PDFs

**** TODO check s/2

** Implementation
 - find xs in lab frame
 - impose more cuts
   - guarantee applicability of massless limit
   - satisfy experimental requirements
 - used vegas to integrate
 - cuts now more complicated because photons not back to back
 - apply stratified sampling variant along with VEGAS
   - 3 dimensions: x1, x2 (symmetric), η
   - use VEGAS to find grid, grid-weights and maxima
   - improve maxima by gradient ascend (usually very fast)
   - improve performance by cythonizing the xs and cut computation
   - sampling routines JIT compiled with numba, especially performant
     for loops and /very/ easy
   - trivial parallelism through python multiprocessing
   - overestimating the maxima corrects for numerical maximization
     error
   - assumptions: mc found maximum and VEGAS weights are precise enough
 - most time consuming part: multidimensional implementation + debugging
 - along the way: validation of kinematics and PDF values through sherpa

** Results
*** Integration with VEGAS
 - Python Tax: very slow, parallelism implemented, but omitted due
     to complications with the PDF library
   - also very inefficient memory management :P
 - result compatible with sherpa
 - that was the easy part

*** Sampling and Observables
 - observables:
   - usual: η and cosθ
 - p_t of one photon and invariant mass are more interesting
 - influence of PDF:
   - more weight to the central angles (see eta)
   - p_t cutoff due to cuts, very steep falloff due to pdf
   - same picture in inv mass
 - compatibilty problematic: just within acceptable limits
   - for p_t and inv mass: low statistic and very steep falloff
   - very sensitive to uncertainties of weights (can be improved by
     improving accuracy of VEGAS)
   - prompts a more rigorous study of uncertainties in the vegas step!

* Pheno Stuff :2m:
 - non LO effects completely neglected
 - sherpa generator allows to model some of them
   - always approximations

** Short review of HO Effects
 - always introduce stage and effects along with the nice event
   picture
*** LO
 - same as toy generator
*** LO+PS
 - parton shower ~CSS~ (dipole) activated
 - radiation of gluons, and splitting into quarks -> shower like
   cascades QCD
 - as there are no QCD particles in FS: initial state radiation
 - due to 4-mom conservation: recoil momenta (and energies)
*** LO+PS+pT
 - beam remnants and primordial transverse momenta simulated
 - additinal radiation and parton showers
 - primordial p_T due to localization of quarks, modeled like gaussian
   distribution
   - mean, sigma: .8 GeV, standard values in sherpa
   - consistent with the notion of "fermi motion"
*** LO+PS+pT+Hadronization
 - AHADIC activated (cluster hadr)
 - jets of parton cluster into hadrons: non perturbative
 - models inspired by qcd but still just models
 - mainly affects isolation of photons (come back to that)
 - in sherpa, unstable are being decayed (using lookup tables) with
   correct kinematics
*** LO+PS+pT+Hadronization+MI
 - Multiple Interactions (AMISIC) turned on
 - no reason for just one single scattering in event
 - based on overlap of hadrons and the most important QCD scattering
   processes
 - in sherpa: shower corrections
 - generally more particles in FS, affects isolation

** Presentation and Discussion of selected Histograms
*** pT of γγ system
 - Parton showers enhance at higher pT
 - intrinsic pT at lower pT (around 1GeV)
 - some isolation impact
 - but highest in phase space cuts
   - increase is almost one percent
   - pT recoils to the diphoton system usually substract pT from one
     photon -> harder to pass cuts -> amplified through big
     probability of low pT events!

*** pT of leading and sub-leading photon
 - shape similar to LO
 - first photon slight pT boost
 - second almost untouched
   - cut bias to select events that have little effect on sub-lead
     photon

*** Invariant Mass
 - events with lower m are allowed throgh cuts
 - events with very high recoil suppressed: colinear limit...

*** Angular Observables
 - mostly untouched
 - biggest difference: total xs and details
 - but LO gives good qualitative picture
 - reasonable, because LO should be dominating

*** Effects of Hadronization and MI
 - fiducial XS differs because of isolation and cuts in the phase
   space
 - we've seen: parton shower affect kinematics and thus the shape of
   observables and phase space cuts
 - isolation critera:
   - photon has to be isolated in detector
   - allow only certain amount of energy in cone around photon
   - force moinimum separation of photons to prevent cone overlap
 - Hadronization spreads out FS particles (decay kinematics) and
   produces particles like muons and neutrinos that aren't detectable
   or easily filtered out -> decrase in isolation toll
 - MI increases hadr activity in FS -> more events filtered out

*** Summary
 - LO gives qualitative picture
 - NLO affect observables shape, create new interesting observables
 - some NLO effects affect mainly the isolation
 - caveat: non-exhaustive, no QED radiation enabled

* Wrap-Up
** Summary
 - calculated XS
 - studied and applied simple MC methods
 - built a basic working event generator
 - looked at what lies beyond that simple generator
** Lessons Learned (if any)
 - calculations have to be done verbose and explicit
 - spending time on tooling is OK
 - have to put more time into detailed diagnosis
 - event generators are marvelously complex
 - should have introduced the term importance sampling properly
** Outlook
 - more effects
 - multi channel mc
 - better validation of vegas