arb/doc/source/gamma.rst

405 lines
17 KiB
ReStructuredText

**gamma.h** -- support for the gamma function
===============================================================================
This module implements various algorithms for evaluating the
gamma function and related functions. The functions provided here are mainly
intended for internal use, though they may be useful to call directly in some
applications where the default algorithm choices are suboptimal.
Most applications should use the standard, user-friendly top-level functions
in the *fmprb* and *fmpcb* modules.
Evaluation using Taylor series
--------------------------------------------------------------------------------
.. function :: long gamma_taylor_bound_mag(long n)
Letting `1/\Gamma(x) = \sum_{n=1}^{\infty} c_n x^n`, returns an integer
`p` such that `|c_n| \le 2^{-p}`.
Uses the bound `\log |c_n| \le b(n) = ((\pi-1)n + (3-5n) \log(n/6))/6`
(to be published).
.. function :: void gamma_taylor_bound_ratio(fmpr_t r, long n)
Sets `r` to a bound for `B(n+1) / B(n)` where `B(n) = \exp b(n)`.
We use `r(n) = 3n^{-5/6}`.
.. function :: void gamma_taylor_bound_remainder(fmpr_t err, const fmpr_t z, long n)
Given a nonnegative `z` and `n \ge 1`, computes a bound for
`\sum_{k=n}^{\infty} |c_{k+1}| z^k`. This is given by
`B(n+1) z^n / (1 - r(n+1) z)`, provided that the denominator is
positive (otherwise the bound is infinity).
.. function :: void gamma_taylor_fmprb(fmprb_t y, const fmprb_t x, long prec)
Sets `y = \Gamma(x)`, computed by translating `x` to the interval
`[-1/2,1/2]` and then evaluating the Taylor series for the
reciprocal gamma function.
Assumes that the absolute value of the midpoint of `x` is not too large
in order for the translation to be reasonably efficient (too large
values will result in extreme slowness and eventually an exception).
Evaluation using the Stirling series
--------------------------------------------------------------------------------
.. function:: void gamma_stirling_choose_param_fmprb(int * reflect, long * r, long * n, const fmprb_t x, int allow_reflection, int digamma, long prec)
.. function:: void gamma_stirling_choose_param_fmpcb(int * reflect, long * r, long * n, const fmprb_t x, int allow_reflection, int digamma, long prec)
Compute parameters `r`, `n` such that the remainder in the Stirling
series with `z = x+yi` approximately is bounded by `2^{-\mathrm{prec}}`.
If *digamma* is nonzero, the calculation is done for the digamma
function.
The parameter `n` is the truncation point in the asymptotic
Stirling series. If `|z|` is too small for the Stirling series
to give sufficient accuracy directly, we first translate to `z + r`
using the formula `\Gamma(z) = \Gamma(z+r) /
(z (z+1) (z+2) \cdots (z+r-1))`.
If *allow_reflection* is nonzero, the *reflect* flag is set if `z`
should be replaced with `1-z` using the reflection formula.
This function uses double precision arithmetic internally,
and does not guarantee the error bound rigorously;
a rigorous error bound, which also accounts for the radius of `z`,
is computed a posteriori when evaluating the Stirling series.
However, in practice, this function does estimate the bound
very accurately.
To obtain a remainder smaller than `2^{-b}`, we must choose an `r` such
that, in the real case, `x + r > \beta b`, where
`\beta > \log(2) / (2 \pi) \approx 0.11`.
In practice, a slightly larger factor `\beta \approx 0.2` more closely
balances `n` and `r`. A much larger `\beta` (e.g. `\beta = 1`) could be
used to reduce the number of Bernoulli numbers that have to be
precomputed, at the expense of slower repeated evaluation.
.. function :: void gamma_stirling_coeff(fmprb_t b, ulong k, int digamma, long prec)
Sets `b = B_{2k} / (2k (2k-1))`, rounded to *prec* bits, or if *digamma*
is nonzero, sets `b = B_{2k} / (2k)`.
.. function :: void gamma_stirling_eval_series_fmprb(fmprb_t s, const fmprb_t z, long n, int digamma, long prec)
.. function :: void gamma_stirling_eval_series_fmpcb(fmpcb_t s, const fmpcb_t z, long n, int digamma, long prec)
Evaluates the Stirling series
.. math ::
\log \Gamma(z) - R(n,z) = \left(z-\frac{1}{2}\right)\log z - z +
\frac{\ln {2 \pi}}{2} + \sum_{k=1}^{n-1} t_k
where
.. math ::
t_k = \frac{B_{2k}}{2k(2k-1)z^{2k-1}}.
If *digamma* is nonzero, the derivative of this series (i.e. the
expansion for the digamma function) is evaluated.
The error bound
.. math ::
|R(n,z)| \le \frac{|t_n|}{\cos(0.5 \arg(z))^{2n}}
is included in the output (when evaluating the digamma function, the
expression is the same except that `t_n` changes to the
differentiated term and the exponent `2n` changes to `2n+1`
(section 5.11 in [NIST2012]_).
.. function :: void gamma_stirling_bound_phase(fmpr_t bound, const fmpcb_t z, long prec)
Sets *bound* to an upper bound for the phase factor
`b = 1/\cos(\operatorname{arg}(z)/2)` which appears in the error bound
for the Stirling series. By trigonometric identities, assuming
that `z = x+yi`, we have `b = \sqrt{1 + t^2}` where
.. math ::
t = \frac{y}{\sqrt{x^2 + y^2} + x} = \frac{\sqrt{x^2 + y^2} - x}{y}
We bound `x` and `y` such that `|\operatorname{arg}(x+yi)|`
is maximized, and then evaluate `t` with the choice of square root
expression that avoids cancellation, using directional rounding throughout.
.. function :: void gamma_stirling_bound_fmprb(fmpr_struct * err, const fmprb_t z, long k0, long knum, long n)
.. function :: void gamma_stirling_bound_fmpcb(fmpr_struct * err, const fmpcb_t z, long k0, long knum, long n)
Computes bounds for the truncation error in the Stirling series
when summed up to term `n - 1` inclusive. An exact expression for the
truncation error is given (see [Olv1997]_ pp. 293-295) by
.. math ::
R_n(z) = \int_0^{\infty} \frac{B_{2n} - {\tilde B}_{2n}(x)}{2n(x+z)^{2n}} dt.
We optionally evaluate the bound for several terms in the
Taylor series: considering `R_n(z+t) \in \mathbb{C}[[t]]`, we
compute bounds for the coefficient of `t^k` for *knum* consecutive
values of *k* starting with *k0*.
Using the fact that the numerator of the integrand is bounded in
absolute value by `2 |B_{2n}|`, and using the bound for `|x+z|`
given by [Olv1997]_, we obtain
.. math ::
|[t^k] R_n(z+t)| \le 2 |B_{2n}|
\frac{\Gamma(2n+k-1)}{\Gamma(k+1) \Gamma(2n+1)}
\; |z| \; (b / |z|)^{2n+k}
where `b` is the phase factor implemented by
:func:`gamma_stirling_bound_phase`.
Rising factorials
--------------------------------------------------------------------------------
.. function :: void gamma_rising_fmprb_ui_bsplit_simple(fmprb_t y, const fmprb_t x, ulong n, long prec)
.. function :: void gamma_rising_fmprb_ui_bsplit_eight(fmprb_t y, const fmprb_t x, ulong n, long prec)
.. function :: void gamma_rising_fmprb_ui_bsplit_rectangular(fmprb_t y, const fmprb_t x, ulong n, ulong step, long prec)
.. function :: void gamma_rising_fmprb_ui_bsplit(fmprb_t y, const fmprb_t x, ulong n, long prec)
.. function :: void gamma_rising_fmpcb_ui_bsplit_simple(fmpcb_t y, const fmpcb_t x, ulong n, long prec)
.. function :: void gamma_rising_fmpcb_ui_bsplit_eight(fmpcb_t y, const fmpcb_t x, ulong n, long prec)
.. function :: void gamma_rising_fmpcb_ui_bsplit_rectangular(fmpcb_t y, const fmpcb_t x, ulong n, ulong step, long prec)
.. function :: void gamma_rising_fmpcb_ui_bsplit(fmpcb_t y, const fmpcb_t x, ulong n, long prec)
Sets `y` to the rising factorial `x (x+1) (x+2) \cdots (x+n-1)`,
computed using binary splitting.
The different versions of this function process the basecase differently.
The *simple* version simply multiplies together several factors
one after another.
The *eight* version processes eight factors at a time using the formula
.. math ::
x(x+1)\cdots(x+7) = (28 + 98x + 63x^2 + 14x^3 + x^4)^2 - 16 (7+2x)^2,
replacing 7 full-precision multiplications with 3 squarings,
1 multiplication, and several linear operations ([CP2005]_, page 316).
Empirically, if `x` is a full-precision number, this is about twice as
fast as the *simple* version at high precision. Numerical stability is
slightly worse.
The *rectangular* version processes *step* factors at a time by
expanding the polynomial `f(t) = t (t+1) (t+2) \cdots (t+\mathrm{step}-1)`
and evaluating each factor `f(x + \mathrm{step} \, k)`
using rectangular splitting. At very high precision, if `x` is a
full-precision number, this asymptotically reduces the number of
full-precision multiplications required. If the *step* parameter
is set to zero, a default value is used.
The functions *gamma_rising_fmprb_ui_bsplit* and
*gamma_rising_fmpcb_ui_bsplit* automatically choose
an algorithm depending on the inputs.
.. function :: void gamma_rising_fmprb_ui_delta(fmprb_t y, const fmprb_t x, ulong n, ulong m, long prec)
.. function :: void gamma_rising_fmpcb_ui_delta(fmpcb_t y, const fmpcb_t x, ulong n, ulong m, long prec)
Sets `y` to the rising factorial `x (x+1) (x+2) \cdots (x+n-1)`,
computed as a product of partial products
`(x+k)(x+k+1)\cdots(x+k+m-1)`. Each partial product is obtained
from the previous by using a precomputed table of powers of `x` to
evaluate the difference
.. math ::
\Delta_m(x,k) = (x+k+m)_{(m)} - (x+k)_{(m)}.
The instance `m = 4` of this algorithm was used by Smith ([Smi2001]_),
but we generalize it to a variable `m` which can be chosen nearly
optimally depending on the precision and `n`.
The polynomials `\Delta_m(x,k) \in \mathbb{Z}[k][x]` are generated dynamically.
Expanding the rising factorials, applying the binomial theorem
a couple of times, and doing several rearrangements of the sums, we
find the closed form
.. math ::
\Delta_m(x,k) = \sum_{v=0}^{m-1} x^v \sum_{i=0}^{m-v-1} k^i C_m(v,i),
where
.. math ::
C_m(v,i) = \sum_{j=i+1}^{m-v} m^{j-i} \left[{m \atop v+j}\right] {{v+j} \choose v} {j \choose i}
in which the square bracket denotes an unsigned Stirling number
of the first kind.
.. function :: void gamma_rising_fmprb_ui_multipoint(fmprb_t f, const fmprb_t c, ulong n, long prec)
Sets `y` to the rising factorial `x (x+1) (x+2) \cdots (x+n-1)`,
computed using fast multipoint evaluation. This only requires
`O(n^{1/2+\varepsilon})` multiplications, but has high overhead
and poor numerical stability (adding `O(n)` guard bits to the input
might be necessary to achieve full accuracy). It can be expected to
be faster than the binary splitting algorithm if the input is a
full-precision number, the precision is at least 100000 bits,
and *n* is of the same order of magnitude as (perhaps slightly
smaller than) the number of bits.
.. function :: void gamma_harmonic_sum_fmprb_ui_bsplit_simple(fmprb_t y, const fmprb_t x, ulong n, long prec)
.. function :: void gamma_harmonic_sum_fmprb_ui_bsplit_rectangular(fmprb_t y, const fmprb_t x, ulong n, ulong step, long prec)
.. function :: void gamma_harmonic_sum_fmprb_ui_bsplit(fmprb_t y, const fmprb_t x, ulong n, long prec)
.. function :: void gamma_harmonic_sum_fmpcb_ui_bsplit_simple(fmpcb_t y, const fmpcb_t x, ulong n, long prec)
.. function :: void gamma_harmonic_sum_fmpcb_ui_bsplit_rectangular(fmpcb_t y, const fmpcb_t x, ulong n, ulong step, long prec)
.. function :: void gamma_harmonic_sum_fmpcb_ui_bsplit(fmpcb_t y, const fmpcb_t x, ulong n, long prec)
Sets `y` to the harmonic sum `1/x + 1/(x+1) + \ldots + 1/(x+n-1)`,
computed using division-avoiding binary splitting.
The *rectangular* version processes *step* terms at a time in
analogy with the rising factorial algorithm.
Letting `f(t) = t (t+1) (t+2) \cdots (t+n-1)`, we have
`1/x + \ldots + 1/(x+n-1) = f'(x) / f(x)`.
If the *step* parameter is set to zero, a default value is used.
The functions *gamma_harmonic_sum_fmprb_ui_bsplit* and
*gamma_harmonic_sum_fmpcb_ui_bsplit* automatically choose
an algorithm depending on the inputs.
Rational arguments
--------------------------------------------------------------------------------
.. function:: void gamma_small_frac(fmprb_t y, unsigned int p, unsigned int q, long prec)
Efficiently evaluates `y = \Gamma(p/q)` where `p/q` (assumed to be reduced)
is one of `1, 1/2, 1/3, 2/3, 1/4, 3/4, 1/6, 5/6`.
The cases `\Gamma(1) = 1` and `\Gamma(1/2) = \sqrt \pi` are trivial.
We reduce all remaining cases to `\Gamma(1/3)` or `\Gamma(1/4)`
using the following relations:
.. math ::
\Gamma(2/3) = \frac{2 \pi}{3^{1/2} \Gamma(1/3)}, \quad \quad
\Gamma(3/4) = \frac{2^{1/2} \pi}{\Gamma(1/4)},
\Gamma(1/6) = \frac{\Gamma(1/3)^2}{(\pi/3)^{1/2} 2^{1/3}}, \quad \quad
\Gamma(5/6) = \frac{2 \pi (\pi/3)^{1/2} 2^{1/3}}{\Gamma(1/3)^2}.
The values of `\Gamma(1/3)` and `\Gamma(1/4)` are cached for fast
repeated evaluation. We compute them rapidly to high precision using
.. math ::
\Gamma(1/3) = \left( \frac{12 \pi^4}{\sqrt{10}}
\sum_{k=0}^{\infty}
\frac{(6k)!(-1)^k}{(k!)^3 (3k)! 3^k 160^{3k}} \right)^{1/6}, \quad \quad
\Gamma(1/4) = \sqrt{\frac{(2\pi)^{3/2}}{\operatorname{agm}(1, \sqrt 2)}}.
An alternative formula which could be used for `\Gamma(1/3)` is
.. math ::
\Gamma(1/3) = \frac{2^{4/9} \pi^{2/3}}{3^{1/12} \left( \operatorname{agm}\left(1,\frac{1}{2} \sqrt{2+\sqrt{3}}\right)\right)^{1/3}},
but this appears to be slightly slower in practice.
.. function:: void gamma_series_fmpq_hypgeom(fmprb_ptr res, const fmpq_t a, long len, long prec)
Given a rational number `0 < a \le 1`, uses binary splitting to compute
*len* coefficients in the Taylor series of `\Gamma(a+x)`, i.e. computes
`\Gamma(a), \Gamma'(a) ... \Gamma^{(\mathrm{len}-1)}(a) / (\mathrm{len}-1)!`.
In particular, with *len* = 1, this function computes `\Gamma(a)`
efficiently for small rational *a*.
The *len* = 1 case of this algorithm dates back to Brent [Bre1978]_,
and the extension to higher derivatives was done by Karatsuba [Kar1998]_.
Karatsuba's original algorithm is suboptimal for large *len*;
we use the faster algorithm given without error bounds
by Borwein, Bradley and Crandall [BBC2000]_.
The algorithm consists of evaluating the finite part of
.. math ::
\Gamma(s) = \int_0^{\infty} e^{-t} t^{s-1} dt =
N^s \left( \sum_{k=0}^R \frac{(-1)^k N^k}{(k + s) k!} + S \right) + I
where
.. math ::
S = \sum_{k=R+1}^{\infty} \frac{(-1)^k N^k}{(k + s) k!}
and
.. math ::
I = \int_N^{\infty} e^{-t} t^{s-1} dt.
This formula is valid for complex `s` with `\Re{s} > 0`.
It is therefore also valid if `s` is a power series argument `s = a + x`
where `\Re{a} > 0`, so doing the arithmetic with truncated power
series gives us the derivatives.
We now discuss choosing the parameters `R` and `N`, and bounding
the error terms `S` and `I`. We assume that `0 < \Re{a} \le 1`, `N \ge 1`
and `R \ge 2 N`. The coefficients of `I` are given by
.. math ::
I = \int_N^{\infty} e^{-t} t^{a+x-1} dt =
\sum_{j=0}^{\infty} \frac{x^j}{j!}
\int_N^{\infty} e^{-t} t^{a-1} \log^j t \; dt.
As shown by Karatsuba, the integrals are bounded in absolute value by
`2 e^{-N} \log^j N`. Thus, for a precision of `p` bits, `N` should be
about `p \log 2`.
Expanding the terms in `S` as geometric series gives
.. math ::
S = \sum_{j=0}^{\infty} \, x^j \,
\sum_{k=R+1}^{\infty} \frac{(-1)^{k+j} N^k}{(k+a)^{j+1} k!}.
By the assumption that `R \ge 2 N`, the sums are bounded by
.. math ::
\frac{N^R}{R^{j+1} R!} \left(\frac{1}{2} + \frac{1}{4} + \ldots\right) =
\frac{N^R}{R^{j+1} R!} \le \frac{1}{R^{j+1}} N^R \left(\frac{e}{R}\right)^R.
Let `R = cN` where `c` is to be determined. Expanding the
logarithm of `N^R \left(\frac{e}{R}\right)^R` around `N = \infty`
gives the approximate magnitude `(c - c \log c) N`. Setting this equal
to `-p \log 2`, we find that we should take
`c = 1/W(1/e) \approx 3.59112147666862` where `W(x)` is the
Lambert W-function. (Karatsuba gives the incorrect value `c = 3`).
We also estimate the working precision needed in the binary splitting
stage (the binary splitting could be done with exact arithmetic, but
this is unnecessarily costly). Assume that the sum is around unity in
magnitude. The binary logarithm of term `k` is roughly
`b(k) = k \log_2 N + k \log_2 e - k \log_2 k`. Since
`b'(k) = \log_2 N - \log_2 k`, the largest term magnitude occurs
roughly at `k = N`, so we need to increase the working precision
by about `b(N) = N / \log 2` bits.