mirror of
https://github.com/vale981/arb
synced 2025-03-06 09:51:39 -05:00
491 lines
17 KiB
Text
491 lines
17 KiB
Text
|
|
A variable of type <tt>fmpr_t</tt> holds an arbitrary-precision binary
|
|
floating-point number, i.e. a rational number of the form
|
|
$x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd;
|
|
or one of the special values zero, plus infinity, minus infinity,
|
|
or NaN (not-a-number).
|
|
|
|
The component $x$ is called the <i>mantissa</i>, and $y$ is called the
|
|
<i>exponent</i>. Note that this is just one among many possible
|
|
conventions: the mantissa (alternatively <i>significand</i>) is
|
|
sometimes viewed as a fraction in the interval $[1/2, 1)$, with the
|
|
exponent pointing to the position above the top bit rather than the
|
|
position of the bottom bit, and with a separate sign.
|
|
|
|
The conventions for special values largely follow those of the
|
|
IEEE floating-point standard. At the moment, there is no support
|
|
for negative zero, unsigned infinity, or a NaN with a payload, though
|
|
some these might be added in the future.
|
|
|
|
An <tt>fmpr</tt> number is exact and has no inherent "accuracy". We
|
|
use the term <i>precision</i> to denote either the target precision of
|
|
an operation, or the bit size of a mantissa (which in general is
|
|
unrelated to the "accuracy" of the number: for example, the
|
|
floating-point value 1 has a precision of 1 bit in this sense and is
|
|
simultaneously an infinitely accurate approximation of the
|
|
integer 1 and a 2-bit accurate approximation of
|
|
$\sqrt 2 = 1.011010100\ldots_2$).
|
|
|
|
Except where otherwise noted, the output of an operation is the
|
|
floating-point number obtained by taking the inputs as exact numbers,
|
|
in principle carrying out the operation exactly, and rounding the
|
|
resulting real number to the nearest representable floating-point
|
|
number whose mantissa has at most the specified number of bits, in
|
|
the specified direction of rounding. Some operations are always
|
|
or optionally done exactly.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Types, macros and constants
|
|
|
|
*******************************************************************************
|
|
|
|
fmpr_struct
|
|
|
|
An <tt>fmpr_struct</tt> holds a mantissa and an exponent.
|
|
If the mantissa and exponent are sufficiently small, their values are
|
|
stored as immediate values in the <tt>fmpr_struct</tt>; large values are
|
|
represented by pointers to heap-allocated arbitrary-precision integers.
|
|
Currently, both the mantissa and exponent are implemented using
|
|
the FLINT <tt>fmpz</tt> type. Special values are currently encoded
|
|
by the mantissa being set to zero.
|
|
|
|
fmpr_t
|
|
|
|
An <tt>fmpr_t</tt> is defined as an array of length one of type
|
|
<tt>fmpr_struct</tt>, permitting an <tt>fmpr_t</tt> to be passed by
|
|
reference.
|
|
|
|
fmpr_rnd_t
|
|
|
|
Specifies the rounding mode for the result of an approximate operation.
|
|
|
|
FMPR_RND_NEAREST
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number, rounding to an odd mantissa if there is a tie
|
|
between two values. Note: the code for this rounding mode is currently
|
|
not implemented.
|
|
|
|
FMPR_RND_DOWN
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards zero.
|
|
|
|
FMPR_RND_UP
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction away from zero.
|
|
|
|
FMPR_RND_FLOOR
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards minus infinity.
|
|
|
|
FMPR_RND_CEIL
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards plus infinity.
|
|
|
|
FMPR_PREC_EXACT
|
|
|
|
If passed as the precision parameter to a function, indicates that no
|
|
rounding is to be performed. This must only be used when it is known
|
|
that the result of the operation can be represented exactly and fits
|
|
in memory (the typical use case is working with values small integers).
|
|
Note that, for example, adding two numbers whose exponents are far
|
|
apart can easily produce an exact result that is far too large to
|
|
store in memory.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Memory management
|
|
|
|
*******************************************************************************
|
|
|
|
void fmpr_init(fmpr_t x)
|
|
|
|
Initializes the variable x for use. Its value is set to zero.
|
|
|
|
void fmpr_clear(fmpr_t x)
|
|
|
|
Clears the variable x, freeing or recycling its allocated memory.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Special values
|
|
|
|
*******************************************************************************
|
|
|
|
void fmpr_zero(fmpr_t x)
|
|
|
|
void fmpr_one(fmpr_t x)
|
|
|
|
void fmpr_pos_inf(fmpr_t x)
|
|
|
|
void fmpr_neg_inf(fmpr_t x)
|
|
|
|
void fmpr_nan(fmpr_t x)
|
|
|
|
Sets x respectively to 0, 1, $+\infty$, $-\infty$, NaN.
|
|
|
|
int fmpr_is_zero(const fmpr_t x)
|
|
|
|
int fmpr_is_one(const fmpr_t x)
|
|
|
|
int fmpr_is_pos_inf(const fmpr_t x)
|
|
|
|
int fmpr_is_neg_inf(const fmpr_t x)
|
|
|
|
int fmpr_is_nan(const fmpr_t x)
|
|
|
|
Returns nonzero iff x respectively equals
|
|
0, 1, $+\infty$, $-\infty$, NaN.
|
|
|
|
int fmpr_is_inf(const fmpr_t x)
|
|
|
|
Returns nonzero iff x equals either $+\infty$ or $-\infty$.
|
|
|
|
int fmpr_is_normal(const fmpr_t x)
|
|
|
|
Returns nonzero iff x is a finite, nonzero floating-point value, i.e.
|
|
not one of the special values 0, $+\infty$, $-\infty$, NaN.
|
|
|
|
int fmpr_is_special(const fmpr_t x)
|
|
|
|
Returns nonzero iff x is one of the special values
|
|
0, $+\infty$, $-\infty$, NaN, i.e. not a finite, nonzero
|
|
floating-point value.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Assignment and rounding
|
|
|
|
*******************************************************************************
|
|
|
|
long _fmpr_normalise(fmpz_t man, fmpz_t exp, long prec, fmpr_rnd_t rnd)
|
|
|
|
Rounds the mantissa and exponent in-place.
|
|
|
|
void fmpr_set(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets y to a copy of x.
|
|
|
|
long fmpr_set_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets y to a copy of x rounded in the direction specified by rnd to the
|
|
number of bits specified by prec.
|
|
|
|
void fmpr_set_error_result(fmpr_t err, const fmpr_t result, long rret)
|
|
|
|
Given the return value rret and output variable result from a
|
|
function performing a rounding (e.g. fmpr_set_round or fmpr_add), sets
|
|
err to a bound for the absolute error.
|
|
|
|
void fmpr_add_error_result(fmpr_t err, const fmpr_t err_in,
|
|
const fmpr_t result, long rret, long prec, fmpr_rnd_t rnd)
|
|
|
|
Like fmpr_set_error_result, but adds err_in to the error.
|
|
|
|
*******************************************************************************
|
|
|
|
Comparisons
|
|
|
|
*******************************************************************************
|
|
|
|
int fmpr_equal(const fmpr_t x, const fmpr_t y)
|
|
|
|
Returns nonzero iff x and y are exactly equal. This function does
|
|
not treat NaN specially, i.e. NaN compares as equal to itself.
|
|
|
|
int fmpr_cmp(const fmpr_t x, const fmpr_t y)
|
|
|
|
Returns negative, zero, or positive, depending on whether x is respectively
|
|
smaller, equal, or greater compared to y. Comparison with NaN is undefined.
|
|
|
|
int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
|
|
|
|
Compares the absolute values of x and y.
|
|
|
|
int fmpr_sgn(const fmpr_t x)
|
|
|
|
Returns $-1$, $0$ or $+1$ according to the sign of x. The sign
|
|
of NaN is undefined.
|
|
|
|
*******************************************************************************
|
|
|
|
Random number generation
|
|
|
|
*******************************************************************************
|
|
|
|
void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long mag_bits)
|
|
|
|
Generates a finite random number whose mantissa has precision at most
|
|
<tt>bits</tt> and whose exponent has at most <tt>mag_bits</tt> bits. The
|
|
values are distributed non-uniformly: special bit patterns are generated
|
|
with high probability in order to allow the test code to exercise corner
|
|
cases.
|
|
|
|
void fmpr_randtest_not_zero(fmpr_t x, flint_rand_t state, long bits,
|
|
long mag_bits)
|
|
|
|
Identical to <tt>fmpr_randtest</tt>, except that zero is never produced
|
|
as an output.
|
|
|
|
void fmpr_randtest_special(fmpr_t x, flint_rand_t state, long bits,
|
|
long mag_bits)
|
|
|
|
Indentical to <tt>fmpr_randtest</tt>, except that the output occasionally
|
|
is set to an infinity or NaN.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Conversions
|
|
|
|
*******************************************************************************
|
|
|
|
int fmpr_get_mpfr(mpfr_t x, const fmpr_t y, mpfr_rnd_t rnd)
|
|
|
|
Sets the MPFR variable <tt>x</tt> to the value of <tt>y</tt>. If the
|
|
precision of <tt>x</tt> is too small to allow <tt>y</tt> to be represented
|
|
exactly, it is rounded in the specified MPFR rounding mode.
|
|
The return value indicates the direction of rounding,
|
|
following the standard convention of the MPFR library.
|
|
|
|
void fmpr_set_mpfr(fmpr_t x, const mpfr_t y)
|
|
|
|
Sets <tt>x</tt> to the exact value of the MPFR variable <tt>y</tt>.
|
|
|
|
void fmpr_set_ui(fmpr_t x, ulong c)
|
|
|
|
void fmpr_set_si(fmpr_t x, long c)
|
|
|
|
void fmpr_set_fmpz(fmpr_t x, const fmpz_t c)
|
|
|
|
Sets <tt>x</tt> exactly to the integer <tt>c</tt>.
|
|
|
|
void fmpr_get_fmpq(fmpq_t y, const fmpr_t x);
|
|
|
|
Sets y to the exact value of x. The result is undefined
|
|
if x is not a finite fraction.
|
|
|
|
long fmpr_set_fmpq(fmpr_t x, const fmpq_t y, long prec, fmpr_rnd_t rnd);
|
|
|
|
Sets x to the value of y, rounded according to prec and rnd.
|
|
|
|
void fmpr_set_fmpz_2exp(fmpr_t x, const fmpz_t man, const fmpz_t exp);
|
|
|
|
void fmpr_set_si_2exp_si(fmpr_t x, long man, long exp)
|
|
|
|
void fmpr_set_ui_2exp_si(fmpr_t x, ulong man, long exp)
|
|
|
|
Sets x to $\mathrm{man} \times 2^{\mathrm{exp}}$.
|
|
|
|
void fmpr_get_fmpz_2exp(fmpz_t man, fmpz_t exp, const fmpr_t x);
|
|
|
|
Sets man and exp to the unique integers such that
|
|
$x = \mathrm{man} \times 2^{\mathrm{exp}}$ and man is odd,
|
|
provided that x is a nonzero finite fraction.
|
|
If x is zero, both man and exp are set to zero. If x is infinite or NaN,
|
|
the result is undefined.
|
|
|
|
int fmpr_get_fmpz_fixed_fmpz(fmpz_t y, const fmpr_t x, const fmpz_t e)
|
|
|
|
int fmpr_get_fmpz_fixed_si(fmpz_t y, const fmpr_t x, long e)
|
|
|
|
Converts x to a mantissa with predetermined exponent, i.e. computes
|
|
an integer y such that $y \times 2^e \approx x$, truncating if necessary.
|
|
Returns 0 if exact and 1 if truncation occurred.
|
|
|
|
*******************************************************************************
|
|
|
|
Input and output
|
|
|
|
*******************************************************************************
|
|
|
|
void fmpr_print(const fmpr_t x)
|
|
|
|
Prints the mantissa and exponent of x as integers, precisely showing
|
|
the internal representation.
|
|
|
|
void fmpr_printd(const fmpr_t x, long digits)
|
|
|
|
Prints x as a decimal floating-point number, rounding to the specified
|
|
number of digits. This function is currently implemented using MPFR,
|
|
and does not support large exponents.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Arithmetic
|
|
|
|
*******************************************************************************
|
|
|
|
void fmpr_neg(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets y to the negation of x.
|
|
|
|
long fmpr_neg_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets y to the negation of x, rounding the result.
|
|
|
|
void fmpr_abs(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets y to the absolute value of x.
|
|
|
|
long fmpr_add(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_add_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_add_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_add_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = x + y$, rounded according to prec and rnd. The precision
|
|
can be FMPR_PREC_EXACT to perform an exact addition, provided that the
|
|
result fits in memory.
|
|
|
|
long _fmpr_add_eps(fmpr_t z, const fmpr_t x, int sign, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets <tt>z</tt> to the value that results by adding an infinitesimal quantity
|
|
of the given sign to <tt>x</tt>, and rounding. The result is undefined
|
|
if <tt>x</tt> is zero.
|
|
|
|
long fmpr_sub(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_sub_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_sub_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_sub_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = x - y$, rounded according to prec and rnd. The precision
|
|
can be FMPR_PREC_EXACT to perform an exact addition, provided that the
|
|
result fits in memory.
|
|
|
|
long fmpr_mul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_mul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_mul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_mul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = x \times y$, rounded according to prec and rnd. The precision
|
|
can be FMPR_PREC_EXACT to perform an exact multiplication, provided that the
|
|
result fits in memory.
|
|
|
|
void fmpr_mul_2exp_si(fmpr_t y, const fmpr_t x, long e)
|
|
|
|
void fmpr_mul_2exp_fmpz(fmpr_t y, const fmpr_t x, const fmpz_t e)
|
|
|
|
Sets y to x multiplied by $2^e$ without rounding.
|
|
|
|
long fmpr_div(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_div_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_ui_div(fmpr_t z, ulong x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_div_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_si_div(fmpr_t z, long x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_div_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_fmpz_div(fmpr_t z, const fmpz_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_fmpz_div_fmpz(fmpr_t z, const fmpz_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = x / y$, rounded according to prec and rnd. If $y$ is zero,
|
|
$z$ is set to NaN.
|
|
|
|
long fmpr_addmul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_addmul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_addmul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_addmul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = z + x \times y$, rounded according to prec and rnd. The
|
|
intermediate multiplication is always performed without roundoff. The
|
|
precision can be FMPR_PREC_EXACT to perform an exact addition, provided that the
|
|
result fits in memory.
|
|
|
|
long fmpr_submul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_submul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_submul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_submul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z = z - x \times y$, rounded according to prec and rnd. The
|
|
intermediate multiplication is always performed without roundoff. The
|
|
precision can be FMPR_PREC_EXACT to perform an exact subtraction, provided that the
|
|
result fits in memory.
|
|
|
|
long fmpr_sqrt(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_sqrt_ui(fmpr_t z, ulong x, long prec, fmpr_rnd_t rnd)
|
|
|
|
long fmpr_sqrt_fmpz(fmpr_t z, const fmpz_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z$ to the square root of $x$, rounded according to prec and rnd.
|
|
The result is NaN if $x$ is negative.
|
|
|
|
void fmpr_pow_sloppy_fmpz(fmpr_t y, const fmpr_t b, const fmpz_t e,
|
|
long prec, fmpr_rnd_t rnd)
|
|
|
|
void fmpr_pow_sloppy_ui(fmpr_t y, const fmpr_t b, ulong e,
|
|
long prec, fmpr_rnd_t rnd)
|
|
|
|
void fmpr_pow_sloppy_si(fmpr_t y, const fmpr_t b, long e,
|
|
long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $y = b^e$, computed using without guaranteeing correct (optimal)
|
|
rounding, but guaranteeing that the result is a correct upper or lower
|
|
bound if the rounding is directional. Currently requires $b \ge 0$.
|
|
|
|
|
|
*******************************************************************************
|
|
|
|
Special functions
|
|
|
|
*******************************************************************************
|
|
|
|
long fmpr_log(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z$ to $\log(x)$, rounded according to prec and rnd.
|
|
The result is NaN if $x$ is negative.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
long fmpr_log1p(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z$ to $\log(1+x)$, rounded according to prec and rnd. This function
|
|
computes an accurate value when $x$ is small.
|
|
The result is NaN if $1+x$ is negative.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
long fmpr_exp(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z$ to $\exp(x)$, rounded according to prec and rnd.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
long fmpr_expm1(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets $z$ to $\exp(x)-1$, rounded according to prec and rnd. This function
|
|
computes an accurate value when $x$ is small.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
|