mirror of
https://github.com/vale981/arb
synced 2025-03-06 09:51:39 -05:00
470 lines
18 KiB
ReStructuredText
470 lines
18 KiB
ReStructuredText
**fmpr.h** -- binary floating-point numbers
|
|
===============================================================================
|
|
|
|
A variable of type *fmpr_t* holds an arbitrary-precision binary
|
|
floating-point number, i.e. a rational number of the form
|
|
`x \times 2^y` where `x, y \in \mathbb{Z}` and `x` is odd;
|
|
or one of the special values zero, plus infinity, minus infinity,
|
|
or NaN (not-a-number).
|
|
|
|
The component `x` is called the *mantissa*, and `y` is called the
|
|
*exponent*. Note that this is just one among many possible
|
|
conventions: the mantissa (alternatively *significand*) is
|
|
sometimes viewed as a fraction in the interval `[1/2, 1)`, with the
|
|
exponent pointing to the position above the top bit rather than the
|
|
position of the bottom bit, and with a separate sign.
|
|
|
|
The conventions for special values largely follow those of the
|
|
IEEE floating-point standard. At the moment, there is no support
|
|
for negative zero, unsigned infinity, or a NaN with a payload, though
|
|
some these might be added in the future.
|
|
|
|
An *fmpr* number is exact and has no inherent "accuracy". We
|
|
use the term *precision* to denote either the target precision of
|
|
an operation, or the bit size of a mantissa (which in general is
|
|
unrelated to the "accuracy" of the number: for example, the
|
|
floating-point value 1 has a precision of 1 bit in this sense and is
|
|
simultaneously an infinitely accurate approximation of the
|
|
integer 1 and a 2-bit accurate approximation of
|
|
`\sqrt 2 = 1.011010100\ldots_2`).
|
|
|
|
Except where otherwise noted, the output of an operation is the
|
|
floating-point number obtained by taking the inputs as exact numbers,
|
|
in principle carrying out the operation exactly, and rounding the
|
|
resulting real number to the nearest representable floating-point
|
|
number whose mantissa has at most the specified number of bits, in
|
|
the specified direction of rounding. Some operations are always
|
|
or optionally done exactly.
|
|
|
|
|
|
Types, macros and constants
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. type:: fmpr_struct
|
|
|
|
An *fmpr_struct* holds a mantissa and an exponent.
|
|
If the mantissa and exponent are sufficiently small, their values are
|
|
stored as immediate values in the *fmpr_struct*; large values are
|
|
represented by pointers to heap-allocated arbitrary-precision integers.
|
|
Currently, both the mantissa and exponent are implemented using
|
|
the FLINT *fmpz* type. Special values are currently encoded
|
|
by the mantissa being set to zero.
|
|
|
|
.. type:: fmpr_t
|
|
|
|
An *fmpr_t* is defined as an array of length one of type
|
|
*fmpr_struct*, permitting an *fmpr_t* to be passed by
|
|
reference.
|
|
|
|
.. type:: fmpr_rnd_t
|
|
|
|
Specifies the rounding mode for the result of an approximate operation.
|
|
|
|
.. macro:: FMPR_RND_NEAREST
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number, rounding to an odd mantissa if there is a tie
|
|
between two values. Note: the code for this rounding mode is currently
|
|
not implemented.
|
|
|
|
.. macro:: FMPR_RND_DOWN
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards zero.
|
|
|
|
.. macro:: FMPR_RND_UP
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction away from zero.
|
|
|
|
.. macro:: FMPR_RND_FLOOR
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards minus infinity.
|
|
|
|
.. macro:: FMPR_RND_CEIL
|
|
|
|
Specifies that the result of an operation should be rounded to the
|
|
nearest representable number in the direction towards plus infinity.
|
|
|
|
.. macro:: FMPR_PREC_EXACT
|
|
|
|
If passed as the precision parameter to a function, indicates that no
|
|
rounding is to be performed. This must only be used when it is known
|
|
that the result of the operation can be represented exactly and fits
|
|
in memory (the typical use case is working with values small integers).
|
|
Note that, for example, adding two numbers whose exponents are far
|
|
apart can easily produce an exact result that is far too large to
|
|
store in memory.
|
|
|
|
Memory management
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: void fmpr_init(fmpr_t x)
|
|
|
|
Initializes the variable *x* for use. Its value is set to zero.
|
|
|
|
.. function:: void fmpr_clear(fmpr_t x)
|
|
|
|
Clears the variable *x*, freeing or recycling its allocated memory.
|
|
|
|
|
|
Special values
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: void fmpr_zero(fmpr_t x)
|
|
|
|
.. function:: void fmpr_one(fmpr_t x)
|
|
|
|
.. function:: void fmpr_pos_inf(fmpr_t x)
|
|
|
|
.. function:: void fmpr_neg_inf(fmpr_t x)
|
|
|
|
.. function:: void fmpr_nan(fmpr_t x)
|
|
|
|
Sets *x* respectively to 0, 1, `+\infty`, `-\infty`, NaN.
|
|
|
|
.. function:: int fmpr_is_zero(const fmpr_t x)
|
|
|
|
.. function:: int fmpr_is_one(const fmpr_t x)
|
|
|
|
.. function:: int fmpr_is_pos_inf(const fmpr_t x)
|
|
|
|
.. function:: int fmpr_is_neg_inf(const fmpr_t x)
|
|
|
|
.. function:: int fmpr_is_nan(const fmpr_t x)
|
|
|
|
Returns nonzero iff *x* respectively equals
|
|
0, 1, `+\infty`, `-\infty`, NaN.
|
|
|
|
.. function:: int fmpr_is_inf(const fmpr_t x)
|
|
|
|
Returns nonzero iff *x* equals either `+\infty` or `-\infty`.
|
|
|
|
.. function:: int fmpr_is_normal(const fmpr_t x)
|
|
|
|
Returns nonzero iff *x* is a finite, nonzero floating-point value, i.e.
|
|
not one of the special values 0, `+\infty`, `-\infty`, NaN.
|
|
|
|
.. function:: int fmpr_is_special(const fmpr_t x)
|
|
|
|
Returns nonzero iff *x* is one of the special values
|
|
0, `+\infty`, `-\infty`, NaN, i.e. not a finite, nonzero
|
|
floating-point value.
|
|
|
|
Assignment, rounding and conversions
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: long _fmpr_normalise(fmpz_t man, fmpz_t exp, long prec, fmpr_rnd_t rnd)
|
|
|
|
Rounds the mantissa and exponent in-place.
|
|
|
|
.. function:: void fmpr_set(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets *y* to a copy of *x*.
|
|
|
|
.. function:: long fmpr_set_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_set_round_fmpz(fmpr_t x, const fmpz_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *y* to a copy of *x* rounded in the direction specified by rnd to the
|
|
number of bits specified by prec.
|
|
|
|
.. function:: void fmpr_set_error_result(fmpr_t err, const fmpr_t result, long rret)
|
|
|
|
Given the return value *rret* and output variable *result* from a
|
|
function performing a rounding (e.g. *fmpr_set_round* or *fmpr_add*), sets
|
|
*err* to a bound for the absolute error.
|
|
|
|
.. function:: void fmpr_add_error_result(fmpr_t err, const fmpr_t err_in, const fmpr_t result, long rret, long prec, fmpr_rnd_t rnd)
|
|
|
|
Like *fmpr_set_error_result*, but adds *err_in* to the error.
|
|
|
|
.. function:: int fmpr_get_mpfr(mpfr_t x, const fmpr_t y, mpfr_rnd_t rnd)
|
|
|
|
Sets the MPFR variable *x* to the value of *y*. If the
|
|
precision of *x* is too small to allow *y* to be represented
|
|
exactly, it is rounded in the specified MPFR rounding mode.
|
|
The return value indicates the direction of rounding,
|
|
following the standard convention of the MPFR library.
|
|
|
|
.. function:: void fmpr_set_mpfr(fmpr_t x, const mpfr_t y)
|
|
|
|
Sets *x* to the exact value of the MPFR variable *y*.
|
|
|
|
.. function:: void fmpr_set_ui(fmpr_t x, ulong c)
|
|
|
|
.. function:: void fmpr_set_si(fmpr_t x, long c)
|
|
|
|
.. function:: void fmpr_set_fmpz(fmpr_t x, const fmpz_t c)
|
|
|
|
Sets *x* exactly to the integer *c*.
|
|
|
|
.. function:: void fmpr_get_fmpq(fmpq_t y, const fmpr_t x)
|
|
|
|
Sets *y* to the exact value of *x*. The result is undefined
|
|
if *x* is not a finite fraction.
|
|
|
|
.. function:: long fmpr_set_fmpq(fmpr_t x, const fmpq_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *x* to the value of *y*, rounded according to *prec* and *rnd*.
|
|
|
|
.. function:: void fmpr_set_fmpz_2exp(fmpr_t x, const fmpz_t man, const fmpz_t exp)
|
|
|
|
.. function:: void fmpr_set_si_2exp_si(fmpr_t x, long man, long exp)
|
|
|
|
.. function:: void fmpr_set_ui_2exp_si(fmpr_t x, ulong man, long exp)
|
|
|
|
Sets *x* to `\mathrm{man} \times 2^{\mathrm{exp}}`.
|
|
|
|
.. function:: long fmpr_set_round_fmpz_2exp(fmpr_t y, const fmpz_t x, const fmpz_t exp, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *x* to `\mathrm{man} \times 2^{\mathrm{exp}}`, rounded according
|
|
to *prec* and *rnd*.
|
|
|
|
.. function:: void fmpr_get_fmpz_2exp(fmpz_t man, fmpz_t exp, const fmpr_t x)
|
|
|
|
Sets *man* and *exp* to the unique integers such that
|
|
`x = \mathrm{man} \times 2^{\mathrm{exp}}` and *man* is odd,
|
|
provided that *x* is a nonzero finite fraction.
|
|
If *x* is zero, both *man* and *exp* are set to zero. If *x* is
|
|
infinite or NaN, the result is undefined.
|
|
|
|
.. function:: int fmpr_get_fmpz_fixed_fmpz(fmpz_t y, const fmpr_t x, const fmpz_t e)
|
|
|
|
.. function:: int fmpr_get_fmpz_fixed_si(fmpz_t y, const fmpr_t x, long e)
|
|
|
|
Converts *x* to a mantissa with predetermined exponent, i.e. computes
|
|
an integer *y* such that `y \times 2^e \approx x`, truncating if necessary.
|
|
Returns 0 if exact and 1 if truncation occurred.
|
|
|
|
|
|
Comparisons
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: int fmpr_equal(const fmpr_t x, const fmpr_t y)
|
|
|
|
Returns nonzero iff *x* and *y* are exactly equal. This function does
|
|
not treat NaN specially, i.e. NaN compares as equal to itself.
|
|
|
|
.. function:: int fmpr_cmp(const fmpr_t x, const fmpr_t y)
|
|
|
|
Returns negative, zero, or positive, depending on whether *x* is
|
|
respectively smaller, equal, or greater compared to *y*.
|
|
Comparison with NaN is undefined.
|
|
|
|
.. function:: int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
|
|
|
|
Compares the absolute values of *x* and *y*.
|
|
|
|
.. function:: int fmpr_cmp_2exp_si(const fmpr_t x, long e)
|
|
|
|
.. function:: int fmpr_cmpabs_2exp_si(const fmpr_t x, long e)
|
|
|
|
Compares *x* (respectively its absolute value) with `2^e`.
|
|
|
|
.. function:: int fmpr_sgn(const fmpr_t x)
|
|
|
|
Returns `-1`, `0` or `+1` according to the sign of *x*. The sign
|
|
of NaN is undefined.
|
|
|
|
Random number generation
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long mag_bits)
|
|
|
|
Generates a finite random number whose mantissa has precision at most
|
|
*bits* and whose exponent has at most *mag_bits* bits. The
|
|
values are distributed non-uniformly: special bit patterns are generated
|
|
with high probability in order to allow the test code to exercise corner
|
|
cases.
|
|
|
|
.. function:: void fmpr_randtest_not_zero(fmpr_t x, flint_rand_t state, long bits, long mag_bits)
|
|
|
|
Identical to *fmpr_randtest*, except that zero is never produced
|
|
as an output.
|
|
|
|
.. function:: void fmpr_randtest_special(fmpr_t x, flint_rand_t state, long bits, long mag_bits)
|
|
|
|
Indentical to *fmpr_randtest*, except that the output occasionally
|
|
is set to an infinity or NaN.
|
|
|
|
|
|
|
|
|
|
Input and output
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: void fmpr_print(const fmpr_t x)
|
|
|
|
Prints the mantissa and exponent of *x* as integers, precisely showing
|
|
the internal representation.
|
|
|
|
.. function:: void fmpr_printd(const fmpr_t x, long digits)
|
|
|
|
Prints *x* as a decimal floating-point number, rounding to the specified
|
|
number of digits. This function is currently implemented using MPFR,
|
|
and does not support large exponents.
|
|
|
|
|
|
Arithmetic
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: void fmpr_neg(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets *y* to the negation of *x*.
|
|
|
|
.. function:: long fmpr_neg_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *y* to the negation of *x*, rounding the result.
|
|
|
|
.. function:: void fmpr_abs(fmpr_t y, const fmpr_t x)
|
|
|
|
Sets *y* to the absolute value of *x*.
|
|
|
|
.. function:: long fmpr_add(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_add_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_add_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_add_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = x + y`, rounded according to *prec* and *rnd*. The precision
|
|
can be *FMPR_PREC_EXACT* to perform an exact addition, provided that the
|
|
result fits in memory.
|
|
|
|
.. function:: long _fmpr_add_eps(fmpr_t z, const fmpr_t x, int sign, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to the value that results by adding an infinitesimal quantity
|
|
of the given sign to *x*, and rounding. The result is undefined
|
|
if *x* is zero.
|
|
|
|
.. function:: long fmpr_sub(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_sub_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_sub_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_sub_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = x - y`, rounded according to *prec* and *rnd*. The precision
|
|
can be *FMPR_PREC_EXACT* to perform an exact addition, provided that the
|
|
result fits in memory.
|
|
|
|
.. function:: long fmpr_mul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_mul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_mul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_mul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = x \times y`, rounded according to prec and rnd. The precision
|
|
can be *FMPR_PREC_EXACT* to perform an exact multiplication, provided that the
|
|
result fits in memory.
|
|
|
|
.. function:: void fmpr_mul_2exp_si(fmpr_t y, const fmpr_t x, long e)
|
|
|
|
.. function:: void fmpr_mul_2exp_fmpz(fmpr_t y, const fmpr_t x, const fmpz_t e)
|
|
|
|
Sets *y* to *x* multiplied by `2^e` without rounding.
|
|
|
|
.. function:: long fmpr_div(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_div_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_ui_div(fmpr_t z, ulong x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_div_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_si_div(fmpr_t z, long x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_div_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_fmpz_div(fmpr_t z, const fmpz_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_fmpz_div_fmpz(fmpr_t z, const fmpz_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = x / y`, rounded according to *prec* and *rnd*. If *y* is zero,
|
|
*z* is set to NaN.
|
|
|
|
.. function:: long fmpr_addmul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_addmul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_addmul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_addmul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = z + x \times y`, rounded according to *prec* and *rnd*. The
|
|
intermediate multiplication is always performed without roundoff. The
|
|
precision can be *FMPR_PREC_EXACT* to perform an exact addition, provided
|
|
that the result fits in memory.
|
|
|
|
.. function:: long fmpr_submul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_submul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_submul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_submul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `z = z - x \times y`, rounded according to *prec* and *rnd*. The
|
|
intermediate multiplication is always performed without roundoff. The
|
|
precision can be *FMPR_PREC_EXACT* to perform an exact subtraction, provided
|
|
that the result fits in memory.
|
|
|
|
.. function:: long fmpr_sqrt(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_sqrt_ui(fmpr_t z, ulong x, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: long fmpr_sqrt_fmpz(fmpr_t z, const fmpz_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to the square root of *x*, rounded according to *prec* and *rnd*.
|
|
The result is NaN if *x* is negative.
|
|
|
|
.. function:: void fmpr_pow_sloppy_fmpz(fmpr_t y, const fmpr_t b, const fmpz_t e, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: void fmpr_pow_sloppy_ui(fmpr_t y, const fmpr_t b, ulong e, long prec, fmpr_rnd_t rnd)
|
|
|
|
.. function:: void fmpr_pow_sloppy_si(fmpr_t y, const fmpr_t b, long e, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets `y = b^e`, computed using without guaranteeing correct (optimal)
|
|
rounding, but guaranteeing that the result is a correct upper or lower
|
|
bound if the rounding is directional. Currently requires `b \ge 0`.
|
|
|
|
|
|
Special functions
|
|
-------------------------------------------------------------------------------
|
|
|
|
.. function:: long fmpr_log(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to `\log(x)`, rounded according to *prec* and *rnd*.
|
|
The result is NaN if *x* is negative.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
.. function:: long fmpr_log1p(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to `\log(1+x)`, rounded according to *prec* and *rnd*.
|
|
This function
|
|
computes an accurate value when *x* is small.
|
|
The result is NaN if `1+x` is negative.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
.. function:: long fmpr_exp(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to `\exp(x)`, rounded according to *prec* and *rnd*.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
.. function:: long fmpr_expm1(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd)
|
|
|
|
Sets *z* to `\exp(x)-1`, rounded according to *prec* and *rnd*.
|
|
This function computes an accurate value when *x* is small.
|
|
This function is currently implemented using MPFR and does not
|
|
support large exponents.
|
|
|
|
|