fmpr.h -- binary floating-point numbers =============================================================================== A variable of type *fmpr_t* holds an arbitrary-precision binary floating-point number, i.e. a rational number of the form `x \times 2^y` where `x, y \in \mathbb{Z}` and `x` is odd; or one of the special values zero, plus infinity, minus infinity, or NaN (not-a-number). The component `x` is called the *mantissa*, and `y` is called the *exponent*. Note that this is just one among many possible conventions: the mantissa (alternatively *significand*) is sometimes viewed as a fraction in the interval `[1/2, 1)`, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and with a separate sign. The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some these might be added in the future. An *fmpr* number is exact and has no inherent "accuracy". We use the term *precision* to denote either the target precision of an operation, or the bit size of a mantissa (which in general is unrelated to the "accuracy" of the number: for example, the floating-point value 1 has a precision of 1 bit in this sense and is simultaneously an infinitely accurate approximation of the integer 1 and a 2-bit accurate approximation of `\sqrt 2 = 1.011010100\ldots_2`). Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly. Types, macros and constants ------------------------------------------------------------------------------- .. type:: fmpr_struct An *fmpr_struct* holds a mantissa and an exponent. If the mantissa and exponent are sufficiently small, their values are stored as immediate values in the *fmpr_struct*; large values are represented by pointers to heap-allocated arbitrary-precision integers. Currently, both the mantissa and exponent are implemented using the FLINT *fmpz* type. Special values are currently encoded by the mantissa being set to zero. .. type:: fmpr_t An *fmpr_t* is defined as an array of length one of type *fmpr_struct*, permitting an *fmpr_t* to be passed by reference. .. type:: fmpr_rnd_t Specifies the rounding mode for the result of an approximate operation. .. macro:: FMPR_RND_NEAREST Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Note: the code for this rounding mode is currently not implemented. .. macro:: FMPR_RND_DOWN Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards zero. .. macro:: FMPR_RND_UP Specifies that the result of an operation should be rounded to the nearest representable number in the direction away from zero. .. macro:: FMPR_RND_FLOOR Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards minus infinity. .. macro:: FMPR_RND_CEIL Specifies that the result of an operation should be rounded to the nearest representable number in the direction towards plus infinity. .. macro:: FMPR_PREC_EXACT If passed as the precision parameter to a function, indicates that no rounding is to be performed. This must only be used when it is known that the result of the operation can be represented exactly and fits in memory (the typical use case is working with values small integers). Note that, for example, adding two numbers whose exponents are far apart can easily produce an exact result that is far too large to store in memory. Memory management ------------------------------------------------------------------------------- .. function:: void fmpr_init(fmpr_t x) Initializes the variable *x* for use. Its value is set to zero. .. function:: void fmpr_clear(fmpr_t x) Clears the variable *x*, freeing or recycling its allocated memory. Special values ------------------------------------------------------------------------------- .. function:: void fmpr_zero(fmpr_t x) .. function:: void fmpr_one(fmpr_t x) .. function:: void fmpr_pos_inf(fmpr_t x) .. function:: void fmpr_neg_inf(fmpr_t x) .. function:: void fmpr_nan(fmpr_t x) Sets *x* respectively to 0, 1, `+\infty`, `-\infty`, NaN. .. function:: int fmpr_is_zero(const fmpr_t x) .. function:: int fmpr_is_one(const fmpr_t x) .. function:: int fmpr_is_pos_inf(const fmpr_t x) .. function:: int fmpr_is_neg_inf(const fmpr_t x) .. function:: int fmpr_is_nan(const fmpr_t x) Returns nonzero iff *x* respectively equals 0, 1, `+\infty`, `-\infty`, NaN. .. function:: int fmpr_is_inf(const fmpr_t x) Returns nonzero iff *x* equals either `+\infty` or `-\infty`. .. function:: int fmpr_is_normal(const fmpr_t x) Returns nonzero iff *x* is a finite, nonzero floating-point value, i.e. not one of the special values 0, `+\infty`, `-\infty`, NaN. .. function:: int fmpr_is_special(const fmpr_t x) Returns nonzero iff *x* is one of the special values 0, `+\infty`, `-\infty`, NaN, i.e. not a finite, nonzero floating-point value. Assignment and rounding ------------------------------------------------------------------------------- .. function:: long _fmpr_normalise(fmpz_t man, fmpz_t exp, long prec, fmpr_rnd_t rnd) Rounds the mantissa and exponent in-place. .. function:: void fmpr_set(fmpr_t y, const fmpr_t x) Sets *y* to a copy of *x*. .. function:: long fmpr_set_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *y* to a copy of *x* rounded in the direction specified by rnd to the number of bits specified by prec. .. function:: void fmpr_set_error_result(fmpr_t err, const fmpr_t result, long rret) Given the return value *rret* and output variable *result* from a function performing a rounding (e.g. *fmpr_set_round* or *fmpr_add*), sets *err* to a bound for the absolute error. .. function:: void fmpr_add_error_result(fmpr_t err, const fmpr_t err_in, const fmpr_t result, long rret, long prec, fmpr_rnd_t rnd) Like *fmpr_set_error_result*, but adds *err_in* to the error. Comparisons ------------------------------------------------------------------------------- .. function:: int fmpr_equal(const fmpr_t x, const fmpr_t y) Returns nonzero iff *x* and *y* are exactly equal. This function does not treat NaN specially, i.e. NaN compares as equal to itself. .. function:: int fmpr_cmp(const fmpr_t x, const fmpr_t y) Returns negative, zero, or positive, depending on whether *x* is respectively smaller, equal, or greater compared to *y*. Comparison with NaN is undefined. .. function:: int fmpr_cmpabs(const fmpr_t x, const fmpr_t y) Compares the absolute values of *x* and *y*. .. function:: int fmpr_sgn(const fmpr_t x) Returns `-1`, `0` or `+1` according to the sign of *x*. The sign of NaN is undefined. Random number generation ------------------------------------------------------------------------------- .. function:: void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long mag_bits) Generates a finite random number whose mantissa has precision at most *bits* and whose exponent has at most *mag_bits* bits. The values are distributed non-uniformly: special bit patterns are generated with high probability in order to allow the test code to exercise corner cases. .. function:: void fmpr_randtest_not_zero(fmpr_t x, flint_rand_t state, long bits, long mag_bits) Identical to *fmpr_randtest*, except that zero is never produced as an output. .. function:: void fmpr_randtest_special(fmpr_t x, flint_rand_t state, long bits, long mag_bits) Indentical to *fmpr_randtest*, except that the output occasionally is set to an infinity or NaN. Conversions ------------------------------------------------------------------------------- .. function:: int fmpr_get_mpfr(mpfr_t x, const fmpr_t y, mpfr_rnd_t rnd) Sets the MPFR variable *x* to the value of *y*. If the precision of *x* is too small to allow *y* to be represented exactly, it is rounded in the specified MPFR rounding mode. The return value indicates the direction of rounding, following the standard convention of the MPFR library. .. function:: void fmpr_set_mpfr(fmpr_t x, const mpfr_t y) Sets *x* to the exact value of the MPFR variable *y*. .. function:: void fmpr_set_ui(fmpr_t x, ulong c) .. function:: void fmpr_set_si(fmpr_t x, long c) .. function:: void fmpr_set_fmpz(fmpr_t x, const fmpz_t c) Sets *x* exactly to the integer *c*. .. function:: void fmpr_get_fmpq(fmpq_t y, const fmpr_t x); Sets *y* to the exact value of *x*. The result is undefined if *x* is not a finite fraction. .. function:: long fmpr_set_fmpq(fmpr_t x, const fmpq_t y, long prec, fmpr_rnd_t rnd); Sets *x* to the value of *y*, rounded according to *prec* and *rnd*. .. function:: void fmpr_set_fmpz_2exp(fmpr_t x, const fmpz_t man, const fmpz_t exp) .. function:: void fmpr_set_si_2exp_si(fmpr_t x, long man, long exp) .. function:: void fmpr_set_ui_2exp_si(fmpr_t x, ulong man, long exp) Sets *x* to `\mathrm{man} \times 2^{\mathrm{exp}}`. .. function:: void fmpr_get_fmpz_2exp(fmpz_t man, fmpz_t exp, const fmpr_t x); Sets *man* and *exp* to the unique integers such that `x = \mathrm{man} \times 2^{\mathrm{exp}}` and *man* is odd, provided that *x* is a nonzero finite fraction. If *x* is zero, both *man* and *exp* are set to zero. If *x* is infinite or NaN, the result is undefined. .. function:: int fmpr_get_fmpz_fixed_fmpz(fmpz_t y, const fmpr_t x, const fmpz_t e) .. function:: int fmpr_get_fmpz_fixed_si(fmpz_t y, const fmpr_t x, long e) Converts *x* to a mantissa with predetermined exponent, i.e. computes an integer *y* such that `y \times 2^e \approx x`, truncating if necessary. Returns 0 if exact and 1 if truncation occurred. Input and output ------------------------------------------------------------------------------- .. function:: void fmpr_print(const fmpr_t x) Prints the mantissa and exponent of *x* as integers, precisely showing the internal representation. .. function:: void fmpr_printd(const fmpr_t x, long digits) Prints *x* as a decimal floating-point number, rounding to the specified number of digits. This function is currently implemented using MPFR, and does not support large exponents. Arithmetic ------------------------------------------------------------------------------- .. function:: void fmpr_neg(fmpr_t y, const fmpr_t x) Sets *y* to the negation of *x*. .. function:: long fmpr_neg_round(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *y* to the negation of *x*, rounding the result. .. function:: void fmpr_abs(fmpr_t y, const fmpr_t x) Sets *y* to the absolute value of *x*. .. function:: long fmpr_add(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_add_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_add_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_add_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = x + y`, rounded according to *prec* and *rnd*. The precision can be *FMPR_PREC_EXACT* to perform an exact addition, provided that the result fits in memory. .. function:: long _fmpr_add_eps(fmpr_t z, const fmpr_t x, int sign, long prec, fmpr_rnd_t rnd) Sets *z* to the value that results by adding an infinitesimal quantity of the given sign to *x*, and rounding. The result is undefined if *x* is zero. .. function:: long fmpr_sub(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_sub_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_sub_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_sub_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = x - y`, rounded according to *prec* and *rnd*. The precision can be *FMPR_PREC_EXACT* to perform an exact addition, provided that the result fits in memory. .. function:: long fmpr_mul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_mul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_mul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_mul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = x \times y`, rounded according to prec and rnd. The precision can be *FMPR_PREC_EXACT* to perform an exact multiplication, provided that the result fits in memory. .. function:: void fmpr_mul_2exp_si(fmpr_t y, const fmpr_t x, long e) .. function:: void fmpr_mul_2exp_fmpz(fmpr_t y, const fmpr_t x, const fmpz_t e) Sets *y* to *x* multiplied by `2^e` without rounding. .. function:: long fmpr_div(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_div_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_ui_div(fmpr_t z, ulong x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_div_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_si_div(fmpr_t z, long x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_div_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_fmpz_div(fmpr_t z, const fmpz_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_fmpz_div_fmpz(fmpr_t z, const fmpz_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = x / y`, rounded according to *prec* and *rnd*. If *y* is zero, *z* is set to NaN. .. function:: long fmpr_addmul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_addmul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_addmul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_addmul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = z + x \times y`, rounded according to *prec* and *rnd*. The intermediate multiplication is always performed without roundoff. The precision can be *FMPR_PREC_EXACT* to perform an exact addition, provided that the result fits in memory. .. function:: long fmpr_submul(fmpr_t z, const fmpr_t x, const fmpr_t y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_submul_ui(fmpr_t z, const fmpr_t x, ulong y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_submul_si(fmpr_t z, const fmpr_t x, long y, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_submul_fmpz(fmpr_t z, const fmpr_t x, const fmpz_t y, long prec, fmpr_rnd_t rnd) Sets `z = z - x \times y`, rounded according to *prec* and *rnd*. The intermediate multiplication is always performed without roundoff. The precision can be *FMPR_PREC_EXACT* to perform an exact subtraction, provided that the result fits in memory. .. function:: long fmpr_sqrt(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_sqrt_ui(fmpr_t z, ulong x, long prec, fmpr_rnd_t rnd) .. function:: long fmpr_sqrt_fmpz(fmpr_t z, const fmpz_t x, long prec, fmpr_rnd_t rnd) Sets *z* to the square root of *x*, rounded according to *prec* and *rnd*. The result is NaN if *x* is negative. .. function:: void fmpr_pow_sloppy_fmpz(fmpr_t y, const fmpr_t b, const fmpz_t e, long prec, fmpr_rnd_t rnd) .. function:: void fmpr_pow_sloppy_ui(fmpr_t y, const fmpr_t b, ulong e, long prec, fmpr_rnd_t rnd) .. function:: void fmpr_pow_sloppy_si(fmpr_t y, const fmpr_t b, long e, long prec, fmpr_rnd_t rnd) Sets `y = b^e`, computed using without guaranteeing correct (optimal) rounding, but guaranteeing that the result is a correct upper or lower bound if the rounding is directional. Currently requires `b \ge 0`. Special functions ------------------------------------------------------------------------------- .. function:: long fmpr_log(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *z* to `\log(x)`, rounded according to *prec* and *rnd*. The result is NaN if *x* is negative. This function is currently implemented using MPFR and does not support large exponents. .. function:: long fmpr_log1p(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *z* to `\log(1+x)`, rounded according to *prec* and *rnd*. This function computes an accurate value when *x* is small. The result is NaN if `1+x` is negative. This function is currently implemented using MPFR and does not support large exponents. .. function:: long fmpr_exp(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *z* to `\exp(x)`, rounded according to *prec* and *rnd*. This function is currently implemented using MPFR and does not support large exponents. .. function:: long fmpr_expm1(fmpr_t y, const fmpr_t x, long prec, fmpr_rnd_t rnd) Sets *z* to `\exp(x)-1`, rounded according to *prec* and *rnd*. This function computes an accurate value when *x* is small. This function is currently implemented using MPFR and does not support large exponents.