mirror of
https://github.com/vale981/arb
synced 2025-03-05 09:21:38 -05:00
add cmpabs, fix bugs, and simplify division error bounding
This commit is contained in:
parent
9c986ebe9e
commit
c2a9b8b3e8
8 changed files with 186 additions and 37 deletions
18
doc/doc.html
18
doc/doc.html
|
@ -40,7 +40,7 @@ MathJax.Hub.Config({
|
|||
|
||||
<h1>Arb documentation</h1>
|
||||
|
||||
<p><i>Last updated: 2012-09-13 18:07:04 CET</i></p>
|
||||
<p><i>Last updated: 2012-09-14 12:37:05 CET</i></p>
|
||||
|
||||
<h2>Contents</h2>
|
||||
|
||||
|
@ -165,7 +165,7 @@ The output should be something like the following:
|
|||
<hr />
|
||||
<h2><a name="fmpr-h--floating-point-arithmetic--">fmpr.h (floating-point arithmetic)</a></h2>
|
||||
<p> A variable of type <tt>fmpr_t</tt> holds an arbitrary-precision binary floating-point number, i.e. a rational number of the form $x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd; or one of the special values zero, plus infinity, minus infinity, or NaN (not-a-number).</p>
|
||||
<p> The component $x$ is called the <i>mantissa</i>, and $y$ is called the <i>exponent</i>. Note that this is just one among many possible conventions: the mantissa (alternatively <i>significand</i>) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and it is sometimes customary to separate the <i>sign</i> from the mantissa.</p>
|
||||
<p> The component $x$ is called the <i>mantissa</i>, and $y$ is called the <i>exponent</i>. Note that this is just one among many possible conventions: the mantissa (alternatively <i>significand</i>) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and with a separate sign.</p>
|
||||
<p> The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some these might be added in the future.</p>
|
||||
<p> An <tt>fmpr</tt> number is exact and has no inherent "accuracy". We use the term <i>precision</i> to denote either the target precision of an operation, or the bit size of a mantissa (which in general is unrelated to the "accuracy" of the number: for example, the floating-point value 1 has a precision of 1 bit in this sense and is simultaneously an infinitely accurate approximation of the integer 1 and a 2-bit accurate approximation of $\sqrt 2 = 1.011010100\ldots_2$).</p>
|
||||
<p> Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.</p>
|
||||
|
@ -185,7 +185,7 @@ The output should be something like the following:
|
|||
</dd>
|
||||
<dt>FMPR_RND_NEAREST</dt>
|
||||
<dd>
|
||||
<p> Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd bit if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.</p>
|
||||
<p> Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.</p>
|
||||
</dd>
|
||||
<dt>FMPR_RND_DOWN</dt>
|
||||
<dd>
|
||||
|
@ -283,6 +283,10 @@ The output should be something like the following:
|
|||
<dd>
|
||||
<p> Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.</p>
|
||||
</dd>
|
||||
<dt>int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)</dt>
|
||||
<dd>
|
||||
<p> Compares the absolute values of x and y.</p>
|
||||
</dd>
|
||||
<dt>int fmpr_sgn(const fmpr_t x)</dt>
|
||||
<dd>
|
||||
<p> Returns $-1$, $0$ or $+1$ according to the sign of x. The sign of NaN is undefined.</p>
|
||||
|
@ -627,7 +631,7 @@ The output should be something like the following:
|
|||
<dt>void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long prec)</dt>
|
||||
<dt>void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);</dt>
|
||||
<dd>
|
||||
<p> Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm +\infty$.</p>
|
||||
<p> Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm \infty$. Otherwise, error propagation uses the rule $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| = \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le \frac{|xb|+|ya|}{|y| (|y|-b)}$$ where $-1 \le \xi_1, \xi_2 \le 1$, and where the triangle inequality has been applied to the numerator and the reverse triangle inequality has been applied to the denominator.</p>
|
||||
</dd>
|
||||
<dt>void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);</dt>
|
||||
<dd>
|
||||
|
@ -658,7 +662,7 @@ The output should be something like the following:
|
|||
<dt>void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)</dt>
|
||||
<dt>void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)</dt>
|
||||
<dd>
|
||||
<p> Sets $y = b^e$.</p>
|
||||
<p> Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$ are small enough and the exponent is positive, the exact power can be computed using FMPR_PREC_EXACT.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
<a name="fmprb-h--real-ball-arithmetic--Special-functions"><h3>Special functions</h3></a>
|
||||
|
@ -667,11 +671,11 @@ The output should be something like the following:
|
|||
<dt>void fmprb_log_ui(fmprb_t z, ulong x, long prec)</dt>
|
||||
<dt>void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)</dt>
|
||||
<dd>
|
||||
<p> Sets $z = \log(x)$.</p>
|
||||
<p> Sets $z = \log(x)$. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is calculated accurately for small radii via <tt>fmpr_log1p</tt>. An input containing zero currently raises an exception.</p>
|
||||
</dd>
|
||||
<dt>void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)</dt>
|
||||
<dd>
|
||||
<p> Sets $z = \exp(x)$.</p>
|
||||
<p> Sets $z = \exp(x)$. Error propagation is done using the following rule: the error is largest at $m + r$, and we have $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$. The last expression is calculated accurately for small radii via <tt>fmpr_expm1</tt>.</p>
|
||||
</dd>
|
||||
<dt>void fmprb_fac_ui(fmprb_t x, ulong n, long prec)</dt>
|
||||
<dd>
|
||||
|
|
|
@ -10,8 +10,7 @@
|
|||
conventions: the mantissa (alternatively <i>significand</i>) is
|
||||
sometimes viewed as a fraction in the interval $[1/2, 1)$, with the
|
||||
exponent pointing to the position above the top bit rather than the
|
||||
position of the bottom bit, and it is sometimes customary to separate
|
||||
the <i>sign</i> from the mantissa.
|
||||
position of the bottom bit, and with a separate sign.
|
||||
|
||||
The conventions for special values largely follow those of the
|
||||
IEEE floating-point standard. At the moment, there is no support
|
||||
|
@ -65,7 +64,7 @@ fmpr_rnd_t
|
|||
FMPR_RND_NEAREST
|
||||
|
||||
Specifies that the result of an operation should be rounded to the
|
||||
nearest representable number, rounding to an odd bit if there is a tie
|
||||
nearest representable number, rounding to an odd mantissa if there is a tie
|
||||
between two values. Note: the code for this rounding mode is currently
|
||||
not implemented.
|
||||
|
||||
|
@ -208,6 +207,10 @@ int fmpr_cmp(const fmpr_t x, const fmpr_t y)
|
|||
Returns negative, zero, or positive, depending on whether x is respectively
|
||||
smaller, equal, or greater compared to y. Comparison with NaN is undefined.
|
||||
|
||||
int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
|
||||
|
||||
Compares the absolute values of x and y.
|
||||
|
||||
int fmpr_sgn(const fmpr_t x)
|
||||
|
||||
Returns $-1$, $0$ or $+1$ according to the sign of x. The sign
|
||||
|
|
|
@ -300,7 +300,13 @@ void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long pre
|
|||
void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);
|
||||
|
||||
Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is
|
||||
set to $0 \pm +\infty$.
|
||||
set to $0 \pm \infty$. Otherwise, error propagation uses the rule
|
||||
$$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| =
|
||||
\left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le
|
||||
\frac{|xb|+|ya|}{|y| (|y|-b)}$$
|
||||
where $-1 \le \xi_1, \xi_2 \le 1$, and
|
||||
where the triangle inequality has been applied to the numerator and
|
||||
the reverse triangle inequality has been applied to the denominator.
|
||||
|
||||
void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);
|
||||
|
||||
|
@ -345,7 +351,9 @@ void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)
|
|||
|
||||
void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)
|
||||
|
||||
Sets $y = b^e$.
|
||||
Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$
|
||||
are small enough and the exponent is positive, the exact power can be
|
||||
computed using FMPR_PREC_EXACT.
|
||||
|
||||
|
||||
*******************************************************************************
|
||||
|
@ -360,11 +368,19 @@ void fmprb_log_ui(fmprb_t z, ulong x, long prec)
|
|||
|
||||
void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)
|
||||
|
||||
Sets $z = \log(x)$.
|
||||
Sets $z = \log(x)$. Error propagation is done using the following rule:
|
||||
assuming $m > r \ge 0$, the error is largest at $m - r$, and we have
|
||||
$\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is
|
||||
calculated accurately for small radii via <tt>fmpr_log1p</tt>.
|
||||
An input containing zero currently raises an exception.
|
||||
|
||||
void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)
|
||||
|
||||
Sets $z = \exp(x)$.
|
||||
Sets $z = \exp(x)$. Error propagation is done using the following rule:
|
||||
the error is largest at $m + r$, and we have
|
||||
$\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$.
|
||||
The last expression is calculated accurately for small radii
|
||||
via <tt>fmpr_expm1</tt>.
|
||||
|
||||
void fmprb_fac_ui(fmprb_t x, ulong n, long prec)
|
||||
|
||||
|
|
2
fmpr.h
2
fmpr.h
|
@ -205,7 +205,7 @@ fmpr_sgn(const fmpr_t x)
|
|||
|
||||
int fmpr_cmp(const fmpr_t x, const fmpr_t y);
|
||||
|
||||
|
||||
int fmpr_cmpabs(const fmpr_t x, const fmpr_t y);
|
||||
|
||||
|
||||
void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long exp_bits);
|
||||
|
|
47
fmpr/cmp.c
47
fmpr/cmp.c
|
@ -51,7 +51,7 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)
|
|||
if (xsign != ysign)
|
||||
return (xsign < 0) ? -1 : 1;
|
||||
|
||||
/* Reduces to integer comparison of bottom exponents are the same */
|
||||
/* Reduces to integer comparison if bottom exponents are the same */
|
||||
if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
|
||||
return fmpz_cmp(fmpr_manref(x), fmpr_manref(y)) < 0 ? -1 : 1;
|
||||
|
||||
|
@ -64,3 +64,48 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)
|
|||
|
||||
return res;
|
||||
}
|
||||
|
||||
int
|
||||
fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
|
||||
{
|
||||
int res, xsign, ysign;
|
||||
fmpr_t t;
|
||||
|
||||
if (fmpr_equal(x, y))
|
||||
return 0;
|
||||
|
||||
if (fmpr_is_special(x) || fmpr_is_special(y))
|
||||
{
|
||||
if (fmpr_is_nan(x) || fmpr_is_nan(y))
|
||||
return 0;
|
||||
if (fmpr_is_zero(x)) return -1;
|
||||
if (fmpr_is_zero(y)) return 1;
|
||||
if (fmpr_is_inf(x)) return fmpr_is_inf(y) ? 0 : 1;
|
||||
if (fmpr_is_inf(y)) return -1;
|
||||
return -1;
|
||||
}
|
||||
|
||||
/* Reduces to integer comparison if bottom exponents are the same */
|
||||
if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
|
||||
{
|
||||
res = fmpz_cmpabs(fmpr_manref(x), fmpr_manref(y));
|
||||
if (res != 0)
|
||||
res = (res < 0) ? -1 : 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
/* TODO: compare position of top exponents to avoid subtraction */
|
||||
xsign = fmpr_sgn(x);
|
||||
ysign = fmpr_sgn(y);
|
||||
|
||||
fmpr_init(t);
|
||||
if (xsign == ysign)
|
||||
fmpr_sub(t, x, y, 2, FMPR_RND_DOWN);
|
||||
else
|
||||
fmpr_add(t, x, y, 2, FMPR_RND_DOWN);
|
||||
res = fmpr_sgn(t) * xsign;
|
||||
fmpr_clear(t);
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
|
86
fmpr/test/t-cmpabs.c
Normal file
86
fmpr/test/t-cmpabs.c
Normal file
|
@ -0,0 +1,86 @@
|
|||
/*=============================================================================
|
||||
|
||||
This file is part of ARB.
|
||||
|
||||
ARB is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
ARB is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with ARB; if not, write to the Free Software
|
||||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
|
||||
=============================================================================*/
|
||||
/******************************************************************************
|
||||
|
||||
Copyright (C) 2012 Fredrik Johansson
|
||||
|
||||
******************************************************************************/
|
||||
|
||||
#include "fmpr.h"
|
||||
|
||||
|
||||
int main()
|
||||
{
|
||||
long iter;
|
||||
flint_rand_t state;
|
||||
|
||||
printf("cmpabs....");
|
||||
fflush(stdout);
|
||||
|
||||
flint_randinit(state);
|
||||
|
||||
for (iter = 0; iter < 100000; iter++)
|
||||
{
|
||||
long bits;
|
||||
fmpr_t x, y;
|
||||
mpfr_t X, Y;
|
||||
int cmp1, cmp2;
|
||||
|
||||
bits = 2 + n_randint(state, 200);
|
||||
|
||||
fmpr_init(x);
|
||||
fmpr_init(y);
|
||||
|
||||
mpfr_init2(X, bits);
|
||||
mpfr_init2(Y, bits);
|
||||
|
||||
fmpr_randtest_special(x, state, bits, 10);
|
||||
fmpr_randtest_special(y, state, bits, 10);
|
||||
|
||||
fmpr_get_mpfr(X, x, MPFR_RNDN);
|
||||
fmpr_get_mpfr(Y, y, MPFR_RNDN);
|
||||
|
||||
mpfr_abs(X, X, MPFR_RNDN);
|
||||
mpfr_abs(Y, Y, MPFR_RNDN);
|
||||
|
||||
cmp1 = fmpr_cmpabs(x, y);
|
||||
cmp2 = mpfr_cmp(X, Y);
|
||||
|
||||
if (cmp1 != cmp2)
|
||||
{
|
||||
printf("FAIL\n\n");
|
||||
printf("x = "); fmpr_print(x); printf("\n\n");
|
||||
printf("y = "); fmpr_print(y); printf("\n\n");
|
||||
printf("cmp1 = %d, cmp2 = %d\n\n", cmp1, cmp2);
|
||||
abort();
|
||||
}
|
||||
|
||||
fmpr_clear(x);
|
||||
fmpr_clear(y);
|
||||
|
||||
mpfr_clear(X);
|
||||
mpfr_clear(Y);
|
||||
}
|
||||
|
||||
flint_randclear(state);
|
||||
_fmpz_cleanup();
|
||||
printf("PASS\n");
|
||||
return EXIT_SUCCESS;
|
||||
}
|
|
@ -28,5 +28,5 @@
|
|||
int
|
||||
fmprb_contains_zero(const fmprb_t x)
|
||||
{
|
||||
return fmpr_cmp(fmprb_midref(x), fmprb_radref(x)) <= 0;
|
||||
return fmpr_cmpabs(fmprb_midref(x), fmprb_radref(x)) <= 0;
|
||||
}
|
||||
|
|
35
fmprb/div.c
35
fmprb/div.c
|
@ -58,31 +58,27 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
|
|||
}
|
||||
else
|
||||
{
|
||||
/* x/y - (x+a)/(y-b) = a/(b-y) + x*b/(y*(b-y)) */
|
||||
/* where we assume y > b */
|
||||
|
||||
fmpr_t t, u, by;
|
||||
|
||||
fmpr_t t, u;
|
||||
fmpr_init(t);
|
||||
fmpr_init(u);
|
||||
fmpr_init(by);
|
||||
|
||||
/* b - y */
|
||||
fmpr_sub(by, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
|
||||
/* numerator of error bound: |xb| + |ya|, rounded up */
|
||||
fmpr_mul(t, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
fmpr_abs(t, t);
|
||||
fmpr_mul(u, fmprb_midref(y), fmprb_radref(x), FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
fmpr_abs(u, u);
|
||||
fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
|
||||
/* y * (b - y) */
|
||||
fmpr_mul(t, fmprb_midref(y), by, FMPRB_RAD_PREC, FMPR_RND_DOWN);
|
||||
/* x * b */
|
||||
fmpr_mul(u, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
/* x*b / (y*(b-y)) */
|
||||
fmpr_div(u, u, t, FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
/* denominator of error bound: |y|(|y|-b), rounded down */
|
||||
if (fmpr_sgn(fmprb_midref(y)) > 0)
|
||||
fmpr_sub(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
|
||||
else
|
||||
fmpr_add(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
|
||||
fmpr_mul(u, u, fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
|
||||
fmpr_abs(u, u);
|
||||
|
||||
/* a / (b-y) */
|
||||
fmpr_div(t, fmprb_radref(x), by, FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
fmpr_abs(t, t);
|
||||
|
||||
fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
/* error bound */
|
||||
fmpr_div(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
|
||||
|
||||
r = fmpr_div(fmprb_midref(z), fmprb_midref(x), fmprb_midref(y), prec, FMPR_RND_DOWN);
|
||||
fmpr_add_error_result(fmprb_radref(z), t,
|
||||
|
@ -90,7 +86,6 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
|
|||
|
||||
fmpr_clear(t);
|
||||
fmpr_clear(u);
|
||||
fmpr_clear(by);
|
||||
}
|
||||
|
||||
fmprb_adjust(z);
|
||||
|
|
Loading…
Add table
Reference in a new issue