add cmpabs, fix bugs, and simplify division error bounding

This commit is contained in:
Fredrik Johansson 2012-09-14 12:38:43 +02:00
parent 9c986ebe9e
commit c2a9b8b3e8
8 changed files with 186 additions and 37 deletions

View file

@ -40,7 +40,7 @@ MathJax.Hub.Config({
<h1>Arb documentation</h1>
<p><i>Last updated: 2012-09-13 18:07:04 CET</i></p>
<p><i>Last updated: 2012-09-14 12:37:05 CET</i></p>
<h2>Contents</h2>
@ -165,7 +165,7 @@ The output should be something like the following:
<hr />
<h2><a name="fmpr-h--floating-point-arithmetic--">fmpr.h (floating-point arithmetic)</a></h2>
<p> A variable of type <tt>fmpr_t</tt> holds an arbitrary-precision binary floating-point number, i.e. a rational number of the form $x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd; or one of the special values zero, plus infinity, minus infinity, or NaN (not-a-number).</p>
<p> The component $x$ is called the <i>mantissa</i>, and $y$ is called the <i>exponent</i>. Note that this is just one among many possible conventions: the mantissa (alternatively <i>significand</i>) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and it is sometimes customary to separate the <i>sign</i> from the mantissa.</p>
<p> The component $x$ is called the <i>mantissa</i>, and $y$ is called the <i>exponent</i>. Note that this is just one among many possible conventions: the mantissa (alternatively <i>significand</i>) is sometimes viewed as a fraction in the interval $[1/2, 1)$, with the exponent pointing to the position above the top bit rather than the position of the bottom bit, and with a separate sign.</p>
<p> The conventions for special values largely follow those of the IEEE floating-point standard. At the moment, there is no support for negative zero, unsigned infinity, or a NaN with a payload, though some these might be added in the future.</p>
<p> An <tt>fmpr</tt> number is exact and has no inherent "accuracy". We use the term <i>precision</i> to denote either the target precision of an operation, or the bit size of a mantissa (which in general is unrelated to the "accuracy" of the number: for example, the floating-point value 1 has a precision of 1 bit in this sense and is simultaneously an infinitely accurate approximation of the integer 1 and a 2-bit accurate approximation of $\sqrt 2 = 1.011010100\ldots_2$).</p>
<p> Except where otherwise noted, the output of an operation is the floating-point number obtained by taking the inputs as exact numbers, in principle carrying out the operation exactly, and rounding the resulting real number to the nearest representable floating-point number whose mantissa has at most the specified number of bits, in the specified direction of rounding. Some operations are always or optionally done exactly.</p>
@ -185,7 +185,7 @@ The output should be something like the following:
</dd>
<dt>FMPR_RND_NEAREST</dt>
<dd>
<p> Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd bit if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.</p>
<p> Specifies that the result of an operation should be rounded to the nearest representable number, rounding to an odd mantissa if there is a tie between two values. Note: the code for this rounding mode is currently not implemented.</p>
</dd>
<dt>FMPR_RND_DOWN</dt>
<dd>
@ -283,6 +283,10 @@ The output should be something like the following:
<dd>
<p> Returns negative, zero, or positive, depending on whether x is respectively smaller, equal, or greater compared to y. Comparison with NaN is undefined.</p>
</dd>
<dt>int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)</dt>
<dd>
<p> Compares the absolute values of x and y.</p>
</dd>
<dt>int fmpr_sgn(const fmpr_t x)</dt>
<dd>
<p> Returns $-1$, $0$ or $+1$ according to the sign of x. The sign of NaN is undefined.</p>
@ -627,7 +631,7 @@ The output should be something like the following:
<dt>void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long prec)</dt>
<dt>void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);</dt>
<dd>
<p> Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm +\infty$.</p>
<p> Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is set to $0 \pm \infty$. Otherwise, error propagation uses the rule $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| = \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le \frac{|xb|+|ya|}{|y| (|y|-b)}$$ where $-1 \le \xi_1, \xi_2 \le 1$, and where the triangle inequality has been applied to the numerator and the reverse triangle inequality has been applied to the denominator.</p>
</dd>
<dt>void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);</dt>
<dd>
@ -658,7 +662,7 @@ The output should be something like the following:
<dt>void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)</dt>
<dt>void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)</dt>
<dd>
<p> Sets $y = b^e$.</p>
<p> Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$ are small enough and the exponent is positive, the exact power can be computed using FMPR_PREC_EXACT.</p>
</dd>
</dl>
<a name="fmprb-h--real-ball-arithmetic--Special-functions"><h3>Special functions</h3></a>
@ -667,11 +671,11 @@ The output should be something like the following:
<dt>void fmprb_log_ui(fmprb_t z, ulong x, long prec)</dt>
<dt>void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)</dt>
<dd>
<p> Sets $z = \log(x)$.</p>
<p> Sets $z = \log(x)$. Error propagation is done using the following rule: assuming $m > r \ge 0$, the error is largest at $m - r$, and we have $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is calculated accurately for small radii via <tt>fmpr_log1p</tt>. An input containing zero currently raises an exception.</p>
</dd>
<dt>void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)</dt>
<dd>
<p> Sets $z = \exp(x)$.</p>
<p> Sets $z = \exp(x)$. Error propagation is done using the following rule: the error is largest at $m + r$, and we have $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$. The last expression is calculated accurately for small radii via <tt>fmpr_expm1</tt>.</p>
</dd>
<dt>void fmprb_fac_ui(fmprb_t x, ulong n, long prec)</dt>
<dd>

View file

@ -10,8 +10,7 @@
conventions: the mantissa (alternatively <i>significand</i>) is
sometimes viewed as a fraction in the interval $[1/2, 1)$, with the
exponent pointing to the position above the top bit rather than the
position of the bottom bit, and it is sometimes customary to separate
the <i>sign</i> from the mantissa.
position of the bottom bit, and with a separate sign.
The conventions for special values largely follow those of the
IEEE floating-point standard. At the moment, there is no support
@ -65,7 +64,7 @@ fmpr_rnd_t
FMPR_RND_NEAREST
Specifies that the result of an operation should be rounded to the
nearest representable number, rounding to an odd bit if there is a tie
nearest representable number, rounding to an odd mantissa if there is a tie
between two values. Note: the code for this rounding mode is currently
not implemented.
@ -208,6 +207,10 @@ int fmpr_cmp(const fmpr_t x, const fmpr_t y)
Returns negative, zero, or positive, depending on whether x is respectively
smaller, equal, or greater compared to y. Comparison with NaN is undefined.
int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
Compares the absolute values of x and y.
int fmpr_sgn(const fmpr_t x)
Returns $-1$, $0$ or $+1$ according to the sign of x. The sign

View file

@ -300,7 +300,13 @@ void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long pre
void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);
Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is
set to $0 \pm +\infty$.
set to $0 \pm \infty$. Otherwise, error propagation uses the rule
$$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| =
\left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le
\frac{|xb|+|ya|}{|y| (|y|-b)}$$
where $-1 \le \xi_1, \xi_2 \le 1$, and
where the triangle inequality has been applied to the numerator and
the reverse triangle inequality has been applied to the denominator.
void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);
@ -345,7 +351,9 @@ void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)
void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)
Sets $y = b^e$.
Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$
are small enough and the exponent is positive, the exact power can be
computed using FMPR_PREC_EXACT.
*******************************************************************************
@ -360,11 +368,19 @@ void fmprb_log_ui(fmprb_t z, ulong x, long prec)
void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)
Sets $z = \log(x)$.
Sets $z = \log(x)$. Error propagation is done using the following rule:
assuming $m > r \ge 0$, the error is largest at $m - r$, and we have
$\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is
calculated accurately for small radii via <tt>fmpr_log1p</tt>.
An input containing zero currently raises an exception.
void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)
Sets $z = \exp(x)$.
Sets $z = \exp(x)$. Error propagation is done using the following rule:
the error is largest at $m + r$, and we have
$\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$.
The last expression is calculated accurately for small radii
via <tt>fmpr_expm1</tt>.
void fmprb_fac_ui(fmprb_t x, ulong n, long prec)

2
fmpr.h
View file

@ -205,7 +205,7 @@ fmpr_sgn(const fmpr_t x)
int fmpr_cmp(const fmpr_t x, const fmpr_t y);
int fmpr_cmpabs(const fmpr_t x, const fmpr_t y);
void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long exp_bits);

View file

@ -51,7 +51,7 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)
if (xsign != ysign)
return (xsign < 0) ? -1 : 1;
/* Reduces to integer comparison of bottom exponents are the same */
/* Reduces to integer comparison if bottom exponents are the same */
if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
return fmpz_cmp(fmpr_manref(x), fmpr_manref(y)) < 0 ? -1 : 1;
@ -64,3 +64,48 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)
return res;
}
int
fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
{
int res, xsign, ysign;
fmpr_t t;
if (fmpr_equal(x, y))
return 0;
if (fmpr_is_special(x) || fmpr_is_special(y))
{
if (fmpr_is_nan(x) || fmpr_is_nan(y))
return 0;
if (fmpr_is_zero(x)) return -1;
if (fmpr_is_zero(y)) return 1;
if (fmpr_is_inf(x)) return fmpr_is_inf(y) ? 0 : 1;
if (fmpr_is_inf(y)) return -1;
return -1;
}
/* Reduces to integer comparison if bottom exponents are the same */
if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
{
res = fmpz_cmpabs(fmpr_manref(x), fmpr_manref(y));
if (res != 0)
res = (res < 0) ? -1 : 1;
}
else
{
/* TODO: compare position of top exponents to avoid subtraction */
xsign = fmpr_sgn(x);
ysign = fmpr_sgn(y);
fmpr_init(t);
if (xsign == ysign)
fmpr_sub(t, x, y, 2, FMPR_RND_DOWN);
else
fmpr_add(t, x, y, 2, FMPR_RND_DOWN);
res = fmpr_sgn(t) * xsign;
fmpr_clear(t);
}
return res;
}

86
fmpr/test/t-cmpabs.c Normal file
View file

@ -0,0 +1,86 @@
/*=============================================================================
This file is part of ARB.
ARB is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
ARB is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with ARB; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
=============================================================================*/
/******************************************************************************
Copyright (C) 2012 Fredrik Johansson
******************************************************************************/
#include "fmpr.h"
int main()
{
long iter;
flint_rand_t state;
printf("cmpabs....");
fflush(stdout);
flint_randinit(state);
for (iter = 0; iter < 100000; iter++)
{
long bits;
fmpr_t x, y;
mpfr_t X, Y;
int cmp1, cmp2;
bits = 2 + n_randint(state, 200);
fmpr_init(x);
fmpr_init(y);
mpfr_init2(X, bits);
mpfr_init2(Y, bits);
fmpr_randtest_special(x, state, bits, 10);
fmpr_randtest_special(y, state, bits, 10);
fmpr_get_mpfr(X, x, MPFR_RNDN);
fmpr_get_mpfr(Y, y, MPFR_RNDN);
mpfr_abs(X, X, MPFR_RNDN);
mpfr_abs(Y, Y, MPFR_RNDN);
cmp1 = fmpr_cmpabs(x, y);
cmp2 = mpfr_cmp(X, Y);
if (cmp1 != cmp2)
{
printf("FAIL\n\n");
printf("x = "); fmpr_print(x); printf("\n\n");
printf("y = "); fmpr_print(y); printf("\n\n");
printf("cmp1 = %d, cmp2 = %d\n\n", cmp1, cmp2);
abort();
}
fmpr_clear(x);
fmpr_clear(y);
mpfr_clear(X);
mpfr_clear(Y);
}
flint_randclear(state);
_fmpz_cleanup();
printf("PASS\n");
return EXIT_SUCCESS;
}

View file

@ -28,5 +28,5 @@
int
fmprb_contains_zero(const fmprb_t x)
{
return fmpr_cmp(fmprb_midref(x), fmprb_radref(x)) <= 0;
return fmpr_cmpabs(fmprb_midref(x), fmprb_radref(x)) <= 0;
}

View file

@ -58,31 +58,27 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
}
else
{
/* x/y - (x+a)/(y-b) = a/(b-y) + x*b/(y*(b-y)) */
/* where we assume y > b */
fmpr_t t, u, by;
fmpr_t t, u;
fmpr_init(t);
fmpr_init(u);
fmpr_init(by);
/* b - y */
fmpr_sub(by, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
/* numerator of error bound: |xb| + |ya|, rounded up */
fmpr_mul(t, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
fmpr_abs(t, t);
fmpr_mul(u, fmprb_midref(y), fmprb_radref(x), FMPRB_RAD_PREC, FMPR_RND_UP);
fmpr_abs(u, u);
fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
/* y * (b - y) */
fmpr_mul(t, fmprb_midref(y), by, FMPRB_RAD_PREC, FMPR_RND_DOWN);
/* x * b */
fmpr_mul(u, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
/* x*b / (y*(b-y)) */
fmpr_div(u, u, t, FMPRB_RAD_PREC, FMPR_RND_UP);
/* denominator of error bound: |y|(|y|-b), rounded down */
if (fmpr_sgn(fmprb_midref(y)) > 0)
fmpr_sub(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
else
fmpr_add(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
fmpr_mul(u, u, fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
fmpr_abs(u, u);
/* a / (b-y) */
fmpr_div(t, fmprb_radref(x), by, FMPRB_RAD_PREC, FMPR_RND_UP);
fmpr_abs(t, t);
fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
/* error bound */
fmpr_div(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
r = fmpr_div(fmprb_midref(z), fmprb_midref(x), fmprb_midref(y), prec, FMPR_RND_DOWN);
fmpr_add_error_result(fmprb_radref(z), t,
@ -90,7 +86,6 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
fmpr_clear(t);
fmpr_clear(u);
fmpr_clear(by);
}
fmprb_adjust(z);