add cmpabs, fix bugs, and simplify division error bounding

2025-03-04 17:01:40 -05:00 · 2012-09-14 12:38:43 +02:00 · 2012-09-14 12:38:43 +02:00 · c2a9b8b3e8
commit c2a9b8b3e8
parent 9c986ebe9e
8 changed files with 186 additions and 37 deletions
--- a/doc/doc.html
+++ b/doc/doc.html
@ -40,7 +40,7 @@ MathJax.Hub.Config({

 <h1>Arb documentation</h1>

-<p><i>Last updated: 2012-09-13 18:07:04 CET</i></p>
+<p><i>Last updated: 2012-09-14 12:37:05 CET</i></p>

 <h2>Contents</h2>

@ -165,7 +165,7 @@ The output should be something like the following:
 <hr />
 <h2><a name="fmpr-h--floating-point-arithmetic--">fmpr.h (floating-point arithmetic)</a></h2>
 <p>    A variable of type <tt>fmpr_t</tt> holds an arbitrary-precision binary     floating-point number, i.e. a rational number of the form     $x \times 2^y$ where $x, y \in \mathbb{Z}$ and $x$ is odd;     or one of the special values zero, plus infinity, minus infinity,     or NaN (not-a-number).</p>
-<p>    The component $x$ is called the <i>mantissa</i>, and $y$ is called the     <i>exponent</i>. Note that this is just one among many possible     conventions: the mantissa (alternatively <i>significand</i>) is     sometimes viewed as a fraction in the interval $[1/2, 1)$, with the     exponent pointing to the position above the top bit rather than the     position of the bottom bit, and it is sometimes customary to separate     the <i>sign</i> from the mantissa.</p>
+<p>    The component $x$ is called the <i>mantissa</i>, and $y$ is called the     <i>exponent</i>. Note that this is just one among many possible     conventions: the mantissa (alternatively <i>significand</i>) is     sometimes viewed as a fraction in the interval $[1/2, 1)$, with the     exponent pointing to the position above the top bit rather than the     position of the bottom bit, and with a separate sign.</p>
 <p>    The conventions for special values largely follow those of the     IEEE floating-point standard. At the moment, there is no support     for negative zero, unsigned infinity, or a NaN with a payload, though     some these might be added in the future.</p>
 <p>    An <tt>fmpr</tt> number is exact and has no inherent "accuracy". We     use the term <i>precision</i> to denote either the target precision of     an operation, or the bit size of a mantissa (which in general is     unrelated to the "accuracy" of the number: for example, the     floating-point value 1 has a precision of 1 bit in this sense and is     simultaneously an infinitely accurate approximation of the     integer 1 and a 2-bit accurate approximation of     $\sqrt 2 = 1.011010100\ldots_2$).</p>
 <p>    Except where otherwise noted, the output of an operation is the     floating-point number obtained by taking the inputs as exact numbers,     in principle carrying out the operation exactly, and rounding the     resulting real number to the nearest representable floating-point     number whose mantissa has at most the specified number of bits, in     the specified direction of rounding. Some operations are always     or optionally done exactly.</p>
@ -185,7 +185,7 @@ The output should be something like the following:
 </dd>
 <dt>FMPR_RND_NEAREST</dt>
 <dd>
-<p>    Specifies that the result of an operation should be rounded to the     nearest representable number, rounding to an odd bit if there is a tie     between two values. Note: the code for this rounding mode is currently     not implemented.</p>
+<p>    Specifies that the result of an operation should be rounded to the     nearest representable number, rounding to an odd mantissa if there is a tie     between two values. Note: the code for this rounding mode is currently     not implemented.</p>
 </dd>
 <dt>FMPR_RND_DOWN</dt>
 <dd>
@ -283,6 +283,10 @@ The output should be something like the following:
 <dd>
 <p>    Returns negative, zero, or positive, depending on whether x is respectively     smaller, equal, or greater compared to y. Comparison with NaN is undefined.</p>
 </dd>
+<dt>int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)</dt>
+<dd>
+<p>    Compares the absolute values of x and y.</p>
+</dd>
 <dt>int fmpr_sgn(const fmpr_t x)</dt>
 <dd>
 <p>    Returns $-1$, $0$ or $+1$ according to the sign of x. The sign     of NaN is undefined.</p>
@ -627,7 +631,7 @@ The output should be something like the following:
 <dt>void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long prec)</dt>
 <dt>void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);</dt>
 <dd>
-<p>    Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is     set to $0 \pm +\infty$.</p>
+<p>    Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is     set to $0 \pm \infty$. Otherwise, error propagation uses the rule     $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| =         \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le         \frac{|xb|+|ya|}{|y| (|y|-b)}$$     where $-1 \le \xi_1, \xi_2 \le 1$, and     where the triangle inequality has been applied to the numerator and     the reverse triangle inequality has been applied to the denominator.</p>
 </dd>
 <dt>void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);</dt>
 <dd>
@ -658,7 +662,7 @@ The output should be something like the following:
 <dt>void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)</dt>
 <dt>void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)</dt>
 <dd>
-<p>    Sets $y = b^e$.</p>
+<p>    Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$     are small enough and the exponent is positive, the exact power can be     computed using FMPR_PREC_EXACT.</p>
 </dd>
 </dl>
 <a name="fmprb-h--real-ball-arithmetic--Special-functions"><h3>Special functions</h3></a>
@ -667,11 +671,11 @@ The output should be something like the following:
 <dt>void fmprb_log_ui(fmprb_t z, ulong x, long prec)</dt>
 <dt>void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)</dt>
 <dd>
-<p>    Sets $z = \log(x)$.</p>
+<p>    Sets $z = \log(x)$. Error propagation is done using the following rule:     assuming $m > r \ge 0$, the error is largest at $m - r$, and we have     $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is     calculated accurately for small radii via <tt>fmpr_log1p</tt>.     An input containing zero currently raises an exception.</p>
 </dd>
 <dt>void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)</dt>
 <dd>
-<p>    Sets $z = \exp(x)$.</p>
+<p>    Sets $z = \exp(x)$. Error propagation is done using the following rule:     the error is largest at $m + r$, and we have     $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$.     The last expression is calculated accurately for small radii     via <tt>fmpr_expm1</tt>.</p>
 </dd>
 <dt>void fmprb_fac_ui(fmprb_t x, ulong n, long prec)</dt>
 <dd>
--- a/doc/fmpr.txt
+++ b/doc/fmpr.txt
@ -10,8 +10,7 @@
    conventions: the mantissa (alternatively <i>significand</i>) is
    sometimes viewed as a fraction in the interval $[1/2, 1)$, with the
    exponent pointing to the position above the top bit rather than the
-    position of the bottom bit, and it is sometimes customary to separate
-    the <i>sign</i> from the mantissa.
+    position of the bottom bit, and with a separate sign.

    The conventions for special values largely follow those of the
    IEEE floating-point standard. At the moment, there is no support
@ -65,7 +64,7 @@ fmpr_rnd_t
 FMPR_RND_NEAREST

    Specifies that the result of an operation should be rounded to the
-    nearest representable number, rounding to an odd bit if there is a tie
+    nearest representable number, rounding to an odd mantissa if there is a tie
    between two values. Note: the code for this rounding mode is currently
    not implemented.

@ -208,6 +207,10 @@ int fmpr_cmp(const fmpr_t x, const fmpr_t y)
    Returns negative, zero, or positive, depending on whether x is respectively
    smaller, equal, or greater compared to y. Comparison with NaN is undefined.

+int fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
+
+    Compares the absolute values of x and y.
+
 int fmpr_sgn(const fmpr_t x)

    Returns $-1$, $0$ or $+1$ according to the sign of x. The sign
--- a/doc/fmprb.txt
+++ b/doc/fmprb.txt
@ -300,7 +300,13 @@ void fmprb_fmpz_div_fmpz(fmprb_t y, const fmpz_t num, const fmpz_t den, long pre
 void fmprb_ui_div(fmprb_t z, ulong x, const fmprb_t y, long prec);

    Sets $z = x / y$, rounded to prec bits. If $y$ contains zero, $z$ is
-    set to $0 \pm +\infty$.
+    set to $0 \pm \infty$. Otherwise, error propagation uses the rule
+    $$\left| \frac{x}{y} - \frac{x+\xi_1 a}{y+\xi_2 b} \right| =
+        \left|\frac{x \xi_2 b - y \xi_1 a}{y (y+\xi_2 b)}\right| \le
+        \frac{|xb|+|ya|}{|y| (|y|-b)}$$
+    where $-1 \le \xi_1, \xi_2 \le 1$, and
+    where the triangle inequality has been applied to the numerator and
+    the reverse triangle inequality has been applied to the denominator.

 void fmprb_div_2expm1_ui(fmprb_t y, const fmprb_t x, ulong n, long prec);

@ -345,7 +351,9 @@ void fmprb_ui_pow_ui(fmprb_t y, ulong b, ulong e, long prec)

 void fmprb_si_pow_ui(fmprb_t y, long b, ulong e, long prec)

-    Sets $y = b^e$.
+    Sets $y = b^e$ using binary exponentiation. Provided that $b$ and $e$
+    are small enough and the exponent is positive, the exact power can be
+    computed using FMPR_PREC_EXACT.


 *******************************************************************************
@ -360,11 +368,19 @@ void fmprb_log_ui(fmprb_t z, ulong x, long prec)

 void fmprb_log_fmpz(fmprb_t z, const fmpz_t x, long prec)

-    Sets $z = \log(x)$.
+    Sets $z = \log(x)$. Error propagation is done using the following rule:
+    assuming $m > r \ge 0$, the error is largest at $m - r$, and we have
+    $\log(m) - \log(m-r) = \log(1 + r/(m-r))$. The last expression is
+    calculated accurately for small radii via <tt>fmpr_log1p</tt>.
+    An input containing zero currently raises an exception.

 void fmprb_exp(fmprb_t z, const fmprb_t x, long prec)

-    Sets $z = \exp(x)$.
+    Sets $z = \exp(x)$. Error propagation is done using the following rule:
+    the error is largest at $m + r$, and we have
+    $\exp(m+r) - \exp(m) = \exp(m) (\exp(r)-1)$.
+    The last expression is calculated accurately for small radii
+    via <tt>fmpr_expm1</tt>.

 void fmprb_fac_ui(fmprb_t x, ulong n, long prec)

--- a/fmpr.h
+++ b/fmpr.h
@ -205,7 +205,7 @@ fmpr_sgn(const fmpr_t x)

 int fmpr_cmp(const fmpr_t x, const fmpr_t y);

-
+int fmpr_cmpabs(const fmpr_t x, const fmpr_t y);


 void fmpr_randtest(fmpr_t x, flint_rand_t state, long bits, long exp_bits);
--- a/fmpr/cmp.c
+++ b/fmpr/cmp.c
@ -51,7 +51,7 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)
    if (xsign != ysign)
        return (xsign < 0) ? -1 : 1;

-    /* Reduces to integer comparison of bottom exponents are the same */
+    /* Reduces to integer comparison if bottom exponents are the same */
    if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
        return fmpz_cmp(fmpr_manref(x), fmpr_manref(y)) < 0 ? -1 : 1;

@ -64,3 +64,48 @@ fmpr_cmp(const fmpr_t x, const fmpr_t y)

    return res;
 }
+
+int
+fmpr_cmpabs(const fmpr_t x, const fmpr_t y)
+{
+    int res, xsign, ysign;
+    fmpr_t t;
+
+    if (fmpr_equal(x, y))
+        return 0;
+
+    if (fmpr_is_special(x) || fmpr_is_special(y))
+    {
+        if (fmpr_is_nan(x) || fmpr_is_nan(y))
+            return 0;
+        if (fmpr_is_zero(x)) return -1;
+        if (fmpr_is_zero(y)) return 1;
+        if (fmpr_is_inf(x)) return fmpr_is_inf(y) ? 0 : 1;
+        if (fmpr_is_inf(y)) return -1;
+        return -1;
+    }
+
+    /* Reduces to integer comparison if bottom exponents are the same */
+    if (fmpz_equal(fmpr_expref(x), fmpr_expref(y)))
+    {
+        res = fmpz_cmpabs(fmpr_manref(x), fmpr_manref(y));
+        if (res != 0)
+            res = (res < 0) ? -1 : 1;
+    }
+    else
+    {
+        /* TODO: compare position of top exponents to avoid subtraction */
+        xsign = fmpr_sgn(x);
+        ysign = fmpr_sgn(y);
+
+        fmpr_init(t);
+        if (xsign == ysign)
+            fmpr_sub(t, x, y, 2, FMPR_RND_DOWN);
+        else
+            fmpr_add(t, x, y, 2, FMPR_RND_DOWN);
+        res = fmpr_sgn(t) * xsign;
+        fmpr_clear(t);
+    }
+
+    return res;
+}
--- a/fmpr/test/t-cmpabs.c
+++ b/fmpr/test/t-cmpabs.c
@ -0,0 +1,86 @@
+/*=============================================================================
+
+    This file is part of ARB.
+
+    ARB is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    ARB is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with ARB; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+
+=============================================================================*/
+/******************************************************************************
+
+    Copyright (C) 2012 Fredrik Johansson
+
+******************************************************************************/
+
+#include "fmpr.h"
+
+
+int main()
+{
+    long iter;
+    flint_rand_t state;
+
+    printf("cmpabs....");
+    fflush(stdout);
+
+    flint_randinit(state);
+
+    for (iter = 0; iter < 100000; iter++)
+    {
+        long bits;
+        fmpr_t x, y;
+        mpfr_t X, Y;
+        int cmp1, cmp2;
+
+        bits = 2 + n_randint(state, 200);
+
+        fmpr_init(x);
+        fmpr_init(y);
+
+        mpfr_init2(X, bits);
+        mpfr_init2(Y, bits);
+
+        fmpr_randtest_special(x, state, bits, 10);
+        fmpr_randtest_special(y, state, bits, 10);
+
+        fmpr_get_mpfr(X, x, MPFR_RNDN);
+        fmpr_get_mpfr(Y, y, MPFR_RNDN);
+
+        mpfr_abs(X, X, MPFR_RNDN);
+        mpfr_abs(Y, Y, MPFR_RNDN);
+
+        cmp1 = fmpr_cmpabs(x, y);
+        cmp2 = mpfr_cmp(X, Y);
+
+        if (cmp1 != cmp2)
+        {
+            printf("FAIL\n\n");
+            printf("x = "); fmpr_print(x); printf("\n\n");
+            printf("y = "); fmpr_print(y); printf("\n\n");
+            printf("cmp1 = %d, cmp2 = %d\n\n", cmp1, cmp2);
+            abort();
+        }
+
+        fmpr_clear(x);
+        fmpr_clear(y);
+
+        mpfr_clear(X);
+        mpfr_clear(Y);
+    }
+
+    flint_randclear(state);
+    _fmpz_cleanup();
+    printf("PASS\n");
+    return EXIT_SUCCESS;
+}
--- a/fmprb/contains_zero.c
+++ b/fmprb/contains_zero.c
@ -28,5 +28,5 @@
 int
 fmprb_contains_zero(const fmprb_t x)
 {
-    return fmpr_cmp(fmprb_midref(x), fmprb_radref(x)) <= 0;
+    return fmpr_cmpabs(fmprb_midref(x), fmprb_radref(x)) <= 0;
 }
--- a/fmprb/div.c
+++ b/fmprb/div.c
@ -58,31 +58,27 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)
    }
    else
    {
-        /* x/y - (x+a)/(y-b)  =  a/(b-y) + x*b/(y*(b-y)) */
-        /* where we assume y > b */
-
-        fmpr_t t, u, by;
-
+        fmpr_t t, u;
        fmpr_init(t);
        fmpr_init(u);
-        fmpr_init(by);

-        /* b - y */
-        fmpr_sub(by, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
+        /* numerator of error bound: |xb| + |ya|, rounded up */
+        fmpr_mul(t, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
+        fmpr_abs(t, t);
+        fmpr_mul(u, fmprb_midref(y), fmprb_radref(x), FMPRB_RAD_PREC, FMPR_RND_UP);
+        fmpr_abs(u, u);
+        fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);

-        /* y * (b - y) */
-        fmpr_mul(t, fmprb_midref(y), by, FMPRB_RAD_PREC, FMPR_RND_DOWN);
-        /* x * b */
-        fmpr_mul(u, fmprb_midref(x), fmprb_radref(y), FMPRB_RAD_PREC, FMPR_RND_UP);
-        /* x*b / (y*(b-y)) */
-        fmpr_div(u, u, t, FMPRB_RAD_PREC, FMPR_RND_UP);
+        /* denominator of error bound: |y|(|y|-b), rounded down */
+        if (fmpr_sgn(fmprb_midref(y)) > 0)
+            fmpr_sub(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
+        else
+            fmpr_add(u, fmprb_radref(y), fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
+        fmpr_mul(u, u, fmprb_midref(y), FMPRB_RAD_PREC, FMPR_RND_DOWN);
        fmpr_abs(u, u);

-        /* a / (b-y) */
-        fmpr_div(t, fmprb_radref(x), by, FMPRB_RAD_PREC, FMPR_RND_UP);
-        fmpr_abs(t, t);
-
-        fmpr_add(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);
+        /* error bound */
+        fmpr_div(t, t, u, FMPRB_RAD_PREC, FMPR_RND_UP);

        r = fmpr_div(fmprb_midref(z), fmprb_midref(x), fmprb_midref(y), prec, FMPR_RND_DOWN);
        fmpr_add_error_result(fmprb_radref(z), t,
@ -90,7 +86,6 @@ fmprb_div(fmprb_t z, const fmprb_t x, const fmprb_t y, long prec)

        fmpr_clear(t);
        fmpr_clear(u);
-        fmpr_clear(by);
    }

    fmprb_adjust(z);