Core arithmetic
-------------------------------------------------------------------------------

* Consider changing the interface of functions such as X_set_Y, X_neg_Y
  to always take a precision parameter (and get rid of X_set_round_Y,
  X_neg_Y etc.). Perhaps have X_setexact_Y methods for convenience,
  or make an exception for _set_ in particular.

* Make sure that excessive shifts in add/sub are detected
  with exact precision. Write tests for correctness of overlaps/contains
  in huge-exponent cases.

* Double-check correctness of add/sub code with large shifts (rounding x+eps).

* Work out semantics for comparisons/overlap/containment checks
  when NaNs are involved, and write test code.

* Add adjustment code for balls (when the mantissa is much more precise than
  the error bound, it can be truncated). Also, try to work out more consistent
  semantics for ball arithmetic (with regard to extra working precision, etc.)

* Do a low-level rewrite of the fmpr type.

  The mantissa should probably be changed to an unsigned, top-aligned fraction
  (i.e. the exponent will point to the top rather than the bottom, and
  the top bit of the ).

  This requires a separate sign field, increasing the struct size from
  2 to 3 words, but ought to lead to simpler code and slightly less overhead.

  The unsigned fraction can be stored directly in a ulong when it has
  most 64 bits. A zero top bit can be used to tag the field as a pointer.
  The pointer could either be to an mpz struct or directly to a limb array
  where the first two limbs encode the allocation and used size.
  There should probably be a recycling mechanism as for fmpz.

  Required work:
    memory allocation code
    conversions to/from various integer types
    rounding/normalization
    addition
    subtraction
    comparison
    multiplication
    fix any code accessing the exponent and mantissa directly as integers

  Lower priority:
    low-level division, square root (these are not as critical for
    performance -- it is ok to do them by converting to integers and back)

    direct low-level code for addmul, mul_ui etc

* Native string conversion code instead of relying on mpfr (so we can have
  big exponents, etc.).

* Add functions for sloppy arithmetic (non-exact rounding). This could be
  used to speed up some ball operations with inexact output, where we don't
  need the best possible result, just a correct error bound.

* Write functions that ignore the possibility that exponents might be
  large, and use where appropriate (e.g. polynomial and matrix multiplication
  where one bounds magnitudes in an initial pass).

* Rewrite fmprb_div (similar to fmprb_mul)


Polynomial and power series arithmetic
-------------------------------------------------------------------------------

* Verify that mullow and power series methods always truncate the inputs to
  length n.

* Handle all input of special form ax^n + b quickly in composition and powering.

* Implemented the addition and convlution methods for Taylor shifts.

* Add polynomial mulmid, and use in Newton iteration

* Tune basecase/Newton selection for exp/sin/cos series (the basecase
  algorithms are more stable, and faster for quite large n)

* Look at using the exponential to compute the complex sine/cosine series

* Improve block multiplication, e.g. by discarding blocks that don't contribute
  to the result, and scaling individual blocks.


Elementary functions
-------------------------------------------------------------------------------

* Add more transcendental functions.

* Double-check error bounds used in the fixed-point exponential code

* Faster elementary functions at low precision (especially log/arctan).
  Use Brent's algorithm (http://maths-people.anu.edu.au/~brent/pd/RNC7t4.pdf):
  atan(x) = atan(p/q) + atan((q*x-p)/(q+p*x))

* Use the complex Newton iteration for cos(pi p/q) when appropriate.
  Double check the proof of correctness of the complex Newton iteration
  and make it work when the polynomial is not exact.

* For small cos(pi p/q) and sin(pi p/q) use a lookup table of the
  1/q values and then do complex binary exponentiation.

* Investigate using Chebyshev polynomials for elefun_cos_minpoly.
  This is certainly faster when n is prime, but might be faster for all n,
  at least if implemented cleverly.


Special functions
-------------------------------------------------------------------------------

* Write a faster logarithmic rising factorial (with correct branch
  cuts) for reducing the complex log gamma function. Also implement
  the logarithmic reflection formula.

* Tune zeta algorithm selection.

* Extend Stirling series code to compute polygamma functions (i.e. starting
  the series from some derivative), and optimize for a small number of
  derivatives by using a direct recurrence instead of binary splitting.

* Fall back to the real code when evaluating gamma functions (or their
  power series) at points that happen to be real

* Implement more functions: error functions, Bessel functions,
  theta functions, etc.

Other
-------------------------------------------------------------------------------

* Document fmpz_extras