Core arithmetic ------------------------------------------------------------------------------- * Consider changing the interface of functions such as X_set_Y, X_neg_Y to always take a precision parameter (and get rid of X_set_round_Y, X_neg_Y etc.). Perhaps have X_setexact_Y methods for convenience, or make an exception for _set_ in particular. * Make sure that excessive shifts in add/sub are detected with exact precision. Write tests for correctness of overlaps/contains in huge-exponent cases. * Double-check correctness of add/sub code with large shifts (rounding x+eps). * Work out semantics for comparisons/overlap/containment checks when NaNs are involved, and write test code. * Add adjustment code for balls (when the mantissa is much more precise than the error bound, it can be truncated). Also, try to work out more consistent semantics for ball arithmetic (with regard to extra working precision, etc.) * Do a low-level rewrite of the fmpr type. The mantissa should probably be changed to an unsigned, top-aligned fraction (i.e. the exponent will point to the top rather than the bottom, and the top bit of the ). This requires a separate sign field, increasing the struct size from 2 to 3 words, but ought to lead to simpler code and slightly less overhead. The unsigned fraction can be stored directly in a ulong when it has most 64 bits. A zero top bit can be used to tag the field as a pointer. The pointer could either be to an mpz struct or directly to a limb array where the first two limbs encode the allocation and used size. There should probably be a recycling mechanism as for fmpz. Required work: memory allocation code conversions to/from various integer types rounding/normalization addition subtraction comparison multiplication fix any code accessing the exponent and mantissa directly as integers Lower priority: low-level division, square root (these are not as critical for performance -- it is ok to do them by converting to integers and back) direct low-level code for addmul, mul_ui etc * Native string conversion code instead of relying on mpfr (so we can have big exponents, etc.). * Add functions for sloppy arithmetic (non-exact rounding). This could be used to speed up some ball operations with inexact output, where we don't need the best possible result, just a correct error bound. * Write functions that ignore the possibility that exponents might be large, and use where appropriate (e.g. polynomial and matrix multiplication where one bounds magnitudes in an initial pass). * Rewrite fmprb_div (similar to fmprb_mul) Polynomial and power series arithmetic ------------------------------------------------------------------------------- * Verify that mullow and power series methods always truncate the inputs to length n. * Handle all input of special form ax^n + b quickly in composition and powering. * Implemented the addition and convlution methods for Taylor shifts. * Add polynomial mulmid, and use in Newton iteration * Tune basecase/Newton selection for exp/sin/cos series (the basecase algorithms are more stable, and faster for quite large n) * Look at using the exponential to compute the complex sine/cosine series * Improve block multiplication, e.g. by discarding blocks that don't contribute to the result, and scaling individual blocks. Elementary functions ------------------------------------------------------------------------------- * Add more transcendental functions. * Double-check error bounds used in the fixed-point exponential code * Faster elementary functions at low precision (especially log/arctan). Use Brent's algorithm (http://maths-people.anu.edu.au/~brent/pd/RNC7t4.pdf): atan(x) = atan(p/q) + atan((q*x-p)/(q+p*x)) * Use the complex Newton iteration for cos(pi p/q) when appropriate. Double check the proof of correctness of the complex Newton iteration and make it work when the polynomial is not exact. * For small cos(pi p/q) and sin(pi p/q) use a lookup table of the 1/q values and then do complex binary exponentiation. * Investigate using Chebyshev polynomials for elefun_cos_minpoly. This is certainly faster when n is prime, but might be faster for all n, at least if implemented cleverly. Special functions ------------------------------------------------------------------------------- * Write a faster logarithmic rising factorial (with correct branch cuts) for reducing the complex log gamma function. Also implement the logarithmic reflection formula. * Tune zeta algorithm selection. * Extend Stirling series code to compute polygamma functions (i.e. starting the series from some derivative), and optimize for a small number of derivatives by using a direct recurrence instead of binary splitting. * Fall back to the real code when evaluating gamma functions (or their power series) at points that happen to be real * Implement more functions: error functions, Bessel functions, theta functions, etc. Other ------------------------------------------------------------------------------- * Document fmpz_extras