exp2() Base-2 Exponential Functionexp2n() Functionexp10() Base-10 Exponential Functionexp() Base-e Exponential Functionexpm1() Base-e Exponential Functionlog() Base-e alias Natural Logarithm Functionlog1p() Base-e alias Natural Logarithm Functionlog10() Base-10 alias Common Logarithm Functionlog2() Base-2 alias Binary Logarithm Functionlogb() Functionilogb() Functioncos() (Circular) Cosine Functioncot() (Circular) Cotangent Functionsin() (Circular) Sine Functiontan() (Circular) Tangent Functionacos() Arc Cosine Functionacot() Arc Cotangent Functionacot2() Arc Cotangent Functionasin() Arc Sine Functionatan() Arc Tangent Functionatan2() Arc Tangent Functioncosh() Hyperbolic Cosine Functioncoth() Hyperbolic Cotangent Functionsinh() Hyperbolic Sine Functiontanh() Hyperbolic Tangent Functionacosh() Area Hyperbolic Cosine Functionacoth() Area Hyperbolic Cotangent Functionasinh() Area Hyperbolic Sine Functionatanh() Area Hyperbolic Tangent Functionfmax() Functionfmin() Functionhypot() Functionpow() Functioncbrt() Functionceil() Functionfabs() Functionfdim() Functionfloor() Functionfma() Functionfmod() Functionfpclassify() Functionfrexp() Functionisfinite() Functionisinf() Functionisnan() Functionisnormal() Functionissubnormal() Functionldexp() Functionldexp10() Functionremainder() Functionremquo() Functionrint() Functionround() Functionroundeven() Functionsignbit() Functionsqrt() Functiontrunc() Functionceil() Functioncopysign() Functionfloor() Functionfrexp() Functionldexp() Functionmodf() Functionnextafter() Functionrint() Functionround() Functiontrunc() Functionelementarymathematical plus other functions defined by the ANSI C, ISO C and POSIX standards, using IEEE 754 floating-point arithmetic.
Positive zero (+0) is represented with sign = 0, exponent = 0 and fraction = 0; negative zero (−0) is represented with sign = 1, exponent = 0 and fraction = 0.
quietNaN is represented with either sign, exponent = 2047 and fraction > 251−1, i.e. the most significant bit of fraction set;
signalingNaN is represented with either sign, exponent = 2047 and fraction < 251, i.e. the most significant bit of fraction clear.
The fraction of a non-zero finite floating-point number is a rational number from the set {½, ¼, ¾, …, 1/252, …, (252−1)/252}.
 The
            significand = integer.fraction
            of a non-zero finite floating-point number is in the interval
            [2−52, 2−2−52],
            decimal
            [0.0000000000000002220446049250313080847263336181640625, 1.9999999999999997779553950749686919152736663818359375];
            a normalized significand is in the interval
            [1, 2−2−52].
        
Representable (non-zero finite) floating-point numbers, also called
            machine numbers
, are in the non-contiguous
            intervals
            [2−1074, (2−2−52) × 21023]
            and
            [−2−1074, −(2−2−52) × 21023],
            the full (normalized) 53-bit significand is available on
            the intervals
            [2−1022, (2−2−52) × 21023]
            and
            [−2−1022, −(2−2−52) × 21023].
        
 The largest representable floating-point number,
            (2−2−52) × 21023 = 0.179769… × 10309,
            has 309 (integral) decimal digits:
            179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
        
The smallest representable floating-point number,
            2−1074 = 0.494065… × 10−323,
            has 1074 fractional decimal digits,
            323 zeroes followed by 751 more digits:
            0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625
        
The largest representable subnormal floating-point number,
            (2−2−52) × 2−1023 = 0.222507… × 10−307,
            has 1074 fractional decimal digits, 307 zeroes
            followed by 767 more digits:
            0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000022250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375
        
Note: the decimal representation of
            2−n has
            n fractional digits!
        
To maintain the working precision and gain correctly
            rounded results, calculations are performed with 3 extra bits beyond
            the least significant bit of the fraction: a
            guard bit, a round bit and a sticky
            bit (alias inexact
 flag).
        
 Basic arithmetic operations, i.e. addition, subtraction,
            multiplication, division, fused multiply-accumulate, plus square
            root on (representable) floating-point numbers, including the
            special values +∞, −∞ and
            NaN, are performed
            as if their mathematical exact (infinitely precise) result is
            calculated, then mapped or rounded to a representable floating-point
            number.
        
Arithmetic underflow yields +0 or −0, arithmetic overflow
            yields +∞ or −∞, operations on
            NaNs as well as
            mathematically undefined operations yield
            NaN, with the notable
            exception that division of a non-zero finite floating-point number
            by ±0 yields ±∞, and non-zero finite results
            are rounded according to the selected rounding mode:
        
tie-break) towards zero for even ⌊precise × 252⌋ and away from zero for odd ⌊precise × 252⌋ (i.e. even ⌈precise × 252⌉);
round to nearest, ties to even!
For arithmetic operations on special values the following identities are defined:
The maximum (relative) error of a faithfully rounded result is less than 1 ULP; the maximum (relative) error of a correctly rounded result is less than ½ ULP.
Note: in all rounding modes, a faithfully rounded result is either equal to the correctly rounded result or 1 ULP off of the correctly rounded result.
nextafter()
            to test whether the following mathematical identities hold for
            various elementary functions and the (correctly rounded) values of
            some of the constants
            M_*
            defined by the
            ANSI C,
            ISO C
            and
            POSIX
            standards:
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
// * The software is provided "as is" without any warranty, neither
//   express nor implied.
// * In no event will the author be held liable for any damage(s) arising
//   from the use of the software.
// * Redistribution of the software is allowed only in unmodified form.
// * Permission is granted to use the software solely for personal private
//   and non-commercial purposes.
// * An individuals use of the software in his or her capacity or function
//   as an agent, (independent) contractor, employee, member or officer of
//   a business, corporation or organization (commercial or non-commercial)
//   does not qualify as personal private and non-commercial purpose.
// * Without written approval from the author the software must not be used
//   for a business, for commercial, corporate, governmental, military or
//   organizational purposes of any kind, or in a commercial, corporate,
//   governmental, military or organizational environment of any kind.
#include <math.h>
#include <stdio.h>
void evaluate(double (*function)(double), double input, double reference)
{
    double output = (function)(input);
    if (output == reference)
        printf("%.17g is correctly rounded\n", output);
    else if (nextafter(output, reference) == reference)
        printf("%.17g is faithfully rounded\n", output);
    else
        printf("%.17g is …\n", output);
}
int main(void)
{
    double last, next = 0.0;
    printf("sqrt(-0): ");
    evaluate(sqrt, -0.0, -0.0);
#ifdef INFINITY
    printf("log(0): ");
    evaluate(log, 0.0, -INFINITY);
#endif
#ifdef M_PI
    printf("acos(0): ");
    evaluate(acos, 0.0, M_PI / 2.0);
    printf("acos(0.5): ");
    evaluate(acos, 0.5, M_PI / 3.0);
    printf("asin(0.5): ");
    evaluate(asin, 0.5, M_PI / 6.0);
#endif
#ifdef M_SQRT1_2
    printf("sqrt(0.5): ");
    evaluate(sqrt, 0.5, M_SQRT1_2);
#endif
#ifdef M_PI_2
    printf("asin(1): ");
    evaluate(asin, 1.0, M_PI_2);
#endif
#ifdef M_PI_4
    printf("atan(1): ");
    evaluate(atan, 1.0, M_PI_4);
#endif
#ifdef M_E
    printf("exp(1): ");
    evaluate(exp, 1.0, M_E);
#endif
    printf("log(1): ");
    evaluate(log, 1.0, 0.0);
    printf("log2(1): ");
    evaluate(log2, 1.0, 0.0);
    printf("log10(1): ");
    evaluate(log10, 1.0, 0.0);
#ifdef M_SQRT2
    printf("sqrt(2): ");
    evaluate(sqrt, 2.0, M_SQRT2);
#endif
    printf("log2(2): ");
    evaluate(log2, 2.0, 1.0);
#ifdef M_LN2
    printf("log(2): ");
    evaluate(log, 2.0, M_LN2);
    printf("exp(%.17g): ", M_LN2);
    evaluate(exp, M_LN2, 2.0);
#endif
    printf("log10(10): ");
    evaluate(log10, 10.0, 1.0);
#ifdef M_LN10
    printf("log(10): ");
    evaluate(log, 10.0, M_LN10);
    printf("exp(%.17g): ", M_LN10);
    evaluate(exp, M_LN10, 10.0);
#endif
#ifdef M_LOG2E
    printf("exp2(%.17g): ", M_LOG2E);
    evaluate(exp2, M_LOG2E, M_E);
#endif
#ifdef M_LOG10E
    printf("exp10(%.17g): ", M_LOG10E);
    evaluate(exp10, M_LOG10E, M_E);
#endif
#ifdef M_E
    printf("log(%.17g): ", M_E);
// log(2.7182818284590452) = 1.0
    evaluate(log, M_E, 1.0);
#ifdef M_LOG2E
    printf("log2(%.17g): ", M_E);
// log2(2.7182818284590452) = 1.0 / log(2.0)
    evaluate(log2, M_E, M_LOG2E);
#endif
#ifdef M_LOG10E
    printf("log10(%.17g): ", M_E);
// log10(2.7182818284590452) = 1.0 / log(10.0)
    evaluate(log10, M_E, M_LOG10E);
#endif
#endif
#ifdef M_PI_4
#ifdef M_SQRT1_2
    printf("cos(%.17g): ", M_PI_4);
    evaluate(cos, M_PI_4, M_SQRT1_2);
    printf("sin(%.17g): ", M_PI_4);
    evaluate(sin, M_PI_4, M_SQRT1_2);
#endif
    printf("tan(%.17g): ", M_PI_4);
    evaluate(tan, M_PI_4, 1.0);
#endif
#ifdef M_PI_2
    printf("cos(%.17g): ", M_PI_2);
// cos(1.5707963267948966) = 6.123233995736766e-17
    evaluate(cos, M_PI_2, 0.0);
    printf("sin(%.17g): ", M_PI_2);
    evaluate(sin, M_PI_2, 1.0);
#ifdef INFINITY
    printf("tan(%.17g): ", M_PI_2);
// tan(1.5707963267948966) = 1.633123935319537e16
    evaluate(tan, M_PI_2, INFINITY);
#endif
#endif
#ifdef M_PI
    printf("cos(%.17g): ", M_PI);
    evaluate(cos, M_PI, -1.0);
    printf("sin(%.17g): ", M_PI);
// sin(3.1415926535897932) = 1.2246467991473532e-16
    evaluate(sin, M_PI, 0.0);
    printf("tan(%.17g): ", M_PI);
    evaluate(tan, M_PI, 0.0);
#endif
    do next = cos(last = next);
    while (next != last);
    printf("cos(%.17g): ", last);
// cos(0.73908513321516064) = 0.73908513321516064
    evaluate(cos, last, 0.73908513321516064);
    printf("acos(%.17g): ", last);
// acos(0.73908513321516064) = 0.73908513321516064
    evaluate(acos, last, 0.73908513321516064);
}
            A003957 - OEIS
        cc -lm evaluate.c ./a.out
sqrt(-0): -0 is correctly rounded log(0): -inf is correctly rounded acos(0): 1.5707963267948966 is correctly rounded acos(0.5): 1.0471975511965979 is faithfully rounded asin(0.5): 0.52359877559829893 is faithfully rounded sqrt(0.5): 0.70710678118654757 is correctly rounded asin(1): 1.5707963267948966 is correctly rounded atan(1): 0.78539816339744828 is correctly rounded exp(1): 2.7182818284590451 is correctly rounded log(1): 0 is correctly rounded log2(1): 0 is correctly rounded log10(1): 0 is correctly rounded sqrt(2): 1.4142135623730951 is correctly rounded log2(2): 1 is correctly rounded log(2): 0.69314718055994529 is correctly rounded exp(0.69314718055994529): 2 is correctly rounded log10(10): 1 is correctly rounded log(10): 2.3025850929940459 is correctly rounded exp(2.3025850929940459): 10.000000000000002 is faithfully rounded exp2(1.4426950408889634): 2.7182818284590451 is correctly rounded exp10(0.43429448190325182): 2.7182818284590451 is correctly rounded log(2.7182818284590451): 1 is correctly rounded log2(2.7182818284590451): 1.4426950408889634 is correctly rounded log10(2.7182818284590451): 0.43429448190325182 is correctly rounded cos(0.78539816339744828): 0.70710678118654757 is correctly rounded sin(0.78539816339744828): 0.70710678118654746 is faithfully rounded tan(0.78539816339744828): 0.99999999999999989 is faithfully rounded cos(1.5707963267948966): 6.123233995736766e-17 is … sin(1.5707963267948966): 1 is correctly rounded tan(1.5707963267948966): 16331239353195370 is … cos(3.1415926535897931): -1 is correctly rounded sin(3.1415926535897931): 1.2246467991473532e-16 is … tan(3.1415926535897931): -1.2246467991473532e-16 is … cos(0.73908513321516067): 0.73908513321516067 is correctly rounded acos(0.73908513321516067): 0.73908513321516056 is faithfully roundedNote: the (correctly rounded) value of the constant
M_PI = 0x1.921FB54442D18p+1 = 3.1415926535897932
            alias machine πis about 0x1.1A62633145C07p−53 = 1.2246467991473532e−16 greater than the
exactvalue of π, and the (correctly rounded) value of the constant
M_PI_2 = 0x1.921FB54442D18p−1 = 1.5707963267948966
            is about
            0x1.1A62633145C07p−54 = 6.123233995736766e−17
            greater than the exactvalue of π/2.
 Shown by
            William Kahan
            (nearpi.c),
            the double-precision floating-point number that is closest to an
            integral multiple of π/2 is the (integral) number
            6381956970095103 × 2797 = 0x1.6AC5B262CA1FFp+849 = 5.319372648326541416707296656673541083813475…e+255,
            which is about 4.68716592425462761112…e−19 less than
            the exact
 integral multiple of π/2; the maximum value of
            the double-precision tangent is therefore about
            2.13348538575370384368…e+18.
        
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define FLT_RADIX        2
#define FLT_ROUNDS       1 // round to nearest, ties to even
#define FP_ILOGB0        -2147483648
#define FP_ILOGBNAN      1024
#define FP_ZERO          0
#define FP_SUBNORMAL     1
#define FP_NORMAL        2
#define FP_INFINITE      3
#define FP_NAN           4
#ifndef INFINITY
#define INFINITY         (1.0 / 0.5e-323)
#endif
#define INDEFINITE       (0.0 * INFINITY)
#define MATH_ERREXCEPT   1
#define MATH_ERRNO       0
#define math_errhandling (MATH_ERREXCEPT | MATH_ERRNO)
double acos(double argument);
double acosh(double argument);
double acot(double argument);
double acot2(double y, double x);
double acoth(double argument);
double asin(double argument);
double asinh(double argument);
double atan(double argument);
double atan2(double y, double x);
double atanh(double argument);
double ceil(double argument);
double copysign(double to, double from);
double cos(double radians);
double exp(double argument);
double expm1(double argument);
double exp10(double argument);
double exp2(double argument);
double exp2n(int exponent);
double fabs(double argument);
double fdim(double left, double right);
double floor(double argument);
double fma(double multiplicand, double multiplier, double addend);
double fmax(double left, double right);
double fmin(double left, double right);
double fmod(double dividend, double divisor);
int fpclassify(double argument);
double frexp(double argument, int *exponent);
double hypot(double left, double right);
int ilogb(double argument);
int isfinite(double argument);
int isinf(double argument);
int isnan(double argument);
int isnormal(double argument);
int issubnormal(double argument);
double ldexp(double argument, int exponent);
double ldexp10(double argument, int exponent);
double log(double argument);
double log1p(double argument);
double log10(double argument);
double log2(double argument);
double logb(double argument);
double modf(double argument, double *integer);
double nextafter(double from, double to);
double remainder(double dividend, double divisor);
double remquo(double dividend, double divisor, int *quotient);
double rint(double argument);
double round(double argument);
int signbit(double argument);
double sin(double radians);
double sqrt(double radicand);
double tan(double radians);
double trunc(double argument);
            Note: indicated by the value 1 of the preprocessor
            macro FLT_ROUNDS, the functions presented here require
            the default rounding mode round to nearest, ties to even!
 Note: indicated by the value 0 of the preprocessor
            macro MATH_ERRNO, the functions presented here
            don’t set the (global) errno variable!
        
antilog, exhibits the identities ra+b = ra × rb, rlogrc = c and rd × logrc = cd.
The exponential function can be approximated by a (minimax) polynomial on any sufficiently small interval with high accuracy, for example faithfully rounded, as shown hereafter.
exp2() Base-2 Exponential Functionexp2()
            returns the base-2 exponential of its argument.
        For −1075 < x = y + z < 1024, with z = ⌊x⌋, i.e. x rounded down towards −∞, hence 0 ≤ y ≤ 1, calculation of 2x = 2y+z = 2y × 2z is reduced to the (polynomial) approximation of 2y on the interval [0, 1], followed by the (trivial) multiplication with 2z.
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double floor(double x);
double ldexp(double x, int z);
// Faithfully rounded base-2 exponential
double exp2(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_SQRT2    1.41421356237309505
#define M_1_SQRT2  0.70710678118654752
    if (x != x)
        return INDEFINITE;
    if (x <= -1075.0)
        return 0.0;
    if (x == -1.0)
        return 0.5;
    if (x == -0.5)
        return M_1_SQRT2;
    if (x == 0.0)
        return 1.0;
    if (x == 0.5)
        return M_SQRT2;
    if (x == 1.0)
        return 2.0;
    if (x >= 1024.0)
        return INFINITY;
#endif
    // for z = floor(x) and x' = x - z, 2**x = 2**(x' + z)
    //                                       = 2**x' * 2**z
    z = floor(x);
    x -= z;
    // for 0 <= x' <= 1.0,
    // a minimax polynomial of degree 11 approximates 2**x'
    // with relative error 3.0545878321297965e-18 < 2**-58
    return ldexp(((((((((((+6.2724342467963420e-10 * x
                           +6.5544572890888113e-9) * x
                           +1.0254457347176946e-7) * x
                           +1.3208193500307799e-6) * x
                           +1.5253190248422251e-5) * x
                           +1.5403511446514356e-4) * x
                           +1.3333558661574856e-3) * x
                           +9.6181290987926433e-3) * x
                           +5.5504108665711137e-2) * x
                           +2.4022650695905471e-1) * x
                           +6.9314718055994623e-1) * x
                           +1.0, (int) z);
}
            Note: overflow and underflow are handled by the
            ldexp()
            alias
            scalbn()
            function!
        For −1075 < x = y + z < 1024, with z = ⌊x+½⌋ for x > 0 and z = ⌈x−½⌉ for x < 0, i.e. x rounded to the nearest (even) integral number, hence −½ ≤ y ≤ ½, calculation of 2x = 2y+z = 2y × 2z is reduced to the (polynomial) approximation of 2y on the interval [−½, ½], followed by the (trivial) multiplication with 2z.
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# Faithfully rounded base-2 exponential
# CAVEAT: requires default (round to nearest, ties to even) rounding mode!
# exp2(-INFINITY) = 0
# exp2(0)         = 1
# exp2(1)         = 2
# exp2(INFINITY)  = INFINITY
# exp2(x)         = 2**x
#                 = 2**(x - z) * 2**z, -1075 < z = rint(x) < 1024
# exp2(-x)        = 1 / exp2(x)
#                 = 1 / 2**x
#                 = (1 / 2)**x
# IEEE 754 double-precision binary floating-point format:
# - 1-bit sign,
# - 12-bit characteristic is 1023 + exponent,
# - 53-bit significand is 0.fraction if 0 = characteristic,
#                         1.fraction if 0 < characteristic < 2047,
#                         1.anything if     characteristic = 2047,
# - integer bit of significand is implied and not stored
#
# binary64 = (-1)**sign * significand * 2**(characteristic - 1023)
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp2:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
.ifdef SSE4_1
	roundsd	xmm1, xmm0, 0		# xmm1 = argument rounded to nearest (even) integer
	cvtsd2si eax, xmm1		# eax = lrint(argument)
.else
	cvtsd2si eax, xmm0		# eax = lrint(argument)
.endif
#	neg	eax
#	jo	.Lrange			# argument > maximum 32-bit integer?
#					# argument < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument < -1074.0?
					# argument < minimum 32-bit integer?
					# argument > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument > 1023.0?
	cvtsi2sd xmm1, eax		# xmm1 = rint(argument)
					#      = log2(scale factor)
	subsd	xmm0, xmm1		# xmm0 = argument - rint(argument)
					#      = argument' in [-0.5, 0.5]
.Lhorner:
	mov	rcx, 0x3DFE7AA0E43A8B3C
	movq	xmm1, rcx		# xmm1 = 0x1.E7AA0E43A8B3Cp-32
					#      = 4.435280790456428e-10
	mulsd	xmm1, xmm0
	mov	rdx, 0x3E3E620FB7BAEC69
	movq	xmm2, rdx		# xmm2 = 0x1.E620FB7BAEC69p-28
					#      = 7.074105630863329e-9
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3E7B526788BF2851
	movq	xmm1, rcx		# xmm1 = 0x1.B526788BF2851p-24
					#      = 1.0178198034320939e-7
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3EB62BFC3C1C57DD
	movq	xmm2, rdx		# xmm2 = 0x1.62BFC3C1C57DDp-20
					#      = 1.3215433089567188e-6
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3EEFFCBFBA7B8470
	movq	xmm1, rcx		# xmm1 = 0x1.FFCBFBA7B847p-17
					#      = 1.5252733489958518e-5
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F243091310BF6C4
	movq	xmm2, rdx		# xmm2 = 0x1.43091310BF6C4p-13
					#      = 1.5403530462514668e-4
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3F55D87FE78CF26E
	movq	xmm1, rcx		# xmm1 = 0x1.5D87FE78CF26Ep-10
					#      = 1.3333558146789953e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F83B2AB6FB9F413
	movq	xmm2, rdx		# xmm2 = 0x1.3B2AB6FB9F413p-7
					#      = 9.618129107588335e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FAC6B08D7049FD0
	movq	xmm1, rcx		# xmm1 = 0x1.C6B08D7049FDp-5
					#      = 5.5504108664819921e-2
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FCEBFBDFF82C5AD
	movq	xmm2, rdx		# xmm2 = 0x1.EBFBDFF82C5Adp-3
					#      = 2.4022650695910156e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FE62E42FEFA39EF
	movq	xmm1, rcx		# xmm1 = 0x1.62E42FEFA39EFp-1
					#      = 6.9314718055994533e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm0, rdx		# xmm0 = 0x1.0p+0
					#      = 1.0
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument')
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument')
					#      * scale factor
					#      = exp2(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp2(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp2(<=-1074.0)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp2(>=1024.0)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp2(±0.0)
.Lexit:
	ret
.size	exp2, .-exp2
.type	exp2, @function
.global	exp2
.end
            Note: the trivial transformation of the assembler
            sources with directives for Unix’ or
            GNU’s as
            into assembler sources for Microsoft’s
            ML.EXE or
            ML64.EXE
            and vice versa is left as an exercise to the reader.
            Microsoft Macro Assembler Reference
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720713.aspx
; exp2(x) = 2**x
	.686
	.model	flat, C
	.code
exp2	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = exponent
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent
	fld	st(1)			; st(0) = exponent,
					; st(1) = 1.0,
					; st(2) = exponent
	fprem				; st(0) = exponent modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent
	f2xm1				; st(0) = 2.0**(exponent modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent
	faddp	st(1), st(0)		; st(0) = 2.0**(exponent modulo 1.0),
					; st(1) = exponent
	fscale				; st(0) = 2.0**exponent,
					; st(1) = exponent
else
	fld	st(0)			; st(0) = st(1) = exponent
	frndint				; st(0) = integer(exponent),
					; st(1) = exponent
	fsub	st(1), st(0)		; st(0) = integer(exponent),
					; st(1) = fraction(exponent)
	fxch	st(1)			; st(0) = fraction(exponent),
					; st(1) = integer(exponent)
	f2xm1				; st(0) = 2.0**fraction(exponent) - 1.0,
					; st(1) = integer(exponent)
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent) - 1.0,
					; st(2) = integer(exponent)
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent),
					; st(1) = integer(exponent)
	fscale				; st(0) = 2.0**exponent,
					; st(1) = integer(exponent)
endif
	fstp	st(1)			; st(0) = 2.0**exponent
	ret
exp2	endp
	end
        exp2n() Functionexp2n(‹integer›)
            is equivalent to ldexp(1.0, ‹integer›).
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY (1.0 / 0.5e-323)
double exp2n(int exponent)
{
    unsigned long long ull;
    if (exponent > 1023)
        return INFINITY;
    if (exponent < -1074)
        return 0.0;
    if (exponent < -1022) {
        ull = 1;
        ull <<= 1074 + exponent;
    } else {
        ull = 1023 + exponent;
        ull <<= 52;
    }
    return *(double *) &ull;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# Unix System V calling convention for AMD64 platform:
# - first 6 floating-point arguments (from left to right) are passed in
#   registers XMM0 to XMM5;
# - first 6 integer or pointer arguments (from left to right) are passed
#   in registers RDI/R7, RSI/R6, RDX/R2, RCX/R1, R8 and R9
#   (R10 is used as static chain pointer in case of nested functions);
# - surplus arguments are pushed on stack in reverse order (from right to
#   left), 8-byte aligned;
# - 128-bit integer arguments are passed as pair of 64-bit integer arguments,
#   low part before/below high part;
# - 128-bit integer result is returned in registers RAX/R0 (low part) and
#   RDX/R2 (high part);
# - 64-bit integer or pointer result is returned in register RAX/R0;
# - 32-bit integer result is returned in register EAX;
# - floating-point result is returned in register XMM0;
# - registers RBX/R3, RSP/R4, RBP/R5 and R12 to R15 must be preserved;
# - registers RAX/R0, RCX/R1, RDX/R2, RSI/R6, RDI/R7, R8, R9, R10 (in
#   case of normal functions), R11 and XMM0 to XMM15 are volatile and can
#   be clobbered;
# - stack is 16-byte aligned: callee must decrement RSP by 8+n*16 bytes
#   before calling other functions (CALL instruction pushes 8 bytes);
# - a "red zone" of 128 bytes below the stack pointer can be clobbered.
# exp2n(<-1074) = 0
# exp2n(0)      = 1
# exp2n(>1023)  = INFINITY
# exp2n(n)      = 2**n
# exp2n(-n)     = 1 / exp2n(n)
#               = 1 / 2**n
#               = (1 / 2)**n
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# edi = exponent
exp2n:
	mov	eax, edi		# eax = exponent
	cmp	eax, BIAS
	jg	.Loverflow		# exponent > 1023?
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# exponent < -1074?
	add	eax, BIAS		# eax = biased exponent
	jg	.Lnormal		# biased exponent > 0?
.Ldenormal:
	add	eax, 51			# eax = index of '1' bit in mantissa
	xor	edi, edi
	bts	rdi, rax		# rdi = denormal 2.0**exponent
	movq	xmm0, rdi		# xmm0 = denormal 2.0**exponent
	ret
.Loverflow:
	mov	eax, 1 + 2 * BIAS
					# rax = biased exponent
					#     = 2047
.Lnormal:
	shl	rax, 52
	movq	xmm0, rax		# xmm0 = 2.0**exponent
	ret
.Lunderflow:
	xorpd	xmm0, xmm0		# xmm0 = 0.0
					#      = exp2n(<-1074)
	ret
.size	exp2n, .-exp2n
.type	exp2n, @function
.global	exp2n
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; Microsoft calling convention for AMD64 platform:
; - first 4 arguments (from left to right) are passed in registers
;   RCX/R1 or XMM0, RDX/R2 or XMM1, R8 or XMM2, and R9 or XMM3,
;   depending on their type (for floating-point arguments of
;   unprototyped or variadic functions, where argument type
;   expected by callee is unknown, both registers are used);
; - arguments larger than 8 bytes are passed by reference;
; - surplus arguments are pushed on stack in reverse order (from
;   right to left), 8-byte aligned;
; - caller allocates memory for return value larger than 8 bytes and
;   passes pointer to it as (hidden) first argument, thus shifting
;   all other arguments;
; - caller always allocates "home space" for 4 arguments on stack,
;   even when less than 4 arguments are passed, but does not need to push
;   first 4 arguments;
; - callee can spill first 4 arguments from registers to "home space";
; - callee can clobber "home space";
; - stack is 16-byte aligned: callee must decrement RSP by 8+n*16
;   bytes when it calls other functions (CALL instruction pushes 8 bytes);
; - integer or pointer result is returned in register RAX/R0;
; - floating-point result is returned in register XMM0;
; - registers RAX/R0, RCX/R1, RDX/R2, R8, R9, R10, R11 and XMM0 to
;   XMM5 are volatile and can be clobbered;
; - registers RBX/R3, RSP/R4, RBP/R5, RSI/R6, RDI/R7, R12, R13, R14,
;   R15 and XMM6 to XMM15 must be preserved.
; exp2n(<-1074) = 0
; exp2n(0)      = 1
; exp2n(>1023)  = INFINITY
; exp2n(x)      = 2**x
; exp2n(-x)     = 1 / exp2n(x)
;               = 1 / 2**x
;               = (1 / 2)**x
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
exp2n	proc	public			; ecx = exponent
	mov	eax, ecx		; eax = exponent
	cmp	eax, bias
	jg	Loverflow		; exponent > 1023?
	cmp	eax, 1 - width mantissa - bias
	jl	Lunderflow		; exponent < -1074?
	add	eax, bias		; eax = biased exponent
	jg	Lnormal			; biased exponent > 0?
Ldenormal:
	add	eax, width mantissa - 1 ; eax = index of '1' bit in mantissa
	xor	ecx, ecx
	bts	rcx, rax		; rcx = denormal 2.0**exponent
	movd	xmm0, rcx		; xmm0 = denormal 2.0**exponent
	ret
Loverflow:
	mov	eax, bias * 2 + 1	; rax = biased exponent
					;     = 2047
Lnormal:
	shl	rax, width mantissa
	movd	xmm0, rax		; xmm0 = 2.0**exponent
	ret
Lunderflow:
	xorpd	xmm0, xmm0		; xmm0 = 0.0
					;      = exp2n(<-1074)
	ret
exp2n	endp
	end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; Common "cdecl" calling and naming convention for i386 platform:
; - arguments are pushed on stack in reverse order (from right to left),
;   4-byte aligned;
; - 64-bit integer arguments are passed as pair of 32-bit integer arguments,
;   low part below high part;
; - 80-bit, 64-bit or 32-bit floating-point result is returned in FPU
;   register ST0;
; - 64-bit integer result is returned in registers EAX (low part) and
;   EDX (high part);
; - 32-bit integer or pointer result is returned in register EAX;
; - registers EAX, ECX and EDX are volatile and can be clobbered;
; - registers EBX, ESP, EBP, ESI and EDI must be preserved.
; exp2n(<-1022) = 0
; exp2n(0)      = 1
; exp2n(>1023)  = INFINITY
; exp2n(n)      = 2**n
; exp2n(-n)     = 1 / exp2n(n)
;               = 1 / 2**n
;               = (1 / 2)**n
	.686
	.model	flat, C
	.code
exp2n	proc	public			; [esp+4] = argument
	fild	dword ptr [esp+4]	; st(0) = exponent
	fld1				; st(0) = 1.0,
					; st(1) = exponent
	fscale				; st(0) = 1.0 * 2.0**exponent,
					; st(1) = exponent
	fstp	st(1)			; st(0) = 2.0**exponent
	ret
exp2n	endp
	end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; exp2n(<-1022) = 0
; exp2n(0)      = 1
; exp2n(>1023)  = INFINITY
; exp2n(n)      = 2**n
; exp2n(-n)     = 1 / exp2n(n)
;               = 1 / 2**n
;               = (1 / 2)**n
	.686
	.model	flat, C
	.code
exp2n	proc	public			; [esp+4] = argument
	mov	eax, [esp+4]		; eax = exponent
	mov	edx, 1024		; edx = 1024
					;     = maximum exponent
	cmp	edx, eax
	cmovl	eax, edx		; eax = min(exponent, 1024)
if 0
	dec	edx			; edx = 1023
	neg	edx			; edx = -1023
					;     = minimum exponent
else
	mov	edx, -1023		; edx = -1023
					;     = minimum exponent
endif
	cmp	edx, eax
	cmovg	eax, edx		; eax = max(min(exponent, 1024), -1023)
					;     = clamped unbiased exponent
	sub	eax, edx		; eax = clamped unbiased exponent + 1023
					;     = biased exponent
	shl	eax, 20
	push	eax
	push	0			; [esp] = 2.0**exponent
	fld	real8 ptr [esp]		; st(0) = 2.0**exponent
	add	esp, 8
	ret
exp2n	endp
	end
        exp10() Base-10 Exponential FunctionTo avoid this, the product z × log102 must be calculated in higher precision and subtraction performed in 2 steps, known as Cody-Waite argument reduction: § log102 is split apart into a (double-double) head + tail pair, with tail = log210 − head and the 11 least significant bits (matching the size of the exponent) of head’s fraction clear.
 The product
            z′ = head × z × log102
            is then exact and the difference
            y′ = x − z′
            according to Sterbenz’ lemma
            §
            too.
        
Subtraction of the product
            z″ = tail × z × log102
            from y′ gives a correctly rounded
            y″ = y for the
            (polynomial) approximation of 10y on the
            interval
            [0, log102 = 1/log210 = 0.3010299956639812],
            followed by the (trivial) multiplication with
            2z.
        
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double floor(double x);
double ldexp(double x, int z);
// Faithfully rounded base-10 exponential
double exp10(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_SQRT10   3.1622776601683793
#define M_1_SQRT10 0.31622776601683793
    if (x != x)
        return INDEFINITE;
    if (x < -323.60724533877978)
        return 0.0;
    if (x == -1.0)
        return 0.1;
    if (x == -0.5)
        return M_1_SQRT10;
    if (x == 0.0)
        return 1.0;
    if (x == 0.5)
        return M_SQRT10;
    if (x == 1.0)
        return 10.0;
    if (x > 308.25471555991674)
        return INFINITY;
#endif
    // for z = x * log2(10.0) = 3.3219280948873623
    // and x" = x - z * log10(2.0), 10**x = 10**x" * 2**z
    //
    // for integral |z| < 2048 the double-precision product
    // z * 0x0.4D104D427DE00 = z * 0x1.34413509F7800p-2
    //                       = z * 0.30102999566395283
    // is exact and lies within a binade from x, therefore the
    // first subtraction yields an exact intermediate result x'
    //
    // subtraction of the double-precision tail product
    // z * 0x0.7FBCC47C4ACD6p-44 = z * 0x1.FEF311F12B358p-46
    //                           = z * 0.28363394551044964e-13
    // yields x" within 2**(-48-52) from x - z * log10(2.0)
    //
    // the correctly rounded x" lies within 0.5 ULP + 2**-100
    // from the exact x - z * log10(2.0)
    //
    // for 0 <= x" <= log10(2.0) = 0.3010299956639812,
    // a minimax polynomial of degree 11 approximates 10**x"
    // with relative error 3.0545878321297965e-18 < 2**-58
    z = floor(x * 3.3219280948873623);
    x -= z * 0.30102999566395283;
    x -= z * 0.28363394551044964e-13;
    return ldexp(((((((((((+3.4097977633132781e-4 * x
                           +1.0726030173640114e-3) * x
                           +5.0515508830497290e-3) * x
                           +1.9586879159041869e-2) * x
                           +6.8091402676825436e-2) * x
                           +2.0699559408492088e-1) * x
                           +5.3938295003481862e-1) * x
                           +1.1712551478362764) * x
                           +2.0346785923260857) * x
                           +2.6509490552386914) * x
                           +2.3025850929940488) * x
                           +1.0, (int) z);
}
            Note: overflow and underflow are handled by the
            ldexp()
            alias
            scalbn()
            function!
        For −1075 < x × log210 < 1024, with z = ⌊x × log210 + ½⌋ for x > 0 and z = ⌈x × log210 - ½⌉ for x < 0, i.e. x × log210 rounded to the nearest (even) integral number, hence −½ × log102 ≤ y = x − z × log102 ≤ ½ × log102, calculation of 10x = 10y+z×log102 = 10y × 10z × log102 = 10y × 2z is reduced to the (polynomial) approximation of 10y on the interval [−½ × log102, ½ × log102], followed by the (trivial) multiplication with 2z.
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# Faithfully rounded base-10 exponential
# CAVEAT: requires default (round to nearest, ties to even) rounding mode!
# exp10(-INFINITY) = 0
# exp10(0)         = 1
# exp10(1)         = 10
# exp10(INFINITY)  = INFINITY
# exp10(x)         = 10**x
#                  = 10**(x - z * log10(2)) * 2**z, -1075 < z = rint(x / log10(2)) < 1024
# exp10(-x)        = 1 / exp10(x)
#                  = 1 / 10**x
#                  = (1 / 10)**x
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp10:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
	mov	rax, 0x400A934F0979A371
	movq	xmm2, rax		# xmm2 = 0x1.A934F0979A371p+1
					#      = 3.3219280948873623
					#      = 1.0 / log10(2.0)
					#      = log2(10.0)
	mulsd	xmm2, xmm0		# xmm2 = log2(10.0) * argument
					#      = argument / log10(2.0)
.ifdef SSE4_1
	roundsd	xmm2, xmm2, 0		# xmm2 = argument / log10(2.0) rounded to nearest (even) integer
.endif
	cvtsd2si eax, xmm2		# eax = lrint(argument / log10(2.0))
#	neg	eax
#	jo	.Lrange			# argument / log10(2.0) > maximum 32-bit integer?
#					# argument / log10(2.0) < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument / log10(2.0) < -1074.0?
					# argument / log10(2.0) < minimum 32-bit integer?
					# argument / log10(2.0) > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument / log10(2.0) > 1023.0?
	cvtsi2sd xmm1, eax		# xmm1 = rint(argument / log10(2.0))
					#      = log2(scale factor)
	mov	rdx, 0x3FD34413509F7800
	movq	xmm2, rdx		# xmm2 = 0x1.34413509F7800p-2
					#      = 0.30102999566395283
					#      = log10(2.0)'
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - log10(2.0)' * rint(argument / log10(2.0))
					#      = argument'
	mov	rdx, 0x3D1FEF311F12B358
	movq	xmm2, rdx		# xmm2 = 0x1.FEF311F12B358p-46
					#      = 2.8363394551044964e-14
					#      = log10(2.0) - log10(2.0)'
					#      = log10(2.0)"
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument'
					#      - log10(2.0)" * rint(argument / log10(2.0))
					#      = argument" in [-log10(2.0) / 2.0, log10(2.0) / 2.0]
.Lhorner:
	mov	rcx, 0x3F2F9A47809D481E
	movq	xmm1, rcx		# xmm1 = 0x1.F9A47809D481Ep-13
					#      = 2.4110911209135413e-4
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F52F77F5270A2E0
	movq	xmm2, rdx		# xmm2 = 0x1.2F77F5270A2E0p-10
					#      = 1.1576407794199815e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3F74898B16300A8C
	movq	xmm1, rcx		# xmm1 = 0x1.4898B16300A8Cp-8
					#      = 5.0139840195721038e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F941165ADE1D201
	movq	xmm2, rdx		# xmm2 = 0x1.41165ADE1D201p-6
					#      = 1.9597614992067139e-2
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FB16E4DF62D8622
	movq	xmm1, rcx		# xmm1 = 0x1.16E4DF62D8622p-4
					#      = 6.8089363672264841e-2
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FCA7ED70A468547
	movq	xmm2, rdx		# xmm2 = 0x1.A7ED70A468547p-3
					#      = 2.0699584962589253e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FE1429FFD1F5001
	movq	xmm1, rcx		# xmm1 = 0x1.1429FFD1F5001p-1
					#      = 5.3938292921020555e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF2BD7609FD42C5
	movq	xmm2, rdx		# xmm2 = 0x1.2BD7609FD42C5p+0
					#      = 1.1712551489073786
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x4000470591DE2C1B
	movq	xmm1, rcx		# xmm1 = 0x1.0470591DE2C1Bp+1
					#      = 2.0346785922934154
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x40053524C73CEA7E
	movq	xmm2, rdx		# xmm2 = 0x1.53524C73CEA7Ep+1
					#      = 2.6509490552392084
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x40026BB1BBB55516
	movq	xmm1, rcx		# xmm1 = 0x1.26BB1BBB55516p+1
					#      = 2.3025850929940458
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm0, rdx		# xmm0 = 0x1.0p+0
					#      = 1.0
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument")
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * scale factor
					#      = exp10(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp10(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp10(<-0x1.439B746E36B52p+8)
#					#      = exp10(<-323.60724533877978)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp10(>0x1.34413509F79FFp+8)
#					#      = exp10(>308.25471555991674)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp10(±0.0)
.Lexit:
	ret
.size	exp10, .-exp10
.type	exp10, @function
.global	exp10
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; exp10(x)  = 10**x
;           = 2**(x * log2(10))
; exp10(-x) = 1 / exp10(x)
	.686
	.model	flat, C
	.code
exp10	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2t				; st(0) = log2(10.0),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(10.0)
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(10.0)
	fld	st(1)			; st(0) = exponent * log2(10.0),
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	fprem				; st(0) = (exponent * log2(10.0)) modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	f2xm1				; st(0) = 2.0**((exponent * log2(10.0)) modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	faddp	st(1), st(0)		; st(0) = 2.0**((exponent * log2(10.0)) modulo 1.0),
					; st(1) = exponent * log2(10.0)
	fscale				; st(0) = 10.0**exponent,
					; st(1) = exponent * log2(10.0)
else
	fld	st(0)			; st(0) = st(1) = exponent * log2(10.0)
	frndint				; st(0) = integer(exponent * log2(10.0)),
					; st(1) = exponent * log2(10.0)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(10.0)),
					; st(1) = fraction(exponent * log2(10.0))
	fxch	st(1)			; st(0) = fraction(exponent * log2(10.0)),
					; st(1) = integer(exponent * log2(10.0))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(10.0)) - 1.0,
					; st(1) = integer(exponent * log2(10.0))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(10.0)) - 1.0,
					; st(2) = integer(exponent * log2(10.0))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(10.0)),
					; st(1) = integer(exponent * log2(10.0))
	fscale				; st(0) = 10.0**exponent,
					; st(1) = integer(exponent * log2(10.0))
endif
	fstp	st(1)			; st(0) = 10.0**exponent
	ret
exp10	endp
	end
        exp() Base-e Exponential Functionexp()
            returns the
            base-e exponential of its
            argument.
        Calculation of the exponential function to the transcendental base e = 2.71828182845904523536028747135266249775724709369995…, known as Euler’s number and also Napier’s constant, ex = ey+z×loge2 = ey × ez×loge2 = ey × 2z for −1075 < x × log2e < 1024, with z = ⌊x × log2e⌋, i.e. x × log2e rounded down towards −∞, hence 0 ≤ y = x − z × loge2 ≤ loge2 = 1/log2e = 0.69314718055994531, is more difficult than calculation of 2x: for z × loge2 close to x, calculation of the difference y = x − z × loge2 suffers from subtractive cancellation, i.e. complete loss of precision!
To avoid this, the product z × loge2 must be calculated in higher precision and subtraction performed in 2 steps, known as Cody-Waite argument reduction: § loge2 is split apart into a (double-double) head + tail pair, with tail = log2e − head and the 11 least significant bits (matching the size of the exponent) of head’s fraction clear.
 The product
            z′ = head × z × loge2
            is then exact and the difference
            y′ = x − z′
            according to Sterbenz’ lemma
            §
            too.
        
Subtraction of the product
            z″ = tail × z × loge2
            from y′ gives a correctly rounded
            y″ = y for the
            (polynomial) approximation of ey on
            the interval
            [0, loge2 = 1/log2e = 0.69314718055994531],
            followed by the (trivial) multiplication with
            2z.
        
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double floor(double x);
double ldexp(double x, int z);
// Faithfully rounded base-e exponential
double exp(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_E        2.7182818284590452
#define M_1_E      0.36787944117144232
#define M_SQRTE    1.6487212707001281
#define M_1_SQRTE  0.60653065971263342
    if (x != x)
        return INDEFINITE;
    if (x < -745.13321910194121)
        return 0.0;
    if (x == -1.0)
        return M_1_E;
    if (x == -0.5)
        return M_1_SQRTE;
    if (x == 0.0)
        return 1.0;
    if (x == 0.5)
        return M_SQRTE;
    if (x == 1.0)
        return M_E;
    if (x > 709.78271289338400)
        return INFINITY;
#endif
    // for (integral) z = x * log2(e) = x * 1.4426950408889634
    // and x" = x - z * log(2.0), e**x = e**x" * 2**z
    //
    // for integral |z| < 2048 the double-precision product
    // z * 0x0.B17217F7D1C00 = z * 0x1.62E42FEFA3800p-1
    //                       = z * 0.69314718055989033
    // is exact and lies within a binade from x, therefore the
    // first subtraction yields an exact intermediate result x'
    //
    // subtraction of the double-precision tail product
    // z * 0x0.F79ABC9E3B398p-44 = z * 0x1.EF35793C76730p-45
    //                           = z * 0.54979230187083712e-13
    // yields x" within 2**(-50-52) from x - z * log(2.0)
    //
    // the correctly rounded x" lies within 0.5 ULP + 2**-102
    // from the exact x - z * log(2.0)
    //
    // for 0 <= x" <= log(2.0) = 0.69314718055994531,
    // a minimax polynomial of degree 11 approximates e**x"
    // with relative error 3.0545878321297965e-18 < 2**-58
    z = floor(x * 1.4426950408889634);
    x -= z * 0.69314718055989033;
    x -= z * 0.54979230187083712e-13;
    return ldexp(((((((((((+3.5347283721656128e-8 * x
                           +2.5602485412126367e-7) * x
                           +2.7764095757136529e-6) * x
                           +2.4787899938611698e-5) * x
                           +1.9841863599469418e-4) * x
                           +1.3888871805082296e-3) * x
                           +8.3333336552944127e-3) * x
                           +4.1666666628388979e-2) * x
                           +1.6666666666933781e-1) * x
                           +4.9999999999990426e-1) * x
                           +1.0000000000000013) * x
                           +1.0, (int) z);
}
            Note: overflow and underflow are handled by the
            ldexp()
            alias
            scalbn()
            function!
        For −1075 < x × log2e < 1024, with z = ⌊x × log2e + ½⌋ for x > 0 and z = ⌈x × log2e − ½⌉ for x < 0, i.e. x × log2e rounded to the nearest (even) integral number, hence −½ × loge2 ≤ y = x − z × loge2 ≤ ½ × loge2, calculation of ex = ey+z×loge2 = ey × ez×loge2 = ey × 2z is reduced to the (polynomial) approximation of ey on the interval [−½ × loge2, ½ × loge2], followed by the (trivial) multiplication with 2z.
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# Faithfully rounded natural exponential
# CAVEAT: requires default (round to nearest, ties to even) rounding mode!
# exp(-INFINITY) = 0
# exp(0)         = 1
# exp(1)         = e
# exp(INFINITY)  = INFINITY
# exp(x)         = e**x
#                = e**(x - z * log(2)) * 2**z, -1075 < z = rint(x / log(2)) < 1024
# exp(-x)        = 1 / exp(x)
#                = 1 / e**x
#                = (1 / e)**x
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
	mov	rax, 0x3FF71547652B82FE
	movq	xmm2, rax		# xmm2 = 0x1.71547652B82FEp+0
					#      = 1.4426950408889634
					#      = 1.0 / log(2.0)
					#      = log2(e)
	mulsd	xmm2, xmm0		# xmm2 = log2(e) * argument
					#      = argument / log(2.0)
.ifdef SSE4_1
	roundsd	xmm2, xmm2, 0		# xmm2 = argument / log(2.0) rounded to nearest (even) integer
.endif
	cvtsd2si eax, xmm2		# eax = lrint(argument / log(2.0))
#	neg	eax
#	jo	.Lrange			# argument / log(2.0) > maximum 32-bit integer?
#					# argument / log(2.0) < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument / log(2.0) < -1074.0?
					# argument / log(2.0) < minimum 32-bit integer?
					# argument / log(2.0) > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument / log(2.0) > 1023.0?
	cvtsi2sd xmm1, eax		# xmm1 = rint(argument / log(2.0))
					#      = log2(scale factor)
	mov	rdx, 0x3FE62E42FEFA3800
	movq	xmm2, rdx		# xmm2 = 0x1.62E42FEFE3800p-1
					#      = 0.69314718055989033
					#      = log(2.0)'
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - log(2.0)' * rint(argument / log(2.0))
					#      = argument'
	mov	rdx, 0x3D2EF35793C76730
	movq	xmm2, rdx		# xmm2 = 0x1.EF35793C76730p-45
					#      = 5.4979230187083712e-14
					#      = log(2.0) - log(2.0)'
					#      = log(2.0)"
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument'
					#      - log(2.0)" * rint(argument / log(2.0))
					#      = argument" in [-log(2.0) / 2.0, log(2.0) / 2.0]
.Lhorner:
	mov	rdx, 0x3E5AD661C903688B
	movq	xmm1, rdx		# xmm1 = 0x1.AD661C903688Bp-26
					#      = 2.4994304016107913e-8
	mulsd	xmm1, xmm0
	mov	rdx, 0x3E928B311C7EB84F
	movq	xmm2, rdx		# xmm2 = 0x1.28B311C7EB84Fp-22
					#      = 2.7632293297497039e-7
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3EC71DF4520AAEEB
	movq	xmm1, rdx		# xmm1 = 0x1.71DF4520AAEEBp-19
					#      = 2.7557622533559223e-6
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3EFA01992D0FE736
	movq	xmm2, rdx		# xmm2 = 0x1.A01992D0FE736p-16
					#      = 2.4801486521375964e-5
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3F2A01A0110572B2
	movq	xmm1, rdx		# xmm1 = 0x1.A01A0110572B2p-13
					#      = 1.9841269432676262e-4
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F56C16C1878111C
	movq	xmm2, rdx		# xmm2 = 0x1.6C16C1878111Cp-10
					#      = 1.3888888951224038e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3F81111111130DD6
	movq	xmm1, rdx		# xmm1 = 0x1.1111111130DD6p-7
					#      = 8.3333333335592727e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FA555555554F370
	movq	xmm2, rdx		# xmm2 = 0x1.555555554F370p-5
					#      = 4.1666666666492767e-2
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3FC55555555554A2
	movq	xmm1, rdx		# xmm1 = 0x1.55555555554A2p-3
					#      = 1.6666666666666169e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FE0000000000010
	movq	xmm2, rdx		# xmm2 = 0x1.0000000000010p-1
					#      = 5.0000000000000177e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm1, rdx		# xmm1 = 0x1.0p+0
					#      = 1.0
	addsd	xmm2, xmm1
	mulsd	xmm0, xmm2
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument")
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * scale factor
					#      = exp(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp(<-0x1.74385446D71C3p+9)
#					#      = exp(<-744.44007192138126)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp(>0x1.62E42FEFA39EFp+9)
#					#      = exp(>709.78271289338400)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp(±0.0)
.Lexit:
	ret
.size	exp, .-exp
.type	exp, @function
.global	exp
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/c850xxez.aspx
; exp(x)  = e**x
;         = 2**(x * log2(e))
; exp(-x) = 1 / exp(x)
	.686
	.model	flat, C
	.code
exp	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2e				; st(0) = log2(e),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(e)
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(e)
	fld	st(1)			; st(0) = exponent * log2(e),
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fprem				; st(0) = (exponent * log2(e)) modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	f2xm1				; st(0) = 2.0**((exponent * log2(e)) modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	faddp	st(1), st(0)		; st(0) = 2.0**((exponent * log2(e)) modulo 1.0),
					; st(1) = exponent * log2(e)
	fscale				; st(0) = e**exponent,
					; st(1) = exponent * log2(e)
else
	fld	st(0)			; st(0) = st(1) = exponent * log2(e)
	frndint				; st(0) = integer(exponent * log2(e)),
					; st(1) = exponent * log2(e)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(e)),
					; st(1) = fraction(exponent * log2(e))
	fxch	st(1)			; st(0) = fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(1) = integer(exponent * log2(e))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(2) = integer(exponent * log2(e))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	fscale				; st(0) = e**exponent,
					; st(1) = integer(exponent * log2(e))
endif
	fstp	st(1)			; st(0) = e**exponent
	ret
exp	endp
	end
        expm1() Base-e Exponential Functionexpm1()
            returns the by one decremented
            base-e exponential of its
            argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fma(double x, double y, double z);
// Faithfully rounded base-e exponential minus 1
// for |x| < log(1.5) = 0.405465108108164382
double expm1(double x)
{
    double z = 2.0884268547791305e-9;
    z = fma(z, x, 2.5136640903355195e-8);
    z = fma(z, x, 2.7557461207244723e-7);
    z = fma(z, x, 2.7557153928447346e-6);
    z = fma(z, x, 2.4801586944307795e-5);
    z = fma(z, x, 1.9841269987879947e-4);
    z = fma(z, x, 1.3888888889202989e-3);
    z = fma(z, x, 8.3333333332766286e-3);
    z = fma(z, x, 4.1666666666665637e-2);
    z = fma(z, x, 1.6666666666666738e-1);
    z = fma(z, x, 0.5) * x;
    z = fma(z, x, x);
    return z;
}
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn353645.aspx
; expm1(x)  = e**x - 1
;           = 2**(x * log2(e)) - 1
; expm1(-x) = 1 / exp(x) - 1
	.686
	.model	flat, C
	.code
expm1	proc	public			; [esp+4] = exponent
	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2e				; st(0) = log2(e),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(e)
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(e)
	fld	st(1)			; st(0) = exponent * log2(e),
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fabs				; st(0) = |exponent * log2(e)|,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fcompp				; st(0) = exponent * log2(e)
	fstsw	ax			; ax = FPU status word
					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > st(1)
					; .  0   ...  0   .   1   ........  st(0) < st(1)
					; .  1   ...  0   .   0   ........  st(0) = st(1)
					; .  1   ...  1   .   1   ........  st(0) # st(1)
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)
	ja	Lrange			; |exponent * log2(e)| > 1.0?
;;	jp	Lexit			; exponent = INDEFINITE?
	f2xm1				; st(0) = 2.0**(exponent * log2(e)) - 1.0
					;       = e**exponent - 1.0
	ret
Lrange:
	fld	st(0)			; st(0) = st(1) = exponent * log2(e)
	frndint				; st(0) = integer(exponent * log2(e)),
					; st(1) = exponent * log2(e)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(e)),
					; st(1) = fraction(exponent * log2(e))
	fxch	st(1)			; st(0) = fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(1) = integer(exponent * log2(e))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(2) = integer(exponent * log2(e))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	fscale				; st(0) = e**exponent,
					; st(1) = integer(exponent * log2(e))
	fstp	st(1)			; st(0) = e**exponent
	fld1				; st(0) = 1.0,
					; st(1) = e**exponent
	fsubp	st(1), st(0)		; st(0) = e**exponent - 1.0
Lexit:
	ret
expm1	endp
	end
        The logarithm function can be approximated by a (minimax) polynomial on any sufficiently small interval with high accuracy, for example faithfully rounded, as shown hereafter.
log() Base-e alias Natural Logarithm Functionlog()
            returns the base-e alias
            natural logarithm of its argument.
        logex = artanh((x2 − 1) / (x2 + 1)) = 2 × artanh((x − 1) / (x + 1)), loge(1 + x) = x1 / 1 − x2 / 2 + x3 / 3 − x4 / 4 + … = 2 × artanh(x / (2 + x)), …
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_E        2.7182818284590452
#define M_LN2      0.69314718055994531
#define M_1_SQRT2  0.70710678118654752
double frexp(double x, int *z);
// Faithfully rounded natural logarithm
double log(double argument)
{
    double mantissa, x, y, z;
    int exponent;
    if (argument != argument)
        return INDEFINITE;
    if (argument < 0.0)
        return INDEFINITE;
    if (argument == 0.0)
        return -INFINITY;
#ifdef OPTIONAL
    if (argument == 1.0)
        return 0.0;
    if (argument == M_E)
        return 1.0;
#endif
    if (argument == INFINITY)
        return INFINITY;
    // for argument > 0,
    // log(argument) = log(2) * log2(argument)
    //
    // for argument = mantissa * 2**exponent,
    // log(argument) = log(mantissa * 2**exponent)
    //               = log(mantissa) + log(2**exponent)
    //               = log(mantissa) + log(2) * log2(2**exponent)
    //               = log(mantissa) + log(2) * log2(2) * exponent
    //               = log(mantissa) + log(2) * exponent
    //
    // for mantissa = 1,
    // log(mantissa) = log(2) * exponent
    //
    // for mantissa = 1 + fraction
    //              = (1 + x) / (1 - x)
    // and x = (mantissa - 1) / (mantissa + 1)
    //       = fraction / (2 + fraction)
    //       = 1 - 2 / (2 + fraction),
    // log(mantissa) = log(1 + fraction)
    //               = log((1 + x) / (1 - x))
    //               = log(1 + x) - log(1 - x)
    //
    // for x = 0,
    // log(1 + x) - log(1 - x) = log(1) - log(1)
    //                         = 0
    //
    // for -1 < x <= 1,
    // log(1 + x) = x**1/1 - x**2/2 + x**3/3 - x**5/5 + x**7/7 + ...
    //            = x - x**2/2 + x**3/3 - x**5/5 + x**7/7 + ...
    //
    // for -1 <= x < 1,
    // log(1 - x) = 0 - x**1/1 - x**2/2 - x**3/3 - x**5/5 - x**7/7 - ...
    //            = 0 - x - x**2/2 - x**3/3 - x**5/5 - x**7/7 - ...
    //            = 0 - (x + x**2/2 + x**3/3 + x**5/5 + x**7/7 + ...)
    //
    // for -1 < x < 1,
    // log(1 + x) - log(1 - x) = x - x**2/2 + x**3/3 - x**5/5 + x**7/7 - ...
    //                         + x + x**2/2 + x**3/3 + x**5/5 + x**7/7 + ...
    //                         = x * 2      + x**3/3 * 2      + x**7/7 * 2 + ...
    //                         = (x + x**3/3 + x**7/7 + ...) * 2
    //                         = x * 2 + (1 + x**2/3 + x**6/7 + ...) * 2
    //                         = x * 2 + polynomial(x**2)
    mantissa = frexp(argument, &exponent);
#ifdef OPTIONAL
    if (mantissa == 0.5)
#if 0
        return (exponent - 1) * M_LN2;
#elif 0
        return (exponent - 1) * 0x1.EF35793C76730p-45
             + (exponent - 1) * 0x1.62E42FEFA3800p-1;
#else
        return (exponent - 1) * 0.54979230187083712e-13
             + (exponent - 1) * 0.69314718055989033;
#endif
#endif
#if 0
    // for 1/2 <= mantissa = 1 + fraction < 1,
    // -1/2 <= fraction < 0 and x = (mantissa - 1) / (mantissa + 1),
    // -1/3 <= x < 0
    x = (mantissa - 1.0) / (mantissa + 1.0);
    // for 0 < x < 1/3,
    // a minimax polynomial of degree 10 in x**2 approximates
    // (log(1 + x) - log(1 - x)) / (2 * x) with relative error
    // 1.2300066608152056e-18 ~ 2**-59.5
    y = x * x;
    y = (((((((((+0.17060062608429468 * y
                 +0.083156843071811262) * y
                 +0.12112248959493536) * y
                 +0.13300102515887726) * y
                 +0.15386635453768495) * y
                 +0.18181739787751806) * y
                 +0.22222224111772142) * y
                 +0.28571428544925157) * y
                 +0.40000000000190325) * y
                 +0.66666666666666134) * y;
#else
    //      _                                 _
    // for /2/2 <= mantissa = 1 + fraction < /2
    // and x = (mantissa - 1) / (mantissa + 1),
    // -0.29289321881345248 <= fraction < 0.41421356237309505,
    // -0.1715728752538099 <= x < 0.1715728752538099
    if (mantissa < M_1_SQRT2) {
        mantissa += mantissa;
        exponent -= 1;
    }
    x = (mantissa - 1.0) / (mantissa + 1.0);
    // for -0.1715728752538099 <= x < 0.1715728752538099,
    // a minimax polynomial of degree 7 in x**2 approximates
    // (log(1 + x) - log(1 - x)) / (2 * x) with relative error
    // 1.1354910268086278e-18 ~ 2**-59.6
    y = x * x;
    y = ((((((+0.14810529843106951 * y
              +0.15312443753011222) * y
              +0.18183635094502661) * y
              +0.22222196988240322) * y
              +0.28571428761346767) * y
              +0.39999999999298882) * y
              +0.66666666666667652) * y;
#endif
    // K. C. Ng's formula yields an error below 1 ULP:
    // for z = fraction * fraction / 2
    // and x * 2 = fraction - fraction * x
    //           = fraction - z + z * x
    //           = fraction - (z - z * x),
    // log(mantissa) = log(1 + fraction)
    //               = fraction - (fraction - polynomial(x * x)) * x
    //               = fraction - (z - (z + polynomial(x * x)) * x)
    mantissa -= 1.0;
    z = mantissa * mantissa * 0.5;
    z = mantissa - (z - (z + y) * x);
    // for integral |exponent| < 2048,
    // the double-precision product exponent * 0x1.62E42FEFA3800p-1
    // is exact; addition of the double-precision tail product
    // exponent * 0x1.EF35793C76730p-45 yields log(2.0) * exponent
    // within 2**(-50-52) from the exact product
    //
    // log(argument) = log(mantissa) + log(2.0) * exponent
    //               = log(mantissa) + exponent * 0x1.EF35793C76730p-45
    //                               + exponent * 0x1.62E42FEFA3800p-1
#if 0
    z += exponent * 0x1.EF35793C76730p-45;
    z += exponent * 0x1.62E42FEFA3800p-1;
#else
    z += exponent * 0.54979230187083712e-13;
    z += exponent * 0.69314718055989033;
#endif
    return z;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# Faithfully rounded natural logarithm
# log(<0)       = INDEFINITE
# log(±0)       = -INFINITY
# log(1)        = 0
# log(e)        = 1
# log(INFINITY) = INFINITY
# log(1/x)      = -log(x)
# log(x)        = log(significand * 2**exponent)
#               = log(significand) + log(2) * exponent
#               = natural logarithm (to base e)
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
log:
	movq	rax, xmm0		# rax = argument
	add	rax, rax		# rax = argument << 1
					#     = |argument| << 1
#	jz	.Lzero			# argument = ±0.0?
#	jc	.Lnegative		# argument < ±0.0?
	jbe	.Lrange			# argument <= ±0.0?
.Lpositive:
	mov	rcx, rax
	shr	rcx, 53			# rcx = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	sub	ecx, BIAS		# ecx = unbiased exponent
	cmp	ecx, BIAS + 1
	je	.Lspecial		# biased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = INFINITY?)
.Lnormal:
	shl	rax, 11			# rax = fractional part of argument << 12
.Lcontinue:
	mov	rdx, 0x6A09E667F3BCC909	# rdx = fractional part of sqrt(2.0) << 12
	cmp	rdx, rax		# CF = (sqrt(2.0) < significand of argument)
	sbb	edx, edx		# edx = (sqrt(2.0) < significand of argument) ? -1 : 0
	sub	ecx, edx		# ecx = exponent of argument
					#     + (sqrt(2.0) < significand of argument)
					#     = exponent'
	add	edx, BIAS		# rdx = (sqrt(2.0) < significand of argument) ? BIAS - 1 : BIAS
	or	rax, rdx
	ror	rax, 12			# rax = significand of argument'
	movq	xmm0, rax		# xmm0 = significand of argument' in [sqrt(0.5), sqrt(2.0)]
.Ltransform:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0
	subsd	xmm2, xmm1		# xmm2 = significand of argument' - 1.0
					#      = fraction of argument'
	addsd	xmm1, xmm0		# xmm1 = significand of argument' + 1.0
	movsd	xmm0, xmm2		# xmm0 = fraction of argument'
	divsd	xmm2, xmm1		# xmm2 = (significand of argument' - 1.0)
					#      / (significand of argument' + 1.0)
					#      = argument"
	movsd	xmm1, xmm2		# xmm1 = argument" in [-0.1715728752538099, 0.1715728752538099]
	mulsd	xmm2, xmm2		# xmm2 = argument"**2
.Lhorner:
	mov	rax, 0x3FC2F51D4A901906
	movq	xmm3, rax		# xmm3 = 0x1.2F51D4A901906p-3
					#      = 0.14810529843106951
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC39994E1B48251
	movq	xmm4, rdx		# xmm4 = 0x1.39994E1B48251p-3
					#      = 0.15312443753011222
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FC74669DE443505
	movq	xmm3, rax		# xmm3 = 0x1.74669DE443505p-3
					#      = 0.18183635094502661
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FCC71C4FE8C7EC6
	movq	xmm4, rdx		# xmm4 = 0x1.C71C4FE8C7EC6p-3
					#      = 0.22222196988240322
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FD2492494532F9F
	movq	xmm3, rax		# xmm3 = 0x1.2492494532F9Fp-2
					#      = 0.28571428761346767
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FD999999997AC3B
	movq	xmm4, rdx		# xmm4 = 0x1.999999997AC3Bp-2
					#      = 0.39999999999298882
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FE55555555555AE
	movq	xmm3, rax		# xmm3 = 0x1.55555555555AEp-1
					#      = 0.66666666666667652
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2		# xmm3 = polynomial(argument"**2)
.Llogarithm:
	mov	rdx, 0x3FE0000000000000
	movq	xmm2, rdx		# xmm2 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm2, xmm0
	mulsd	xmm2, xmm0		# xmm2 = 0.5 * fraction of argument'**2
	addsd	xmm3, xmm2		# xmm3 = polynomial(argument"**2)
					#      + 0.5 * fraction of argument'**2
	mulsd	xmm3, xmm1		# xmm3 = (polynomial(argument"**2)
					#      + 0.5 * fraction of argument'**2)
					#      * argument"
	subsd	xmm2, xmm3
	subsd	xmm0, xmm2		# xmm0 = log(significand of argument')
.Lexponent:
	cvtsi2sd xmm1, ecx		# xmm1 = exponent'
	mov	rax, 0x3D2EF35793C76730
	movq	xmm3, rax		# xmm3 = 0x1.EF35793C76730p-45
					#      = 0.54979230187083712e-13
					#      = tail of log(2.0)
	mulsd	xmm3, xmm1
	addsd	xmm0, xmm3
	mov	rdx, 0x3FE62E42FEFA3800
	movq	xmm2, rdx		# xmm2 = 0x1.62E42FEFA3800p-1
					#      = 0.69314718055989033
					#      = head of log(2.0)
	mulsd	xmm2, xmm1
	addsd	xmm0, xmm2		# xmm0 = natural logarithm of argument
	ret
.Ldenormal:
	bsr	rcx, rax		# rcx = index of most significant '1' bit in argument << 1
	add	rax, rax
	xor	ecx, 63			# ecx = number of leading '0' bits in argument << 1
					#     = 11 - biased exponent
	shl	rax, cl			# rax = (fractional part of) normalized argument << 12
	neg	ecx			# ecx = biased exponent - 11
	sub	ecx, BIAS - 11		# ecx = unbiased exponent of normalized argument
	jmp	.Lcontinue
.Lrange:
	jnz	.Lnegative		# argument <> ±0.0?
.Lzero:
	mov	rax, 0xFFF0000000000000
	movq	xmm0, rax		# xmm0 = -0x1.0p+1024
					#      = -INFINITY
	ret
.Lspecial:
	shl	rax, 11
	jz	.Lexit			# argument = +INFINITY?
.Lindefinite:
.Lnegative:
	mov	rax, 0x7FF8000000000000
	movq	xmm0, rax		# xmm0 = 0x1.8p+1024
					#      = INDEFINITE
.Lexit:
	ret
.size	log, .-log
.type	log, @function
.global	log
.end
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   0x1.0p+1024
#define INDEFINITE 0x1.8p+1024
#define M_E        0x1.5BF0A8B145769p+1
#define M_LN2      0x1.62E42FEFA39EFp-1
#define M_1_SQRT2  0x1.6A09E667F3BCDp-1
double frexp(double x, int *z);
// Faithfully rounded natural logarithm
double log(double argument)
{
    double mantissa;
    int exponent;
    if (argument != argument)
        return INDEFINITE;
    if (argument < 0.0)
        return INDEFINITE;
    if (argument == 0.0)
        return -INFINITY;
#ifdef OPTIONAL
    if (argument == 1.0)
        return 0.0;
    if (argument == M_E)
        return 1.0;
#endif
    if (argument == INFINITY)
        return INFINITY;
    mantissa = frexp(argument, &exponent);
#ifdef OPTIONAL
    if (mantissa == 0.5)
#if 0
        return (exponent - 1) * M_LN2;
#else
        return (exponent - 1) * 0x1.EF35793C76730p-45
             + (exponent - 1) * 0x1.62E42FEFA3800p-1;
#endif
#endif
    if (mantissa < M_1_SQRT2) {
        mantissa += mantissa;
        exponent -= 1;
    }
    mantissa -= 1.0;
    // for -0.29289321881345248 <= mantissa < 0.41421356237309505,
    // a minimax polynomial of degree 19 approximates log1p(1+mantissa)
    mantissa += (((((((((((((((((((-0x1.CC4EC078138E3p-6 * mantissa
                                   +0x1.0266CD08DB2F2p-4) * mantissa
                                   -0x1.1654764F478ECp-4) * mantissa
                                   +0x1.EA17E14773369p-5) * mantissa
                                   -0x1.EED2E2BB64B2Ep-5) * mantissa
                                   +0x1.0F23916A44515p-4) * mantissa
                                   -0x1.25480A82633AFp-4) * mantissa
                                   +0x1.3B4ED39194B87p-4) * mantissa
                                   -0x1.554D5ACD502ABp-4) * mantissa
                                   +0x1.745980F3FB889p-4) * mantissa
                                   -0x1.9999C5BE751E3p-4) * mantissa
                                   +0x1.C71C90DB06248p-4) * mantissa
                                   -0x1.FFFFFFBD8606Dp-4) * mantissa
                                   +0x1.249248DAE4B2Ap-3) * mantissa
                                   -0x1.55555554A6A2Bp-3) * mantissa
                                   +0x1.9999999A43E4Fp-3) * mantissa
                                   -0x1.00000000013C7p-2) * mantissa
                                   +0x1.5555555555103p-2) * mantissa
                                   -0x1.FFFFFFFFFFFF2p-2) * mantissa) * mantissa;
    // for integral |exponent| < 2048,
    // the double-precision product exponent * 0x1.62E42FEFA3800p-1
    // is exact; addition of the double-precision tail product
    // exponent * 0x1.EF35793C76730p-45 yields log(2.0) * exponent
    // within 2**(-50-52) from the exact product
    mantissa += exponent * 0x1.EF35793C76730p-45;
    mantissa += exponent * 0x1.62E42FEFA3800p-1;
    return mantissa;
}
            loge(significand × 2exponent−1023) = log2significand × (exponent - 1023) × loge2
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/t63833dz.aspx
; log(x) = log(2) * log2(x)
;        = natural logarithm (to base e)
	.686
	.model	flat, C
	.code
log	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of argument
	ret
log	endp
	end
        log1p() Base-e alias Natural Logarithm Functionlog1p()
            returns the base-e alias
            natural logarithm of its by one incremented argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720722.aspx
; log1p(x) = log(2) * log2(1 + x)
;          = natural logarithm (to base e) of (1 + x)
	.686
	.model	flat, C
	.code
log1p	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fabs				; st(0) = |argument|,
					; st(1) = argument,
					; st(2) = ln(2.0)
ifdef DOUBLE
	push	3FD2BEC3r
	push	33018867r		; [esp] = 1.0 - sqrt(0.5)
					;       = 0.292893218813452482773840301888412795960903167724609375
	fcomp	real8 ptr [esp]		; st(0) = argument,
					; st(1) = ln(2.0)
	pop	eax
else
	push	3E95F61Ar		; [esp] = 1.0F - sqrtf(0.5F)
					;       = 0.292893230915069580078125
	fcomp	real4 ptr [esp]		; st(0) = argument,
					; st(1) = ln(2.0)
endif
	pop	eax
	fstsw	ax			; ax = FPU status word
					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > [esp]
					; .  0   ...  0   .   1   ........  st(0) < [esp]
					; .  1   ...  0   .   0   ........  st(0) = [esp]
					; .  1   ...  1   .   1   ........  st(0) # [esp]
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)
	ja	Lrange			; |argument| > 1.0 - sqrt(0.5)?
;;	jp	Lexit			; |argument| = INDEFINITE?
	fyl2xp1				; st(0) = natural logarithm of (argument - 1.0)
Lexit:
	ret
Lrange:
	fld1				; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + 1.0,
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument - 1.0)
	ret
log1p	endp
	end
        log10() Base-10 alias Common Logarithm Functionlog10()
            returns the base-10 alias common logarithm of its argument.
        log10(significand × 2exponent−1023) = log2significand × (exponent - 1023) × log102
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double log(double x);
double log10(double argument)
{
    return 0.43429448190325183 * log(argument);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# log10(<0)       = INDEFINITE
# log10(±0)       = -INFINITY
# log10(1)        = 0
# log10(10)       = 1
# log10(INFINITY) = INFINITY
# log10(1/x)      = -log10(x)
# log10(x)        = log10(e) * log(x)
#                 = common logarithm (to base 10)
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
log10:
	call	log			# xmm0 = log(argument)
	mov	rax, 0x3FDBCB7B1526E50E
	movq	xmm1, rax		# xmm1 = 0x1.BCB7B1526E50Ep-2
					#      = 0.434294481903251828
					#      = log10(2.71828182845904524)
	mulsd	xmm0, xmm1		# xmm0 = log(argument) * log10(2.71828182845904524)
					#      = log10(argument)
	ret
.size	log10, .-log10
.type	log10, @function
.weak	log10
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/t63833dz.aspx
; log10(x) = log10(2) * log2(x)
;          = common logarithm (to base 10)
	.686
	.model	flat, C
	.code
log10	proc	public			; [esp+4] = argument
	fldlg2				; st(0) = log10(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = log10(2.0)
	fyl2x				; st(0) = common logarithm of argument
	ret
log10	endp
	end
        log2() Base-2 alias Binary Logarithm Functionlog2()
            returns the base-2 logarithm of its argument.
        log2(significand × 2exponent−1023) = log2significand × (exponent - 1023)
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
double frexp(double x, int *z);
double log2(double argument)
{
    int exponent;
    unsigned count = 5;
    long long logarithm;
    if (argument != argument)
        return INDEFINITE;
    if (argument < 0.0)
        return INDEFINITE;
    if (argument == 0.0)
        return -INFINITY;
    if (argument == INFINITY)
        return INFINITY;
    argument = frexp(argument, &exponent);
    if (argument == 0.5)
        return (double) (exponent - 1);
    logarithm = exponent - 1;
    do {
        argument += argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument *= argument;
        argument = frexp(argument, &exponent);
        logarithm <<= 10;
        logarithm += exponent - 1;
    } while (--count != 0);
    return logarithm * 0x1.0p-50;
}
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
double frexp(double x, int *z);
double log(double x);
double log2(double argument)
{
    int exponent;
    if (argument != argument)
        return INDEFINITE;
    if (argument < 0.0)
        return INDEFINITE;
    if (argument == 0.0)
        return -INFINITY;
    if (argument == INFINITY)
        return INFINITY;
    argument = frexp(argument, &exponent);
#ifdef OPTIONAL
    if (mantissa == 0.5)
        return (double) (exponent - 1);
#endif
    return 1.4426950408889634 * log(argument) + exponent;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# log2(<0)       = INDEFINITE
# log2(±0)       = -INFINITY
# log2(1)        = 0
# log2(2)        = 1
# log2(INFINITY) = INFINITY
# log2(1/x)      = -log2(x)
# log2(x)        = log2(e) * log(x)
#                = binary logarithm (to base 2)
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
log2:
	call	log			# xmm0 = log(argument)
	mov	rax, 0x3FF71547652B82FE
	movq	xmm1, rax		# xmm1 = 0x1.71547652B82FEp+0
					#      = 1.44269504088896341
					#      = log(2.71828182845904524)
	mulsd	xmm0, xmm1		# xmm0 = log(argument) * log2(2.71828182845904524)
					#      = log2(argument)
	ret
.size	log2, .-log2
.type	log2, @function
.weak	log2
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720721.aspx
; log2(x) = binary logarithm (to base 2)
	.686
	.model	flat, C
	.code
log2	proc	public			; [esp+4] = argument
	fld1				; st(0) = 1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0
	fyl2x				; st(0) = binary logarithm of argument
	ret
log2	endp
	end
        logb() Functionlogb()
            returns the integral part of the base-2 logarithm of the absolute
            value of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
double logb(double argument)
{
    int exponent;
    if (argument != argument)
        return INDEFINITE;
    if (argument == 0.0)
        return -INFINITY;
    if (argument < 0.0)
        argument = -argument;
    if (argument == INFINITY)
        return INFINITY;
    exponent = *(unsigned long long *) &argument >> 52;
    return (exponent & 2047) - 1023;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# logb(±0)         = -INFINITY
# logb(±0.5)       = -1
# logb(±1)         = 0
# logb(±2)         = 1
# logb(±INFINITY)  = INFINITY
# logb(INDEFINITE) = INDEFINITE
# logb(x)          = floor(log2(fabs(x)))
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
logb:
	movq	rcx, xmm0		# rcx = argument
	add	rcx, rcx		# rcx = argument << 1
					#     = |argument| << 1
	jz	.Lzero			# argument = ±0.0?
	mov	rax, rcx
	shr	rax, 53			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	cmp	eax, BIAS * 2 + 1
	jne	.Lnormal		# biased exponent <> 2047?
					# (argument normal?)
	shl	rax, 12
	jnz	.Lindefinite		# argument = INDEFINITE?
.Linfinity:				# argument = ±INFINITY
	mov	rax, 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
					#      = INFINITY
	ret
.Lnormal:
	sub	eax, BIAS		# eax = biased exponent - 1023
					#     = unbiased exponent of argument
	cvtsi2sd xmm0, eax		# xmm0 = unbiased exponent of argument
	ret
.Ldenormal:
	bsr	rax, rcx		# rax = index of most significant '1' bit
					#     = biased exponent + 52
	sub	eax, BIAS + 52		# eax = unbiased exponent of argument
	cvtsi2sd xmm0, eax		# xmm0 = unbiased exponent of argument
	ret
.Lzero:
	mov	rax, 0xFFF0000000000000
	movq	xmm0, rax		# xmm0 = -0x1.0p+1024
					#      = -INFINITY
	ret
.Lindefinite:
	mov	rax, 0x7FF8000000000000
	movq	xmm0, rax		# xmm0 = 0x1.8p+1024
					#      = INDEFINITE
	ret
.size	logb, .-logb
.type	logb, @function
.global	logb
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/e4x82d9s.aspx
	.686
	.model	flat, C
	.code
logb	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = mantissa
					;       = argument / 2.0**exponent,
					; st(1) = exponent
	fstp	st(0)			; st(0) = exponent
	ret
logb	endo
	end
        ilogb() Functionlogb()
            returns the integral part of the base-2 logarithm of the absolute
            value of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int ilogb(double argument)
{
    int exponent = *(unsigned long long *) &argument >> 52;
    return argument == 0.0 ? -2147483648 : (exponent & 2047) - 1023;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# ilogb(±0)         = -2**31
# ilogb(±0.5)       = -1
# ilogb(±1)         = 0
# ilogb(±2)         = +1
# ilogb(±INFINITY)  = +1024
# ilogb(INDEFINITE) = +1024
# ilogb(x)          = floor(log2(fabs(x)))
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
ilogb:
	movq	rcx, xmm0		# rcx = argument
	add	rcx, rcx		# rcx = argument << 1
					#     = |argument| << 1
	jz	.Lzero			# argument = ±0.0?
	mov	rax, rcx
	shr	rax, 53			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
.Lnormal:
	sub	eax, BIAS		# eax = biased exponent - 1023
					#     = unbiased exponent of argument
	ret
.Ldenormal:
	bsr	rax, rcx		# rax = index of most significant '1' bit
					#     = biased exponent + 52
	sub	eax, BIAS + 52		# eax = unbiased exponent of argument
	ret
.Lzero:
	mov	eax, -2147483648	# eax = -2**31
	ret
.size	ilogb, .-ilogb
.type	ilogb, @function
.global	ilogb
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720719.aspx
	.686
	.model	flat, C
	.code
ilogb	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = mantissa
					;       = argument / 2.0**exponent,
					; st(1) = exponent
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa
	push	eax
	fistp	dword ptr [esp]		; [esp] = exponent,
					; st(0) = mantissa
	pop	eax			; eax = exponent
	ret
ilogb	endp
	end
        machine e= 0x1.5BF0A8B145769p+1 = 2.7182818284590452 is 0x1.5355FB8AC404Ep−54 = 0.7228234458646251e−16 greater than the
exactvalue of e.
machine pi= 0x1.921FB54442D18p+1 = 3.1415926535897932 is 0x1.1A62633145C07p−53 = 1.2246467991473532e−16 greater than the
exactvalue of π
 π = 3.14159265358979324
                 = 0x1.921FB54442D18p+1
        
remainder = 0x1'FFFFFFFFFFFFF'A61D414728C8B'C4F533
                      = quadrant 1, 0x0.0000000000000'59E2BEB8D7374'3B0ACD
                      = 0.00000000000000007796343665038750893128850032303923134435791966849159349864407171054711716273732946547170286066830158233642578125
                      = 7.79634366503875089e−17
                      = 0x1.678AFAE35CDD1p−54
        
reduced = 1.2246467991473532e−16
        
 6381956970095103 × 2797 =
        
0x16AC5B262CA1FF × 2797 =
        
0x1.6AC5B262CA1FFp+849 =
        
5.319372648326541416707296656673541083813475…e+255
        
is the binary64 that is closest to a multiple of
            π/2 ???
        
remainder = quadrant 1, +2.983942503748065…e−19=0x1.604820E0811AA'802p−62
        
reduced = 4.68716592425462761112…e−19
        
cos(0x1.0p+120) = −0.92587902285483786730386176410741494673083320992866…
cos(2) = −0.41614683654714238699756822950076218976600077107554…
sin(22) = −0.00885130929040387592169025681577233246328920395133256644233083529808955201463… 22 = π × 7.002817496…
sin(1.0e+22) = −0.8522008497671888017727…
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#if 0
double π = 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651320158099543173816435024903010475699058520066023404996466884655031174235631524875783834033912401662638819210406589597660857119182803827759368407141697525266952615084492003288375582792204376643164609551001015890514555563284293526749024749733507633209228515625;
#else
double π = 0x3.243F6A8885A308D313198A2E03707344A4093822299F31D0082EFA98EC4E6C89452821E638D01377BE5466CF34E9p+0;
#endif
#ifdef _MSC_VER
#pragma intrinsic(_umul128)
#else
static inline
unsigned long long _umul128(unsigned long long x, unsigned long long y, unsigned long long *z)
{
#ifdef __amd64__
    uint64_t l;
    __asm__ ("mulq\t%3"
            :"=a" (l), "=d" (*z)
            :"%0" (y), "rm" (x)
            :"cc");
    return l;
#else
    __uint128_t p = (__uint128_t) x * y;
    *z = p >> 64;
    return p;
#endif
}
#endif
int clzll(unsigned long long x);
double fabs(double x);
double fma(double x, double y, double z);
double frexp(double x, int *z);
double ldexp(double x, int z);
double rint(double x);
int signbit(double x);
// reduce argument to interval [-π/4, π/4] and its quadrant
double __quarter_pi(double argument, int *quadrant)
{
    double z = fabs(argument);
    if (z < 262144.0) {
        if (z < 0.78539816339744831) {             // π/4
            *quadrant = -signbit(argument);
            return argument;
        }
        // Cody-Waite argument reduction: 127 bits of π/2
#if _MSC_VER < 1914
        z = rint(argument * 0.63661977236758134);  // 2/π
        argument -= z * 1.5707963267923332;        // high part of π/2
        argument -= z * 5.1266883031679116e-12;    // middle part of π/2
        argument -= z * 2.1125998133974855e-23;    // low part of π/2
#else
        z = rint(argument * 0x1.45F306DC9C883p-1); // 2/π
        argument -= z * 0x1.921FB54440000p-0;      // high part of π/2
        argument -= z * 0x1.68C234C4C0000p-38;     // middle part of π/2
        argument -= z * 0x1.98A2E03707345p-76;     // low part of π/2
#endif
        *quadrant = (int) z;
        return argument;
    }
    // Payne-Hanek argument reduction:
    // 1216 bits of 2/π in 2.1214 fixed-point format
    static const unsigned long long pi2inverse[19] = {0x28BE60DB9391054A,
                                                      0x7F09D5F47D4D3770,
                                                      0x36D8A5664F10E410,
                                                      0x7F9458EAF7AEF158,
                                                      0x6DC91B8E909374B8,
                                                      0x01924BBA82746487,
                                                      0x3F877AC72C4A69CF,
                                                      0xBA208D7D4BAED121,
                                                      0x3A671C09AD17DF90,
                                                      0x4E64758E60D4CE7D,
                                                      0x272117E2EF7E4A0E,
                                                      0xC7FE25FFF7816603,
                                                      0xFBCBC462D6829B47,
                                                      0xDB4D9FB3C9F2C26D,
                                                      0xD3D18FD9A797FA8B,
                                                      0x5D49EEB1FAF97C5E,
                                                      0xCF41CE7DE294A4BA,
                                                      0x9AFED7EC47E35742,
                                                      0x1580CC11BF1EDAEA};
    unsigned long long high, mid, low, tmp, ull;
    unsigned index, shift;
    int exponent;
    double head, tail;
    // get fraction of |argument| in 0.64 fixed-point format and its
    // exponent
#if 0
    ull = (unsigned long long) (ldexp(frexp(z, &exponent), 64));
#else
    ull = (unsigned long long) (frexp(z, &exponent) * 0x1.0p+64);
#endif
    // get 192 bits of 2/π, determined by exponent of argument, in
    // 2.190 fixed-point format
    index = exponent >> 6;
    shift = exponent & 63; // (1 << 6) - 1
    high = pi2inverse[index];
    mid = pi2inverse[index + 1];
    low = pi2inverse[index + 2];
    tmp = pi2inverse[index + 3];
    if (shift != 0) {
        high = (high << shift) | (mid >> (64 - shift));
        mid = (mid << shift) | (low >> (64 - shift));
        low = (low << shift) | (tmp >> (64 - shift));
    }
    // compute fraction of |argument| * 2/π in 2.190 fixed-point format
    low = _umul128(ull, low, &tmp);
    low = tmp;
    mid = _umul128(ull, mid, &tmp) + low;
    tmp += mid < low;
    mid = tmp;
    high = _umul128(ull, high, &tmp) + mid;
    tmp += high < mid;
    // convert fraction of |argument| * 2/π in 2.190 fixed-point format
    // into fraction of ±argument * 2/π in 0.192 fixed-point format,
    // shifting fraction of |argument| * 2/π from interval [0.0, 1.0]
    // to fraction of ±argument * 2/π in interval [-0.5, 0.5],
    // set quadrant to integer part of |argument| * 2/π modulo 4,
    // and increment it when fraction of argument * 2/π is negative
    // (sign change is equivalent to subtraction of 1.0)
    *quadrant = (int) (tmp >> 62);
    tmp <<= 2;
    tmp |= high >> 62;
    high <<= 2;
    high |= mid >> 62;
    mid <<= 2;
    *quadrant += (long long) tmp < 0;
    // if argument is negative, complement fraction and mirror quadrant
    if (signbit(argument)) {
        mid = 0 - mid;
        high = 0 - high - (0 < mid);
        tmp = 0 - tmp - (0 < high);
        *quadrant = -*quadrant;
    }
    // convert fraction of argument * 2/π from 0.192 fixed-point format
    // into (intermediate) double-double format; complement tail part when
    // head part is negative, adjusting head part on overflow, i.e. when
    // tail part is 0x8000000000000000
    shift = clzll(llabs(tmp));
    if (shift > 11) {
        tmp <<= shift - 11;
        tmp |= high >> (64 - shift + 11);
        high <<= shift - 11;
        tmp |= high >> (64 - shift + 11);
        high <<= shift - 11;
    } else if (shift < 11) {
        high >>= 11 - shift;
        high |= tmp << (64 - 11 + shift);
        (long long) tmp >>= 11 - shift;
    }
    if ((long long) tmp < 0) {
        high = 0 - high;
        tmp += high == 0x8000000000000000;
    }
    head = ldexp((double) (long long) tmp, shift - 11 - 64);
    tail = ldexp((double) (long long) high, shift - 11 - 128);
    // return remainder of argument / (π/2)
#ifdef FP_FAST_FMA
    double x = tail * 1.5707963267948966;
    double y = head * 6.123233995736766e-17;
    z = fma(head * 6.123233995736766e-17, -y)
      + fma(tail, 1.5707963267948966, -x)
      + tail * 6.123233995736766e-17;
    return fma(head, 1.5707963267948966, x + y + z);
#else
    return (tail * 6.123233995736766e-17
         + (tail * 1.5707963267948966 + head * 6.123233995736766e-17))
         + head * 1.5707963267948966;
#endif
}
static inline
double __sin_cos_core(double reduced, int quadrant)
{
    double square = reduced * reduced;
    double result = 1 & quadrant
                    // polynomial approximation of cosine on [-π/4, π/4]
                  ? ((((((-0x1.908B4EF9A7E2Ep-37 * square
                          +0x1.1EEB7C6903BA2p-29) * square
                          -0x1.27E4FA28F90C6p-22) * square
                          +0x1.A01A019F556D1p-16) * square
                          -0x1.6C16C16C16910p-10) * square
                          +0x1.5555555555555p-5) * square
                          -0.5) * square + 1.0
                    // polynomial approximation of sine on [-π/4, π/4]
                  : (((((+0x1.5E3C6B7EEB28Dp-33 * square
                         -0x1.AE60A561EEAB5p-26) * square
                         +0x1.71DE384036E7Dp-19) * square
                         -0x1.A01A019F1C947p-13) * square
                         +0x1.1111111110EB8p-7) * square
                         -0x1.5555555555555p-3) * square * reduced + reduced;
    return 2 & quadrant ? 0.0 - result : result;
}
// NOTE: 0.0 * x + x yields NaN for x = ±INFINITY
double cos(double x)
{
    int quadrant;
    double reduced = __quarter_pi(0.0 * x + x, &quadrant);
    return __sin_cos_core(reduced, 1 + quadrant);
}
double sin(double x)
{
    int quadrant;
    double reduced = __quarter_pi(0.0 * x + x, &quadrant);
    return __sin_cos_core(reduced, quadrant);
}
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fma(double x, double y, double z);
static inline
double cos_poly(double x)
{
    double s = x * x;
#ifdef FP_FAST_FMA
    double t = -0x1.908B4EF9A7E2Ep-37;
    t = fma(t, s,  0x1.1EEB7C6903BA2p-29);
    t = fma(t, s, -0x1.27E4FA28F90C6p-22);
    t = fma(t, s,  0x1.A01A019F556D1p-16);
    t = fma(t, s, -0x1.6C16C16C16910p-10);
    t = fma(t, s,  0x1.5555555555555p-5);
    t = fma(t, s, -0.5);
    t = fma(t, s,  1.0);
    return t;
#else
#error
#endif
}
static inline
double cot_poly(double x)
{
    double s = x * x;
#ifdef FP_FAST_FMA
    double t = 0x1.2113ADD876256p-35;
    t = fma(t, s, 0x1.D3F62BCB56407p-33);
    t = fma(t, s, 0x1.3722FC10FB082p-29);
    t = fma(t, s, 0x1.7D86D9F6CBA62p-26);
    t = fma(t, s, 0x1.D6DC6DA8B4B97p-23);
    t = fma(t, s, 0x1.228059183E28Cp-19);
    t = fma(t, s, 0x1.66A8F2D1BC68Fp-16);
    t = fma(t, s, 0x1.BBD7793321936p-13);
    t = fma(t, s, 0x1.1566ABC011734p-9);
    t = fma(t, s, 0x1.6C16C16C16C16p-6);
    t = fma(t, s, 0x1.5555555555555p-2);
    return t * s;
#else
#error
#endif
}
static inline
double sin_poly(double x)
{
    double s = x * x;
#ifdef FP_FAST_FMA
#if 1
    double t = 0x1.5E3C6B7EEB28Dp-33;
    t = fma(t, s, -0x1.AE60A561EEAB5p-26);
    t = fma(t, s,  0x1.71DE384036E7Dp-19);
    t = fma(t, s, -0x1.A01A019F1C947p-13);
    t = fma(t, s,  0x1.1111111110EB8p-7);
    t = fma(t, s, -0x1.5555555555555p-3);
    t = fma(t, x, x);
    return t;
#else
    double t = 0x1.5D8E4FD051E03p-33;
    t = fma(t, s, -0x1.AE5E54BFD59F5p-26);
    t = fma(t, s,  0x1.71DE355F53FB7p-19);
    t = fma(t, s, -0x1.A01A019BF2621p-13);
    t = fma(t, s,  0x1.1111111110F75p-7);
    t = fma(t, s, -0x1.5555555555548p-3);
    t = fma(t, x, x);
    return t;
#endif
#else
    return ((((((((-0x1.2622B22D526BEp-57 * s
                   +0x1.94FA618796592p-49) * s
                   -0x1.AE7EA531357BFp-41) * s
                   +0x1.6124601C23966p-33) * s
                   -0x1.AE64567CB5786p-26) * s
                   +0x1.71DE3A5568A50p-19) * s
                   -0x1.A01A01A019FC7p-13) * s
                   +0x1.111111111110Fp-7) * s
                   -0x1.5555555555555p-3) * s * x + x;
#endif
}
static inline
double tan_poly(double x)
{
    double s = x * x;
#ifdef FP_FAST_FMA
    double t = 0x1.5D99C5B37B8FBp-16;
    t = fma(t, s, -0x1.778CB8106DD3Dp-15);
    t = fma(t, s,  0x1.7656FC1431EF6p-14);
    t = fma(t, s, -0x1.B6EA2534187CBp-16);
    t = fma(t, s,  0x1.1DF93999DC111p-13);
    t = fma(t, s,  0x1.D28899E55DABCp-13);
    t = fma(t, s,  0x1.37F87931093E9p-11);
    t = fma(t, s,  0x1.7D5B9D094E180p-10);
    t = fma(t, s,  0x1.D6D92E4DC1EC9p-9);
    t = fma(t, s,  0x1.226E1281741EDp-7);
    t = fma(t, s,  0x1.664F49A7087A7p-6);
    t = fma(t, s,  0x1.BA1BA1B472C71p-5);
    t = fma(t, s,  0x1.111111111823Fp-3);
    t = fma(t, s,  0x1.55555555554F2p-2);
    t = fma(t, s,  1.0);
    return t * x;
#else
    return ((((((((+0x1.5445F555134EDp-12 * s
                   +0x1.269BE400DE3AFp-11) * s
                   +0x1.7EEF631E20B93p-10) * s
                   +0x1.D6C27C371C959p-9) * s
                   +0x1.226E7BFA35090p-7) * s
                   +0x1.664F4729F98E5p-6) * s
                   +0x1.BA1BA1BDCEC06p-5) * s
                   +0x1.111111110E933p-3) * s
                   +0x1.5555555555568p-2) * s * x + x;
#endif
}
        cos() (Circular) Cosine Functioncos()
            returns the (circular) cosine of its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/ff770589.aspx
	.686
	.model	flat, C
	.code
cos	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fcos				; st(0) = cosine of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Lexit			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?
	fstp	st(1)			; st(0) = argument'
	fcos				; st(0) = cosine of argument'
endif
Lexit:
	ret
cos	endp
	end
            Caveat: although the
            FSCALE instruction yields 2×π in
            double-extended (80-bit) precision, and the
            FPREM1 instruction operates
            in double-extended (80-bit) precision too, reduction of arguments
            that are greater than 263 in magnitude to the interval
            (-π, π) looses almost all precision: for example
            0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in
            double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 =
            -1.897517260289773073471397690781259370851330459117889404296875
            instead of 4.68716592425462761112…e−19!
        cot() (Circular) Cotangent Functioncot() returns the (circular) cotangent of
            its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; cot(x) = 1 / tan(x)
;        = cos(x) / sin(x)
	.686
	.model	flat, C
	.code
cot	proc	public			; [esp+4] argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Ldone			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
	ret
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?
	fstp	st(1)			; st(0) = argument'
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument'
endif
Ldone:
	fdivrp	st(1), st(0)		; st(0) = 1.0 / tangent of argument
					;       = cotangent of argument
	ret
cot	endp
	end
            Caveat: although the
            FSCALE instruction yields 2×π in
            double-extended (80-bit) precision, and the
            FPREM1 instruction operates
            in double-extended (80-bit) precision too, reduction of arguments
            that are greater than 263 in magnitude to the interval
            (-π, π) looses almost all precision: for example
            0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in
            double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 =
            -1.897517260289773073471397690781259370851330459117889404296875
            instead of 4.68716592425462761112…e−19!
        sin() (Circular) Sine Functionsin()
            returns the (circular) sine of its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/ff770597.aspx
	.686
	.model	flat, C
	.code
sin	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fsin				; st(0) = sine of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Lexit			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?
	fstp	st(1)			; st(0) = argument'
	fsin				; st(0) = sine of argument'
endif
Lexit:
	ret
sin	endp
	end
            Caveat: although the
            FSCALE instruction yields 2×π in
            double-extended (80-bit) precision, and the
            FPREM1 instruction operates
            in double-extended (80-bit) precision too, reduction of arguments
            that are greater than 263 in magnitude to the interval
            (-π, π) looses almost all precision: for example
            0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in
            double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 =
            -1.897517260289773073471397690781259370851330459117889404296875
            instead of 4.68716592425462761112…e−19!
        tan() (Circular) Tangent Functiontan()
            returns the (circular) tangent of its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/ff770595.aspx
	.686
	.model	flat, C
	.code
tan	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Ldone			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
	ret
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?
	fstp	st(1)			; st(0) = argument'
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument'
endif
Ldone:
	fstp	st(0)			; st(0) = tangent of argument
	ret
tan	endp
	end
            Caveat: although the
            FSCALE instruction yields 2×π in
            double-extended (80-bit) precision, and the
            FPREM1 instruction operates
            in double-extended (80-bit) precision too, reduction of arguments
            that are greater than 263 in magnitude to the interval
            (-π, π) looses almost all precision: for example
            0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in
            double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 =
            -1.897517260289773073471397690781259370851330459117889404296875
            instead of 4.68716592425462761112…e−19!
        acos() Arc Cosine Functionacos()
            returns the (principal) arc cosine of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
int signbit(double x);
double sqrt(double x);
static inline
double asin_poly(double t)
{
    // for -0.5 <= t <= 0.5,
    // a minimax polynomial of degree 12 in t**2 approximates asin(t)
    double s = t * t;
    return (((((((((((+0x1.02FF4C7428A47p-5 * s
                      -0x1.032E75CCD4AE8p-6) * s
                      +0x1.3C0E0817E9742p-6) * s
                      +0x1.B0EF96B727E7Ep-8) * s
                      +0x1.8E3FD48D0FB6Fp-7) * s
                      +0x1.C70DDF81249FCp-7) * s
                      +0x1.1C6B5042EC6B2p-6) * s
                      +0x1.6E89F8578B64Ep-6) * s
                      +0x1.F1C72C5FD95BAp-6) * s
                      +0x1.6DB6DB407C2B3p-5) * s
                      +0x1.3333333375CD0p-4) * s
                      +0x1.55555555552F4p-3) * s * t + t;
}
double acos(double x)
{
    // for -1.0 <= x < -0.5, arccos(x) = (π/2 - asin_poly(sqrt((1 + x) / 2))) * 2
    // for -0.5 <= x <= 0.5, arccos(x) = π/2 - asin_poly(x)
    // for  0.5 <  x <= 1.0, arccos(x) = asin_poly(sqrt((1 - x) / 2)) * 2
    double z = fabs(x);
    int i = z > 0.5;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
    if (x != x)
        return INDEFINITE;
    if (x == 1.0)
        return 0.0;
    if (x == 0.0)
        return 1.57079632679489662;      // π/2
    if (x == -1.0)
        return 3.14159265358979324;      // π
    if (z > 1.0)
        return INDEFINITE;
#endif
    if (i)
        z = sqrt(0.5 - 0.5 * z);
    z = asin_poly(z);
    z = copysign(z, x);
    if (i) {                             // |x| > 0.5?
        if (signbit(x)) {                // x < -0.5?
#ifdef FP_FAST_FMA
            z = fma(1.8656436928143307, 0.8419594442630920, z);
#else
            z += -0x1.5777A5CF72CECp-18; // tail of π/2
            z += 0x1.921FC00000000p-0;   // head of π/2
#endif
        }
        z += z;
    } else {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;
        z += 0x1.921FC00000000p-0;
#endif
    }
    return z;
}
            Note: used within the
            fma()
            function, the product
            1.8656436928143307 × 1.6839188885261840 = 0x1.DD9AD336A05p+0 × 0x1.AF154EEB562D6p+0 = 0x1.921FB54442D18469898CC517p+1
            (courtesy of Norbert Juffa and Tor Myklebust) provides 104 bits of
            π, equivalent to
            31 decimal places.
         For floating-point numbers in the
            IEEE 754
            32-bit binary single-precision format, the product
            1.8663789 × 1.6832556 = 0x1.DDCB02p+0F × 0x1.AEE9D6p+0F = 1.921FB54442D6p+1
            provides 45 bits of π, equivalent to 13 decimal places, when
            used within the
            fmaf()
            function.
        
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
acos:
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (0.5 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?
	sbb	eax, eax		# eax = (0.5 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 0.5?
.Lbig:
	mulsd	xmm2, xmm1		# xmm2 = 0.5 * |argument|
	subsd	xmm1, xmm2		# xmm1 = 0.5 - 0.5 * |argument|
	sqrtsd	xmm2, xmm1		# xmm2 = sqrt(0.5 - 0.5 * |argument|)
					#      = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
	mov	rcx, 0x3FA02FF4C7428A47
	movq	xmm3, rcx		# xmm3 = 0x1.02FF4C7428A47p-5
					#      = 0.031615876506539346
	mulsd	xmm3, xmm2
	mov	rdx, 0xBF9032E75CCD4AE8
	movq	xmm4, rdx		# xmm4 = -0x1.032E75CCD4AE8p-6
					#      = -0.015819182433299966
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F93C0E0817E9742
	movq	xmm3, rcx		# xmm3 = 0x1.3C0E0817E9742p-6
					#      = 0.019290454772679107
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F7B0EF96B727E7E
	movq	xmm4, rdx		# xmm4 = 0x1.B0EF96B727E7Ep-8
					#      = 0.006606077476277171
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F88E3FD48D0FB6F
	movq	xmm3, rcx		# xmm3 = 0x1.8E3FD48D0FB6Fp-7
					#      = 0.012153605255773773
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F8C70DDF81249FC
	movq	xmm4, rdx		# xmm4 = 0x1.C70DDF81249FCp-7
					#      = 0.013887151845016092
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F91C6B5042EC6B2
	movq	xmm3, rcx		# xmm3 = 0x1.1C6B5042EC6B2p-6
					#      = 0.017359569912236146
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F96E89F8578B64E
	movq	xmm4, rdx		# xmm4 = 0x1.6E89F8578B64Ep-6
					#      = 0.022371761819320483
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F9F1C72C5FD95BA
	movq	xmm3, rcx		# xmm3 = 0x1.F1C72C5FD95BAp-6
					#      = 0.030381959280381322
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA6DB6DB407C2B3
	movq	xmm4, rdx		# xmm4 = 0x1.6DB6DB407C2B3p-5
					#      = 0.044642856813771024
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3FB3333333375CD0
	movq	xmm3, rcx		# xmm3 = 0x1.3333333375CD0p-4
					#      = 0.075000000003785816
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC55555555552F4
	movq	xmm4, rdx		# xmm4 = 0x1.55555555552F4p-3
					#      = 0.166666666666649754
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
.if 0
	mov	rcx, 0x3FF0000000000000
	movq	xmm3, rcx		# xmm3 = 0x1.0p+0
					#      = 1.0
	addsd	xmm3, xmm4
	mulsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.else
	mulsd	xmm4, xmm1
	addsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.endif
	orpd	xmm0, xmm1		# xmm0 = polynomial(argument)
	test	eax, eax
	jz	.Lsmall			# |argument| <= 0.5?
	movmskpd eax, xmm0		# eax = (argument & -0.0) ? 0b?1 : 0b?0
	shr	eax, 1
	jnc	.Lpositive		# argument > 0.5?
.Lnegative:
	mov	rdx, 0x3FF921FC00000000
	movq	xmm1, rdx		# xmm1 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	addsd	xmm1, xmm0
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm0, rcx		# xmm0 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm0, xmm1		# xmm0 = pi/2 - polynomial(argument)
.Lpositive:
	addsd	xmm0, xmm0		# xmm0 = acos(argument)
	ret
.Lsmall:
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm1, rcx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	subsd	xmm1, xmm0
	mov	rdx, 0x3FF921FC00000000
	movq	xmm0, rdx		# xmm1 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	addsd	xmm0, xmm1		# xmm0 = pi/2 - polynomial(argument)
					#      = acos(argument)
	ret
.size	acos, .-acos
.type	acos, @function
.global	acos
.end
            The following implementation for the i387
            FPU uses its
            FPATAN instruction and the
            formula
            arccos(argument) = arctan2(argument, sqrt(1 − argument²))
            based upon the identities
            cos(result) = argument,
            sin²(result) + cos²(result) = 1
            and
            tan(result) = sin(result) / cos(result):
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/bztkwykh.aspx
; arccos(x) = arctan(sqrt((1 + x) * (1 - x)) / x)
;           = arctan(sqrt(1 - x**2) / x)
;           = arctan2(x, sqrt(1 - x**2))
;           = arctan2(x, sqrt((1 + x) * (1 - x)))
	.686
	.model	flat, C
	.code
acos	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fadd	st(0), st(1)		; st(0) = 1.0 + argument,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fsub	st(0), st(2)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fmulp	st(1), st(0)		; st(0) = (1.0 - argument) * (1.0 + argument)
					;       = 1.0 - argument**2,
					; st(1) = argument
else
	fld	st(0)			; st(0) = st(1) = argument
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument
	fsubrp	st(1), st(0)		; st(0) = 1.0 - argument**2,
					; st(1) = argument
endif
	fsqrt				; st(0) = square root of (1.0 - argument**2),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = square root of (1.0 - argument**2)
	fpatan				; st(0) = inverse circular cosine of argument
	ret
acos	endp
	end
        acot() Arc Cotangent Functionacot() returns the (principal) arc
            cotangent of its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; arccot(x) = arctan(1 / x)
;           = arctan2(x, 1)
	.686
	.model	flat, C
	.code
acot	proc	public			; [esp+4] = argument
	fld1				; st(0) = 1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0
	fpatan				; st(0) = inverse circular tangent of (1.0 / argument)
					;       = inverse circular cotangent of argument
	ret
acot	endp
	end
        acot2() Arc Cotangent Functionacot2() returns the (principal) arc
            cotangent of the quotient of its arguments.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; arccot2(y, x) = arctan2(x, y)
	.686
	.model	flat, C
	.code
acot2	proc	public			; [esp+12] = denominator
					; [esp+4] = numerator
	fld	real8 ptr [esp+12]	; st(0) = denominator
	fld	real8 ptr [esp+4]	; st(0) = numerator,
					; st(1) = denominator
	fpatan				; st(0) = inverse circular tangent of (denominator / numerator)
					;       = inverse circular cotangent of (numerator / denominator)
	ret
acot2	endp
	end
        asin() Arc Sine Functionasin()
            returns the (principal) arc sine of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
double sqrt(double x);
static inline
double asin_poly(double t)
{
    // for -0.5 <= t <= 0.5,
    // a minimax polynomial of degree 12 in t**2 approximates asin(t)
    double s = t * t;
    return (((((((((((+0x1.02FF4C7428A47p-5 * s
                      -0x1.032E75CCD4AE8p-6) * s
                      +0x1.3C0E0817E9742p-6) * s
                      +0x1.B0EF96B727E7Ep-8) * s
                      +0x1.8E3FD48D0FB6Fp-7) * s
                      +0x1.C70DDF81249FCp-7) * s
                      +0x1.1C6B5042EC6B2p-6) * s
                      +0x1.6E89F8578B64Ep-6) * s
                      +0x1.F1C72C5FD95BAp-6) * s
                      +0x1.6DB6DB407C2B3p-5) * s
                      +0x1.3333333375CD0p-4) * s
                      +0x1.55555555552F4p-3) * s * t + t;
}
double asin(double x)
{
    // for -1.0 <= x < -0.5, arcsin(x) = -π/2 + asin_poly(sqrt((1 + x) / 2)) * 2
    // for -0.5 <= x <= 0.5, arcsin(x) = asin_poly(x)
    // for  0.5 <  x <= 1.0, arcsin(x) =  π/2 - asin_poly(sqrt((1 - x) / 2)) * 2
    double z = fabs(x);
    int i = z > 0.5;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
    if (x != x)
        return INDEFINITE;
    if (x == 0.0)
        return x;
    if (z == 1.0)
        return copysign(1.57079632679489662, x); // ±π/2
    if (z > 1.0)
        return INDEFINITE;
#endif
    if (i)
        z = sqrt(0.5 - 0.5 * z);
    z = asin_poly(z);
    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -2.0 * z);
#else
        z = 0x1.921FC00000000p-0 - (z + z);      // head of π/2
        z += -0x1.5777A5CF72CECp-18;             // tail of π/2
#endif
    }
    return copysign(z, x);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
asin:
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (0.5 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?
	sbb	eax, eax		# eax = (0.5 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 0.5?
.Lbig:
	mulsd	xmm2, xmm1		# xmm2 = 0.5 * |argument|
	subsd	xmm1, xmm2		# xmm1 = 0.5 - 0.5 * |argument|
	sqrtsd	xmm2, xmm1		# xmm2 = sqrt(0.5 - 0.5 * |argument|)
					#      = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
	mov	rcx, 0x3FA02FF4C7428A47
	movq	xmm3, rcx		# xmm3 = 0x1.02FF4C7428A47p-5
					#      = 0.031615876506539346
	mulsd	xmm3, xmm2
	mov	rdx, 0xBF9032E75CCD4AE8
	movq	xmm4, rdx		# xmm4 = -0x1.032E75CCD4AE8p-6
					#      = -0.015819182433299966
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F93C0E0817E9742
	movq	xmm3, rcx		# xmm3 = 0x1.3C0E0817E9742p-6
					#      = 0.019290454772679107
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F7B0EF96B727E7E
	movq	xmm4, rdx		# xmm4 = 0x1.B0EF96B727E7Ep-8
					#      = 0.006606077476277171
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F88E3FD48D0FB6F
	movq	xmm3, rcx		# xmm3 = 0x1.8E3FD48D0FB6Fp-7
					#      = 0.012153605255773773
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F8C70DDF81249FC
	movq	xmm4, rdx		# xmm4 = 0x1.C70DDF81249FCp-7
					#      = 0.013887151845016092
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F91C6B5042EC6B2
	movq	xmm3, rcx		# xmm3 = 0x1.1C6B5042EC6B2p-6
					#      = 0.017359569912236146
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F96E89F8578B64E
	movq	xmm4, rdx		# xmm4 = 0x1.6E89F8578B64Ep-6
					#      = 0.022371761819320483
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F9F1C72C5FD95BA
	movq	xmm3, rcx		# xmm3 = 0x1.F1C72C5FD95BAp-6
					#      = 0.030381959280381322
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA6DB6DB407C2B3
	movq	xmm4, rdx		# xmm4 = 0x1.6DB6DB407C2B3p-5
					#      = 0.044642856813771024
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3FB3333333375CD0
	movq	xmm3, rcx		# xmm3 = 0x1.3333333375CD0p-4
					#      = 0.075000000003785816
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC55555555552F4
	movq	xmm4, rdx		# xmm4 = 0x1.55555555552F4p-3
					#      = 0.166666666666649754
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
.if 0
	mov	rcx, 0x3FF0000000000000
	movq	xmm3, rcx		# xmm3 = 0x1.0p+0
					#      = 1.0
	addsd	xmm3, xmm4
	mulsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.else
	mulsd	xmm4, xmm1
	addsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.endif
	test	eax, eax
	jz	.Lsmall			# |argument| <= 0.5?
	addsd	xmm1, xmm1		# xmm1 = 2.0 * polynomial(argument')
	mov	rcx, 0x3FF921FC00000000
	movq	xmm2, rcx		# xmm2 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	subsd	xmm2, xmm1
	mov	rdx, 0xBEA5777A5CF72CEC
	movq	xmm1, rdx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm1, xmm2		# xmm1 = pi/2 - 2.0 * polynomial(argument')
.Lsmall:
	orpd	xmm0, xmm1		# xmm0 = polynomial(argument)
					#      = asin(argument)
	ret
.size	asin, .-asin
.type	asin, @function
.global	asin
.end
            The following implementation for the i387
            FPU uses its
            FPATAN instruction and the
            formula
            arcsin(argument) = arctan2(sqrt(1 − argument²), argument)
            based upon the identities
            sin(result) = argument,
            sin²(result) + cos²(result) = 1
            and
            tan(result) = sin(result) / cos(result):
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/txk32e70.aspx
; arcsin(x) = arctan(x / sqrt((1 + x) * (1 - x)))
;           = arctan(x / sqrt(1 - x**2))
;           = arctan2(sqrt(1 - x**2), x)
;           = arctan2(sqrt((1 + x) * (1 - x)), x)
	.686
	.model	flat, C
	.code
asin	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fadd	st(0), st(1)		; st(0) = 1.0 + argument,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fsub	st(0), st(2)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fmulp	st(1), st(0)		; st(0) = (1.0 - argument) * (1.0 + argument)
					;       = 1.0 - argument**2,
					; st(1) = argument
else
	fld	st(0)			; st(0) = st(1) = argument
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument
	fsubrp	st(1), st(0)		; st(0) = 1.0 - argument**2,
					; st(1) = argument
endif
	fsqrt				; st(0) = square root of (1.0 - argument**2),
					; st(1) = argument
	fpatan				; st(0) = inverse circular sine of argument
	ret
asin	endp
	end
        atan() Arc Tangent Functionatan()
            returns the (principal) arc tangent of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
int signbit(double x);
static inline
double atan_poly(double t)
{
    // for -1.0 <= t <= 1.0,
    // a minimax polynomial of degree 19 in t**2 approximates atan(t)
    double s = t * t;
#ifdef FP_FAST_FMA
    double r = -0x1.53E1D2A25FF34p-16;
    r = fma(r, s,  0x1.D3B63DBB65AF4p-13);
    r = fma(r, s, -0x1.312788DDE0801p-10);
    r = fma(r, s,  0x1.F9690C82492DBp-9);
    r = fma(r, s, -0x1.2CF5AABC7CEF3p-7);
    r = fma(r, s,  0x1.162B0B2A3BFCEp-6);
    r = fma(r, s, -0x1.A7256FEB6FC5Cp-6);
    r = fma(r, s,  0x1.171560CE4A483p-5);
    r = fma(r, s, -0x1.4F44D841450E1p-5);
    r = fma(r, s,  0x1.7EE3D3F36BB94p-5);
    r = fma(r, s, -0x1.AD32AE04A9FD1p-5);
    r = fma(r, s,  0x1.E17813D66954Fp-5);
    r = fma(r, s, -0x1.11089CA9A5BCDp-4);
    r = fma(r, s,  0x1.3B12B2DB51738p-4);
    r = fma(r, s, -0x1.745D022F8DC5Cp-4);
    r = fma(r, s,  0x1.C71C709DFE927p-4);
    r = fma(r, s, -0x1.2492491FA1744p-3);
    r = fma(r, s,  0x1.99999999840D2p-3);
    r = fma(r, s, -0x1.555555555544Cp-2);
    r = fma(r, s,  1.0);
    return r * t;
#else
    return ((((((((((((((((((-0x1.3CBF44A88555Fp-16 * s
                             +0x1.B81666EB938AFp-13) * s
                             -0x1.21F657F3915DAp-10) * s
                             +0x1.E5005F4C78C20p-9) * s
                             -0x1.2399E74A75E56p-7) * s
                             +0x1.0FF6A2A0D2286p-6) * s
                             -0x1.A1006DE22CDACp-6) * s
                             +0x1.14C4D24651F2Ep-5) * s
                             -0x1.4DEE09915F638p-5) * s
                             +0x1.7E4B31D8A55AEp-5) * s
                             -0x1.ACFE938E04FCAp-5) * s
                             +0x1.E16A933B73622p-5) * s
                             -0x1.11074E45F93E0p-4) * s
                             +0x1.3B1283C0CA0B1p-4) * s
                             -0x1.745CFD878FEE8p-4) * s
                             +0x1.C71C704FB4F9Fp-4) * s
                             -0x1.2492491E100BBp-3) * s
                             +0x1.999999997B9DDp-3) * s
                             -0x1.55555555553C5p-2) * s * t + t;
#endif
}
double atan(double x)
{
    // with arctan(-x)     = -arctan(x),
    //      arctan(1 / x)  = π/2 - arctan(x)
    // and  arctan(1 / -x) = -π/2 - arctan(x),
    // for       x < -1, arctan(x) = -π/2 - atan_poly(1 / x),
    // for -1 <= x <= 1, arctan(x) = atan_poly(x),
    // for  1 <  x,      arctan(x) =  π/2 - atan_poly(1 / x)
    double z = fabs(x);
    int i = z > 1.0;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
    if (x != x)
        return INDEFINITE;
    if (z == INFINITY)
        return copysign(1.57079632679489662, x); // π/2
    if (x == 0.0)
        return x;
#endif
    if (i)
        z = 1.0 / z;
    z = atan_poly(z);
    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;          // tail of π/2
        z += 0x1.921FC00000000p-0;               // head of π/2
#endif
    }
    return copysign(z, x);
}
double atan2(double y, double x)
{
    double z;
#if 0
    if (fabs(x) > fabs(y))
        z = atan(y / x);
    else {
        z = atan(x / y);
        z = copysign(1.57079632679489662, z) - z;
    }
    if (signbit(x))
        z += copysign(3.14159265358979324, y);
#else
    int i;
    if (x == 0.0) {
        if (y > 0.0)
            return 1.57079632679489662;  // π/2
        if (y < 0.0)
            return -1.57079632679489662; // -π/2
        return signbit(x) ? copysign(3.14159265358979324, y) : y;
    }
    y = fabs(y);
    z = fabs(x);
    i = z < y;
    if (i)
        z /= y;
    else
        z = y / z;
    z = atan_poly(z);
    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;  // tail of π/2
        z += 0x1.921FC00000000p-0;       // head of π/2
#endif
    }
    if (signbit(x)) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, -1.6839188885261840, z);
#else
        z -= 3.14159265358979324;        // π
#endif
        if (y == 0.0)
            z = -z;
    }
    return z;
}
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
atan:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (1.0 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?
	sbb	eax, eax		# eax = (1.0 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 1.0?
.Lbig:
	divsd	xmm1, xmm2		# xmm1 = 1.0 / |argument|
	movsd	xmm2, xmm1		# xmm2 = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
.ifdef ALTERNATE
	mov	rcx, 0xBEF53E1D2A25FF34
	movq	xmm3, rcx		# xmm3 = -0x1.53E1D2A25FF34p-16
					#      = -2.0258553044438107e-5
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F2D3B63DBB65AF4
	movq	xmm4, rdx		# xmm4 = 0x1.D3B63DBB65AF4p-13
					#      = 2.2302240345758279e-4
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF5312788DDE0801
	movq	xmm3, rcx		# xmm3 = -0x1.312788DDE0801p-10
					#      = -1.1640717779930478e-3
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F6F9690C82492DB
	movq	xmm4, rdx		# xmm4 = 0x1.F9690C82492DBp-9
					#      = 3.8559749383629666e-3
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF82CF5AABC7CEF3
	movq	xmm3, rcx		# xmm3 = -0x1.2CF5AABC7CEF3p-7
					#      = -9.1845592187165034e-3
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F9162B0B2A3BFCE
	movq	xmm4, rdx		# xmm4 = 0x1.162B0B2A3BFCEp-6
					#      = 1.6978035834597276e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF9A7256FEB6FC5C
	movq	xmm3, rcx		# xmm3 = -0x1.A7256FEB6FC5Cp-6
					#      = -2.5826796814495942e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA171560CE4A483
	movq	xmm4, rdx		# xmm4 = 0x1.171560CE4A483p-5
					#      = 3.4067811082715081e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFA4F44D841450E1
	movq	xmm3, rcx		# xmm3 = -0x1.4F44D841450E1p-5
					#      = -4.0926382420509951e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA7EE3D3F36BB94
	movq	xmm4, rdx		# xmm4 = 0x1.7EE3D3F36BB94p-5
					#      = 4.6739496199157987e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFAAD32AE04A9FD1
	movq	xmm3, rcx		# xmm3 = -0x1.AD32AE04A9FD1p-5
					#      = -5.2392330054601317e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FAE17813D66954F
	movq	xmm4, rdx		# xmm4 = 0x1.E17813D66954Fp-5
					#      = 5.8773077721790849e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB11089CA9A5BCD
	movq	xmm3, rcx		# xmm3 = -0x1.11089CA9A5BCDp-4
					#      = -6.6658603633512573e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FB3B12B2DB51738
	movq	xmm4, rdx		# xmm4 = 0x1.3B12B2DB51738p-4
					#      = 7.6922129305867837e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB745D022F8DC5C
	movq	xmm3, rcx		# xmm3 = -0x1.745D022F8DC5Cp-4
					#      = -9.0909012354005225e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FBC71C709DFE927
	movq	xmm4, rdx		# xmm4 = 0x1.C71C709DFE927p-4
					#      = 0.11111110678749424
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFC2492491FA1744
	movq	xmm3, rcx		# xmm3 = -0x1.2492491FA1744p-3
					#      = -0.14285714271334815
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC99999999840D2
	movq	xmm4, rdx		# xmm4 = 0x1.99999999840D2p-3
					#      = 0.19999999999755019
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFD555555555544C
	movq	xmm3, rcx		# xmm3 = -0x1.555555555544Cp-2
					#      = -0.3333333333333186
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
.else
	mov	rcx, 0xBEF3CBF44A88555F
	movq	xmm3, rcx		# xmm3 = -0x1.3CBF44A88555Fp-16
					#      = -1.8879600846307350e-5
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F2B81666EB938AF
	movq	xmm4, rdx		# xmm4 = 0x1.B81666EB938AFp-13
					#      = 2.0985007664581698e-4
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF521F657F3915DA
	movq	xmm3, rcx		# xmm3 = -0x1.21F657F3915DAp-10
					#      = -0.0011061183148667248
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F6E5005F4C78C20
	movq	xmm4, rdx		# xmm4 = 0x1.E5005F4C78C20p-9
					#      = 0.003700267441887131
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF82399E74A75E56
	movq	xmm3, rcx		# xmm3 = -0x1.2399E74A75E56p-7
					#      = -0.008898961958876555
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F90FF6A2A0D2286
	movq	xmm4, rdx		# xmm4 = 0x1.0FF6A2A0D2286p-6
					#      = 0.016599329773529202
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF9A1006DE22CDAC
	movq	xmm3, rcx		# xmm3 = -0x1.A1006DE22CDACp-6
					#      = -0.025451762493231264
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA14C4D24651F2E
	movq	xmm4, rdx		# xmm4 = 0x1.14C4D24651F2Ep-5
					#      = 0.033785258000135307
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFA4DEE09915F638
	movq	xmm3, rcx		# xmm3 = -0x1.4DEE09915F638p-5
					#      = -0.040762919127683650
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA7E4B31D8A55AE
	movq	xmm4, rdx		# xmm4 = 0x1.7E4B31D8A55AEp-5
					#      = 0.046666715007784063
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFAACFE938E04FCA
	movq	xmm3, rcx		# xmm3 = -0x1.ACFE938E04FCAp-5
					#      = -0.052367485230348246
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FAE16A933B73622
	movq	xmm4, rdx		# xmm4 = 0x1.E16A933B73622p-5
					#      = 0.058766639292667358
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB11074E45F93E0
	movq	xmm3, rcx		# xmm3 = -0x1.11074E45F93E0p-4
					#      = -0.066657357936108053
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FB3B1283C0CA0B1
	movq	xmm4, rdx		# xmm4 = 0x1.3B1283C0CA0B1p-4
					#      = 0.076921953831176962
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB745CFD878FEE8
	movq	xmm3, rcx		# xmm3 = -0x1.745CFD878FEE8p-4
					#      = -0.090908995008245008
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FBC71C704FB4F9F
	movq	xmm4, rdx		# xmm4 = 0x1.C71C704FB4F9Fp-4
					#      = 0.111111105648261418
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFC2492491E100BB
	movq	xmm3, rcx		# xmm3 = -0x1.2492491E100BBp-3
					#      = -0.142857142667713294
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC999999997B9DD
	movq	xmm4, rdx		# xmm4 = 0x1.999999997B9DDp-3
					#      = 0.199999999996591266
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFD55555555553C5
	movq	xmm3, rcx		# xmm3 = -0x1.55555555553C5p-2
					#      = -0.333333333333311110
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
.endif # ALTERNATE
.if 0
	mov	rdx, 0x3FF0000000000000
	movq	xmm4, rdx		# xmm4 = 0x1.0p+0
					#      = 1.0
	addsd	xmm4, xmm3
	mulsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.else
	mulsd	xmm3, xmm1
	addsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.endif
	test	eax, eax
	jz	.Lsmall			# |argument| <= 1.0?
	mov	rdx, 0x3FF921FC00000000
	movq	xmm2, rdx		# xmm2 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	subsd	xmm2, xmm1
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm1, rcx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm1, xmm2		# xmm1 = pi/2 - polynomial(argument')
					#      = atan(|argument|)
.Lsmall:
	orpd	xmm0, xmm1		# xmm0 = atan(argument)
	ret
.size	atan, .-atan
.type	atan, @function
.global	atan
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/88c36t42.aspx
; arctan(x) = arctan2(1, x)
	.686
	.model	flat, C
	.code
atan	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fpatan				; st(0) = inverse circular tangent of (argument / 1.0)
	ret
atan	endp
	end
        atan2() Arc Tangent Functionatan2()
            returns the (principal) arc tangent of the quotient of its arguments.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/88c36t42.aspx
	.686
	.model	flat, C
	.code
atan2	proc	public			; [esp+12] = denominator
					; [esp+4] = numerator
	fld	real8 ptr [esp+4]	; st(0) = numerator
	fld	real8 ptr [esp+12]	; st(0) = denominator,
					; st(1) = numerator
	fpatan				; st(0) = inverse circular tangent of (numerator / denominator)
	ret
atan2	endp
	end
        cosh() Hyperbolic Cosine Functionacosh()
            returns the hyperbolic cosine of its argument.
        coth() Hyperbolic Cotangent Functionacosh()
            returns the hyperbolic cotangent of its argument.
        sinh() Hyperbolic Sine Functionacosh()
            returns the hyperbolic sine of its argument.
        tanh() Hyperbolic Tangent Functionacosh()
            returns the hyperbolic tangent of its argument.
        acosh() Area Hyperbolic Cosine Functionacosh()
            returns the inverse hyperbolic cosine of its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# arcosh(x) = log(x + sqrt((x + 1) * (x - 1)))
#           = log(x + sqrt(x**2 - 1))
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
acosh:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0		# xmm2 = argument
	mulsd	xmm0, xmm0		# xmm0 = argument**2
	subsd	xmm0, xmm1		# xmm0 = argument**2 - 1.0
	sqrtsd	xmm0, xmm0		# xmm0 = sqrt(argument**2 - 1.0)
	addsd	xmm0, xmm2		# xmm0 = sqrt(argument**2 - 1.0) + argument
	jmp	log
.size	acosh, .-acosh
.type	acosh, @function
.weak	acosh
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465171.aspx
; arcosh(x) = log(x + sqrt((x + 1) * (x - 1)))
;           = log(x + sqrt(x**2 - 1))
	.686
	.model	flat, C
	.code
acosh	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = argument**2 - 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fsqrt				; st(0) = sqrt(argument**2 - 1.0),
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + sqrt(argument**2 - 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument + sqrt(argument**2 - 1.0))
					;       = inverse hyperbolic cosine of argument
	ret
acosh	endp
	end
        acoth() Area Hyperbolic Cotangent Functionacoth() returns the inverse hyperbolic
            cotangent of its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# arcoth(x) = log((x + 1) / (x - 1)) / 2
#           = log(1 + 2 / (x - 1)) / 2
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
acoth:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0		# xmm2 = argument
	addsd	xmm0, xmm1		# xmm0 = argument + 1.0
	subsd	xmm2, xmm1		# xmm2 = argument - 1.0
	divsd	xmm0, xmm2		# xmm0 = (argument + 1.0) / (argument - 1.0)
	call	log			# xmm0 = log((argument + 1.0) / (argument - 1.0))
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm0, xmm1		# xmm0 = acoth(argument)
	ret
.size	acoth, .-acoth
.type	acoth, @function
.weak	acoth
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; arcoth(x) = log((x + 1) / (x - 1)) / 2
;           = log(1 + 2 / (x - 1)) / 2
;           = log1p(2 / (x - 1)) / 2
	.686
	.model	flat, C
	.code
single	record	sign:1, exponent:8, mantissa:23
bias	equ	1 shl (width exponent - 1) - 1
acoth	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = argument,
					; st(3) = ln(2.0)
	fadd	st(2), st(0)		; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = argument + 1.0,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = argument - 1.0,
					; st(1) = argument + 1.0,
					; st(2) = ln(2.0)
	fdivp	st(1), st(0)		; st(0) = (argument + 1.0) / (argument - 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of ((argument + 1.0) / (argument - 1.0))
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = inverse hyperbolic cotangent of argument
	pop	eax
	ret
acoth	endp
	end
        asinh() Area Hyperbolic Sine Functionasinh()
            returns the inverse hyperbolic sine of its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# arsinh(x) = log(x + sqrt(x**2 + 1))
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
asinh:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0		# xmm2 = argument
	mulsd	xmm0, xmm0		# xmm0 = argument**2
	addsd	xmm0, xmm1		# xmm0 = argument**2 + 1.0
	sqrtsd	xmm0, xmm0		# xmm0 = sqrt(argument**2 + 1.0)
	addsd	xmm0, xmm2		# xmm0 = sqrt(argument**2 + 1.0) + argument
	jmp	log
.size	asinh, .-asinh
.type	asinh, @function
.weak	asinh
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465168.aspx
; arsinh(x) = log(x + sqrt(x**2 + 1))
;           = log1p(x + sqrt(x**2 + 1) - 1)
;           = log1p(x + x**2 / (sqrt(x**2 + 1) + 1))
	.686
	.model	flat, C
	.code
asinh	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument,
					; st(3) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument**2 + 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fsqrt				; st(0) = sqrt(argument**2 + 1.0),
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + sqrt(argument**2 + 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument + sqrt(argument**2 + 1.0))
					;       = inverse hyperbolic sine of argument
	ret
asinh	endp
	end
        atanh() Area Hyperbolic Tangent Functionatanh()
            returns the inverse hyperbolic tangent of its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# artanh(x) = log((1 + x) / (1 - x)) / 2
#           = log(1 + 2 * x / (x - 1)) / 2
.arch	generic64
.code64
.intel_syntax noprefix
.extern	log
.text
					# xmm0 = argument
atanh:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0		# xmm2 = argument
	addsd	xmm0, xmm1		# xmm0 = 1.0 + argument
	subsd	xmm1, xmm2		# xmm1 = 1.0 - argument
	divsd	xmm0, xmm1		# xmm0 = (1.0 + argument) / (1.0 - argument)
	call	log			# xmm0 = log((1.0 + argument) / (1.0 - argument))
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm0, xmm1		# xmm0 = atanh(argument)
	ret
.size	atanh, .-atanh
.type	atanh, @function
.weak	atanh
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn324930.aspx
; artanh(x) = log((1 + x) / (1 - x)) / 2
;           = log(1 + 2 * x / (1 - x)) / 2
;           = log1p(2 * x / (1 - x)) / 2
; artanh(x) = log((1 + x) / (1 - x)) / 2
;           = (log(1 + x) - log(1 - x)) / 2
;           = (log1p(x) - log1p(-x)) / 2
	.686
	.model	flat, C
	.code
single	record	sign:1, exponent:8, mantissa:23
bias	equ	1 shl (width exponent - 1) - 1
atanh	proc	public			; [esp+4] = argument
	fldln2				; st(0) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = 1.0,
					; st(2) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = 1.0,
					; st(3) = ln(2.0)
	fadd	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = 1.0 + argument,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = ln(2.0)
	fdivp	st(1), st(0)		; st(0) = (1.0 + argument) / (1.0 - argument),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of ((1.0 + argument) / (1.0 - argument))
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = inverse hyperbolic tangent of argument
	pop	eax
	ret
atanh	endp
	end
        fmax() Functionfmax()
            returns its other argument if one argument is a
            NaN, else the larger of its
            arguments.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fmax(double left, double right)
{
#ifdef QUIET
    return (left > right) || (left == left) ? left : right == right ? right : right + right;
#else
    return (left > right) || (right != right) ? left : right;
#endif
}
            Note: with the preprocessor macro
            QUIET defined, a signalingNaN is returned as a
quietNaN.
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fmax:
	movsd	xmm2, xmm0		# xmm2 = left
	maxsd	xmm2, xmm1		# xmm2 = (left > right) ? left : right
					#      = (left # right) ? right : max(left, right)
	cmpsd	xmm1, xmm0, 3		# xmm1 = (left # right) ? ~0L : 0L
	andpd	xmm0, xmm1		# xmm0 = (left # right) ? left : 0L
	andnpd	xmm1, xmm2		# xmm1 = (left # right) ? 0L : max(left, right)
	orpd	xmm0, xmm1		# xmm0 = (left # right) ? left : max(left, right)
					#      = fmax(left, right)
	ret
.size	fmax, .-fmax
.type	fmax, @function
.global	fmax
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720717.aspx
	.686
	.model	flat, C
	.code
fmax	proc	public			; [esp+12] = right
					; [esp+4] = left
	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fucomi	st(0), st(0)		; eflags = right ><=# right
	fcmovu	st(0), st(1)		; st(0) = (right # right) ? left : right,
					; st(1) = left
if 0
	fld	st(1)			; st(0) = left,
					; st(1) = (right # right) ? left : right,
					; st(2) = left
	fucomip	st(0), st(1)		; eflags = left ><=# ((right # right) ? left : right),
					; st(0) = (right # right) ? left : right,
					; st(1) = left
	fcmovnb	st(0), st(1)		; st(0) = (left < right) ? right : left,
					; st(1) = left
else
	fxch	st(1)			; st(0) = left,
					; st(1) = (right # right) ? left : right
	fucomi	st(0), st(1)		; eflags = left ><=# ((right # right) ? left : right)
	fcmovb	st(0), st(1)		; st(0) = (left < right) ? right : left,
					; st(1) = (right # right) ? left : right
endif
	fstp	st(1)			; st(0) = fmax(left, right)
	ret
fmax	endp
	end
        fmin() Functionfmin()
            returns its other argument if one argument is a
            NaN, else the smaller of its
            arguments.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fmin(double left, double right)
{
#ifdef QUIET
    return (left < right) || (left == left) ? left : right == right ? right : right + right;
#else
    return (left < right) || (right != right) ? left : right;
#endif
}
            Note: with the preprocessor macro
            QUIET defined, a signalingNaN is returned as
quietNaN.
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fmin:
	movsd	xmm2, xmm0		# xmm2 = left
	minsd	xmm2, xmm1		# xmm2 = (left < right) ? left : right
					#      = (left # right) ? right : min(left, right)
	cmpsd	xmm1, xmm0, 3		# xmm1 = (left # right) ? ~0L : 0L
	andpd	xmm0, xmm1		# xmm0 = (left # right) ? left : 0L
	andnpd	xmm1, xmm2		# xmm1 = (left # right) ? 0L : min(left, right)
	orpd	xmm0, xmm1		# xmm0 = (left # right) ? left : min(left, right)
					#      = fmin(left, right)
	ret
.size	fmin, .-fmin
.type	fmin, @function
.global	fmin
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720716.aspx
	.686
	.model	flat, C
	.code
fmin	proc	public			; [esp+12] = right
					; [esp+4] = left
	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fucomi	st(0), st(0)		; eflags = right ><=# right
	fcmovu	st(0), st(1)		; st(0) = (right # right) ? left : right,
					; st(1) = left
	fucomi	st(0), st(1)		; eflags = ((right # right) ? left : right) ><=# left
	fcmovnb	st(0), st(1)		; st(0) = (left < right) ? left : right,
					; st(1) = left
	fstp	st(1)			; st(0) = fmin(left, right)
	ret
fmin	endp
	end
        hypot() Functionhypot()
            returns +∞ if one of its arguments is a
            NaN, but the other argument
            is ±∞, else the square root of the sum of the squares
            of its arguments,
            √(a2 + b2),
            which is occasionally called Pythagorean Sum.
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
double hypot(double p, double q)
{
    double r, s;
    if (p < 0.0)
        p = -p;
    if (q < 0.0)
        q = -q;
    if (p < q)
        r = q, q = p, p = r;
    if (p == INFINITY)
        return p;
    if (p == 0.0)
        return p;
    if (q == 0.0)
        return p;
    if ((p != p) && (q != q))
        return INDEFINITE;
    for (;;) {
        r = q / p;
        r *= r;
        s = r + 4.0;
	if (s == 4.0)
            return p;
        r /= s;
        p += p * (r + r);
        q *= r;
    }
}
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY (0.0 / 0.5e-323)
double fabs(double x);
double fma(double x, double y, double z);
double sqrt(double x);
double hypot(double left, double right)
{
    double tmp;
    right = fabs(right);
    if ((right == INFINITY) || (left == 0.0))
        return right;
    left = fabs(left);
    if ((left == INFINITY) || (right == 0.0))
        return left;
    if (left < right)
        tmp = right, right = left, left = tmp;
    right /= left;
#ifdef FP_FAST_FMA
    right = fma(right, right, 1.0);
    tmp = sqrt(right);
    right = fma(-tmp, tmp, right) / (tmp + tmp);
    return fma(left, tmp, left * right);
#else
    return left * sqrt(1.0 + right * right);
#endif
}
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY 0x1.0p+1024
double fabs(double x);
double sqrt(double x);
double hypot(double left, double right)
{
    double tmp;
    right = fabs(right);
    if (right == INFINITY)
        return right;
    left = fabs(left);
    if (left == INFINITY)
        return left;
    if (left > right)
        tmp = right, right = left, left = tmp;
    if (left < right * 0x1.6A09E667F3BCDp-27)    // sqrt(0x1.0p-53)
        return right;
    if (left < 0x1.0p-511) {                     // sqrt(0x1.0p-1022)
        tmp = 0x1.0p-511;                        // scale up to prevent underflow
        left *= 0x1.0p+511;
        right *= 0x1.0p+511;
    } else if (right > 0x1.6A09E667F3BCCp+511) { // sqrt(0x1.0p+1023)
        tmp = 0x1.0p+511;                        // scale down to prevent overflow
        left *= 0x1.0p-511;
        right *= 0x1.0p-511;
    } else
        tmp = 1.0;
#if 1
    double delta, hypot = sqrt(left * left + right * right);
    if (hypot > 2.0 * left) {
        delta = hypot - right;
        hypot -= (2.0 * delta * (right - 2.0 * left) + (4.0 * delta - left) * left + delta * delta) / (2.0 * hypot);
    } else {
        delta = hypot - left;
        hypot -= ((2.0 * delta - right) * right + (delta - 2.0 * (right - left)) * delta) / (2.0 * hypot);
    }
    return tmp * hypot;
#else
    return tmp * sqrt(left * left + right * right);
#endif
}
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY 0x1.0p+1024
double fabs(double x);
double fma(double x, double y, double z);
double frexp(double x, int *z);
double ldexp(double x, int z);
double sqrt(double x);
double hypot(double left, double right)
{
    double tmp;
    int exponent;
    right = fabs(right);
    if (right == INFINITY)
        return right;
    left = fabs(left);
    if (left == INFINITY)
        return left;
    if (left < right)
        tmp = right, right = left, left = tmp;
    left = frexp(left, &exponent);
    right = ldexp(right, -exponent);
#ifdef FP_FAST_FMA
    return ldexp(sqrt(fma(left, left, right * right)), exponent);
#else
    return ldexp(sqrt(left * left + right * right), exponent);
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# hypot(a, ±INFINITY)  = +INFINITY
# hypot(a, INDEFINITE) = INDEFINITE
# hypot(a, ±0)         = |a|
# hypot(a, b)          = hypot(a, -b)
#                      = hypot(b, a)
# hypot(a, b)          = sqrt(a**2 + b**2)
#                      = sqrt(1 + (b / a)**2) * |a|
#                      = sqrt(1 + (min(|a|, |b|) / max(|a|, |b|))**2) * max(|a|, |b|)
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
hypot:
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm1
	subsd	xmm2, xmm0		# xmm2 = -left
	andpd	xmm0, xmm2		# xmm0 = |left|
	jz	.Lleft			# right = ±0.0?
					# right = INDEFINITE?
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm0
	subsd	xmm2, xmm1		# xmm2 = -right
	andpd	xmm1, xmm2		# xmm1 = |right|
	jz	.Lright			# left = ±0.0?
					# left = INDEFINITE?
	movsd	xmm2, xmm0
	minsd	xmm0, xmm1		# xmm0 = min(|left|, |right|)
	maxsd	xmm1, xmm2		# xmm1 = max(|left|, |right|)
	divsd	xmm0, xmm1		# xmm0 = min(|left|, |right|)
					#      / max(|left|, |right|)
	mov	rax, 0x3FF0000000000000
	movq	xmm2, rax		# xmm2 = 1.0
	mulsd	xmm0, xmm0		# xmm0 = (min(|left|, |right|)
					#       / max(|left|, |right|))**2
	addsd	xmm0, xmm2		# xmm0 = (min(|left|, |right|)
					#       / max(|left|, |right|))**2 + 1.0
	sqrtsd	xmm0, xmm0		# xmm0 = sqrt((min(|left|, |right|)
					#            / max(|left|, |right|))**2 + 1.0)
	mulsd	xmm0, xmm1		# xmm0 = hypot(left, right)
	ret
.Lleft:
	jnp	.Lexit			# right <> INDEFINITE?
					# (right = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm0
	je	.Lexit			# left = ±INFINITY?
.Linfinity:
.Lcommon:
	movsd	xmm0, xmm1		# xmm0 = |right|
	ret
.Lright:
	jnp	.Lcommon		# left <> INDEFINITE?
					# (left = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm1
	je	.Linfinity		# right = ±INFINITY?
.Lindefinite:
.Lexit:
	ret
.size	hypot, .-hypot
.type	hypot, @function
.global	hypot
.end
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# hypot(a, ±INFINITY)  = +INFINITY
# hypot(a, INDEFINITE) = INDEFINITE
# hypot(a, ±0)         = |a|
# hypot(a, b)          = hypot(a, -b)
#                      = hypot(b, a)
# hypot(a, b)          = sqrt(a**2 + b**2)
#                      = sqrt((max(|a|, |b|) * 2**c)**2 + (min(|a|, |b|) * 2**c)**2) / 2**c
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
hypot:
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm1
	subsd	xmm2, xmm0		# xmm2 = -left
	andpd	xmm0, xmm2		# xmm0 = |left|
	jz	.Lleft			# right = ±0.0?
					# right = INDEFINITE?
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm0
	subsd	xmm2, xmm1		# xmm2 = -right
	andpd	xmm1, xmm2		# xmm1 = |right|
	jz	.Lright			# left = ±0.0?
					# left = INDEFINITE?
	movsd	xmm2, xmm0
	maxsd	xmm0, xmm1		# xmm0 = max(|left|, |right|)
					#      = left'
	minsd	xmm1, xmm2		# xmm1 = min(|left|, |right|)
					#      = right'
	movq	rax, xmm0
	shr	rax, 54
	shl	eax, 2			# eax = biased exponent of left'
	mov	ecx, BIAS * 2 - 1
	sub	ecx, eax		# ecx = 2045
					#     - biased exponent of left'
					#     = biased exponent of (normalized) scale factor
					#     = {1, 5, 9, ..., 2045}
	inc	eax			# eax = biased exponent of reciprocal scale factor
	shl	rcx, 52
	shl	rax, 52
	movq	xmm2, rcx		# xmm2 = (normalized) scale factor
.ifdef SSE4_1
	unpcklpd xmm2, xmm2
	unpcklpd xmm0, xmm1		# xmm0[63:0] = left',
					# xmm0[127:64] = right'
	mulpd	xmm0, xmm2		# xmm0[63:0] = left' * scale factor,
					# xmm0[127:64] = right' * scale factor
	dppd	xmm0, xmm0, 0x31	# xmm0 = (left' * scale factor)**2
					#      + (right' * scale factor)**2
					#      = (left'**2 + right'**2) * scale factor**2
.else
	mulsd	xmm0, xmm2		# xmm0 = left' * scale factor
	mulsd	xmm1, xmm2		# xmm1 = right' * scale factor
	mulsd	xmm0, xmm0		# xmm0 = (left' * scale factor)**2
	mulsd	xmm1, xmm1		# xmm1 = (right' * scale factor)**2
	addsd	xmm0, xmm1		# xmm0 = (left' * scale factor)**2
					#      + (right' * scale factor)**2
					#      = (left'**2 + right'**2) * scale factor**2
.endif
	sqrtsd	xmm0, xmm0		# xmm0 = sqrt(left'**2 + right'**2) * scale factor
	movq	xmm1, rax		# xmm1 = reciprocal scale factor
	mulsd	xmm0, xmm1		# xmm0 = sqrt(left'**2 + right'**2)
					#      = hypot(left, right)
	ret
.Lleft:
	jnp	.Lexit			# right <> INDEFINITE?
					# (right = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm0
	je	.Lexit			# left = ±INFINITY?
.Linfinity:
.Lcommon:
	movsd	xmm0, xmm1		# xmm0 = |right|
	ret
.Lright:
	jnp	.Lcommon		# left <> INDEFINITE?
					# (left = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm1
	je	.Linfinity		# right = ±INFINITY?
.Lindefinite:
.Lexit:
	ret
.size	hypot, .-hypot
.type	hypot, @function
.global	hypot
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/a9yb3dbt.aspx
; hypot(x, ±INFINITY)  = +INFINITY
; hypot(x, INDEFINITE) = INDEFINITE
; hypot(x, ±0)         = |x|
; hypot(x, y)          = hypot(x, -y)
;                      = hypot(y, x)
; hypot(x, y)          = sqrt(x**2 + y**2)
;                      = sqrt((max(|x|, |y|) / 2**z)**2 + (min(|x|, |y|) / 2**z)**2) * 2**z
	.686
	.model	flat, C
	.code
hypot	proc	public			; [esp+12] = right
					; [esp+4] = left
	fld	real8 ptr [esp+4]	; st(0) = left
	ftst
	fstsw	ax			; ax = FPU status word
					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > 0.0
					; .  0   ...  0   .   1   ........  st(0) < 0.0
					; .  1   ...  0   .   0   ........  st(0) = 0.0
					; .  1   ...  1   .   1   ........  st(0) # 0.0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fabs				; st(0) = |right|,
					; st(1) = left
	jz	Lspecial		; left = ±0.0?
					; left = INDEFINITE?
	fxch	st(1)			; st(0) = left,
					; st(1) = |right|
	fabs				; st(0) = |left|,
					; st(1) = |right|
	fucom	st(1)
	fstsw	ax			; ax = FPU status word
					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > st(1)
					; .  0   ...  0   .   1   ........  st(0) < st(1)
					; .  1   ...  0   .   0   ........  st(0) = st(1)
					; .  1   ...  1   .   1   ........  st(0) # st(1)
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lunordered		; |right| = INDEFINITE?
	jnb	Lscale			; |left| >= |right|?
Lbelow:
	fxch	st(1)			; st(0) = max(|left|, |right|)
					;       = left',
					; st(1) = min(|left|, |right|)
					;       = right'
Lscale:
	fxtract				; st(0) = left' / 2**exponent,
					; st(1) = exponent,
					; st(2) = right'
	fmul	st(0), st(0)		; st(0) = (left' / 2**exponent)**2,
					; st(1) = exponent,
					; st(2) = right'
	fxch	st(2)			; st(0) = right',
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	fld	st(1)			; st(0) = exponent,
					; st(1) = right',
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fchs				; st(0) = -exponent,
					; st(1) = right',
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fxch	st(1)			; st(0) = right',
					; st(1) = -exponent,
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fscale				; st(0) = right' * 2**-exponent
					;       = right' / 2**exponent,
					; st(1) = -exponent,
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fstp	st(1)			; st(0) = right' / 2**exponent,
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	fmul	st(0), st(0)		; st(0) = (right' / 2**exponent)**2,
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	faddp	st(2), st(0)		; st(0) = exponent,
					; st(1) = (left' / 2**exponent)**2
					;       + (right' / 2**exponent)**2
					;       = (left'**2 + right'**2) / (2**exponent)**2
	fxch	st(1)			; st(0) = (left' / 2**exponent)**2
					;       + (right' / 2**exponent)**2
					;       = (left'**2 + right'**2) / (2**exponent)**2,
					; st(1) = exponent
	fsqrt				; st(0) = sqrt(left'**2 + right'**2) / 2**exponent,
					; st(1) = exponent
	fscale				; st(0) = sqrt(left'**2 + right'**2),
					; st(1) = exponent
	fstp	st(1)			; st(0) = hypot(left, right)
	ret
;;Lunordered:
;;	fxam
;;	fstsw	ax			; ax = FPU status word,
;;					; ah = B:C3:T:O:P:C2:C1:C0
;;	and	ah, 0x45
;;	cmp	ah, 0x05
;;	jne	Lindefinite		; |left| <> INFINITY?
;;Linfinity:
;;	fstp	st(1)			; st(0) = |left|
;;					;       = INFINITY
;;					;       = hypot(±INFINITY, right)
;;	ret
Lspecial:
	jnp	Lzero			; left <> INDEFINITE?
					; left = ±0.0?
Lunordered
	fxam
	fstsw	ax			; ax = FPU status word
					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   0   0   ........  st(0) = +unsupported
					; .  0   ...  0   1   0   ........  st(0) = -unsupported
					; .  0   ...  0   0   1   ........  st(0) = +indefinite
					; .  0   ...  0   1   1   ........  st(0) = -indefinite
					; .  0   ...  1   0   0   ........  st(0) = +finite
					; .  0   ...  1   1   0   ........  st(0) = -finite
					; .  0   ...  1   0   1   ........  st(0) = +infinity
					; .  0   ...  1   1   1   ........  st(0) = -infinity
					; .  1   ...  0   0   0   ........  st(0) = +0.0
					; .  1   ...  0   1   0   ........  st(0) = -0.0
					; .  1   ...  0   0   1   ........  st(0) = +empty
					; .  1   ...  0   1   1   ........  st(0) = -empty
					; .  1   ...  1   0   0   ........  st(0) = +denormal
					; .  1   ...  1   1   0   ........  st(0) = -denormal
	and	ah, 0x45
	cmp	ah, 0x05
	jne	Lindefinite		; |right| <> INFINITY?
Linfinity:
Lzero:
	fstp	st(1)			; st(0) = |right|
					;       = hypot(left, ±INFINITY)
					;       = hypot(±0.0, right)
	ret
Lindefinite:
	faddp	st(1), st(0)		; st(0) = INDEFINITE
	ret
hypot	endp
	end
        pow() Functionpow()
            returns +1 if its first argument is +1 or if its second argument is
            ±0, even if the other argument is a
            NaN, else its first argument
            raised to the power of its second argument.
        cbrt() Functioncbrt()
            returns the cube root of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
double fabs(double x);
double frexp(double x, int *z);
double ldexp(double x, int z);
double cbrt(double argument)
{
    static const double scale[5] = {0x1.428A2F98D728Bp-1,  // 2**(-2/3)
                                    0x1.965FEA53D6E3Dp-1,  // 2**(-1/3)
                                    1.0,                   // 2**0
                                    0x1.428A2F98D728Bp-0,  // 2**(1/3)
                                    0x1.965FEA53D6E3Dp-0}; // 2**(2/3)
    double a, b, c;
    int exponent;
    if (argument != argument)
        return INDEFINITE;
    if (argument == 0.0)
        return argument;
    a = fabs(argument);
    if (a == INFINITY)
        return argument;
    a = frexp(a, &exponent);
    // for 0.5 <= a < 1.0,
    // a minimax polynomial of degree 6 yields an approximation
    // of the cube root, followed by a single Halley iteration
    b = (((((-0x1.29801E893366Dp-3 * a
             +0x1.91E2A6FE7E984p-1) * a
             -0x1.D5AE6CFA20F0Cp-0) * a
             +0x1.39350ADAD51ECp+1) * a
             -0x1.0EB8277CD8D5Dp+1) * a
             +0x1.8218DDE9028B4p-0) * a
             +0x1.6B69CBA168FF2p-2;
    c = b * b * b;
    c = b * (2.0 * a + c) / (a + 2.0 * c);
    c = argument < 0.0 ? -c : c;
    return ldexp(c * scale[2 + exponent % 3], exponent / 3);
}
        ceil() Functionceil()
            returns the smallest integral value not less than its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double ceil(double argument)
{
#ifdef TRUNC
    double trunc(double x);
    double tmp = trunc(argument);
    return (argument > tmp) ? tmp + 1.0 : tmp;
#else
    double tmp;
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;
    return argument;
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
ceil:
	roundsd	xmm0, xmm0, 2		# xmm0 = argument rounded up (towards +INFINITY)
	ret
.size	ceil, .-ceil
.type	ceil, @function
.global	ceil
.end
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: ceil() returns -0.0 for argument in (-1.0, -0.0]
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
ceil:
	mov	rax, 0x4330000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+52
					#      = 4503599627370496.0
					#      = minimum non-fractional number
	mov	rax, 0x3FF0000000000000
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	xorpd	xmm1, xmm0		# xmm1 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm2, xmm1		# xmm2 = (argument & -0.0) ? -0x1.0p+52 : +0.x1.0p+52
	movsd	xmm3, xmm0		# xmm3 = argument
	addsd	xmm0, xmm2		# xmm0 = argument
					#      + (argument & -0.0) ? -0x1.0p+52 : +0.x1.0p+52
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - (argument & -0.0) ? -0x1.0p+52 : +0.x1.0p+52
					#      = rint(argument)
	movq	xmm2, rax		# xmm2 = 0x1.0p+0
					#      = 1.0
	cmpsd	xmm3, xmm0, 6		# xmm3 = (argument > rint(argument)) ? ~0L : 0L
	andpd	xmm3, xmm2		# xmm3 = (argument > rint(argument)) ? 1.0 : 0.0
	addsd	xmm0, xmm3		# xmm0 = (argument > rint(argument)) ? 1.0 : 0.0
					#      + rint(argument)
					#      = ceil(argument)
	orpd	xmm0, xmm1		# xmm0 = ceil(argument)
	ret
.size	ceil, .-ceil
.type	ceil, @function
.global	ceil
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/atdhw2dx.aspx
	.686
	.model	flat, C
	.code
ceil	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
; ceil(x) = x > trunc(x) ? trunc(x) + 1.0 : trunc(x)
	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?
	fxch	st(2)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument'
	fsubr	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument - argument'
					;       = trunc(argument)
	fcomp	st(2)			; st(0) = 1.0,
					; st(1) = trunc(argument)
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	ja	Labove			; argument > trunc(argument)?
	fstp	st(1)			; st(0) = trunc(argument)
					;       = ceil(argument)
	ret
Labove:
	faddp	st(1), st(0)		; st(0) = trunc(argument) + 1.0
					;       = ceil(argument)
Lexit:
else
; ceil(x) = x > rint(x) ? rint(x) + 1.0 : rint(x)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	frndint				; st(0) = rint(argument),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = rint(argument)
	fucomip	st(0), st(1)		; eflags = argument ><=# rint(argument),
					; st(0) = rint(argument)
	fld1				; st(0) = 1.0,
					; st(1) = rint(argument)
	fldz				; st(0) = 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fcmovnbe st(0), st(1)		; st(0) = (rint(argument) < argument) ? 1.0 : 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	faddp	st(2), st(0)		; st(0) = 1.0,
					; st(1) = ceil(argument)
	fstp	st(0)			; st(0) = ceil(argument)
endif
	ret
ceil	endp
	end
        fabs() Functionfabs()
            returns the absolute value alias magnitude of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fabs(double argument)
{
    *(unsigned long long *) &argument <<= 1;
    *(unsigned long long *) &argument >>= 1;
    return argument;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
fabs:
.if 0
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	maxsd	xmm0, xmm1		# xmm0 = |argument|
	ret
.else
	movq	rax, xmm0		# rax = argument
	btr	rax, 63			# rax = |argument|
	movq	xmm0, rax		# xmm0 = |argument|
	ret
.endif
.size	fabs, .-fabs
.type	fabs, @function
.global	fabs
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/18z15bk0.aspx
	.686
	.model	flat; C
	.code
_fabs	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fabs				; st(0) = |argument|
	ret
_fabs	endp
	end
        fdim() Functionfdim()
            returns the positive difference of its arguments.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fdim(double left, double right)
{
    return left < right ? 0.0 : left - right;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fdim:
	movsd	xmm2, xmm0		# xmm2 = left
	cmpsd	xmm0, xmm1, 5		# xmm0 = (left < right) ? ~0L : 0L
	subsd	xmm2, xmm1		# xmm2 = left - right
	andnpd	xmm0, xmm2		# xmm0 = (left < right) ? 0.0 : left - right
	ret
.size	fdim, .-fdim
.type	fdim, @function
.global	fdim
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720714.aspx
	.686
	.model	flat, C
	.code
fdim	proc	public			; [esp+12] = right
					; [esp+4] = left
	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fsubp	st(1), st(0)		; st(0) = left - right
	fldz				; st(0) = 0.0,
					; st(1) = left - right
	fucomi	st(0), st(1)		; eflags = 0.0 ><=# left - right
	fcmovb	st(0), st(1)		; st(0) = (left > right) ? left - right : 0.0,
					; st(1) = left - right
	fcmovu	st(0), st(1)		; st(0) = (left # right) ? left - right
					;       : (left > right) ? left - right : 0.0,
					; st(1) = left - right
	fstp	st(1)			; st(0) = fdim(left, right)
	ret
fdim	endp
	end
        floor() Functionfloor()
            returns the largest integral value not greater than its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double floor(double argument)
{
#ifdef TRUNC
    double trunc(double x);
    double tmp = trunc(argument);
    return (argument < tmp) ? tmp - 1.0 : tmp;
#else
    double tmp;
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if (argument != 0.0)
        argument += 0.0;
    return argument;
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
floor:
	roundsd	xmm0, xmm0, 1		# xmm0 = argument rounded down (towards -INFINITY)
	ret
.size	floor, .-floor
.type	floor, @function
.global	floor
.end
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: floor() preserves -0.0
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
floor:
	mov	rax, 0x4330000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+52
					#      = 4503599627370496.0
					#      = minimum non-fractional number
	mov	rax, 0x3FF0000000000000
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	xorpd	xmm1, xmm0		# xmm1 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm2, xmm1		# xmm2 = (argument & -0.0) ? -0x1.0p+52 : +0.x1.0p+52
	movsd	xmm3, xmm0		# xmm3 = argument
	addsd	xmm0, xmm2		# xmm0 = argument
					#      + (argument & -0.0) ? -0x1.0p+52 : +0x1.0p+52
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - (argument & -0.0) ? -0x1.0p+52 : +0x1.0p+52
					#      = rint(argument)
	movq	xmm2, rax		# xmm2 = 0x1.0p+0
					#      = 1.0
	cmpsd	xmm3, xmm0, 1		# xmm3 = (argument < rint(argument)) ? ~0L : 0L
	andpd	xmm3, xmm2		# xmm3 = (argument < rint(argument)) ? 1.0 : 0.0
	subsd	xmm0, xmm3		# xmm0 = (argument < rint(argument)) ? -1.0 : 0.0
					#      + rint(argument)
					#      = floor(argument)
	orpd	xmm0, xmm1		# xmm0 = floor(argument)
	ret
.size	floor, .-floor
.type	floor, @function
.global	floor
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/x39715t6.aspx
	.686
	.model	flat, C
	.code
floor	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
; floor(x) = x < trunc(x) ? trunc(x) - 1.0 : trunc(x)
	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?
	fxch	st(2)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument'
	fsubr	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument - argument'
					;       = trunc(argument)
	fcomp	st(2)			; st(0) = 1.0,
					; st(1) = trunc(argument)
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jb	Lbelow			; argument < trunc(argument)?
	fstp	st(1)			; st(0) = trunc(argument)
					;       = floor(argument)
	ret
Lbelow:
	fsubp	st(1), st(0)		; st(0) = trunc(argument) - 1.0
					;       = floor(argument)
Lexit:
else
; floor(x) = x > rint(x) ? rint(x) - 1.0 : rint(x)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	frndint				; st(0) = rint(argument),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = rint(argument)
	fucomip	st(0), st(1)		; eflags = argument ><=# rint(argument),
					; st(0) = rint(argument)
	fld1				; st(0) = 1.0,
					; st(1) = rint(argument)
	fldz				; st(0) = 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fcmovb	st(0), st(1)		; st(0) = (rint(argument) > argument) ? 1.0 : 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fsubp	st(2), st(0)		; st(0) = 1.0,
					; st(1) = floor(argument)
	fstp	st(0)			; st(0) = floor(argument)
endif
	ret
floor	endp
	end
        fma() Functionfma()
            returns the sum of the product of its first and second argument plus
            its third argument, calculated in full precision and without
            intermediate rounding of the product.
         Note: this means for example that
            fma(2.0, nextafter(INFINITY, 0.0), -nextafter(INFINITY, 0.0))
            returns nextafter(INFINITY, 0.0), and
            fma(0.5, nextafter(0.0, INFINITY), nextafter(0.0, INFINITY))
            returns 2.0 * nextafter(0.0, INFINITY),
            despite the over- respectively underflow of the (intermediate)
            product!
        
// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double frexp(double x, int *z);
double ldexp(double x, int z);
static inline // Veltkamp
void _2split(double *h, double *l, double x)
{
#if 0
    int e;
    double f = frexp(x, &e);
    double g = f * 0x1.0000002000000p+27;
#if 1
    g -= g - f;
#else
    g += f - g;
#endif
    *l = ldexp(f - g, e);
    *h = ldexp(g, e);
#else
    unsigned long long ull = *(unsigned long long *) &x & (~0ULL << 26);
    *h = *(double *) &ull
    *l = x - *h;
#endif
}
static inline // Dekker
void _2product(double *h, double *l, double x, double y)
{
    double xl, xh, yl, yh, zl, zh = x * y;
    _2split(&xh, &xl, x);
    _2split(&yh, &yl, y);
    zl = xl * yl + (xl * yh + (xh * yl + (xh * yh - zh)));
#ifdef DEKKER
    *h = zl + zh;
#if 0
    *l = zl - (*h - zh);
#else
    *l = zl + (zh - *h);
#endif
#else
    *l = zl;
    *h = zh;
#endif
}
#if 0
static inline // Møller, Knuth
void _2sum(double *h, double *l, double x, double y)
{
    double s = x + y;
    double t = s - x;
#if 0
    *l = (x - (s - t)) + (y - t);
#elif 0
    *l = (x - (s - t)) - (t - y);
#elif 0
    *l = (x + (t - s)) - (t - y);
#else
    *l = (x + (t - s)) + (y - t);
#endif
    *h = s;
}
#else
static inline // Boldo, Melquiond: |u| >= |v| >= |w|
double _3sum(double u, double v, double w)
{
    double h = w + v;
    double l = w + (v - h);
    // round high part of intermediate sum to odd when
    // its fraction is even and also inexact, i.e. low
    // part of intermediate sum is not equal to zero
    if ((l != 0.0)
     && ((*(unsigned long long *) &h & 1ull) == 0ull))
        *(unsigned long long *) &h |= 1ull;
    return u + h;
}
#endif
double fma(double multiplicand, double multiplier, double addend)
{
    int o;
    double ph, pl, qh, ql, rh, rl, sh, sl;
    double product = multiplicand * multiplier;
    if ((multiplicand - multiplicand != 0.0)
     || (multiplier - multiplier != 0.0)
     || (addend - addend != 0.0)) // at least one argument INFINITE?
        return product + addend;
    if (addend == 0.0) // when product underflows to ±0.0,
                       // its sign determines the sign of the result
        return (product == 0.0)
            && (multiplier != 0.0)
            && (multiplicand != 0.0) ? product : product + addend;
    if ((multiplicand == 0.0) || (multiplier == 0.0))
        return addend;
    o = product - product != 0.0;
    if (o) { // product overflows?
        if ((product < 0.0) == (addend < 0.0))
            return product;
        multiplier *= 0.5;
        addend *= 0.5;
#if 0
        product = 2.0 * (multiplicand * multiplier + addend);
        if (product - product != 0.0)
            return product;
#endif
    }
    _2product(&ph, &pl, multiplicand, multiplier);
#if 0
    _2sum(&qh, &ql, ph, addend);
    _2sum(&rh, &rl, pl, qh);
#if 0
    _2sum(&sh, &sl, ql, rl);
#else
    sh = rl + ql;
#endif
    sh += rh;
#else
    if (fabs(addend) < fabs(pl))
        sh = _3sum(ph, pl, addend);
    else if (fabs(addend) < fabs(ph))
        sh = _3sum(ph, addend, pl);
    else
        sh = _3sum(addend, ph, pl);
#endif
    return o ? sh + sh : sh;
}
            Note: the function _2product()
            implements Dekker’s product, an error-free (exact)
            transformation that exposes in the absence of overflows the
            properties h + l = x × y and
            |h| ≥ |l| × 253.
         Note: the function _2sum()
            implements Møller’s and Knuth’s sum, an
            error-free (exact) transformation that exposes in the absence of
            overflows the properties h + l = x + y,
            |l| ≤ |x| and
            |h| ≥ |l| × 253.
        
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# CAVEAT: requires default (round to nearest, ties to even) rounding mode!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = multiplicand
					# xmm1 = multiplier
					# xmm2 = addend
fma:
	movsd	xmm3, xmm0		# xmm3 = multiplicand
	movsd	xmm4, xmm1		# xmm4 = multiplier
	movsd	xmm5, xmm2		# xmm5 = addend
	subsd	xmm3, xmm0		# xmm3 = multiplicand - multiplicand
	subsd	xmm4, xmm1		# xmm4 = multiplier - multiplier
	subsd	xmm5, xmm2		# xmm5 = addend - addend
	ucomisd	xmm3, xmm0
	movsd	xmm3, xmm0		# xmm3 = multiplicand
	mulsd	xmm0, xmm1		# xmm0 = multiplicand * multiplier
					#      = product
	je	.Lmultiplicand		# multiplicand = ±0.0?
					# multiplicand = ±INFINITY?
					# multiplicand = INDEFINITE?
	ucomisd	xmm4, xmm1
	je	.Lmultiplier		# multiplier = ±0.0?
					# multiplier = ±INFINITY?
					# multiplier = INDEFINITE?
	ucomisd	xmm5, xmm2
	je	.Laddend		# addend = ±0.0?
					# addend = ±INFINITY?
					# addend = INDEFINITE?
	movsd	xmm4, xmm0
	subsd	xmm4, xmm0		# xmm4 = product - product
	ucomisd	xmm4, xmm0
	jp	.Loverflow		# product = ±INFINITY?
.Lveltkamp:
	mov	eax, 0x03FFFFFF		# rax = 2**26 - 1
	movq	xmm4, rax
	movq	xmm5, rax
	andnpd	xmm4, xmm3		# xmm4 = upper half of multiplicand
	andnpd	xmm5, xmm1		# xmm5 = upper half of multiplier
	subsd	xmm3, xmm4		# xmm3 = lower half of multiplicand
	subsd	xmm1, xmm5		# xmm1 = lower half of multiplier
.Ldekker:
	unpcklpd xmm4, xmm3		# xmm4[63:0] = upper half of multiplicand,
					# xmm4[127:64] = lower half of multiplicand
	unpcklpd xmm5, xmm1		# xmm5[63:0] = upper half of multiplier,
					# xmm5[127:64] = lower half of multiplier
	unpcklpd xmm3, xmm4		# xmm3[63:0] = lower half of multiplicand,
					# xmm3[127:64] = upper half of multiplicand
	mulpd	xmm4, xmm5		# xmm4[63:0] = upper half of multiplicand
					#            * upper half of multiplier,
					# xmm4[127:64] = lower half of multiplicand
					#              * lower half of multiplier
	mulpd	xmm3, xmm5		# xmm3[63:0] = lower half of multiplicand
					#            * upper half of multiplier,
					# xmm3[127:64] = upper half of multiplicand
					#              * lower half of multiplier
.Ltail:
	movsd	xmm1, xmm4
	subsd	xmm1, xmm0
	addsd	xmm1, xmm3
	unpckhpd xmm3, xmm3
	addsd	xmm1, xmm3
	unpckhpd xmm4, xmm4
	addsd	xmm1, xmm4		# xmm1 = upper half of multiplicand
					#      * upper half of multiplier
					#      - multiplicand * multiplier
					#      + lower half of multiplicand
					#      * upper half of multiplier
					#      + upper half of multiplicand
					#      * lower half of multiplier
					#      + lower half of multiplicand
					#      * lower half of multiplier
					#      = tail part of (intermediate) product
					# xmm0 = head part of (intermediate) product
.Lmøller:
	movsd	xmm3, xmm0
	addsd	xmm0, xmm2
	movsd	xmm4, xmm0		# xmm4 = head part of first intermediate sum
	subsd	xmm0, xmm3
	subsd	xmm2, xmm0
	subsd	xmm0, xmm4
	addsd	xmm0, xmm3
	addsd	xmm0, xmm2		# xmm0 = tail part of first intermediate sum
.Lknuth:
	movsd	xmm3, xmm4
	addsd	xmm4, xmm1
	movsd	xmm2, xmm4		# xmm2 = head part of second intermediate sum
	subsd	xmm4, xmm3
	subsd	xmm1, xmm4
	subsd	xmm4, xmm2
	addsd	xmm4, xmm3
	addsd	xmm4, xmm1		# xmm4 = tail part of second intermediate sum
	addsd	xmm0, xmm4		# xmm0 = tail part of first intermediate sum
					#      + tail part of second intermediate sum
					#      = head part of third intermediate sum
.Lfinal:
	addsd	xmm0, xmm2		# xmm0 = product + addend
					#      = fma(multiplicand, multiplier, addend)
	ret
.Lmultiplicand:
	jp	.Lfinal			# multiplicand = INDEFINITE?
					# multiplicand = ±INFINITY?
					# multiplicand = ±0.0!
	ucomisd	xmm4, xmm1
.Lmultiplier:
	jp	.Lfinal			# multiplier = INDEFINITE?
					# multiplier = ±INFINITY?
					# multiplier = ±0.0,
					# multiplicand <> ±INFINITY,
					# multiplicand <> INDEFINITE!
.Lindefinite:
	movsd	xmm0, xmm2		# xmm0 = addend
	ret
.Laddend:
	jp	.Lindefinite		# addend = INDEFINITE?
					# addend = ±INFINITY?
					# addend = ±0.0,
					# multiplier <> ±0.0,
					# multiplier <> ±INFINITY,
					# multiplier <> INDEFINITE,
					# multiplicand <> ±0.0,
					# multiplicand <> ±INFINITY,
					# multiplicand <> INDEFINITE!
	ucomisd	xmm0, xmm2
	je	.Lunderflow		# product = ±0.0?
.Lproduct:
	movsd	xmm4, xmm0
	subsd	xmm4, xmm0		# xmm4 = product - product
	ucomisd	xmm4, xmm0
	jnp	.Lfinal			# product <> ±INFINITY?
.Loverflow:
	movq	rcx, xmm0		# rcx = product
	movq	rdx, xmm2		# rdx = addend
	xor	rdx, rcx		# rdx = (addend < 0.0) = (product < 0.0) ? positive : negative
	jns	.Linfinity		# (addend < 0.0) = (product < 0.0)?
					# (sign of addend = sign of product?)
	mov	rax, 0x3FE0000000000000
	movq	xmm5, rax		# xmm5 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm1, xmm5		# xmm1 = multiplier * 0.5
					#      = multiplier'
	mulsd	xmm2, xmm5		# xmm2 = addend * 0.5
					#      = addend'
.if 1
	movsd	xmm0, xmm3		# xmm0 = multiplicand
	mulsd	xmm0, xmm1		# xmm0 = multiplicand * multiplier'
					#      = product'
.else
	movsd	xmm4, xmm1
	movsd	xmm5, xmm2
	mulsd	xmm4, xmm3		# xmm4 = multiplier' * multiplicand
					#      = product'
	addsd	xmm5, xmm4		# xmm5 = product' + addend'
	addsd	xmm5, xmm5		# xmm5 = (product' + addend') * 2.0
	subsd	xmm5, xmm5		# xmm5 = (product' + addend') * 2.0
					#      - (product' + addend') * 2.0
	ucomisd	xmm5, xmm5
	jp	.Linfinity		# (product' + addend') * 2.0 = ±INFINITY?
	movsd	xmm0, xmm4		# xmm0 = product'
.endif
	call	.Lveltkamp
	addsd	xmm0, xmm0		# xmm0 = (product' + addend') * 2.0
					#      = fma(multiplicand, multiplier, addend)
.Linfinity:
.Lunderflow:
	ret
.size	fma, .-fma
.type	fma, @function
.global	fma
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/.aspx
	.686
	.model	flat, C
	.code
single	record	sign:1, exponent:8, mantissa:23
bias	equ	1 shl (width exponent - 1) - 1
fma	proc	public			; [esp+20] = addend
					; [esp+12] = multiplier
					; [esp+4] = multiplicand
	fld	real8 ptr [esp+20]	; st(0) = addend
	fld	st(0)			; st(0) = addend,
					; st(1) = addend
	fsub	st(0), st(0)		; st(0) = addend - addend,
					; st(1) = addend
	fcomip	st(0), st(1)		; st(0) = addend
	je	Laddend			; addend = ±0.0?
					; addend = ±INFINITY?
					; addend = INDEFINITE?
	fld	real8 ptr [esp+12]	; st(0) = multiplier,
					; st(1) = addend
	fld	st(0)			; st(0) = multiplier,
					; st(1) = multiplier,
					; st(2) = addend
	fsub	st(0), st(0)		; st(0) = multiplier - multiplier,
					; st(1) = multiplier,
					; st(2) = addand
	fcomip	st(0), st(1)		; st(0) = multiplier,
					; st(1) = addend
	je	Lmultiplier		; multiplier = ±0.0?
					; multiplier = ±INFINITY?
					; multiplier = INDEFINITE?
	fld	real8 ptr [esp+4]	; st(0) = multiplicand,
					; st(1) = multiplier,
					; st(2) = addend
	fld	st(0)			; st(0) = multiplicand,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	fsub	st(0), st(0)		; st(0) = multiplicand - multiplicand,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	fcomip	st(0), st(1)		; st(0) = multiplicand,
					; st(1) = multiplier,
					; st(2) = addend
	je	Lmultiplicand		; multiplicand = ±0.0?
					; multiplicand = ±INFINITY?
					; multiplicand = INDEFINITE?
	fld	st(0)			; st(0) = multiplicand,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	fmul	st(0), st(2)		; st(0) = multiplicand * multiplier
					;       = product,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	fld	st(0)			; st(0) = multiplicand * multiplier
					;       = product,
					; st(1) = multiplicand * multiplier
					;       = product,
					; st(2) = multiplicand,
					; st(3) = multiplier,
					; st(4) = addend
	fsub	st(0), st(0)		; st(0) = product - product,
					; st(1) = multiplicand * multiplier
					;       = product,
					; st(2) = multiplicand,
					; st(3) = multiplier,
					; st(4) = addend
	fcomip	st(0), st(1)		; st(0) = multiplicand * multiplier
					;       = product,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	jp	Loverflow		; product = ±INFINITY?
	fxch	st(2)			; st(0) = multiplier,
					; st(1) = multiplicand,
					; st(2) = multiplicand * multiplier
					;       = product,
					; st(3) = addend
Lveltkamp:
	mov	eax, not 0 shl 26
	and	[esp+4], eax
	and	[esp+12], eax
	fld	real8 ptr [esp+4]	; st(0) = upper half of multiplicand,
					; st(1) = multiplier,
					; st(2) = multiplicand,
					; st(3) = multiplicand * multiplier,
					; st(4) = addend
	fsub	st(2), st(0)		; st(0) = upper half of multiplicand,
					; st(1) = multiplier,
					; st(2) = lower half of multiplicand,
					; st(3) = multiplicand * multiplier,
					; st(4) = addend
	fld	real8 ptr [esp+12]	; st(0) = upper half of multiplier,
					; st(1) = upper half of multiplicand,
					; st(2) = multiplier,
					; st(3) = lower half of multiplicand,
					; st(4) = multiplicand * multiplier,
					; st(5) = addend
	fsub	st(2), st(0)		; st(0) = upper half of multiplier,
					; st(1) = upper half of multiplicand,
					; st(2) = lower half of multiplier,
					; st(3) = lower half of multiplicand,
					; st(4) = multiplicand * multiplier,
					; st(5) = addend
	fld	st(0)			; st(0) = upper half of multiplier,
					; st(1) = upper half of multiplier,
					; st(2) = upper half of multiplicand,
					; st(3) = lower half of multiplier,
					; st(4) = lower half of multiplicand,
					; st(5) = multiplicand * multiplier,
					; st(6) = addend
Ldekker:
	fmul	st(0), st(2)		; st(0) = upper half of multiplier
					;       * upper half of multiplicand,
					; st(1) = upper half of multiplier,
					; st(2) = upper half of multiplicand
					; st(3) = lower half of multiplier,
					; st(4) = lower half of multiplicand,
					; st(5) = multiplicand * multiplier,
					; st(6) = addend
	fsub	st(0), st(5)		; st(0) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier,
					; st(1) = upper half of multiplier,
					; st(2) = upper half of multiplicand,
					; st(3) = lower half of multiplier,
					; st(4) = lower half of multiplicand,
					; st(5) = multiplicand * multiplier,
					; st(6) = addend
	fxch	st(2)			; st(0) = upper half of multiplicand,
					; st(1) = upper half of multiplier,
					; st(2) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier,
					; st(3) = lower half of multiplier,
					; st(4) = lower half of multiplicand,
					; st(5) = multiplicand * multiplier,
					; st(6) = addend
	fmul	st(0), st(3)		; st(0) = upper half of multiplicand
					;       * lower half of multiplier,
					; st(1) = upper half of multiplier,
					; st(2) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier,
					; st(3) = lower half of multiplier,
					; st(4) = lower half of multiplicand,
					; st(5) = multiplicand * multiplier,
					; st(6) = addend
	faddp	st(2), st(0)		; st(0) = upper half of multiplier,
					; st(1) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier,
					; st(2) = lower half of multiplier,
					; st(3) = lower half of multiplicand,
					; st(4) = multiplicand * multiplier,
					; st(5) = addend
	fmul	st(0), st(3)		; st(0) = upper half of multiplier
					;       * lower half of multiplicand,
					; st(1) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier,
					; st(2) = lower half of multiplier,
					; st(3) = lower half of multiplicand,
					; st(4) = multiplicand * multiplier,
					; st(5) = addend
	faddp	st(1), st(0)		; st(0) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier
					;       + upper half of multiplier
					;       * lower half of multiplicand,
					; st(1) = lower half of multiplier,
					; st(2) = lower half of multiplicand,
					; st(3) = multiplicand * multiplier,
					; st(4) = addend
	fxch	st(2)			; st(0) = lower half of multiplicand,
					; st(1) = lower half of multiplier,
					; st(2) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier
					;       + upper half of multiplier
					;       * lower half of multiplicand,
					; st(3) = multiplicand * multiplier,
					; st(4) = addend
	fmulp	st(1), st(0)		; st(0) = lower half of multiplier
					;       * lower half of multiplicand,
					; st(1) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier
					;       + upper half of multiplier
					;       * lower half of multiplicand,
					; st(2) = multiplicand * multiplier,
					; st(3) = addend
	faddp	st(1), st(0)		; st(0) = upper half of multiplier
					;       * upper half of multiplicand
					;       - multiplicand * multiplier
					;       + upper half of multiplicand
					;       * lower half of multiplier
					;       + upper half of multiplier
					;       * lower half of multiplicand
					;       + lower half of multiplier
					;       * lower half of multiplicand
					;       = tail part of (intermediate) product,
					; st(1) = multiplicand * multiplier
					;       = head part of (intermediate) product,
					; st(2) = addend
    double s = x + y;
    double t = s - x;
    *l = (x + (t - s)) + (y - t);
Lmoller:
	fxch	st(2)			; st(0) = addend,
					; st(1) = head part of (intermediate) product,
					; st(2) = tail part of (intermediate) product
	???
Lknuth:
	???
Lfinal:
	???
Laddend:
	jp	Lexit			; addend = INDEFINITE?
					; addend = ±INFINITY?
	fld	real8 ptr [esp+12]	; st(0) = multiplier,
					; st(1) = addend
					;       = ±0.0
Lmultiplier:
	fld	real8 ptr [esp+4]	; st(0) = multiplicand,
					; st(1) = multiplier,
					; st(2) = addend
Lmultiplicand:
	fmulp	st(1), st(0)		; st(0) = multiplier * multiplicand
	faddp	st(1), st(0)		; st(0) = multiplier * multiplicand + addend
Lexit:
	ret
Loverflow:
	mov	eax, [esp+24]		; eax = high dword of addend
	xor	eax, [esp+16]		; eax = high dword of addend
					;     ^ high dword of multiplier
	xor	eax, [esp+8]		; eax = high dword of addend
					;     ^ high dword of multiplier
					;     ^ high dword of multiplicand
	jns	Linfinity		; (addend < 0.0) = (product < 0.0)?
					; (sign of addend = sign of product?)
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fld	real4 ptr [esp]		; st(0) = 0.5,
					; st(1) = ±INFINITY,
					; st(2) = multiplicand,
					; st(3) = multiplier,
					; st(4) = addend
	pop	eax
	fstp	st(1)			; st(0) = 0.5,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend
	fmul	st(3), st(0)		; st(0) = 0.5,
					; st(1) = multiplicand,
					; st(2) = multiplier,
					; st(3) = addend * 0.5
					;       = addend'
	fmulp	st(2), st(0)		; st(0) = multiplicand,
					; st(1) = multiplier * 0.5
					;       = multiplier',
					; st(2) = addend * 0.5
	fld	st(0)			; st(0) = multiplicand,
					; st(1) = multiplicand,
					; st(2) = multiplier * 0.5
					;       = multiplier',
					; st(3) = addend * 0.5
					;       = addend'
	fmul	st(0), st(2)		; st(0) = multiplicand * multiplier'
					;       = product',
					; st(1) = multiplicand,
					; st(2) = multiplier * 0.5
					;       = multiplier',
					; st(3) = addend * 0.5
					;       = addend'
	???
Linfinity:
	fstp	st(1)			; st(0) = product
					;       = ±INFINITY,
					; st(1) = multiplier,
					; st(2) = addend
	fstp	st(1)			; st(0) = product
					;       = ±INFINITY,
					; st(1) = addend
	fstp	st(1)			; st(0) = product
					;       = ±INFINITY
	ret
fma	endp
	end
        fmod() Functionfmod()
            returns the remainder from the division of its arguments.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fma(double x, double y, double z);
double trunc(double x);
double fmod(double dividend, double divisor)
{
#ifdef TRUNC
    double quotient = trunc(dividend / divisor)
#else
    double tmp, quotient = dividend / divisor;
    if ((quotient > 0.0) && (quotient < 0x1.0p+52)) {
        tmp = quotient;
        quotient += 0x1.0p+52;
        quotient -= 0x1.0p+52;
        if (quotient > tmp)
            quotient -= 1.0;
    } else if ((quotient < 0.0) && (quotient > -0x1.0p+52)) {
        tmp = quotient;
        quotient -= 0x1.0p+52;
        quotient += 0x1.0p+52;
        if (quotient < tmp)
            quotient += 1.0;
    }
#endif
#if 0 // avoid subtractive cancellation
    return quotient == 0.0 ? dividend : dividend - divisor * quotient;
#else
    return fma(-quotient, divisor, dividend);
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = dividend
					# xmm1 = divisor
fmod:
	movsd	xmm0, xmm2		# xmm2 = dividend
	divsd	xmm2, xmm1		# xmm2 = dividend / divisor
					#      = quotient
	roundsd	xmm2, xmm2, 3		# xmm2 = trunc(quotient)
	mulsd	xmm1, xmm2		# xmm1 = divisor * trunc(quotient)
	subsd	xmm0, xmm1		# xmm0 = dividend - divisor * trunc(quotient)
					#      = remainder
	ret
.size	fmod, .-fmod
.type	fmod, @function
.global	fmod
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/20dckbeh.aspx
; fmod(dividend, divisor) = dividend % divisor
;                         = dividend - divisor * trunc(dividend / divisor)
	.686
	.model	flat, C
	.code
fmod	proc	public			; [esp+12] = divisor
					; [esp+4] = dividend
	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
Lreduce:
	fprem				; st(0) = remainder,
					; st(1) = divisor
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce
	fstp	st(1)			; st(0) = remainder
	ret
fmod	endp
	end
        fpclassify() Functionfpclassify()
            returns the implementation-defined category of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define FP_ZERO      0
#define FP_SUBNORMAL 1
#define FP_NORMAL    2
#define FP_INFINITE  3
#define FP_NAN       4
#define INFINITY 0x1.0p+1024
#define MINIMUM  0x1.0p-1022
double fabs(double x);
int fpclassify(double argument)
{
#if 1
    unsigned long long ull = *(unsigned long long *) &double << 1;
    if (ull == 0)
        return FP_ZERO;
    if (ull < (1ULL << 53))
        return FP_SUBNORMAL;
    if (ull < (2047ULL << 53))
        return FP_NORMAL;
    if (ull == (2047ULL << 53))
        return FP_INFINITE;
#else
    if (argument == 0.0)
        return FP_ZERO;
    argument = fabs(argument);
    if (argument < MINIMUM)
        return FP_SUBNORMAL;
    if (argument < INFINITY)
        return FP_NORMAL;
    if (argument == INFINITY)
        return FP_INFINITE;
#endif
    return FP_NAN;
}
        frexp() Functionfrexp()
            returns the normalized fraction and the (integral) exponent of its
            first argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/w1xfschh.aspx
	.686
	.model	flat, C
	.code
single	record	sign:1, exponent:8, mantissa:23
bias	equ	1 shl (width exponent - 1) - 1
frexp	proc	public			; [esp+12] = address of exponent
					; [esp+4] = argument
if 0
	fld1				; st(0) = 1.0
	fchs				; st(0) = -1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = -1.0
	fxtract				; st(0) = argument / 2.0**exponent
					;       = mantissa,
					; st(1) = exponent,
					; st(2) = -1.0
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa,
					; st(2) = -1.0
	fsub	st(0), st(2)		; st(0) = exponent + 1.0,
					; st(1) = mantissa,
					; st(2) = -1.0
	mov	eax, [esp+12]		; eax = address of exponent
	fistp	dword ptr [eax]		; [eax] = exponent + 1.0,
					; st(0) = mantissa,
					; st(1) = -1.0
	fscale				; st(0) = mantissa / 2.0,
					; st(1) = -1.0
	fstp	st(1)			; st(0) = mantissa / 2.0
else
	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = argument / 2.0**exponent
					;       = mantissa,
					; st(1) = exponent
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa
	mov	eax, [esp+12]		; eax = address of exponent
	fistp	dword ptr [eax]		; [eax] = exponent,
					; st(0) = mantissa
	inc	dword ptr [eax]		; [eax] = exponent + 1
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = mantissa / 2.0
	pop	eax
endif
	ret
frexp	endp
	end
        isfinite() Functionisfinite()
            returns non-zero if its argument is a finite floating-point number.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int isfinite(double argument)
{
    return (*(unsigned long long *) &argument << 1) < (2047ULL << 53);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isfinite:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	setb	al			# eax = (|argument| < 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| < 0x1.0p+1024) ? 1 : 0
	ret
.size	isfinite, .-isfinite
.type	isfinite, @function
.global	isfinite
.end
        isinf() Functionisinf()
            returns non-zero if its argument is +∞ or −∞.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int isinf(double argument)
{
    return (*(unsigned long long *) &argument << 1) == (2047ULL << 53);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isinf:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	sete	al			# eax = (|argument| = 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| = 0x1.0p+1024) ? 1 : 0
	ret
.size	isinf, .-isinf
.type	isinf, @function
.global	isinf
.end
        isnan() Functionisnan()
            returns non-zero if its argument is a
            NaN.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int isnan(double argument)
{
    return (*(unsigned long long *) &argument << 1) > (2047ULL << 53);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isnan:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	seta	al			# eax = (|argument| > 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| > 0x1.0p+1024) ? 1 : 0
	ret
.size	isnan, .-isnan
.type	isnan, @function
.global	isnan
.end
        isnormal() Functionisnormal()
            returns non-zero if its argument is a non-zero finite floating-point
            number.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int isnormal(double argument)
{
    return ((*(unsigned long long *) &argument << 1) < (2047ULL << 53))
        && ((*(unsigned long long *) &argument << 1) != 0);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isnormal:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	seta	cl			# cl = (|argument| <> 0.0) ? 1 : 0
	cmp	rdx, rax
	seta	al			# eax = (|argument| < 0x1.0p+1024) ? 1 : 0
	and	eax, ecx		# rax = (0.0 < |argument| < 0x1.0p+1024) ? 1 : 0
	ret
.size	isnormal, .-isnormal
.type	isnormal, @function
.global	isnormal
.end
        issubnormal() Functionissubnormal() returns non-zero if its
            argument is a (non-zero) subnormal floating-point number.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int issubnormal(double argument)
{
    return ((*(unsigned long long *) &argument << 1) < (1ULL << 53))
        && ((*(unsigned long long *) &argument << 1) != 0);
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
issubnormal:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0x0020000000000000 # rax = 0x1.0p-1022 << 1
	seta	cl			# cl = (|argument| <> 0.0) ? 1 : 0
	cmp	rdx, rax
	setb	al			# eax = (|argument| < 0x1.0p-1022) ? 1 : 0
	and	eax, ecx		# rax = (0.0 < |argument| < 0x1.0p-1022) ? 1 : 0
	ret
.size	issubnormal, .-issubnormal
.type	issubnormal, @function
.global	issubnormal
.end
        ldexp() Functionldexp()
            returns its first argument multiplied by 2 raised to the power of
            its (integral) second argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/zx52ds7f.aspx
; https://msdn.microsoft.com/en-us/library/dn465179.aspx
; ldexp(x, n) = x * 2**n
; scalbn(x, n) = x * 2**n
	.686
	.model	flat, C
	.code
ldexp	proc	public			; [esp+12] = exponent
scalbn	proc	public			; [esp+4] = argument
	fild	dword ptr [esp+12]	; st(0) = exponent
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = exponent
	fscale				; st(0) = argument * 2.0**exponent,
					; st(1) = exponent
	fstp	st(1)			; st(0) = argument * 2.0**exponent
	ret
scalbn	endp
ldexp	endp
	end
        ldexp10() Function// Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY (1.0 / 0.5e-323)
// powers of 5 from 5**0 up to 5**22 (less than 2**53, hence exact)
static const double powers5[] = {1.0,
#if 0
                                 1.0e+1 * 0x1.0p-1,
                                 1.0e+2 * 0x1.0p-2,
                                 1.0e+3 * 0x1.0p-3,
                                 1.0e+4 * 0x1.0p-4,
                                 1.0e+5 * 0x1.0p-5,
                                 1.0e+6 * 0x1.0p-6,
                                 1.0e+7 * 0x1.0p-7,
                                 1.0e+8 * 0x1.0p-8,
                                 1.0e+9 * 0x1.0p-9,
                                 1.0e+10 * 0x1.0p-10,
                                 1.0e+11 * 0x1.0p-11,
                                 1.0e+12 * 0x1.0p-12,
                                 1.0e+13 * 0x1.0p-13,
                                 1.0e+14 * 0x1.0p-14,
                                 1.0e+15 * 0x1.0p-15,
                                 1.0e+16 * 0x1.0p-16,
                                 1.0e+17 * 0x1.0p-17,
                                 1.0e+18 * 0x1.0p-18,
                                 1.0e+19 * 0x1.0p-19,
                                 1.0e+20 * 0x1.0p-20,
                                 1.0e+21 * 0x1.0p-21,
                                 1.0e+22 * 0x1.0p-22};
#else
                                 5.0,
                                 25.0,
                                 125.0,
                                 625.0,
                                 3125.0,
                                 15625.0,
                                 78125.0,
                                 390625.0,
                                 1953125.0,
                                 9765625.0,
                                 48828125.0,
                                 244140625.0,
                                 1220703125.0,
                                 6103515625.0,
                                 30517578125.0,
                                 152587890625.0,
                                 762939453125.0,
                                 3814697265625.0,
                                 19073486328125.0,
                                 95367431640625.0,
                                 476837158203125.0,
                                 2384185791015625.0};
#endif
// powers of 5 from 5**0 up to 5**(23*19) in steps of 5**23
static const double powers5positive[] = {1.0,
#if 0
                                         1.0e+23 * 0x1.0p-23,
                                         1.0e+46 * 0x1.0p-46,
                                         1.0e+69 * 0x1.0p-69,
                                         1.0e+92 * 0x1.0p-92,
                                         1.0e+115 * 0x1.0p-115,
                                         1.0e+138 * 0x1.0p-138,
                                         1.0e+161 * 0x1.0p-161,
                                         1.0e+184 * 0x1.0p-184,
                                         1.0e+207 * 0x1.0p-207,
                                         1.0e+230 * 0x1.0p-230,
                                         1.0e+253 * 0x1.0p-253,
                                         1.0e+276 * 0x1.0p-276,
                                         1.0e+299 * 0x1.0p-299,
                                         1.0e+322 * 0x1.0p-322,
                                         1.0e+345 * 0x1.0p-345,
                                         1.0e+368 * 0x1.0p-368,
                                         1.0e+391 * 0x1.0p-391,
                                         1.0e+414 * 0x1.0p-414,
                                         1.0e+437 * 0x1.0p-437};
#elif 0
                                         0x1.52D02C7E14AF6p+53,
                                         0x1.C06A5EC5433C6p+106,
                                         0x1.28BC8ABE49F64p+160,
                                         0x1.88BA3BF284E24p+213,
                                         0x1.03E29F5C2B18Cp+267,
                                         0x1.57F48BB41DB7Cp+320,
                                         0x1.C73892ECBFBF4p+373,
                                         0x1.2D3D6F88F0B3Dp+427,
                                         0x1.8EB0138858D0Ap+480,
                                         0x1.07D457124123Dp+534,
                                         0x1.5D2CE55747A18p+587,
                                         0x1.CE2137F743382p+640,
                                         0x1.31CFD3999F7B0p+694,
                                         0x1.94BD136316C04p+747,
                                         0x1.0BD561C834D28p+801,
                                         0x1.627987065DE19p+854,
                                         0x1.D524B49F94CA1p+907,
                                         0x1.3673FAEB68902p+961,
                                         0x1.9AE1957B849F0p+1014};
#else
                                         1.1920928955078125e+16,
                                         1.4210854715202004e+32,
                                         1.6940658945086007e+48,
                                         2.0194839173657902e+64,
                                         2.4074124304840448e+80,
                                         2.8698592549372254e+96,
                                         3.4211388289180104e+112,
                                         4.0783152924990778e+128,
                                         4.8617306858290170e+144,
                                         5.7956346104490959e+160,
                                         6.9089348440755557e+176,
                                         8.2360921431488463e+192,
                                         9.8181869305954531e+208,
                                         1.1704190886730495e+225,
                                         1.3952482803738708e+241,
                                         1.6632655625031839e+257,
                                         1.9827670604028510e+273,
                                         2.3636425261531484e+289,
                                         2.8176814629473071e+305};
#endif
// powers of 5 from 5**-0 down to 5**(-23*19) in steps of 5**-23
static const double powers5negative[] = {1.0,
#if 0
                                         1.0e-23 * 0x1.0p+23,
                                         1.0e-46 * 0x1.0p+46,
                                         1.0e-69 * 0x1.0p+69,
                                         1.0e-92 * 0x1.0p+92,
                                         1.0e-115 * 0x1.0p+115,
                                         1.0e-138 * 0x1.0p+138,
                                         1.0e-161 * 0x1.0p+161,
                                         1.0e-184 * 0x1.0p+184,
                                         1.0e-207 * 0x1.0p+207,
                                         1.0e-230 * 0x1.0p+230,
                                         1.0e-253 * 0x1.0p+253,
                                         1.0e-276 * 0x1.0p+276,
                                         1.0e-299 * 0x1.0p+299,
                                         1.0e-322 * 0x1.0p+322,
                                         1.0e-345 * 0x1.0p+345,
                                         1.0e-368 * 0x1.0p+368,
                                         1.0e-391 * 0x1.0p+391,
                                         1.0e-414 * 0x1.0p+414,
                                         1.0e-437 * 0x1.0p+437};
#elif 0
                                         0x1.82DB34012B251p-54,
                                         0x1.244CE242C5561p-107,
                                         0x1.B9B6364F30304p-161,
                                         0x1.4DBF7B3F71CB7p-214,
                                         0x1.F8587E7083E30p-268,
                                         0x1.7D12A4670C123p-321,
                                         0x1.1FEE341FC585Dp-374,
                                         0x1.B31BB5DC320D2p-428,
                                         0x1.48C22CA71A1BDp-481,
                                         0x1.F0CE4839198DBp-535,
                                         0x1.77603725064A8p-588,
                                         0x1.1BA03F5B21000p-641,
                                         0x1.AC9A7B3B7302Fp-695,
                                         0x1.43D7F68432923p-748,
                                         0x1.E960ED3C8FD6Bp-802,
                                         0x1.71C3978517DE1p-855,
                                         0x1.1762C3F35BDA3p-908,
                                         0x1.A63225B3E7F4Cp-962,
                                         0x1.3F008FC1D0D46p-1015};
#else
                                         8.388608e-17,
                                         7.0368744177664e-33,
                                         5.9029581035870565e-49,
                                         4.9517601571415211e-65,
                                         4.1538374868278621e-81,
                                         3.4844914372704099e-97,
                                         2.9230032746618058e-113,
                                         2.4519928653854222e-129,
                                         2.0568806966515076e-145,
                                         1.7254365866976409e-161,
                                         1.4474011154664524e-177,
                                         1.2141680576410807e-193,
                                         1.0185179881672430e-209,
                                         8.5439481436836403e-226,
                                         7.1671831749689735e-242,
                                         6.0122690119010131e-258,
                                         5.0434567931384933e-274,
                                         4.2307582002575910e-290,
                                         3.5490172084746430e-306};
#endif
enum {
    count5         = sizeof(powers5) / sizeof(*powers5),
    count5positive = sizeof(powers5positive) / sizeof(*powers5positive),
    count5negative = sizeof(powers5negative) / sizeof(*powers5negative)
};
double fabs(double x);
double ldexp(double x, int z);
double ldexp10(double argument, int exponent)
{
    if (argument != argument)
        return INDEFINITE;
    if (argument == 0.0)
        return argument;
    if (fabs(argument) == INFINITY)
        return argument;
    if (exponent > 0) {
        if (exponent > 324 + 308 - 1)
            return argument < 0.0 ? -INFINITY : INFINITY;
        if (exponent > count5 * count5positive - 1) {
            argument *= 1.0e+303;
            exponent -= 303;
        }
        return ldexp(argument, exponent)
             * powers5positive[exponent / count5]
             * powers5[exponent % count5];
    }
    if (exponent < 0) {
        if (exponent < 1 - 324 - 308)
            return argument < 0.0 ? -0.0 : 0.0;
        if (exponent < 1 - count5 * count5negative) {
            argument /= 1.0e+303;
            exponent += 303;
        }
        return ldexp(argument, exponent)
             * powers5negative[-exponent / count5]
             / powers5[-exponent % count5];
    }
    return argument;
}
        remainder() Functionremainder()
            returns the remainder from the division of its arguments, with the
            quotient rounded according to the current mode.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fma(double x, double y, double z);
double rint(double x);
double remainder(double dividend, double divisor)
{
#ifdef RINT
    double quotient = rint(dividend / divisor);
#else
    double quotient = dividend / divisor;
    if ((quotient > 0.0) && (quotient < 0x1.0p+52)) {
        quotient += 0x1.0p+52;
        quotient -= 0x1.0p+52;
    } else if ((quotient < 0.0) && (quotient > -0x1.0p+52)) {
        quotient -= 0x1.0p+52;
        quotient += 0x1.0p+52;
    }
#endif
#if 0 // avoid subtractive cancellation
    return quotient == 0.0 ? dividend : dividend - divisor * quotient;
#else
    return fma(-quotient, divisor, dividend);
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = dividend
					# xmm1 = divisor
remainder:
	movsd	xmm0, xmm2		# xmm2 = dividend
	divsd	xmm2, xmm1		# xmm2 = dividend / divisor
					#      = quotient
	roundsd	xmm2, xmm2, 4		# xmm2 = rint(quotient)
	mulsd	xmm1, xmm2		# xmm1 = divisor * rint(quotient)
	subsd	xmm0, xmm1		# xmm0 = dividend - divisor * rint(quotient)
					#      = remainder
	ret
.size	remainder, .-remainder
.type	remainder, @function
.global	remainder
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465170.aspx
	.686
	.model	flat, C
	.code
remainder proc	public			; [esp+12] = dividend
					; [esp+4 ] = divisor
	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
Lreduce:
	fprem1				; st(0) = remainder,
					; st(1) = divisor
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce
	fstp	st(1)			; st(0) = remainder
	ret
remainder endp
	end
        remquo() Functionremquo()
            returns the remainder and the (partial) integral quotient from the
            division of its arguments.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fma(double x, double y, double z);
double trunc(double x);
double remquo(double dividend, double divisor, int *quotient)
{
#ifdef TRUNC
    double ratio = trunc(dividend / divisor);
#else
    double tmp, ratio = dividend / divisor;
    if ((ratio > 0.0) && (ratio < 0x1.0p+52)) {
        tmp = ratio;
        ratio += 0x1.0p+52;
        ratio -= 0x1.0p+52;
        if (ratio > tmp)
            ratio -= 1.0;
    } else if ((ratio < 0.0) && (ratio > -0x1.0p+52)) {
        tmp = ratio;
        ratio -= 0x1.0p+52;
        ratio += 0x1.0p+52;
        if (ratio < tmp)
            ratio += 1.0;
    }
#endif
    *quotient = (int) ratio;
#if 0 // avoid subtractive cancellation
    return ratio == 0.0 ? dividend : dividend - divisor * ratio;
#else
    return fma(-ratio, divisor, dividend);
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = dividend
					# xmm1 = divisor
					# rdi = address of quotient
remquo:
	movsd	xmm0, xmm2		# xmm2 = dividend
	divsd	xmm2, xmm1		# xmm2 = dividend / divisor
					#      = quotient
	roundsd	xmm2, xmm2, 3		# xmm2 = trunc(quotient)
	mulsd	xmm1, xmm2		# xmm1 = divisor * trunc(quotient)
	subsd	xmm0, xmm1		# xmm0 = dividend - divisor * trunc(quotient)
					#      = remainder
	cvtsd2si eax, xmm2		# eax = trunc(quotient)
	mov	[rdi], eax		# *quotient = trunc(quotient)
	ret
.size	remquo, .-remquo
.type	remquo, @function
.global	remquo
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465175.aspx
	.686
	.model	flat, C
	.code
remquo	proc	public			; [esp+20] = address of (partial) quotient
					; [esp+12] = divisor
					; [esp+4] = dividend
	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
	mov	ecx, [esp+20]		; ecx = address of quotient
	mov	eax, [esp+16]		; eax = high dword of divisor
	xor	eax, [esp+8]		; eax = high dword of divisor
					;     ^ high dword of dividend
	cdq				; edx = (sign of dividend <> sign of divisor) ? -1 : 0
Lreduce:
	fprem1				; st(0) = dividend modulo divisor,
					; st(1) = divisor,
					; C0:C3:C1 = least significant bits of quotient
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce
	fstp	st(1)			; st(0) = dividend modulo divisor
					;       = remainder
Lquotient:
	and	eax, 4300h		; eax = 0b0:C3:0000:C1:C0:00000000
	imul	eax, 910000h
	shr	eax, 29			; eax = C0:C3:C1
					;     = (partial) quotient
Lsign:
	xor	eax, edx
	sub	eax, edx		; eax = (sign of dividend <> sign of divisor)
					;     ? -quotient : quotient
	mov	[ecx], eax
	ret
remquo	endp
	end
        rint() Functionrint()
            returns the according to the current rounding mode nearest integral
            value to its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double rint(double argument)
{
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;
    return argument;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
rint:
	roundsd	xmm0, xmm0, 4		# xmm0 = argument rounded according to current mode
	ret
.size	rint, .-rint
.type	rint, @function
.global	rint
.end
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: depending on current rounding mode, rint() is equivalent to
#       floor(), ceil(), roundeven() or trunc(), but differs from
#       round(); while roundeven() breaks ties to the nearest even
#       integer, round() breaks ties away from 0, what neither CPU
#       nor FPU support in their instruction sets!
# NOTE: rint() preserves -0.0, and returns -0.0 for argument in
#       [-0.5, -0.0] or (-1.0, -0.0]
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
rint:
	mov	rax, 0x4330000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+52
					#      = 4503599627370496.0
					#      = minimum non-fractional number
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	andpd	xmm1, xmm0		# xmm1 = |argument|
	xorpd	xmm0, xmm1		# xmm0 = (argument & -0.0) ? -0.0 : +0.0
	addsd	xmm1, xmm2		# xmm1 = |argument| + 0x1.0p+52
	subsd	xmm1, xmm2		# xmm1 = |argument| - 0x1.0p+52
					#      = rint(|argument|)
	orpd	xmm0, xmm1		# xmm0 = rint(argument)
	ret
.size	rint, .-rint
.type	rint, @function
.global	rint
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465165.aspx
; NOTE: depending on current rounding mode, rint() is equivalent to
;       floor(), ceil(), roundeven() or trunc(), but differs from
;       round(); while roundeven() breaks ties to the nearest even
;       integer, round() rounds ties away from 0, what neither FPU
;       nor CPU support in their instruction sets!
	.686
	.model	flat, C
	.code
rint	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	frndint				; st(0) = rint(argument)
	ret
rint	endp
	end
        round() Functionround()
            returns the nearest integral value to its argument, rounding ties
            away from 0.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double round(double argument)
{
#ifdef TRUNC
    double trunc(double x);
    double tmp = trunc(argument);
    if (argument > 0.0)
        return argument - tmp < 0.5 ? tmp : tmp + 1.0;
    if (argument < 0.0)
        return argument - tmp > -0.5 ? tmp : tmp - 1.0;
    return tmp;
#else
    double tmp;
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument - tmp <= -0.5)
            argument += 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument - tmp >= 0.5)
            argument -= 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;
    return argument;
#endif
}
        roundeven() Functionroundeven() returns the nearest integral
            value to its argument, rounding ties to even.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
roundeven:
	roundsd	xmm0, xmm0, 0		# xmm0 = argument rounded to nearest (even) integer
	ret
.size	roundeven, .-roundeven
.type	roundeven, @function
.global	roundeven
.end
        signbit() Functionsignbit()
            returns 1 if the sign of its argument is negative, else 0.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
int signbit(double argument)
{
    return *(long long *) &argument < 0;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
signbit:
	movmskpd eax, xmm0		# rax = (argument & -0.0) ? 0b?1 : 0b?0
	and	eax, 1			# rax = (argument & -0.0) ? 1 : 0
					#     = signbit(argument)
	ret
.size	signbit, .-signbit
.type	signbit, @function
.global	signbit
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
	.686
	.model	flat, C
	.code
signbit	proc	public			; [esp+4] = argument
	mov	eax, [esp+8]		; eax = high dword of argument
	shr	eax, 31			; eax = (argument & -0.0) ? 1 : 0
	ret
signbit	endp
	end
        sqrt() Functionsqrt()
            returns the positive square root of its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
sqrt:
	sqrtsd	xmm0, xmm0		# xmm0 = square root of argument
	ret
.size	sqrt, .-sqrt
.type	sqrt, @function
.global	sqrt
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/f1xa99e6.aspx
	.686
	.model	flat, C
	.code
sqrt	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
	fsqrt				; st(0) = square root of argument
	ret
sqrt	endp
	end
        trunc() Functiontrunc()
            returns the by magnitude largest integral value not greater than its
            argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double trunc(double argument)
{
    double tmp;
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;
    return argument;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: requires SSE 4.1 instruction set!
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
trunc:
	roundsd	xmm0, xmm0, 3		# xmm0 = argument rounded towards zero
	ret
.size	trunc, .-trunc
.type	trunc, @function
.global	trunc
.end
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: trunc() returns -0.0 for argument in (-1.0, -0.0]
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
trunc:
	mov	rax, 0x4330000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+52
					#      = 4503599627370496.0
					#      = minimum non-fractional number
	mov	rax, 0x3FF0000000000000
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	andpd	xmm1, xmm0		# xmm1 = |argument|
	xorpd	xmm0, xmm1		# xmm0 = (argument & -0.0) ? -0.0 : +0.0
	movsd	xmm3, xmm1		# xmm3 = |argument|
	addsd	xmm1, xmm2		# xmm1 = |argument| + 0x1.0p+52
	subsd	xmm1, xmm2		# xmm1 = |argument| - 0x1.0p+52
					#      = rint(|argument|)
	movq	xmm2, rax		# xmm2 = 0x1.0p+0
					#      = 1.0
	cmpsd	xmm3, xmm1, 1		# xmm3 = (|argument| < rint(|argument|)) ? ~0L : 0L
	andpd	xmm3, xmm2		# xmm3 = (|argument| < rint(|argument|)) ? 1.0 : 0.0
	subsd	xmm1, xmm3		# xmm1 = (|argument| < rint(|argument|)) ? -1.0 : 0.0
					#      + rint(|argument|)
					#      = trunc(|argument|)
	orpd	xmm0, xmm1		# xmm0 = trunc(argument)
	ret
.size	trunc, .-trunc
.type	trunc, @function
.global	trunc
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720727.aspx
	.686
	.model	flat, C
	.code
trunc	proc	public			; [esp+4] = argument
	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?
	fstp	st(1)			; st(0) = argument',
					; st(1) = argument
	fsubp	st(1), st(0)		; st(0) = argument - argument'
					;       = trunc(argument)
Lexit:
else
	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	fabs				; st(0) = |argument|,
					; st(1) = argument
	fld	st(0)			; st(0) = |argument|,
					; st(1) = |argument|,
					; st(2) = argument
	frndint				; st(0) = rint(|argument|),
					; st(1) = |argument|,
					; st(2) = argument
	fxch	st(1)			; st(0) = |argument|,
					; st(1) = rint(|argument|),
					; st(2) = argument
	fucomip	st(0), st(1)		; eflags = |argument| ><=# rint(|argument|),
					; st(0) = rint(|argument|),
					; st(1) = argument
	fldz				; st(0) = 0.0,
					; st(1) = rint(|argument|),
					; st(2) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 0.0,
					; st(2) = rint(|argument|),
					; st(3) = argument
	fcmovnb	st(0), st(1)		; st(0) = (rint(|argument|) <= |argument|) ? 0.0 : 1.0,
					; st(1) = 0.0,
					; st(2) = rint(|argument|),
					; st(3) = argument
	fsubp	st(2), st(0)		; st(0) = 0.0,
					; st(1) = trunc(|argument|),
					; st(2) = argument
	fucomip	st(0), st(2)		; eflags = 0.0 ><=# argument,
					; st(0) = trunc(|argument|),
					; st(1) = argument
	fst	st(1)			; st(0) = trunc(|argument|),
					; st(1) = trunc(|argument|)
	fchs				; st(0) = -trunc(|argument|),
					; st(1) = trunc(|argument|)
	fcmovbe	st(0), st(1)		; st(0) = (argument >= 0.0) ? trunc(|argument|) : -trunc(|argument|)
					;       = trunc(argument),
					; st(1) = trunc(|argument|)
	fstp	st(1)			; st(0) = trunc(argument)
endif
	ret
trunc	endp
	end
        ceil() Functionceil()
            returns the smallest integral value not less than its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: ceil() returns -0.0 for argument in (-1.0, -0.0]
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
ceil:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?
	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lsmall			# |argument| < 1.0?
	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?
	neg	ecx			# ecx = number of bits in fractional part of mantissa
	mov	rdx, rax
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	xor	rdx, rax		# rdx = fractional part of mantissa
	movq	xmm0, rax		# xmm0 = trunc(argument)
	neg	rdx			# CF = (fractional part of mantissa <> 0)
	sbb	ecx, ecx		# ecx = (fractional part of mantissa <> 0) ? -1 : 0
	shr	ecx, 22			# ecx = (fractional part of mantissa <> 0) ? 0x3FF : 0
	cqo				# rdx = (trunc(argument) < 0.0) ? -1 : 0
	not	edx			# edx = (trunc(argument) < 0.0) ? 0 : -1
	and	edx, ecx
	shl	rdx, 52			# rdx = (trunc(argument) < 0.0)
					#     | (fractional part of mantissa = 0)
					#     ? 0 : 0x3FF0000000000000
	movq	xmm1, rdx		# xmm0 = (trunc(argument) < 0.0)
					#      | (fractional part of mantissa = 0)
					#      ? 0.0 : 1.0
	addsd	xmm0, xmm1		# xmm0 = ceil(argument)
	ret
.Lsmall:
	test	rax, rax
	jns	.Lpositive
.Lnegative:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lpositive:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# rax = 0x1.0p+0
					#     = 1.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret
.size	ceil, .-ceil
.type	ceil, @function
.global	ceil
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/atdhw2dx.aspx
; NOTE: ceil() returns -0.0 for argument in (-1.0, -0.0]
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
ceil	proc	public			; xmm0 = argument
	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?
	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	comisd	xmm1, xmm0		; CF = (rint(argument) < argument)
	adc	rax, 0			; rax = llrint(argument)
					;     + (rint(argument) < argument)
					;     = ceil(argument)
	cvtsi2sd xmm2, rax		; xmm2 = ceil(argument)
	xorpd	xmm1, xmm1		; xmm1 = 0.0
	subsd	xmm1, xmm0		; xmm1 = -argument
	xorpd	xmm0, xmm1		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = ceil(argument)
Lexit:
	ret
ceil	endp
	end
            Note: returns a signalingNaN unchanged!
copysign() Functioncopysign()
            returns its first operand with the sign of its second operand.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double fabs(double x);
int signbit(double x);
double copysign(double to, double from)
{
#if 0
    return signbit(from) ? -fabs(to) : fabs(to);
#else
    return signbit(from) == signbit(to) ? to : -to;
#endif
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = to
					# xmm1 = from
copysign:
	movq	rcx, xmm0		# rcx = to
	movq	rdx, xmm1		# rdx = from
	add	rdx, rdx		# CF = (from & -0.0)
	adc	rcx, rcx
	ror	rcx, 1
	movq	xmm0, rcx		# xmm0 = copysign(to, from)
	ret
.size	copysign, .-copysign
.type	copysign, @function
.global	copysign
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/0yafk1hc.aspx
	.686
	.model	flat, C
	.code
copysign proc	public			; [esp+12] = from
					; [esp+4] = to
	mov	eax, [esp+16]		; eax = high dword of from
if 0
	mov	edx, [esp+8]		; edx = high dword of to
	add	eax, eax		; CF = (from & -0.0)
	adc	edx, edx
	ror	edx, 1
	mov	[esp+8], edx
else
	shld	[esp+8], eax, 1
	ror	dword ptr [esp+8], 1
endif
	fld	real8 ptr [esp+4]	; st(0) = (from & -0.0) ? -|to| : |to|
	ret
copysign endp
	end
        floor() Functionfloor()
            returns the largest integral value not greater than its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: floor() preserves -0.0
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
floor:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?
	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lsmall			# |argument| < 1.0?
	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?
	neg	ecx			# ecx = number of bits in fractional part of mantissa
	mov	rdx, rax
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	xor	rdx, rax		# rdx = fractional part of mantissa
	movq	xmm0, rax		# xmm0 = trunc(argument)
	neg	rdx			# CF = (fractional part of mantissa <> 0)
	sbb	ecx, ecx		# ecx = (fractional part of mantissa <> 0) ? -1 : 0
	shr	ecx, 22			# ecx = (fractional part of mantissa <> 0) ? 0x3FF : 0
	cqo				# rdx = (trunc(argument) < 0.0) ? -1 : 0
	and	edx, ecx
	shl	rdx, 52			# rdx = (trunc(argument) < 0.0)
					#     & (fractional part of mantissa <> 0)
					#     ? 0x3FF0000000000000 : 0
	movq	xmm1, rdx		# xmm1 = (trunc(argument) < 0.0)
					#      & (fractional part of mantissa <> 0)
					#      ? 1.0 : 0.0
	subsd	xmm0, xmm1		# xmm0 = floor(argument)
	ret
.Lsmall:
	test	rax, rax
	js	.Lnegative
.Lpositive:
	xorpd	xmm0, xmm0		# xmm0 = 0.0
	ret
.Lnegative:
	mov	rax, 0xBFF0000000000000
	movq	xmm0, rax		# rax = -0x1.0p+0
					#     = -1.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret
.size	floor, .-floor
.type	floor, @function
.global	floor
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/x39715t6.aspx
; NOTE: floor() preserves -0.0
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
floor	proc	public			; xmm0 = argument
	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?
	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	comisd	xmm0, xmm1		; CF = (rint(argument) > argument)
	sbb	rax, 0			; rax = llrint(argument)
					;     - (rint(argument) > argument)
					;     = floor(argument)
	cvtsi2sd xmm2, rax		; xmm2 = floor(argument)
	xorpd	xmm1, xmm1		; xmm1 = 0.0
	subsd	xmm1, xmm0		; xmm1 = -argument
	xorpd	xmm0, xmm1		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = floor(argument)
Lexit:
	ret
floor	endp
	end
            Note: returns a signalingNaN unchanged!
frexp() Functionfrexp()
            returns the (normalized) fraction and the (integral) exponent of its
            first argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double frexp(double argument, int *exponent)
{
    unsigned long long sign, ull;
    if (argument == 0.0)
        *exponent = 0;
    else {
        ull = *(unsigned long long *) &argument;
        *exponent = ull >> 52;
        *exponent &= 2047;
        if (*exponent > 0) {
            ull &= ~(2047ULL << 52);
            ull |= 1022ULL << 52;
        } else {
            sign = ull & (1ULL << 63);
            do {
                *exponent -= 1;
                ull += ull;
            } while (ull < (1ULL << 52));
            ull ^= 1023ULL << 52;
            ull |= sign;
        }
        *exponent -= 1022;
        argument = *(double *) &ull;
    }
    return argument;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
					# rdi = address of exponent
frexp:
	movq	rax, xmm0		# rax = argument
	lea	rcx, [rax+rax]		# rcx = argument << 1
					#     = |argument| << 1
	shr	rcx, 1			# rcx = |argument|
	mov	[rdi], ecx
	jz	.Lexit			# argument = ±0.0?
	shr	rcx, 52			# rcx = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	sub	ecx, BIAS - 1		# ecx = unbiased exponent + 1
	cmp	ecx, BIAS + 2
	mov	[rdi], ecx
	je	.Lexit			# unbiased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = ±INFINITY?)
.Lnormal:
	rol	rax, 1
	shl	rax, 11
	or	rax, BIAS - 1
	ror	rax, 12			# rax = fractional part of argument
	movq	xmm0, rax		# xmm0 = fractional part of argument
.Lexit:
	ret
.Ldenormal:
	xor	edx, edx
	add	rax, rax		# rax = argument << 1
					#     = |argument| << 1
	adc	edx, edx		# rdx = (argument & -0.0) ? 1 : 0
	shl	edx, 11
	or	edx, BIAS - 1
	bsr	rcx, rax		# rcx = index of most significant '1' bit in |argument| << 1
	xor	ecx, 63			# ecx = number of leading '0' bits in |argument| << 1
					#     = 11 - biased exponent
	shl	rax, cl			# rax = normalized significand of argument << 11
	add	rax, rax		# rax = fractional part of argument << 12
	or	rax, rdx
	ror	rax, 12			# rax = fractional part of argument
	movq	xmm0, rax		# xmm0 = fractional part of argument
	neg	ecx			# ecx = biased exponent - 11
	sub	ecx, BIAS - 12		# ecx = unbiased exponent + 1
	mov	[rdi], ecx
	ret
.size	frexp, .-frexp
.type	frexp, @function
.global	frexp
.end
        ldexp() Functionldexp()
            returns its first argument multiplied by 2 raised to the power of
            its (integral) second argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: for denormal argument and negative exponent or denormal
#       result, ldexp() rounds to nearest with ties to even!
.arch	generic64
.code64
.equiv	BIAS, 1023
.equiv	JCCLESS, 1
.intel_syntax noprefix
.text
					# xmm0 = argument
					# edi = exponent
ldexp:
	test	edi, edi
	jz	.Lexit			# exponent = 0?
	movq	rsi, xmm0		# rsi = argument
	lea	rax, [rsi+rsi]		# rax = argument << 1
					#     = |argument| << 1
	shr	rax, 1			# rax = |argument|
	jz	.Lexit			# argument = ±0.0?
	mov	rdx, rax		# rdx = |argument|
	shr	rax, 52			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	cmp	eax, BIAS * 2 + 1
	je	.Lexit			# biased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = ±INFINITY?)
.Lnormal:
	add	eax, edi		# eax = new biased exponent
	jle	.Lotherflow		# new biased exponent < 1?
					# (possible exponent underflow?)
	cmp	eax, BIAS * 2
	jg	.Loverflow		# new biased exponent > 2046?
					# (exponent overflow?)
	shl	rdi, 52
	add	rdx, rdi		# rdx = |argument| * 2.0**exponent
.Lcopysign:
.if 0
	shld	rdx, rsi, 1
	ror	rdx, 1			# rdx = argument * 2.0**exponent
.elseif 0
	add	rdx, rdx
	add	rsi, rsi		# CF = (argument & -0.0)
	rcr	rdx, 1			# rdx = argument * 2.0**exponent
.else
	add	rsi, rsi		# CF = (argument & -0.0)
	adc	rdx, rdx
	ror	rdx, 1			# rdx = argument * 2.0**exponent
.endif
	movq	xmm0, rdx		# xmm0 = argument * 2.0**exponent
.Lexit:
	ret
.Lunderflow:
	xor	rdx, rdx		# rdx = 0.0
	jmp	.Lcopysign
.Loverflow:
	mov	rdx, 0x7FF0000000000000	# rdx = 0x1.0p+1024
					#     = INFINITY
	jmp	.Lcopysign
.Lotherflow:
	cmp	eax, -52
	jl	.Lunderflow		# new (biased) exponent + 1 < -52?
					# (exponent underflow, even with mantissa rounded up?)
	dec	eax			# eax = new biased exponent
	neg	eax			# eax = 0 - new biased exponent
	mov	ecx, eax		# ecx = 0 - new biased exponent
					#     = shift count
	mov	rax, 0x000FFFFFFFFFFFFF
	and	rdx, rax		# rdx = mantissa
	inc	rax			# rax = 0x0010000000000000
					#     = explicit integer bit
	or	rdx, rax		# rdx = 1.mantissa
					#     = significand
	xor	eax, eax
.Lcontinue:
	shrd	rax, rdx, cl		# rax = excess part of significand
	shr	rdx, cl			# rdx = significand >> -(new biased exponent)
					#     = |argument| * 2.0**exponent
.ifnotdef JCCLESS
	add	rax, rax		# rax = excess part of significand << 1,
					# CF = (excess part of significand >= 0x8000000000000000),
					# ZF = (excess part of significand = 0x8000000000000000)
	jnc	.Lcopysign		# excess part of significand < 0x8000000000000000?
	jnz	.Lround			# excess part of significand > 0x8000000000000000?
.Ltie:
	bt	edx, 0			# CF = (significand odd) ? 1 : 0
.Lround:
	adc	rdx, 0			# rdx = significand rounded to nearest even
.else
	xor	ecx, ecx
	add	rax, rax		# rax = excess part of significand << 1,
					# CF = (excess part of significand >= 0x8000000000000000),
					# ZF = (excess part of significand = 0x8000000000000000)
	adc	ecx, ecx		# ecx = (excess part of significand < 0x8000000000000000) ? 0 : 1
	neg	rax			# CF = (excess part of significand <> 0x8000000000000000)
	sbb	eax, eax		# eax = (excess part of significand = 0x8000000000000000) ? 0 : -1
	or	eax, edx
	and	eax, ecx		# rax = (excess part of significand > 0x8000000000000000)
					#     | (excess part of significand = 0x8000000000000000)
					#     & (significand odd) ? 1 : 0
	add	rdx, rax		# rdx = significand rounded to nearest even
.endif # JCCLESS
	jmp	.Lcopysign
.Ldenormal:
	bsr	rcx, rdx		# rcx = index of most significant '1' bit in |argument|
	test	edi, edi
	js	.Lnegative		# exponent < 0?
	xor	ecx, 63			# ecx = number of leading '0' bits in |argument|
	sub	ecx, 12			# ecx = number of leading '0' bits in mantissa
	cmp	ecx, edi
	jb	.Lnormalize		# exponent > number of leading '0' bits in mantissa?
	mov	ecx, edi
	shl	rdx, cl			# rdx = mantissa << exponent
					#     = |argument| << exponent
					#     = |argument| * 2.0**exponent
	jmp	.Lcopysign
.Lnegative:
.if 0
	add	ecx, edi		# ecx = index of most significant '1' bit in mantissa
					#     + exponent
					#     = new index of most significant '1' bit in mantissa
	inc	ecx			# ecx = new index of most significant '1' bit in significand
.else
	stc
	adc	ecx, edi		# ecx = new index of most significant '1' bit in significand
.endif
	js	.Lunderflow		# mantissa underflow, even with mantissa rounded up?
	neg	edi
	mov	ecx, edi		# ecx = -exponent
#	shrd	rax, rdx, cl
#	shr	rdx, cl			# rdx = mantissa >> exponent
					#     = |argument| >> exponent
					#     = |argument| * 2.0**exponent
	jmp	.Lcontinue
.Lnormalize:
	inc	ecx			# ecx = number of leading '0' bits in mantissa + 1
					#     = number of leading '0' bits in significand
	sub	edi, ecx		# edi = exponent
					#     - number of leading '0' bits in significand
					#     = new (biased) exponent
	cmp	edi, BIAS * 2
	jge	.Loverflow		# new biased exponent > 2046?
					# (exponent overflow?)
	shl	rdi, 52
	shl	rdx, cl			# rdx = significand << number of leading '0' bits in significand
					#     = |argument| << number of leading '0' bits in significand
	add	rdx, rdi		# rdx = |argument| * 2.0**exponent
	jmp	.Lcopysign
	ret
.size	ldexp, .-ldexp
.type	ldexp, @function
.global	ldexp
.end
        modf() Functionmodf()
            returns the fractional and integral parts of its argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define INFINITY (1.0 / 0.5e-323)
double fabs(double x);
double trunc(double x);
double modf(double argument, double *integer)
{
    *integer = trunc(argument);
    return fabs(argument) == INFINITY ? 1.0 / argument : argument - *integer;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
					# rdi = address of integer part
modf:
	movq	rax, xmm0		# rax = argument
	shr	rax, 52			# rax = sign and biased exponent
	mov	ecx, BIAS * 2 + 1
	and	ecx, eax		# rcx = biased exponent
	sub	eax, ecx
	shl	rax, 52			# rax = sign of argument
	mov	[rdi], rax		# *integer = ±0.0
	sub	ecx, BIAS		# rcx = biased exponent - 1023
					#     = unbiased exponent
	js	.Lexit			# unbiased exponent < 0?
					# (no integer part?)
	cmp	ecx, 52
	jge	.Linteger		# no fractional part?
	mov	rdx, 0x000FFFFFFFFFFFFF
	shr	rdx, cl			# rdx = mask for fractional part of mantissa
	movq	rcx, xmm0		# rcx = argument
	test	rcx, rdx
	jz	.Linteger		# fractional part of mantissa = 0?
					# (argument is integer?)
.Lfraction:
	not	rdx			# rdx = mask for sign, biased exponent and integer part of mantissa
	and	rdx, rcx		# rdx = sign, biased exponent and integer part of mantissa
					#     = integer part of argument
	mov	[rdi], rdx
	movq	xmm1, rdx		# xmm1 = integer part of argument
	subsd	xmm0, xmm1		# xmm0 = argument - integer part of argument
					#      = fractional part of argument
	ret
.Linteger:
	movq	[rdi], xmm0		# *integer = argument
	movq	xmm0, rax		# xmm0 = ±0.0
					#      = fractional part of argument
.Lexit:
	ret
.size	modf, .-modf
.type	modf, @function
.global	modf
.end
        nextafter() Functionnextafter()
            returns the next representable double-precision floating-point
            number from its first argument in direction of its second
            argument.
        // Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
double nextafter(double from, double to)
{
    if (from == to)
        return to;
    if (to != to)
        return to;
    if (from != from)
        return from;
    if (from == 0.0)
        return to < 0.0 ? -0x1.0p-1074 : 0x1.0p-1074;
#if 0
    if ((from < to) && (from < 0.0)
     || (from > to) && (from > 0.0))
#elif 0
    if ((from > to) == (from > 0.0))
#else
    if ((from < to) == (from < 0.0))
#endif
        --*(unsigned long long *) &from; // from -= 1 ULP
    else
        ++*(unsigned long long *) &from; // from += 1 ULP
    return from;
}
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = from
					# xmm1 = to
nextafter:
	ucomisd	xmm1, xmm0		# CF = (from > to)
	je	.Lspecial		# from = to?
					# from = INDEFINITE?
					# to = INDEFINITE?
.Lnotequal:
	sbb	rdx, rdx		# rdx = (from > to) ? -1 : 0
	movq	rcx, xmm0		# rcx = from
	mov	rax, rcx
	add	rax, rax		# CF = (from & -0.0)
	jz	.Lzero			# from = ±0.0?
.Lnext:
	sbb	rax, rax		# rax = (from < 0.0) ? -1 : 0
	xor	rax, rdx		# rax = (from < 0.0) ^ (from > to) ? -1 : 0
					#     = (from < 0.0) & (from < to)
					#     | (from > 0.0) & (from > to) ? -1 : 0
	or	rax, 1			# rax = (from < 0.0) ^ (from > to) ? -1 : 1
					#     = (from < 0.0) = (from < to) ? -1 : 1
					#     = (from > 0.0) = (from > to) ? -1 : 1
	add	rax, rcx
	movq	xmm0, rax		# xmm0 = from ± 1 ULP
.ifdef COMPLIANT
	xorpd	xmm1, xmm1
	addsd	xmm1, xmm0
.endif
	ret
.Lzero:
	movmskpd eax, xmm1		# rax = (to & -0.0) ? 0b?1 : 0b?0
	or	eax, 2			# rax = (to & -0.0) ? 0b11 : 0b10
	ror	rax, 1			# rax = (to & -0.0) ? 0x8000000000000001 : 1
	movq	xmm0, rax		# xmm0 = (to & -0.0) ? -0x1.0p-1074 : 0x1.0p-1074
.ifdef COMPLIANT
	xorpd	xmm1, xmm1
	addsd	xmm1, xmm0
.endif
	ret
.Lspecial:
	jp	.Lindefinite		# to = INDEFINITE?
					# from = INDEFINITE?
.Lequal:
	movsd	xmm0, xmm1		# xmm0 = to
	ret
.Lindefinite:
	addsd	xmm0, xmm1		# xmm0 = INDEFINITE
	ret
.size	nextafter, .-nextafter
.type	nextafter, @function
.global	nextafter
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/h0dff77w.aspx
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
nextafter proc	public			; xmm0 = from
					; xmm1 = to
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	ucomisd	xmm1, xmm2		; CF = (to < 0.0)
	jp	Lto			; to = INDEFINITE?
	sbb	rax, rax		; rax = (to < 0.0) ? -1 : 0
	ucomisd	xmm0, xmm1		; CF = (from < to)
;;	jp	Lfrom			; from = INDEFINITE?
;;	je	Lto			; from = to?
	je	Lspecial		; from = to?
					; from = INDEFINITE?
Lnotequal:
	sbb	rcx, rcx		; rcx = (from < to) ? -1 : 0
	ucomisd	xmm0, xmm2		; CF = (from < 0.0)
	jz	Lzero			; from = ±0.0?
Lnext:
	movd	rdx, xmm0		; rdx = from
	sbb	rax, rax		; rax = (from < 0.0) ? -1 : 0
	xor	rax, rcx		; rax = (from < 0.0) = (from < to) ? 0 : -1
	or	rax, 1			; rax = (from < 0.0) = (from < to) ? 1 : -1
	sub	rdx, rax
	movd	xmm0, rdx		; xmm0 = from ± 1 ULP
ifdef MXCSR
	addsd	xmm2, xmm0
endif
	ret
Lzero:
	shl	rax, 63			; rax = (to < 0.0) ? 0x8000000000000000 : 0
	or	rax, 1			; rax = (to < 0.0) ? 0x8000000000000001 : 1
	movd	xmm0, rax		; xmm0 = (to < 0.0) ? -0x1.0p-1074 : 0x1.0p-1074
ifdef MXCSR
	addsd	xmm2, xmm0
endif
	ret
Lspecial:
	jp	Lfrom			; from = INDEFINITE?
Lto:
	movsd	xmm0, xmm1		; xmm0 = to
Lfrom:
	ret
nextafter endp
	end
            Note: returns signalingNaNs unchanged!
# Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
.arch	i686
.code32
.intel_syntax noprefix
.text
					# [esp+12] = to
					# [esp+4] = from
nextafter:
	fld	real8 ptr [esp+4]	# st(0) = from
	fld	real8 ptr [esp+12]	# st(0) = to,
					# st(1) = from
	fucomi	st(0), st(1)
	je	.Lspecial		# from = to?
					# from = INDEFINITE?
					# to = INDEFINITE?
	sbb	edx, edx		# edx = (to < from) ? -1 : 0
	fsub	st(0), st(0)		# st(0) = 0.0,
					# st(1) = from
	fucomip	st(0), st(1)		# st(0) = from
	jz	.Lzero			# from = ±0.0?
	sbb	eax, eax		# eax = (from > 0.0) ? -1 : 0
	xor	eax, edx		# eax = (from > 0.0) ^ (from < to) ? -1 : 0
					#     = (from > 0.0) & (from > to)
					#     | (from < 0.0) & (from < to) ? -1 : 0
	or	eax, 1			# eax = (from > 0.0) ^ (from < to) ? -1 : 1
					#     = (from > 0.0) = (from > to) ? -1 : 1
					#     = (from < 0.0) = (from < to) ? -1 : 1
	cdq				# edx:eax = (from < 0.0) = (from > to) ? -1 : 1
	sub	[esp+4], eax
	sbb	[esp+8], edx		# from = from
					#      - (from < 0.0) = (from > to) ? -1 : 1
					#      = from'
	fld	real8 ptr [esp+4]	# st(0) = from',
					# st(1) = from
	fstp	st(1)			# st(0) = nextafter(from, to)
	ret
.Lzero:
	and	dword ptr [esp+16], 0x80000000
	mov	dword ptr [esp+12], 0x1	# to = (to & -0.0) ? 0x8000000000000001 : 1
	fld	real8 ptr [esp+12]	# st(0) = (to & -0.0) ? -0x1.0p-1074 : 0x1.0p-1074
					# st(1) = from
.Lequal:
	fstp	st(1)			# st(0) = nextafter(from, to)
	ret
.Lspecial:
	jnp	.Lequal			# from = to?
.Lindefinite:
	faddp	st(1), st(0)		# st(0) = from + to
					#       = INDEFINITE
	ret
.size	nextafter, .-nextafter
.type	nextafter, @function
.global	nextafter
.end
        rint() Functionrint()
            returns the according to the current rounding mode nearest integral
            value to its argument.
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/dn465165.aspx
; NOTE: rint() preserves -0.0, and returns -0.0 for argument in
;       [-0.5, -0.0] or (-1.0, -0.0]
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
rint	proc	public			; xmm0 = argument
	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?
	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	subsd	xmm2, xmm0		; xmm2 = -argument
	xorpd	xmm0, xmm2		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm1		; xmm0 = rint(argument)
Lexit:
	ret
rint	endp
	end
            Note: returns a signalingNaN unchanged!
round() Functionround()
            returns the nearest integral value to its argument, rounding ties
            away from 0.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: round() returns -0.0 for argument in (-0.5, -0.0]
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
round:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?
	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS - 1		# rcx = 1 + unbiased exponent of |argument|
	jl	.Lzero			# |argument| < 0.5?
	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 53
	jge	.Lexit			# |argument| >= 0x1.0p+52?
	neg	ecx			# ecx = number of bits in fractional part of mantissa
	shr	rax, cl			# CF = (fraction >= 0.5)
	sbb	edx, edx		# edx = (fraction >= 0.5) ? -1 : 0
	shl	rax, cl			# rax = trunc(argument)
	movq	xmm1, rax		# xmm1 = trunc(argument)
	movmskpd eax, xmm0		# rax = (argument & -0.0) ? 0b?1 : 0b?0
	shr	edx, 22			# edx = (fraction >= 0.5) ? 0x3FF : 0
	shl	eax, 11
	or	eax, edx
	shl	rax, 52			# rax = (fraction >= 0.5) ? 0x3FF0000000000000 : 0
					#     | (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rax		# xmm0 = {-1.0, 0.0, 1.0}
	addsd	xmm0, xmm1		# xmm0 = round(argument)
	ret
.Lzero:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret
.size	round, .-round
.type	round, @function
.global	round
.end
        trunc() Functiontrunc()
            returns the largest integral value not greater than the magnitude of
            its argument.
        # Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
# NOTE: trunc() returns -0.0 for argument in (-1.0, -0.0]
.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
trunc:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?
	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lzero			# |argument| < 1.0?
	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?
	neg	ecx			# ecx = number of bits in fractional part of mantissa
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	movq	xmm0, rax		# xmm0 = trunc(argument)
	ret
.Lzero:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret
.size	trunc, .-trunc
.type	trunc, @function
.global	trunc
.end
        ; Copyright © 2004-2025, Stefan Kanthak <stefan.kanthak@nexgo.de>
; https://msdn.microsoft.com/en-us/library/mt720727.aspx
; NOTE: trunc() returns -0.0 for argument in (-1.0, -0.0]
	.code
double	record	sign:1, exponent:11, mantissa:52
bias	equ	1 shl (width exponent - 1) - 1
trunc	proc	public			; xmm0 = argument
	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?
	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvttsd2si rax, xmm0		; rax = trunc(argument)
	cvtsi2sd xmm1, rax		; xmm1 = trunc(argument)
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	subsd	xmm2, xmm0		; xmm2 = -argument
	xorpd	xmm0, xmm2		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = trunc(argument)
Lexit:
	ret
trunc	endp
	end
            Note: returns a signalingNaN unchanged!
Ole Møller, Quasi Double-Precision in Floating Point Addition, BIT Numerical Mathematics, Volume 5(1):37-50, March 1965, ISSN 0006-3835, 1572-9125.
Theodorus J. Dekker, A Floating-Point Technique for Extending the Available Precision, Numerische Mathematik, Volume 18(3):224-242, June 1971, ISSN 0029-599X, 0945-3245.
Pat H. Sterbenz, Floating-Point Computation, Prentice-Hall, 1974, ISBN 0-13-322495-3.
William J. Cody and William Waite, Software Manual for the Elementary Functions, Prentice-Hall, 1980, ISBN 0-13-822064-6.
Seppo I. Linnainmaa, Software for Doubled-Precision Floating-Point Computations, ACM Transactions on Mathematical Software, Volume 7(3):272-283, September 1981, ISSN 0098-3500, 1557-7295.
Mary H. Payne and Robert N. Hanek, Radian Reduction for Trigonometric Functions, ACM SIGNUM Newsletter, Volume 18(1):19-24, January 1983, ISSN 0163-5778.
Mary H. Payne and Robert N. Hanek, Degree Reduction for Trigonometric Functions, ACM SIGNUM Newsletter, Volume 18(2):18-19, April 1983, ISSN 0163-5778.
Cleve B. Moler and Donald Morrison, Replacing Square Roots by Pythagorean Sums, IBM Journal of Research and Development, Volume 27(6):577-581, November 1983, ISSN 0018-8646.
Augustin A. Dubrulle, A Class of Numerical Methods for the Computation of Pythagorean Sums, IBM Journal of Research and Development, Volume 27(6):582-589, November 1983, ISSN 0018-8646.
Sylvie Boldo and Guillaume Melquiond, Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd, IEEE Transactions on Computers, Volume 57(4):462-471, April 2008, ISSN 0018-9340, 1557-9956.
Nelson H. F. Beebe, The Mathematical-Function Computation Handbook, Springer, 2017, ISBN 978-3-319-64109-6, 978-3-319-87725-9, 978-3-319-64110-2.
Use the X.509 certificate to send S/MIME encrypted mail.
Note: email in weird format and without a proper sender name is likely to be discarded!
 I dislike
            HTML (and even
            weirder formats too) in email, I prefer to receive plain text.
        
I also expect to see your full (real) name as sender, not your
            nickname.
        
I abhor top posts and expect inline quotes in replies.
        
as iswithout any warranty, neither express nor implied.
cookiesin the web browser.
The web service is operated and provided by
Telekom Deutschland GmbH The web service provider stores a session cookie
 in the web
            browser and records every visit of this web site with the following
            data in an access log on their server(s):