Valid HTML 4.01 Transitional Valid CSS Valid SVG 1.0

Me, myself & IT

Accurate and Fast Double-Precision Floating-Point Elementary Functions

Purpose
Introduction
Demonstration
Declaration
Exponential Functions
exp2() Base-2 Exponential Function
exp2n() Function
exp10() Base-10 Exponential Function
exp() Base-e Exponential Function
expm1() Base-e Exponential Function
Logarithm Functions
log() Base-e alias Natural Logarithm Function
log1p() Base-e alias Natural Logarithm Function
log10() Base-10 alias Common Logarithm Function
log2() Base-2 alias Binary Logarithm Function
logb() Function
ilogb() Function
Trigonometric Functions
cos() (Circular) Cosine Function
cot() (Circular) Cotangent Function
sin() (Circular) Sine Function
tan() (Circular) Tangent Function
Inverse Trigonometric Functions
acos() Arc Cosine Function
acot() Arc Cotangent Function
acot2() Arc Cotangent Function
asin() Arc Sine Function
atan() Arc Tangent Function
atan2() Arc Tangent Function
Hyperbolic Functions
cosh() Hyperbolic Cosine Function
coth() Hyperbolic Cotangent Function
sinh() Hyperbolic Sine Function
tanh() Hyperbolic Tangent Function
Inverse alias Area Hyperbolic Functions
acosh() Area Hyperbolic Cosine Function
acoth() Area Hyperbolic Cotangent Function
asinh() Area Hyperbolic Sine Function
atanh() Area Hyperbolic Tangent Function
Irregular Functions
fmax() Function
fmin() Function
hypot() Function
pow() Function
Regular Functions
cbrt() Function
ceil() Function
fabs() Function
fdim() Function
floor() Function
fma() Function
fmod() Function
fpclassify() Function
frexp() Function
isfinite() Function
isinf() Function
isnan() Function
isnormal() Function
issubnormal() Function
ldexp() Function
remainder() Function
remquo() Function
rint() Function
round() Function
roundeven() Function
signbit() Function
sqrt() Function
trunc() Function
Special (bit-twiddling) Functions
ceil() Function
copysign() Function
floor() Function
frexp() Function
ldexp() Function
modf() Function
nextafter() Function
rint() Function
round() Function
trunc() Function
Bibliography

Purpose

Present clean, lean and mean implementations of elementary mathematical plus other functions defined by the ANSI C, ISO C and POSIX standards, using IEEE 754 floating-point arithmetic.

Introduction

The IEEE 754 64-bit binary double-precision floating-point format is defined with 1-bit sign, (biased) 11-bit exponent, implied (i.e. hidden) integer bit and 52-bit fraction, providing a 53-bit significand = integer.fraction: binary64 = (−1)sign × significand × 2exponent−1023.

Positive zero (+0) is represented with sign = 0, exponent = 0 and fraction = 0; negative zero (−0) is represented with sign = 1, exponent = 0 and fraction = 0.

The maximum value 211−1 = 2047 of the exponent is reserved for special values:
positive infinity (+∞) is represented with sign = 0, exponent = 2047 and fraction = 0;
negative infinity (−∞) is represented with sign = 1, exponent = 2047 and fraction = 0;
NaN (not-a-number) is represented with either sign, exponent = 2047 and fraction > 0;
a quiet NaN is represented with either sign, exponent = 2047 and fraction > 251−1, i.e. the most significant bit of fraction set;
a signaling NaN is represented with either sign, exponent = 2047 and fraction < 251, i.e. the most significant bit of fraction clear.
If the exponent is 0, the implied integer bit is 0, else 1; a floating-point number is normalized if its integer bit is 1, i.e. if its exponent is greater than 0, else subnormal alias denormal.

The fraction of a non-zero finite floating-point number is a rational number from the set {½, ¼, ¾, …, 1/252, …, (252−1)/252}.

The significand = integer.fraction of a non-zero finite floating-point number is in the interval [2−52, 2−2−52], decimal [0.0000000000000002220446049250313080847263336181640625, 1.9999999999999997779553950749686919152736663818359375]; a normalized significand is in the interval [1, 2−2−52].
Representable (non-zero finite) floating-point numbers, also called machine numbers, are in the non-contiguous intervals [2−1074, (2−2−52) × 21023] and [−2−1074, −(2−2−52) × 21023], the full (normalized) 53-bit significand is available on the intervals [2−1022, (2−2−52) × 21023] and [−2−1022, −(2−2−52) × 21023].

The largest representable floating-point number, (2−2−52) × 21023 = 0.179769… × 10309, has 309 (integral) decimal digits: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
The smallest representable floating-point number, 2−1074 = 0.494065… × 10−323, has 1074 fractional decimal digits, 323 zeroes followed by 751 more digits: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625
The largest representable subnormal floating-point number, (2−2−52) × 2−1023 = 0.222507… × 10−307, has 1074 fractional decimal digits, 307 zeroes followed by 767 more digits: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000022250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375
Note: the decimal representation of 2n has n fractional digits!

Basic arithmetic operations, i.e. addition, subtraction, multiplication, division, fused multiply-accumulate, plus square root on (representable) floating-point numbers, including the special values +∞, −∞ and NaN, are performed as if their mathematical exact (infinitely precise) result is calculated, then mapped or rounded to a representable floating-point number.
Arithmetic underflow yields +0 or −0, arithmetic overflow yields +∞ or −∞, operations on NaNs as well as mathematically undefined operations yield NaN, with the notable exception that division of a non-zero finite floating-point number by ±0 yields ±∞, and non-zero finite results are rounded according to the selected rounding mode:

RN (round to nearest, ties to even)
fraction = ⌊precise × 252⌋ × 2−52 (towards zero) when precise × 252 − ⌊precise × 252⌋ < 0.5,
fraction = ⌈precise × 252⌉ × 2−52 (away from zero) when precise × 252 − ⌊precise × 252⌋ > 0.5,
else (the tie-break) towards zero for even precise × 252 and away from zero for odd precise × 252 (i.e. even precise × 252);
RZ (round towards zero)
fraction = ⌊precise × 252⌋ × 2−52;
RU (round up, towards +∞)
fraction = ⌈precise × 252⌉ × 2−52 for positive numbers,
fraction = ⌊precise × 252⌋ × 2−52 for negative numbers;
RD (round down, towards −∞)
fraction = ⌊precise × 252⌋ × 2−52 for positive numbers,
fraction = ⌈precise × 252⌉ × 2−52 for negative numbers;
RA (round to nearest, ties away from zero)
Note: the default rounding mode is round to nearest, ties to even!
The result of a floating-point operation is
exact if the infinitely precise mathematical result is a representable floating-point number, else inexact;
correctly rounded if it is the (according to the selected rounding mode) nearest representable floating-point number to the exact result;
faithfully rounded if it is one of the two (consecutive) representable floating-point numbers next to (i.e. surrounding) the exact result.
Note: due to the limitation of fraction to rational numbers, (operands and) results are almost always inexact!

The maximum (relative) error of a faithfully rounded result is less than 1 ULP; the maximum (relative) error of a correctly rounded result is less than ½ ULP.

Note: in all rounding modes, a faithfully rounded result is either equal to the correctly rounded result or 1 ULP off of the correctly rounded result.

Demonstration

The program below uses the function nextafter() to test whether the following mathematical identities hold for various elementary functions and the (correctly rounded) values of some of the constants M_* defined by the ANSI C, ISO C and POSIX standards:
√−0 = −0
loge0 = −∞
arc cos 0 = π/2
arc cos ½ = π/3
arc sin ½ = π/6
√½ = …
arc sin 1 = π/2
arc tan 1 = π/4
e1 = e
loge1 = 0
log21 = 0
log101 = 0
√2 = …
log22 = 1
loge2 = …
eloge2 = 2
log1010 = 1
loge10 = …
eloge10 = 10
2log2e = e
10log10e = e
logee = 1
log2e = 1/loge2
log10e = 1/loge10
tan π/4 = 1
sin π/4 = √½
cos π/4 = √½
cos π/2 = 0
sin π/2 = 1
tan π/2 = ∞
cos π = −1
sin π = 0
tan π = 0
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

// * The software is provided "as is" without any warranty, neither
//   express nor implied.
// * In no event will the author be held liable for any damage(s) arising
//   from the use of the software.
// * Redistribution of the software is allowed only in unmodified form.
// * Permission is granted to use the software solely for personal private
//   and non-commercial purposes.
// * An individuals use of the software in his or her capacity or function
//   as an agent, (independent) contractor, employee, member or officer of
//   a business, corporation or organization (commercial or non-commercial)
//   does not qualify as personal private and non-commercial purpose.
// * Without written approval from the author the software must not be used
//   for a business, for commercial, corporate, governmental, military or
//   organizational purposes of any kind, or in a commercial, corporate,
//   governmental, military or organizational environment of any kind.

#include <math.h>
#include <stdio.h>

void evaluate(double (*function)(double), double input, double reference)
{
    double output = (function)(input);

    if (output == reference)
        printf("%.17g is correctly rounded\n", output);
    else if (nextafter(output, reference) == reference)
        printf("%.17g is faithfully rounded\n", output);
    else
        printf("%.17g is …\n", output);
}

int main(void)
{
    double last, next = 0.0;

    printf("sqrt(-0): ");
    evaluate(sqrt, -0.0, -0.0);
#ifdef INFINITY
    printf("log(0): ");
    evaluate(log, 0.0, -INFINITY);
#endif
#ifdef M_PI
    printf("acos(0): ");
    evaluate(acos, 0.0, M_PI / 2.0);
    printf("acos(0.5): ");
    evaluate(acos, 0.5, M_PI / 3.0);
    printf("asin(0.5): ");
    evaluate(asin, 0.5, M_PI / 6.0);
#endif
#ifdef M_SQRT1_2
    printf("sqrt(0.5): ");
    evaluate(sqrt, 0.5, M_SQRT1_2);
#endif
#ifdef M_PI_2
    printf("asin(1): ");
    evaluate(asin, 1.0, M_PI_2);
#endif
#ifdef M_PI_4
    printf("atan(1): ");
    evaluate(atan, 1.0, M_PI_4);
#endif
#ifdef M_E
    printf("exp(1): ");
    evaluate(exp, 1.0, M_E);
#endif
    printf("log(1): ");
    evaluate(log, 1.0, 0.0);
    printf("log2(1): ");
    evaluate(log2, 1.0, 0.0);
    printf("log10(1): ");
    evaluate(log10, 1.0, 0.0);
#ifdef M_SQRT2
    printf("sqrt(2): ");
    evaluate(sqrt, 2.0, M_SQRT2);
#endif
    printf("log2(2): ");
    evaluate(log2, 2.0, 1.0);
#ifdef M_LN2
    printf("log(2): ");
    evaluate(log, 2.0, M_LN2);
    printf("exp(%.17g): ", M_LN2);
    evaluate(exp, M_LN2, 2.0);
#endif
    printf("log10(10): ");
    evaluate(log10, 10.0, 1.0);
#ifdef M_LN10
    printf("log(10): ");
    evaluate(log, 10.0, M_LN10);
    printf("exp(%.17g): ", M_LN10);
    evaluate(exp, M_LN10, 10.0);
#endif
#ifdef M_LOG2E
    printf("exp2(%.17g): ", M_LOG2E);
    evaluate(exp2, M_LOG2E, M_E);
#endif
#ifdef M_LOG10E
    printf("exp10(%.17g): ", M_LOG10E);
    evaluate(exp10, M_LOG10E, M_E);
#endif
#ifdef M_E
    printf("log(%.17g): ", M_E);
// log(2.7182818284590452) = 1.0
    evaluate(log, M_E, 1.0);
#ifdef M_LOG2E
    printf("log2(%.17g): ", M_E);
// log2(2.7182818284590452) = 1.0 / log(2.0)
    evaluate(log2, M_E, M_LOG2E);
#endif
#ifdef M_LOG10E
    printf("log10(%.17g): ", M_E);
// log10(2.7182818284590452) = 1.0 / log(10.0)
    evaluate(log10, M_E, M_LOG10E);
#endif
#endif
#ifdef M_PI_4
#ifdef M_SQRT1_2
    printf("cos(%.17g): ", M_PI_4);
    evaluate(cos, M_PI_4, M_SQRT1_2);
    printf("sin(%.17g): ", M_PI_4);
    evaluate(sin, M_PI_4, M_SQRT1_2);
#endif
    printf("tan(%.17g): ", M_PI_4);
    evaluate(tan, M_PI_4, 1.0);
#endif
#ifdef M_PI_2
    printf("cos(%.17g): ", M_PI_2);
// cos(1.5707963267948966) = 6.123233995736766e-17
    evaluate(cos, M_PI_2, 0.0);
    printf("sin(%.17g): ", M_PI_2);
    evaluate(sin, M_PI_2, 1.0);
#ifdef INFINITY
    printf("tan(%.17g): ", M_PI_2);
// tan(1.5707963267948966) = 1.633123935319537e16
    evaluate(tan, M_PI_2, INFINITY);
#endif
#endif
#ifdef M_PI
    printf("cos(%.17g): ", M_PI);
    evaluate(cos, M_PI, -1.0);
    printf("sin(%.17g): ", M_PI);
// sin(3.1415926535897932) = 1.2246467991473532e-16
    evaluate(sin, M_PI, 0.0);
    printf("tan(%.17g): ", M_PI);
    evaluate(tan, M_PI, 0.0);
#endif
    do next = cos(last = next);
    while (next != last);
    printf("cos(%.17g): ", last);
// cos(0.73908513321516064) = 0.73908513321516064
    evaluate(cos, last, 0.73908513321516064);
    printf("acos(%.17g): ", last);
// acos(0.73908513321516064) = 0.73908513321516064
    evaluate(acos, last, 0.73908513321516064);
}
cc -lm evaluate.c
./a.out
sqrt(-0): -0 is correctly rounded
log(0): -inf is correctly rounded
acos(0): 1.5707963267948966 is correctly rounded
acos(0.5): 1.0471975511965979 is faithfully rounded
asin(0.5): 0.52359877559829893 is faithfully rounded
sqrt(0.5): 0.70710678118654757 is correctly rounded
asin(1): 1.5707963267948966 is correctly rounded
atan(1): 0.78539816339744828 is correctly rounded
exp(1): 2.7182818284590451 is correctly rounded
log(1): 0 is correctly rounded
log2(1): 0 is correctly rounded
log10(1): 0 is correctly rounded
sqrt(2): 1.4142135623730951 is correctly rounded
log2(2): 1 is correctly rounded
log(2): 0.69314718055994529 is correctly rounded
exp(0.69314718055994529): 2 is correctly rounded
log10(10): 1 is correctly rounded
log(10): 2.3025850929940459 is correctly rounded
exp(2.3025850929940459): 10.000000000000002 is faithfully rounded
exp2(1.4426950408889634): 2.7182818284590451 is correctly rounded
exp10(0.43429448190325182): 2.7182818284590451 is correctly rounded
log(2.7182818284590451): 1 is correctly rounded
log2(2.7182818284590451): 1.4426950408889634 is correctly rounded
log10(2.7182818284590451): 0.43429448190325182 is correctly rounded
cos(0.78539816339744828): 0.70710678118654757 is correctly rounded
sin(0.78539816339744828): 0.70710678118654746 is faithfully rounded
tan(0.78539816339744828): 0.99999999999999989 is faithfully rounded
cos(1.5707963267948966): 6.123233995736766e-17 is …
sin(1.5707963267948966): 1 is correctly rounded
tan(1.5707963267948966): 16331239353195370 is …
cos(3.1415926535897931): -1 is correctly rounded
sin(3.1415926535897931): 1.2246467991473532e-16 is …
tan(3.1415926535897931): -1.2246467991473532e-16 is …
cos(0.73908513321516067): 0.73908513321516067 is correctly rounded
acos(0.73908513321516067): 0.73908513321516056 is faithfully rounded
Note: the (correctly rounded) value of the constant M_PI = 0x1.921FB54442D18p+1 = 3.1415926535897932 alias machine π is about 0x1.1A62633145C07p−53 = 1.2246467991473532e−16 greater than the exact value of π, and the (correctly rounded) value of the constant M_PI_2 = 0x1.921FB54442D18p−1 = 1.5707963267948966 is about 0x1.1A62633145C07p−54 = 6.123233995736766e−17 greater than the exact value of π/2.
From the identities
cos (r + π) = −cos r = −sin (r + π/2)
and
sin (r + π) = −sin r = cos (r + π/2)
plus the expansion of the Taylor-Maclaurin Madhava-Newton series
cos r = ∑ 1r2/2! + r4/4! − r6/6! + …
and
sin r = ∑ rr3/3! + r5/5! − r7/7! + …
follows that sine and cosine of such small double-precision floating-point values are equal to these values or ±1; the value of the tangent follows from the identity tan r = sin r / cos r.
Note: there exists no non-zero finite double-precision floating-point number for which the (correctly rounded) cosine is ±0 and the (correctly rounded) tangent is ±∞!

Shown by William Kahan (nearpi.c), the double-precision floating-point number that is closest to an integral multiple of π/2 is the (integral) number 6381956970095103 × 2797 = 0x1.6AC5B262CA1FFp+849 = 5.319372648326541416707296656673541083813475…e+255, which is about 4.68716592425462761112…e−19 less than the exact integral multiple of π/2; the maximum value of the double-precision tangent is therefore about 2.13348538575370384368…e+18.

Declaration

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define FLT_RADIX        2
#define FLT_ROUNDS       1 // round to nearest, ties to even

#define FP_ILOGB0        -2147483648
#define FP_ILOGBNAN      1024

#define FP_ZERO          0
#define FP_SUBNORMAL     1
#define FP_NORMAL        2
#define FP_INFINITE      3
#define FP_NAN           4

#ifndef INFINITY
#define INFINITY         (1.0 / 0.5e-323)
#endif
#define INDEFINITE       (0.0 * INFINITY)

#define MATH_ERREXCEPT   1
#define MATH_ERRNO       0
#define math_errhandling (MATH_ERREXCEPT | MATH_ERRNO)

double acos(double argument);
double acosh(double argument);
double acot(double argument);
double acot2(double y, double x);
double acoth(double argument);
double asin(double argument);
double asinh(double argument);
double atan(double argument);
double atan2(double y, double x);
double atanh(double argument);
double ceil(double argument);
double copysign(double to, double from);
double cos(double radians);
double exp(double argument);
double expm1(double argument);
double exp10(double argument);
double exp2(double argument);
double exp2n(int exponent);
double fabs(double argument);
double fdim(double left, double right);
double floor(double argument);
double fma(double multiplicand, double multiplier, double addend);
double fmax(double left, double right);
double fmin(double left, double right);
double fmod(double dividend, double divisor);
int fpclassify(double argument);
double frexp(double argument, int *exponent);
double hypot(double left, double right);
int ilogb(double argument);
int isfinite(double argument);
int isinf(double argument);
int isnan(double argument);
int isnormal(double argument);
int issubnormal(double argument);
double ldexp(double argument, int exponent);
double log(double argument);
double log1p(double argument);
double log10(double argument);
double log2(double argument);
double logb(double argument);
double modf(double argument, double *integer);
double nextafter(double from, double to);
double remainder(double dividend, double divisor);
double remquo(double dividend, double divisor, int *quotient);
double rint(double argument);
double round(double argument);
int signbit(double argument);
double sin(double radians);
double sqrt(double radicand);
double tan(double radians);
double trunc(double argument);
Note: indicated by the value 1 of the preprocessor macro FLT_ROUNDS, the functions presented here require the default rounding mode round to nearest, ties to even!

Note: indicated by the value 0 of the preprocessor macro MATH_ERRNO, the functions presented here don’t set the (global) errno variable!

Exponential Functions

Independent of its base or radix r, exponentiation, which is occasionally called antilog, exhibits the identities ra+b = ra × rb, rlogrc = c and rd × logrc = cd.

The exponential function can be approximated by a (minimax) polynomial on any sufficiently small interval with high accuracy, for example faithfully rounded, as shown hereafter.

exp2() Base-2 Exponential Function

The function exp2() returns the base-2 exponential of its argument.

For −1075 < x = y + z < 1024, with z = ⌊x, i.e. x rounded down towards −∞, hence 0 ≤ y ≤ 1, calculation of 2x = 2y+z = 2y × 2z is reduced to the (polynomial) approximation of 2y on the interval [0, 1], followed by the (trivial) multiplication with 2z.

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double floor(double x);
double ldexp(double x, int z);

// Faithfully rounded base-2 exponential

double exp2(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_SQRT2    1.41421356237309505
#define M_1_SQRT2  0.70710678118654752

    if (x != x)
        return INDEFINITE;

    if (x <= -1075.0)
        return 0.0;

    if (x == -1.0)
        return 0.5;

    if (x == -0.5)
        return M_1_SQRT2;

    if (x == 0.0)
        return 1.0;

    if (x == 0.5)
        return M_SQRT2;

    if (x == 1.0)
        return 2.0;

    if (x >= 1024.0)
        return INFINITY;
#endif
    // for z = floor(x) and x' = x - z, 2**x = 2**(x' + z)
    //                                       = 2**x' * 2**z

    z = floor(x);
    x -= z;

    // for 0 <= x' <= 1.0,
    // a minimax polynomial of degree 11 approximates 2**x'
    // with relative error 3.0545878321297965e-18 < 2**-58

    return ldexp(((((((((((+6.2724342467963420e-10 * x
                           +6.5544572890888113e-9) * x
                           +1.0254457347176946e-7) * x
                           +1.3208193500307799e-6) * x
                           +1.5253190248422251e-5) * x
                           +1.5403511446514356e-4) * x
                           +1.3333558661574856e-3) * x
                           +9.6181290987926433e-3) * x
                           +5.5504108665711137e-2) * x
                           +2.4022650695905471e-1) * x
                           +6.9314718055994623e-1) * x
                           +1.0, (int) z);
}
Note: overflow and underflow are handled by the ldexp() alias scalbn() function!

For −1075 < x = y + z < 1024, with z = ⌊x+½⌋ for x > 0 and z = ⌈x−½⌉ for x < 0, i.e. x rounded to the nearest (even) integral number, hence −½ ≤ y ≤ ½, calculation of 2x = 2y+z = 2y × 2z is reduced to the (polynomial) approximation of 2y on the interval [−½, ½], followed by the (trivial) multiplication with 2z.

# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# Faithfully rounded base-2 exponential

# CAVEAT: requires default (round to nearest, ties to even) rounding mode!

# exp2(-INFINITY) = 0
# exp2(0)         = 1
# exp2(1)         = 2
# exp2(INFINITY)  = INFINITY
# exp2(x)         = 2**x
#                 = 2**(x - z) * 2**z, -1075 < z = rint(x) < 1024
# exp2(-x)        = 1 / exp2(x)
#                 = 1 / 2**x
#                 = (1 / 2)**x

# IEEE 754 double-precision binary floating-point format:
# - 1-bit sign,
# - 12-bit characteristic is 1023 + exponent,
# - 53-bit significand is 0.fraction if 0 = characteristic,
#                         1.fraction if 0 < characteristic < 2047,
#                         1.anything if     characteristic = 2047,
# - integer bit of significand is implied and not stored
#
# binary64 = (-1)**sign * significand * 2**(characteristic - 1023)

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp2:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
.ifdef SSE4_1
	roundsd	xmm1, xmm0, 0		# xmm1 = argument rounded to nearest (even) integer
	cvtsd2si eax, xmm1		# eax = lrint(argument)
.else
	cvtsd2si eax, xmm0		# eax = lrint(argument)
.endif
#	neg	eax
#	jo	.Lrange			# argument > maximum 32-bit integer?
#					# argument < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument < -1074.0?
					# argument < minimum 32-bit integer?
					# argument > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument > 1023.0?

	cvtsi2sd xmm1, eax		# xmm1 = rint(argument)
					#      = log2(scale factor)
	subsd	xmm0, xmm1		# xmm0 = argument - rint(argument)
					#      = argument' in [-0.5, 0.5]
.Lhorner:
	mov	rcx, 0x3DFE7AA0E43A8B3C
	movq	xmm1, rcx		# xmm1 = 0x1.E7AA0E43A8B3Cp-32
					#      = 4.435280790456428e-10
	mulsd	xmm1, xmm0
	mov	rdx, 0x3E3E620FB7BAEC69
	movq	xmm2, rdx		# xmm2 = 0x1.E620FB7BAEC69p-28
					#      = 7.074105630863329e-9
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3E7B526788BF2851
	movq	xmm1, rcx		# xmm1 = 0x1.B526788BF2851p-24
					#      = 1.0178198034320939e-7
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3EB62BFC3C1C57DD
	movq	xmm2, rdx		# xmm2 = 0x1.62BFC3C1C57DDp-20
					#      = 1.3215433089567188e-6
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3EEFFCBFBA7B8470
	movq	xmm1, rcx		# xmm1 = 0x1.FFCBFBA7B847p-17
					#      = 1.5252733489958518e-5
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F243091310BF6C4
	movq	xmm2, rdx		# xmm2 = 0x1.43091310BF6C4p-13
					#      = 1.5403530462514668e-4
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3F55D87FE78CF26E
	movq	xmm1, rcx		# xmm1 = 0x1.5D87FE78CF26Ep-10
					#      = 1.3333558146789953e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F83B2AB6FB9F413
	movq	xmm2, rdx		# xmm2 = 0x1.3B2AB6FB9F413p-7
					#      = 9.618129107588335e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FAC6B08D7049FD0
	movq	xmm1, rcx		# xmm1 = 0x1.C6B08D7049FDp-5
					#      = 5.5504108664819921e-2
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FCEBFBDFF82C5AD
	movq	xmm2, rdx		# xmm2 = 0x1.EBFBDFF82C5Adp-3
					#      = 2.4022650695910156e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FE62E42FEFA39EF
	movq	xmm1, rcx		# xmm1 = 0x1.62E42FEFA39EFp-1
					#      = 6.9314718055994533e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm0, rdx		# xmm0 = 0x1.0p+0
					#      = 1.0
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument')
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument')
					#      * scale factor
					#      = exp2(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp2(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp2(<=-1074.0)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp2(>=1024.0)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp2(±0.0)
.Lexit:
	ret

.size	exp2, .-exp2
.type	exp2, @function
.global	exp2
.end
Note: the trivial transformation of the assembler sources with directives for Unix’ or GNU’s as into assembler sources for Microsoft’s ML.EXE or ML64.EXE and vice versa is left as an exercise to the reader.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720713.aspx

; exp2(x) = 2**x

	.686
	.model	flat, C
	.code

exp2	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = exponent
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent
	fld	st(1)			; st(0) = exponent,
					; st(1) = 1.0,
					; st(2) = exponent
	fprem				; st(0) = exponent modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent
	f2xm1				; st(0) = 2.0**(exponent modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent
	faddp	st(1), st(0)		; st(0) = 2.0**(exponent modulo 1.0),
					; st(1) = exponent
	fscale				; st(0) = 2.0**exponent,
					; st(1) = exponent
else
	fld	st(0)			; st(0) = st(1) = exponent
	frndint				; st(0) = integer(exponent),
					; st(1) = exponent
	fsub	st(1), st(0)		; st(0) = integer(exponent),
					; st(1) = fraction(exponent)
	fxch	st(1)			; st(0) = fraction(exponent),
					; st(1) = integer(exponent)
	f2xm1				; st(0) = 2.0**fraction(exponent) - 1.0,
					; st(1) = integer(exponent)
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent) - 1.0,
					; st(2) = integer(exponent)
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent),
					; st(1) = integer(exponent)
	fscale				; st(0) = 2.0**exponent,
					; st(1) = integer(exponent)
endif
	fstp	st(1)			; st(0) = 2.0**exponent
	ret

exp2	endp
	end

exp2n() Function

Note: exp2n(‹integer›) is equivalent to ldexp(1.0, ‹integer›).
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY (1.0 / 0.5e-323)

double exp2n(int exponent)
{
    unsigned long long ull;

    if (exponent > 1023)
        return INFINITY;

    if (exponent < -1074)
        return 0.0;

    if (exponent < -1022) {
        ull = 1;
        ull <<= 1074 + exponent;
    } else {
        ull = 1023 + exponent;
        ull <<= 52;
    }

    return *(double *) &ull;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# Unix System V calling convention for AMD64 platform:
# - first 6 floating-point arguments (from left to right) are passed in
#   registers XMM0 to XMM5;
# - first 6 integer or pointer arguments (from left to right) are passed
#   in registers RDI/R7, RSI/R6, RDX/R2, RCX/R1, R8 and R9
#   (R10 is used as static chain pointer in case of nested functions);
# - surplus arguments are pushed on stack in reverse order (from right to
#   left), 8-byte aligned;
# - 128-bit integer arguments are passed as pair of 64-bit integer arguments,
#   low part before/below high part;
# - 128-bit integer result is returned in registers RAX/R0 (low part) and
#   RDX/R2 (high part);
# - 64-bit integer or pointer result is returned in register RAX/R0;
# - 32-bit integer result is returned in register EAX;
# - floating-point result is returned in register XMM0;
# - registers RBX/R3, RSP/R4, RBP/R5 and R12 to R15 must be preserved;
# - registers RAX/R0, RCX/R1, RDX/R2, RSI/R6, RDI/R7, R8, R9, R10 (in
#   case of normal functions), R11 and XMM0 to XMM15 are volatile and can
#   be clobbered;
# - stack is 16-byte aligned: callee must decrement RSP by 8+n*16 bytes
#   before calling other functions (CALL instruction pushes 8 bytes);
# - a "red zone" of 128 bytes below the stack pointer can be clobbered.

# exp2n(<-1074) = 0
# exp2n(0)      = 1
# exp2n(>1023)  = INFINITY
# exp2n(n)      = 2**n
# exp2n(-n)     = 1 / exp2n(n)
#               = 1 / 2**n
#               = (1 / 2)**n

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# edi = exponent
exp2n:
	mov	eax, edi		# eax = exponent
	cmp	eax, BIAS
	jg	.Loverflow		# exponent > 1023?

	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# exponent < -1074?

	add	eax, BIAS		# eax = biased exponent
	jg	.Lnormal		# biased exponent > 0?
.Ldenormal:
	add	eax, 51			# eax = index of '1' bit in mantissa
	xor	edi, edi
	bts	rdi, rax		# rdi = denormal 2.0**exponent
	movq	xmm0, rdi		# xmm0 = denormal 2.0**exponent
	ret
.Loverflow:
	mov	eax, 1 + 2 * BIAS
					# rax = biased exponent
					#     = 2047
.Lnormal:
	shl	rax, 52
	movq	xmm0, rax		# xmm0 = 2.0**exponent
	ret
.Lunderflow:
	xorpd	xmm0, xmm0		# xmm0 = 0.0
					#      = exp2n(<-1074)
	ret

.size	exp2n, .-exp2n
.type	exp2n, @function
.global	exp2n
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; Microsoft calling convention for AMD64 platform:
; - first 4 arguments (from left to right) are passed in registers
;   RCX/R1 or XMM0, RDX/R2 or XMM1, R8 or XMM2, and R9 or XMM3,
;   depending on their type (for floating-point arguments of
;   unprototyped or variadic functions, where argument type
;   expected by callee is unknown, both registers are used);
; - arguments larger than 8 bytes are passed by reference;
; - surplus arguments are pushed on stack in reverse order (from
;   right to left), 8-byte aligned;
; - caller allocates memory for return value larger than 8 bytes and
;   passes pointer to it as (hidden) first argument, thus shifting
;   all other arguments;
; - caller always allocates "home space" for 4 arguments on stack,
;   even when less than 4 arguments are passed, but does not need to push
;   first 4 arguments;
; - callee can spill first 4 arguments from registers to "home space";
; - callee can clobber "home space";
; - stack is 16-byte aligned: callee must decrement RSP by 8+n*16
;   bytes when it calls other functions (CALL instruction pushes 8 bytes);
; - integer or pointer result is returned in register RAX/R0;
; - floating-point result is returned in register XMM0;
; - registers RAX/R0, RCX/R1, RDX/R2, R8, R9, R10, R11 and XMM0 to
;   XMM5 are volatile and can be clobbered;
; - registers RBX/R3, RSP/R4, RBP/R5, RSI/R6, RDI/R7, R12, R13, R14,
;   R15 and XMM6 to XMM15 must be preserved.

; exp2n(<-1074) = 0
; exp2n(0)      = 1
; exp2n(>1023)  = INFINITY
; exp2n(x)      = 2**x
; exp2n(-x)     = 1 / exp2n(x)
;               = 1 / 2**x
;               = (1 / 2)**x

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

exp2n	proc	public			; ecx = exponent

	mov	eax, ecx		; eax = exponent
	cmp	eax, bias
	jg	Loverflow		; exponent > 1023?

	cmp	eax, 1 - width mantissa - bias
	jl	Lunderflow		; exponent < -1074?

	add	eax, bias		; eax = biased exponent
	jg	Lnormal			; biased exponent > 0?
Ldenormal:
	add	eax, width mantissa - 1 ; eax = index of '1' bit in mantissa
	xor	ecx, ecx
	bts	rcx, rax		; rcx = denormal 2.0**exponent
	movd	xmm0, rcx		; xmm0 = denormal 2.0**exponent
	ret
Loverflow:
	mov	eax, bias * 2 + 1	; rax = biased exponent
					;     = 2047
Lnormal:
	shl	rax, width mantissa
	movd	xmm0, rax		; xmm0 = 2.0**exponent
	ret
Lunderflow:
	xorpd	xmm0, xmm0		; xmm0 = 0.0
					;      = exp2n(<-1074)
	ret

exp2n	endp
	end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; Common "cdecl" calling and naming convention for i386 platform:
; - arguments are pushed on stack in reverse order (from right to left),
;   4-byte aligned;
; - 64-bit integer arguments are passed as pair of 32-bit integer arguments,
;   low part below high part;
; - 80-bit, 64-bit or 32-bit floating-point result is returned in FPU
;   register ST0;
; - 64-bit integer result is returned in registers EAX (low part) and
;   EDX (high part);
; - 32-bit integer or pointer result is returned in register EAX;
; - registers EAX, ECX and EDX are volatile and can be clobbered;
; - registers EBX, ESP, EBP, ESI and EDI must be preserved.

; exp2n(<-1022) = 0
; exp2n(0)      = 1
; exp2n(>1023)  = INFINITY
; exp2n(n)      = 2**n
; exp2n(-n)     = 1 / exp2n(n)
;               = 1 / 2**n
;               = (1 / 2)**n

	.686
	.model	flat, C
	.code

exp2n	proc	public			; [esp+4] = argument

	fild	dword ptr [esp+4]	; st(0) = exponent
	fld1				; st(0) = 1.0,
					; st(1) = exponent
	fscale				; st(0) = 1.0 * 2.0**exponent,
					; st(1) = exponent
	fstp	st(1)			; st(0) = 2.0**exponent
	ret

exp2n	endp
	end

exp10() Base-10 Exponential Function

For −1075 < x × log210 < 1024, with z = ⌊x × log210⌋, i.e. x × log210 rounded down towards −∞, hence 0 ≤ y = xz × log102 ≤ log102 = 1/log210 = 0.3010299956639812, calculation of 10x = 10y+z×log102 = 10y × 10z×log102 = 10y × 2z is not as easy as calculation of 2x: for z × log102 close to x, calculation of the difference y = xz × log102 suffers from subtractive cancellation, i.e. complete loss of precision!

To avoid this, the product z × log102 must be calculated in higher precision and subtraction performed in 2 steps, known as Cody-Waite argument reduction: § log102 is split apart into a (double-double) head + tail pair, with tail = log210 − head and the 11 least significant bits (matching the size of the exponent) of head’s fraction clear.

The product z′ = head × z × log102 is then exact and the difference y′ = xz′ according to Sterbenz’ lemma § too.
Subtraction of the product z″ = tail × z × log102 from y′ gives a correctly rounded y″ = y for the (polynomial) approximation of 10y on the interval [0, log102 = 1/log210 = 0.3010299956639812], followed by the (trivial) multiplication with 2z.

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double floor(double x);
double ldexp(double x, int z);

// Faithfully rounded base-10 exponential

double exp10(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_SQRT10   3.1622776601683793
#define M_1_SQRT10 0.31622776601683793

    if (x != x)
        return INDEFINITE;

    if (x < -323.60724533877978)
        return 0.0;

    if (x == -1.0)
        return 0.1;

    if (x == -0.5)
        return M_1_SQRT10;

    if (x == 0.0)
        return 1.0;

    if (x == 0.5)
        return M_SQRT10;

    if (x == 1.0)
        return 10.0;

    if (x > 308.25471555991674)
        return INFINITY;
#endif
    // for z = x * log2(10.0) = 3.3219280948873623
    // and x" = x - z * log10(2.0), 10**x = 10**x" * 2**z
    //
    // for integral |z| < 2048 the double-precision product
    // z * 0x0.4D104D427DE00 = z * 0x1.34413509F7800p-2
    //                       = z * 0.30102999566395283
    // is exact and lies within a binade from x, therefore the
    // first subtraction yields an exact intermediate result x'
    //
    // subtraction of the double-precision tail product
    // z * 0x0.7FBCC47C4ACD6p-44 = z * 0x1.FEF311F12B358p-46
    //                           = z * 0.28363394551044964e-13
    // yields x" within 2**(-48-52) from x - z * log10(2.0)
    //
    // the correctly rounded x" lies within 0.5 ULP + 2**-100
    // from the exact x - z * log10(2.0)
    //
    // for 0 <= x" <= log10(2.0) = 0.3010299956639812,
    // a minimax polynomial of degree 11 approximates 10**x"
    // with relative error 3.0545878321297965e-18 < 2**-58

    z = floor(x * 3.3219280948873623);

    x -= z * 0.30102999566395283;
    x -= z * 0.28363394551044964e-13;

    return ldexp(((((((((((+3.4097977633132781e-4 * x
                           +1.0726030173640114e-3) * x
                           +5.0515508830497290e-3) * x
                           +1.9586879159041869e-2) * x
                           +6.8091402676825436e-2) * x
                           +2.0699559408492088e-1) * x
                           +5.3938295003481862e-1) * x
                           +1.1712551478362764) * x
                           +2.0346785923260857) * x
                           +2.6509490552386914) * x
                           +2.3025850929940488) * x
                           +1.0, (int) z);
}
Note: overflow and underflow are handled by the ldexp() alias scalbn() function!

For −1075 < x × log210 < 1024, with z = ⌊x × log210 + ½⌋ for x > 0 and z = ⌈x × log210 - ½⌉ for x < 0, i.e. x × log210 rounded to the nearest (even) integral number, hence −½ × log102 ≤ y = xz × log102 ≤ ½ × log102, calculation of 10x = 10y+z×log102 = 10y × 10z × log102 = 10y × 2z is reduced to the (polynomial) approximation of 10y on the interval [−½ × log102, ½ × log102], followed by the (trivial) multiplication with 2z.

# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# Faithfully rounded base-10 exponential

# CAVEAT: requires default (round to nearest, ties to even) rounding mode!

# exp10(-INFINITY) = 0
# exp10(0)         = 1
# exp10(1)         = 10
# exp10(INFINITY)  = INFINITY
# exp10(x)         = 10**x
#                  = 10**(x - z * log10(2)) * 2**z, -1075 < z = rint(x / log10(2)) < 1024
# exp10(-x)        = 1 / exp10(x)
#                  = 1 / 10**x
#                  = (1 / 10)**x

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp10:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
	mov	rax, 0x400A934F0979A371
	movq	xmm2, rax		# xmm2 = 0x1.A934F0979A371p+1
					#      = 3.3219280948873623
					#      = 1.0 / log10(2.0)
					#      = log2(10.0)
	mulsd	xmm2, xmm0		# xmm2 = log2(10.0) * argument
					#      = argument / log10(2.0)
.ifdef SSE4_1
	roundsd	xmm2, xmm2, 0		# xmm2 = argument / log10(2.0) rounded to nearest (even) integer
.endif
	cvtsd2si eax, xmm2		# eax = lrint(argument / log10(2.0))
#	neg	eax
#	jo	.Lrange			# argument / log10(2.0) > maximum 32-bit integer?
#					# argument / log10(2.0) < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument / log10(2.0) < -1074.0?
					# argument / log10(2.0) < minimum 32-bit integer?
					# argument / log10(2.0) > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument / log10(2.0) > 1023.0?

	cvtsi2sd xmm1, eax		# xmm1 = rint(argument / log10(2.0))
					#      = log2(scale factor)
	mov	rdx, 0x3FD34413509F7800
	movq	xmm2, rdx		# xmm2 = 0x1.34413509F7800p-2
					#      = 0.30102999566395283
					#      = log10(2.0)'
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - log10(2.0)' * rint(argument / log10(2.0))
					#      = argument'
	mov	rdx, 0x3D1FEF311F12B358
	movq	xmm2, rdx		# xmm2 = 0x1.FEF311F12B358p-46
					#      = 2.8363394551044964e-14
					#      = log10(2.0) - log10(2.0)'
					#      = log10(2.0)"
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument'
					#      - log10(2.0)" * rint(argument / log10(2.0))
					#      = argument" in [-log10(2.0) / 2.0, log10(2.0) / 2.0]
.Lhorner:
	mov	rcx, 0x3F2F9A47809D481E
	movq	xmm1, rcx		# xmm1 = 0x1.F9A47809D481Ep-13
					#      = 2.4110911209135413e-4
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F52F77F5270A2E0
	movq	xmm2, rdx		# xmm2 = 0x1.2F77F5270A2E0p-10
					#      = 1.1576407794199815e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3F74898B16300A8C
	movq	xmm1, rcx		# xmm1 = 0x1.4898B16300A8Cp-8
					#      = 5.0139840195721038e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F941165ADE1D201
	movq	xmm2, rdx		# xmm2 = 0x1.41165ADE1D201p-6
					#      = 1.9597614992067139e-2
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FB16E4DF62D8622
	movq	xmm1, rcx		# xmm1 = 0x1.16E4DF62D8622p-4
					#      = 6.8089363672264841e-2
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FCA7ED70A468547
	movq	xmm2, rdx		# xmm2 = 0x1.A7ED70A468547p-3
					#      = 2.0699584962589253e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x3FE1429FFD1F5001
	movq	xmm1, rcx		# xmm1 = 0x1.1429FFD1F5001p-1
					#      = 5.3938292921020555e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF2BD7609FD42C5
	movq	xmm2, rdx		# xmm2 = 0x1.2BD7609FD42C5p+0
					#      = 1.1712551489073786
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x4000470591DE2C1B
	movq	xmm1, rcx		# xmm1 = 0x1.0470591DE2C1Bp+1
					#      = 2.0346785922934154
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x40053524C73CEA7E
	movq	xmm2, rdx		# xmm2 = 0x1.53524C73CEA7Ep+1
					#      = 2.6509490552392084
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rcx, 0x40026BB1BBB55516
	movq	xmm1, rcx		# xmm1 = 0x1.26BB1BBB55516p+1
					#      = 2.3025850929940458
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm0, rdx		# xmm0 = 0x1.0p+0
					#      = 1.0
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument")
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * scale factor
					#      = exp10(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp10(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp10(<-0x1.439B746E36B52p+8)
#					#      = exp10(<-323.60724533877978)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp10(>0x1.34413509F79FFp+8)
#					#      = exp10(>308.25471555991674)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp10(±0.0)
.Lexit:
	ret

.size	exp10, .-exp10
.type	exp10, @function
.global	exp10
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; exp10(x)  = 10**x
;           = 2**(x * log2(10))
; exp10(-x) = 1 / exp10(x)

	.686
	.model	flat, C
	.code

exp10	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2t				; st(0) = log2(10.0),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(10.0)
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(10.0)
	fld	st(1)			; st(0) = exponent * log2(10.0),
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	fprem				; st(0) = (exponent * log2(10.0)) modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	f2xm1				; st(0) = 2.0**((exponent * log2(10.0)) modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(10.0)
	faddp	st(1), st(0)		; st(0) = 2.0**((exponent * log2(10.0)) modulo 1.0),
					; st(1) = exponent * log2(10.0)
	fscale				; st(0) = 10.0**exponent,
					; st(1) = exponent * log2(10.0)
else
	fld	st(0)			; st(0) = st(1) = exponent * log2(10.0)
	frndint				; st(0) = integer(exponent * log2(10.0)),
					; st(1) = exponent * log2(10.0)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(10.0)),
					; st(1) = fraction(exponent * log2(10.0))
	fxch	st(1)			; st(0) = fraction(exponent * log2(10.0)),
					; st(1) = integer(exponent * log2(10.0))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(10.0)) - 1.0,
					; st(1) = integer(exponent * log2(10.0))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(10.0)) - 1.0,
					; st(2) = integer(exponent * log2(10.0))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(10.0)),
					; st(1) = integer(exponent * log2(10.0))
	fscale				; st(0) = 10.0**exponent,
					; st(1) = integer(exponent * log2(10.0))
endif
	fstp	st(1)			; st(0) = 10.0**exponent
	ret

exp10	endp
	end

exp() Base-e Exponential Function

The function exp() returns the base-e exponential of its argument.

Calculation of the exponential function to the transcendental base e = 2.71828182845904523536028747135266249775724709369995…, known as Euler’s number and also Napier’s constant, ex = ey+z×loge2 = ey × ez×loge2 = ey × 2z for −1075 < x × log2e < 1024, with z = ⌊x × log2e, i.e. x × log2e rounded down towards −∞, hence 0 ≤ y = xz × loge2 ≤ loge2 = 1/log2e = 0.69314718055994531, is more difficult than calculation of 2x: for z × loge2 close to x, calculation of the difference y = xz × loge2 suffers from subtractive cancellation, i.e. complete loss of precision!

To avoid this, the product z × loge2 must be calculated in higher precision and subtraction performed in 2 steps, known as Cody-Waite argument reduction: § loge2 is split apart into a (double-double) head + tail pair, with tail = log2ehead and the 11 least significant bits (matching the size of the exponent) of head’s fraction clear.

The product z′ = head × z × loge2 is then exact and the difference y′ = xz′ according to Sterbenz’ lemma § too.
Subtraction of the product z″ = tail × z × loge2 from y′ gives a correctly rounded y″ = y for the (polynomial) approximation of ey on the interval [0, loge2 = 1/log2e = 0.69314718055994531], followed by the (trivial) multiplication with 2z.

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double floor(double x);
double ldexp(double x, int z);

// Faithfully rounded base-e exponential

double exp(double x)
{
    double z;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_E        2.7182818284590452
#define M_1_E      0.36787944117144232
#define M_SQRTE    1.6487212707001281
#define M_1_SQRTE  0.60653065971263342

    if (x != x)
        return INDEFINITE;

    if (x < -745.13321910194121)
        return 0.0;

    if (x == -1.0)
        return M_1_E;

    if (x == -0.5)
        return M_1_SQRTE;

    if (x == 0.0)
        return 1.0;

    if (x == 0.5)
        return M_SQRTE;

    if (x == 1.0)
        return M_E;

    if (x > 709.78271289338400)
        return INFINITY;
#endif
    // for (integral) z = x * log2(e) = x * 1.4426950408889634
    // and x" = x - z * log(2.0), e**x = e**x" * 2**z
    //
    // for integral |z| < 2048 the double-precision product
    // z * 0x0.B17217F7D1C00 = z * 0x1.62E42FEFA3800p-1
    //                       = z * 0.69314718055989033
    // is exact and lies within a binade from x, therefore the
    // first subtraction yields an exact intermediate result x'
    //
    // subtraction of the double-precision tail product
    // z * 0x0.F79ABC9E3B398p-44 = z * 0x1.EF35793C76730p-45
    //                           = z * 0.54979230187083712e-13
    // yields x" within 2**(-50-52) from x - z * log(2.0)
    //
    // the correctly rounded x" lies within 0.5 ULP + 2**-102
    // from the exact x - z * log(2.0)
    //
    // for 0 <= x" <= log(2.0) = 0.69314718055994531,
    // a minimax polynomial of degree 11 approximates e**x"
    // with relative error 3.0545878321297965e-18 < 2**-58

    z = floor(x * 1.4426950408889634);

    x -= z * 0.69314718055989033;
    x -= z * 0.54979230187083712e-13;

    return ldexp(((((((((((+3.5347283721656128e-8 * x
                           +2.5602485412126367e-7) * x
                           +2.7764095757136529e-6) * x
                           +2.4787899938611698e-5) * x
                           +1.9841863599469418e-4) * x
                           +1.3888871805082296e-3) * x
                           +8.3333336552944127e-3) * x
                           +4.1666666628388979e-2) * x
                           +1.6666666666933781e-1) * x
                           +4.9999999999990426e-1) * x
                           +1.0000000000000013) * x
                           +1.0, (int) z);
}
Note: overflow and underflow are handled by the ldexp() alias scalbn() function!

For −1075 < x × log2e < 1024, with z = ⌊x × log2e + ½⌋ for x > 0 and z = ⌈x × log2e − ½⌉ for x < 0, i.e. x × log2e rounded to the nearest (even) integral number, hence −½ × loge2 ≤ y = xz × loge2 ≤ ½ × loge2, calculation of ex = ey+z×loge2 = ey × ez×loge2 = ey × 2z is reduced to the (polynomial) approximation of ey on the interval [−½ × loge2, ½ × loge2], followed by the (trivial) multiplication with 2z.

# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# Faithfully rounded natural exponential

# CAVEAT: requires default (round to nearest, ties to even) rounding mode!

# exp(-INFINITY) = 0
# exp(0)         = 1
# exp(1)         = e
# exp(INFINITY)  = INFINITY
# exp(x)         = e**x
#                = e**(x - z * log(2)) * 2**z, -1075 < z = rint(x / log(2)) < 1024
# exp(-x)        = 1 / exp(x)
#                = 1 / e**x
#                = (1 / e)**x

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
exp:
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	comisd	xmm1, xmm0
	jz	.Lspecial		# argument = ±0.0?
					# argument = INDEFINITE?
	mov	rax, 0x3FF71547652B82FE
	movq	xmm2, rax		# xmm2 = 0x1.71547652B82FEp+0
					#      = 1.4426950408889634
					#      = 1.0 / log(2.0)
					#      = log2(e)
	mulsd	xmm2, xmm0		# xmm2 = log2(e) * argument
					#      = argument / log(2.0)
.ifdef SSE4_1
	roundsd	xmm2, xmm2, 0		# xmm2 = argument / log(2.0) rounded to nearest (even) integer
.endif
	cvtsd2si eax, xmm2		# eax = lrint(argument / log(2.0))
#	neg	eax
#	jo	.Lrange			# argument / log(2.0) > maximum 32-bit integer?
#					# argument / log(2.0) < minimum 32-bit integer?
#	neg	eax
	cmp	eax, 1 - 52 - BIAS
	jl	.Lunderflow		# argument / log(2.0) < -1074.0?
					# argument / log(2.0) < minimum 32-bit integer?
					# argument / log(2.0) > maximum 32-bit integer?
	cmp	eax, BIAS
	jg	.Loverflow		# argument / log(2.0) > 1023.0?

	cvtsi2sd xmm1, eax		# xmm1 = rint(argument / log(2.0))
					#      = log2(scale factor)
	mov	rdx, 0x3FE62E42FEFA3800
	movq	xmm2, rdx		# xmm2 = 0x1.62E42FEFE3800p-1
					#      = 0.69314718055989033
					#      = log(2.0)'
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument
					#      - log(2.0)' * rint(argument / log(2.0))
					#      = argument'
	mov	rdx, 0x3D2EF35793C76730
	movq	xmm2, rdx		# xmm2 = 0x1.EF35793C76730p-45
					#      = 5.4979230187083712e-14
					#      = log(2.0) - log(2.0)'
					#      = log(2.0)"
	mulsd	xmm2, xmm1
	subsd	xmm0, xmm2		# xmm0 = argument'
					#      - log(2.0)" * rint(argument / log(2.0))
					#      = argument" in [-log(2.0) / 2.0, log(2.0) / 2.0]
.Lhorner:
	mov	rdx, 0x3E5AD661C903688B
	movq	xmm1, rdx		# xmm1 = 0x1.AD661C903688Bp-26
					#      = 2.4994304016107913e-8
	mulsd	xmm1, xmm0
	mov	rdx, 0x3E928B311C7EB84F
	movq	xmm2, rdx		# xmm2 = 0x1.28B311C7EB84Fp-22
					#      = 2.7632293297497039e-7
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3EC71DF4520AAEEB
	movq	xmm1, rdx		# xmm1 = 0x1.71DF4520AAEEBp-19
					#      = 2.7557622533559223e-6
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3EFA01992D0FE736
	movq	xmm2, rdx		# xmm2 = 0x1.A01992D0FE736p-16
					#      = 2.4801486521375964e-5
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3F2A01A0110572B2
	movq	xmm1, rdx		# xmm1 = 0x1.A01A0110572B2p-13
					#      = 1.9841269432676262e-4
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3F56C16C1878111C
	movq	xmm2, rdx		# xmm2 = 0x1.6C16C1878111Cp-10
					#      = 1.3888888951224038e-3
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3F81111111130DD6
	movq	xmm1, rdx		# xmm1 = 0x1.1111111130DD6p-7
					#      = 8.3333333335592727e-3
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FA555555554F370
	movq	xmm2, rdx		# xmm2 = 0x1.555555554F370p-5
					#      = 4.1666666666492767e-2
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3FC55555555554A2
	movq	xmm1, rdx		# xmm1 = 0x1.55555555554A2p-3
					#      = 1.6666666666666169e-1
	addsd	xmm1, xmm2
	mulsd	xmm1, xmm0
	mov	rdx, 0x3FE0000000000010
	movq	xmm2, rdx		# xmm2 = 0x1.0000000000010p-1
					#      = 5.0000000000000177e-1
	addsd	xmm2, xmm1
	mulsd	xmm2, xmm0
	mov	rdx, 0x3FF0000000000000
	movq	xmm1, rdx		# xmm1 = 0x1.0p+0
					#      = 1.0
	addsd	xmm2, xmm1
	mulsd	xmm0, xmm2
	addsd	xmm0, xmm1		# xmm0 = polynomial(argument")
.Lscale:
	add	eax, BIAS		# eax = biased exponent of scale factor
	jle	.Ldenormal
.Lnormal:
	shl	rax, 52
	movq	xmm1, rax		# xmm1 = 2.0**unbiased exponent
					#      = scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * scale factor
					#      = exp(argument)
	ret
.Ldenormal:
	add	eax, 51			# eax = 51 + biased exponent of denormal scale factor
					#     = index of '1' bit in mantissa
	xor	edx, edx
	bts	rdx, rax		# rdx = denormal scale factor
	movq	xmm1, rdx		# xmm1 = denormal scale factor
	mulsd	xmm0, xmm1		# xmm0 = polynomial(argument")
					#      * denormal scale factor
					#      = exp(argument)
	ret
.Lunderflow:
#	comisd	xmm1, xmm0
#	jb	.Loverflow		# argument > 0.0?
#
#	xorpd	xmm0, xmm0		# xmm0 = 0.0
#					#      = exp(<-0x1.74385446D71C3p+9)
#					#      = exp(<-744.44007192138126)
#	ret
.Loverflow:
#	mov	rax, 0x7FF0000000000000
#	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
#					#      = INFINITY
#					#      = exp(>0x1.62E42FEFA39EFp+9)
#					#      = exp(>709.78271289338400)
#	ret
.Lrange:
	comisd	xmm1, xmm0
	sbb	eax, eax		# eax = (argument < 0.0) ? 0 : -1
	shr	eax, 21			# rax = (argument < 0.0) ? 0 : 0x7FF
	shl	rax, 52			# rax = (argument < 0.0) ? 0 : 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = (argument < 0.0) ? 0.0 : 0x1.0p+1024
					#      = (argument < 0.0) ? 0.0 : INFINITY
	ret
.Lspecial:
	jp	.Lexit			# argument = INDEFINITE?
.Lzero:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+0
					#      = 1.0
					#      = exp(±0.0)
.Lexit:
	ret

.size	exp, .-exp
.type	exp, @function
.global	exp
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/c850xxez.aspx

; exp(x)  = e**x
;         = 2**(x * log2(e))
; exp(-x) = 1 / exp(x)

	.686
	.model	flat, C
	.code

exp	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2e				; st(0) = log2(e),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(e)
if 0
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(e)
	fld	st(1)			; st(0) = exponent * log2(e),
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fprem				; st(0) = (exponent * log2(e)) modulo 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	f2xm1				; st(0) = 2.0**((exponent * log2(e)) modulo 1.0) - 1.0,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	faddp	st(1), st(0)		; st(0) = 2.0**((exponent * log2(e)) modulo 1.0),
					; st(1) = exponent * log2(e)
	fscale				; st(0) = e**exponent,
					; st(1) = exponent * log2(e)
else
	fld	st(0)			; st(0) = st(1) = exponent * log2(e)
	frndint				; st(0) = integer(exponent * log2(e)),
					; st(1) = exponent * log2(e)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(e)),
					; st(1) = fraction(exponent * log2(e))
	fxch	st(1)			; st(0) = fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(1) = integer(exponent * log2(e))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(2) = integer(exponent * log2(e))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	fscale				; st(0) = e**exponent,
					; st(1) = integer(exponent * log2(e))
endif
	fstp	st(1)			; st(0) = e**exponent
	ret

exp	endp
	end

expm1() Base-e Exponential Function

The function expm1() returns the by one decremented base-e exponential of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn353645.aspx

; expm1(x)  = e**x - 1
;           = 2**(x * log2(e)) - 1
; expm1(-x) = 1 / exp(x) - 1

	.686
	.model	flat, C
	.code

expm1	proc	public			; [esp+4] = exponent

	fld	real8 ptr [esp+4]	; st(0) = exponent
	fldl2e				; st(0) = log2(e),
					; st(1) = exponent
	fmulp	st(1), st(0)		; st(0) = exponent * log2(e)
	fld1				; st(0) = 1.0,
					; st(1) = exponent * log2(e)
	fld	st(1)			; st(0) = exponent * log2(e),
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fabs				; st(0) = |exponent * log2(e)|,
					; st(1) = 1.0,
					; st(2) = exponent * log2(e)
	fcompp				; st(0) = exponent * log2(e)
	fstsw	ax			; ax = FPU status word

					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > st(1)
					; .  0   ...  0   .   1   ........  st(0) < st(1)
					; .  1   ...  0   .   0   ........  st(0) = st(1)
					; .  1   ...  1   .   1   ........  st(0) # st(1)

	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah

					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)

	ja	Lrange			; |exponent * log2(e)| > 1.0?
;;	jp	Lexit			; exponent = INDEFINITE?

	f2xm1				; st(0) = 2.0**(exponent * log2(e)) - 1.0
					;       = e**exponent - 1.0
	ret
Lrange:
	fld	st(0)			; st(0) = st(1) = exponent * log2(e)
	frndint				; st(0) = integer(exponent * log2(e)),
					; st(1) = exponent * log2(e)
	fsub	st(1), st(0)		; st(0) = integer(exponent * log2(e)),
					; st(1) = fraction(exponent * log2(e))
	fxch	st(1)			; st(0) = fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	f2xm1				; st(0) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(1) = integer(exponent * log2(e))
	fld1				; st(0) = 1.0,
					; st(1) = 2.0**fraction(exponent * log2(e)) - 1.0,
					; st(2) = integer(exponent * log2(e))
	faddp	st(1), st(0)		; st(0) = 2.0**fraction(exponent * log2(e)),
					; st(1) = integer(exponent * log2(e))
	fscale				; st(0) = e**exponent,
					; st(1) = integer(exponent * log2(e))
	fstp	st(1)			; st(0) = e**exponent
	fld1				; st(0) = 1.0,
					; st(1) = e**exponent
	fsubp	st(1), st(0)		; st(0) = e**exponent - 1.0
Lexit:
	ret

expm1	endp
	end

Logarithm Functions

Independent of its base b or radix r, the logarithm exhibits the following identities, similar to the identities of exponentiation: logr(a × b) = logra + logrb, logrcd = d × logrc, logrre = e and logra = logba / logbr,

The logarithm function can be approximated by a (minimax) polynomial on any sufficiently small interval with high accuracy, for example faithfully rounded, as shown hereafter.

log() Base-e alias Natural Logarithm Function

The function log() returns the base-e alias natural logarithm of its argument.

logex = artanh((x2 − 1) / (x2 + 1)) = 2 × artanh((x − 1) / (x + 1)), loge(1 + x) = x1 / 1 − x2 / 2 + x3 / 3 − x4 / 4 + … = 2 × artanh(x / (2 + x)), …

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)
#define M_E        2.7182818284590452
#define M_LN2      0.69314718055994531
#define M_1_SQRT2  0.70710678118654752

double frexp(double x, int *z);

// Faithfully rounded natural logarithm

double log(double argument)
{
    double mantissa, x, y, z;
    int exponent;

    if (argument != argument)
        return INDEFINITE;

    if (argument < 0.0)
        return INDEFINITE;

    if (argument == 0.0)
        return -INFINITY;
#ifdef OPTIONAL
    if (argument == 1.0)
        return 0.0;

    if (argument == M_E)
        return 1.0;
#endif
    if (argument == INFINITY)
        return INFINITY;

    // for argument > 0,
    // log(argument) = log(2) * log2(argument)
    //
    // for argument = mantissa * 2**exponent,
    // log(argument) = log(mantissa * 2**exponent)
    //               = log(mantissa) + log(2**exponent)
    //               = log(mantissa) + log(2) * log2(2**exponent)
    //               = log(mantissa) + log(2) * log2(2) * exponent
    //               = log(mantissa) + log(2) * exponent
    //
    // for mantissa = 1,
    // log(mantissa) = log(2) * exponent
    //
    // for mantissa = 1 + fraction
    //              = (1 + x) / (1 - x)
    // and x = (mantissa - 1) / (mantissa + 1)
    //       = fraction / (2 + fraction)
    //       = 1 - 2 / (2 + fraction),
    // log(mantissa) = log(1 + fraction)
    //               = log((1 + x) / (1 - x))
    //               = log(1 + x) - log(1 - x)
    //
    // for x = 0,
    // log(1 + x) - log(1 - x) = log(1) - log(1)
    //                         = 0
    //
    // for -1 < x <= 1,
    // log(1 + x) = x**1/1 - x**2/2 + x**3/3 - x**5/5 + x**7/7 + ...
    //            = x - x**2/2 + x**3/3 - x**5/5 + x**7/7 + ...
    //
    // for -1 <= x < 1,
    // log(1 - x) = 0 - x**1/1 - x**2/2 - x**3/3 - x**5/5 - x**7/7 - ...
    //            = 0 - x - x**2/2 - x**3/3 - x**5/5 - x**7/7 - ...
    //            = 0 - (x + x**2/2 + x**3/3 + x**5/5 + x**7/7 + ...)
    //
    // for -1 < x < 1,
    // log(1 + x) - log(1 - x) = x - x**2/2 + x**3/3 - x**5/5 + x**7/7 - ...
    //                         + x + x**2/2 + x**3/3 + x**5/5 + x**7/7 + ...
    //                         = x * 2      + x**3/3 * 2      + x**7/7 * 2 + ...
    //                         = (x + x**3/3 + x**7/7 + ...) * 2
    //                         = x * 2 + (1 + x**2/3 + x**6/7 + ...) * 2
    //                         = x * 2 + polynomial(x**2)

    mantissa = frexp(argument, &exponent);
#ifdef OPTIONAL
    if (mantissa == 0.5)
#if 0
        return (exponent - 1) * M_LN2;
#elif 0
        return (exponent - 1) * 0x1.EF35793C76730p-45
             + (exponent - 1) * 0x1.62E42FEFA3800p-1;
#else
        return (exponent - 1) * 0.54979230187083712e-13
             + (exponent - 1) * 0.69314718055989033;
#endif
#endif
#if 0
    // for 1/2 <= mantissa = 1 + fraction < 1,
    // -1/2 <= fraction < 0 and x = (mantissa - 1) / (mantissa + 1),
    // -1/3 <= x < 0

    x = (mantissa - 1.0) / (mantissa + 1.0);

    // for 0 < x < 1/3,
    // a minimax polynomial of degree 10 in x**2 approximates
    // (log(1 + x) - log(1 - x)) / (2 * x) with relative error
    // 1.2300066608152056e-18 ~ 2**-59.5

    y = x * x;
    y = (((((((((+0.17060062608429468 * y
                 +0.083156843071811262) * y
                 +0.12112248959493536) * y
                 +0.13300102515887726) * y
                 +0.15386635453768495) * y
                 +0.18181739787751806) * y
                 +0.22222224111772142) * y
                 +0.28571428544925157) * y
                 +0.40000000000190325) * y
                 +0.66666666666666134) * y;
#else
    //      _                                 _
    // for /2/2 <= mantissa = 1 + fraction < /2
    // and x = (mantissa - 1) / (mantissa + 1),
    // -0.29289321881345248 <= fraction < 0.41421356237309505,
    // -0.1715728752538099 <= x < 0.1715728752538099

    if (mantissa < M_1_SQRT2) {
        mantissa += mantissa;
        exponent -= 1;
    }

    x = (mantissa - 1.0) / (mantissa + 1.0);

    // for -0.1715728752538099 <= x < 0.1715728752538099,
    // a minimax polynomial of degree 7 in x**2 approximates
    // (log(1 + x) - log(1 - x)) / (2 * x) with relative error
    // 1.1354910268086278e-18 ~ 2**-59.6

    y = x * x;
    y = ((((((+0.14810529843106951 * y
              +0.15312443753011222) * y
              +0.18183635094502661) * y
              +0.22222196988240322) * y
              +0.28571428761346767) * y
              +0.39999999999298882) * y
              +0.66666666666667652) * y;
#endif
    // K. C. Ng's formula yields an error below 1 ULP:
    // for z = fraction * fraction / 2
    // and x * 2 = fraction - fraction * x
    //           = fraction - z + z * x
    //           = fraction - (z - z * x),
    // log(mantissa) = log(1 + fraction)
    //               = fraction - (fraction - polynomial(x * x)) * x
    //               = fraction - (z - (z + polynomial(x * x)) * x)

    mantissa -= 1.0;
    z = mantissa * mantissa * 0.5;
    z = mantissa - (z - (z + y) * x);

    // for integral |exponent| < 2048,
    // the double-precision product exponent * 0x1.62E42FEFA3800p-1
    // is exact; addition of the double-precision tail product
    // exponent * 0x1.EF35793C76730p-45 yields log(2.0) * exponent
    // within 2**(-50-52) from the exact product
    //
    // log(argument) = log(mantissa) + log(2.0) * exponent
    //               = log(mantissa) + exponent * 0x1.EF35793C76730p-45
    //                               + exponent * 0x1.62E42FEFA3800p-1
#if 0
    z += exponent * 0x1.EF35793C76730p-45;
    z += exponent * 0x1.62E42FEFA3800p-1;
#else
    z += exponent * 0.54979230187083712e-13;
    z += exponent * 0.69314718055989033;
#endif
    return z;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# Faithfully rounded natural logarithm

# log(<0)       = INDEFINITE
# log(±0)       = -INFINITY
# log(1)        = 0
# log(e)        = 1
# log(INFINITY) = INFINITY
# log(1/x)      = -log(x)
# log(x)        = log(significand * 2**exponent)
#               = log(significand) + log(2) * exponent
#               = natural logarithm (to base e)

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
log:
	movq	rax, xmm0		# rax = argument
	add	rax, rax		# rax = argument << 1
					#     = |argument| << 1
#	jz	.Lzero			# argument = ±0.0?
#	jc	.Lnegative		# argument < ±0.0?
	jbe	.Lrange			# argument <= ±0.0?
.Lpositive:
	mov	rcx, rax
	shr	rcx, 53			# rcx = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	sub	ecx, BIAS		# ecx = unbiased exponent
	cmp	ecx, BIAS + 1
	je	.Lspecial		# biased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = INFINITY?)
.Lnormal:
	shl	rax, 11			# rax = fractional part of argument << 12
.Lcontinue:
	mov	rdx, 0x6A09E667F3BCC909	# rdx = fractional part of sqrt(2.0) << 12
	cmp	rdx, rax		# CF = (sqrt(2.0) < significand of argument)
	sbb	edx, edx		# edx = (sqrt(2.0) < significand of argument) ? -1 : 0
	sub	ecx, edx		# ecx = exponent of argument
					#     + (sqrt(2.0) < significand of argument)
					#     = exponent'
	add	edx, BIAS		# rdx = (sqrt(2.0) < significand of argument) ? BIAS - 1 : BIAS
	or	rax, rdx
	ror	rax, 12			# rax = significand of argument'
	movq	xmm0, rax		# xmm0 = significand of argument' in [sqrt(0.5), sqrt(2.0)]
.Ltransform:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	movsd	xmm2, xmm0
	subsd	xmm2, xmm1		# xmm2 = significand of argument' - 1.0
					#      = fraction of argument'
	addsd	xmm1, xmm0		# xmm1 = significand of argument' + 1.0
	movsd	xmm0, xmm2		# xmm0 = fraction of argument'
	divsd	xmm2, xmm1		# xmm2 = (significand of argument' - 1.0)
					#      / (significand of argument' + 1.0)
					#      = argument"
	movsd	xmm1, xmm2		# xmm1 = argument" in [-0.1715728752538099, 0.1715728752538099]
	mulsd	xmm2, xmm2		# xmm2 = argument"**2
.Lhorner:
	mov	rax, 0x3FC2F51D4A901906
	movq	xmm3, rax		# xmm3 = 0x1.2F51D4A901906p-3
					#      = 0.14810529843106951
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC39994E1B48251
	movq	xmm4, rdx		# xmm4 = 0x1.39994E1B48251p-3
					#      = 0.15312443753011222
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FC74669DE443505
	movq	xmm3, rax		# xmm3 = 0x1.74669DE443505p-3
					#      = 0.18183635094502661
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FCC71C4FE8C7EC6
	movq	xmm4, rdx		# xmm4 = 0x1.C71C4FE8C7EC6p-3
					#      = 0.22222196988240322
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FD2492494532F9F
	movq	xmm3, rax		# xmm3 = 0x1.2492494532F9Fp-2
					#      = 0.28571428761346767
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FD999999997AC3B
	movq	xmm4, rdx		# xmm4 = 0x1.999999997AC3Bp-2
					#      = 0.39999999999298882
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rax, 0x3FE55555555555AE
	movq	xmm3, rax		# xmm3 = 0x1.55555555555AEp-1
					#      = 0.66666666666667652
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2		# xmm3 = polynomial(argument"**2)
.Llogarithm:
	mov	rdx, 0x3FE0000000000000
	movq	xmm2, rdx		# xmm2 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm2, xmm0
	mulsd	xmm2, xmm0		# xmm2 = 0.5 * fraction of argument'**2
	addsd	xmm3, xmm2		# xmm3 = polynomial(argument"**2)
					#      + 0.5 * fraction of argument'**2
	mulsd	xmm3, xmm1		# xmm3 = (polynomial(argument"**2)
					#      + 0.5 * fraction of argument'**2)
					#      * argument"
	subsd	xmm2, xmm3
	subsd	xmm0, xmm2		# xmm0 = log(significand of argument')
.Lexponent:
	cvtsi2sd xmm1, ecx		# xmm1 = exponent'
	mov	rax, 0x3D2EF35793C76730
	movq	xmm3, rax		# xmm3 = 0x1.EF35793C76730p-45
					#      = 0.54979230187083712e-13
					#      = tail of log(2.0)
	mulsd	xmm3, xmm1
	addsd	xmm0, xmm3
	mov	rdx, 0x3FE62E42FEFA3800
	movq	xmm2, rdx		# xmm2 = 0x1.62E42FEFA3800p-1
					#      = 0.69314718055989033
					#      = head of log(2.0)
	mulsd	xmm2, xmm1
	addsd	xmm0, xmm2		# xmm0 = natural logarithm of argument
	ret
.Ldenormal:
	bsr	rcx, rax		# rcx = index of most significant '1' bit in argument << 1
	add	rax, rax
	xor	ecx, 63			# ecx = number of leading '0' bits in argument << 1
					#     = 11 - biased exponent
	shl	rax, cl			# rax = (fractional part of) normalized argument << 12
	neg	ecx			# ecx = biased exponent - 11
	sub	ecx, BIAS - 11		# ecx = unbiased exponent of normalized argument
	jmp	.Lcontinue
.Lrange:
	jnz	.Lnegative		# argument <> ±0.0?
.Lzero:
	mov	rax, 0xFFF0000000000000
	movq	xmm0, rax		# xmm0 = -0x1.0p+1024
					#      = -INFINITY
	ret
.Lspecial:
	shl	rax, 11
	jz	.Lexit			# argument = +INFINITY?
.Lindefinite:
.Lnegative:
	mov	rax, 0x7FF8000000000000
	movq	xmm0, rax		# xmm0 = 0x1.8p+1024
					#      = INDEFINITE
.Lexit:
	ret

.size	log, .-log
.type	log, @function
.global	log
.end
loge(significand × 2exponent−1023) = log2significand × (exponent - 1023) × loge2
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/t63833dz.aspx

; log(x) = log(2) * log2(x)
;        = natural logarithm (to base e)

	.686
	.model	flat, C
	.code

log	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of argument
	ret

log	endp
	end

log1p() Base-e alias Natural Logarithm Function

The function log1p() returns the base-e alias natural logarithm of its by one incremented argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720722.aspx

; log1p(x) = log(2) * log2(1 + x)
;          = natural logarithm (to base e) of (1 + x)

	.686
	.model	flat, C
	.code

log1p	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fabs				; st(0) = |argument|,
					; st(1) = argument,
					; st(2) = ln(2.0)
ifdef DOUBLE
	push	3FD2BEC3r
	push	33018867r		; [esp] = 1.0 - sqrt(0.5)
					;       = 0.292893218813452482773840301888412795960903167724609375
	fcomp	real8 ptr [esp]		; st(0) = argument,
					; st(1) = ln(2.0)
	pop	eax
else
	push	3E95F61Ar		; [esp] = 1.0F - sqrtf(0.5F)
					;       = 0.292893230915069580078125
	fcomp	real4 ptr [esp]		; st(0) = argument,
					; st(1) = ln(2.0)
endif
	pop	eax
	fstsw	ax			; ax = FPU status word

					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > [esp]
					; .  0   ...  0   .   1   ........  st(0) < [esp]
					; .  1   ...  0   .   0   ........  st(0) = [esp]
					; .  1   ...  1   .   1   ........  st(0) # [esp]

	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah

					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)

	ja	Lrange			; |argument| > 1.0 - sqrt(0.5)?
;;	jp	Lexit			; |argument| = INDEFINITE?

	fyl2xp1				; st(0) = natural logarithm of (argument - 1.0)
Lexit:
	ret
Lrange:
	fld1				; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + 1.0,
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument - 1.0)
	ret

log1p	endp
	end

log10() Base-10 alias Common Logarithm Function

The function log10() returns the base-10 alias common logarithm of its argument.

log10(significand × 2exponent−1023) = log2significand × (exponent - 1023) × log102

; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/t63833dz.aspx

; log10(x) = log10(2) * log2(x)
;          = common logarithm (to base 10)

	.686
	.model	flat, C
	.code

log10	proc	public			; [esp+4] = argument

	fldlg2				; st(0) = log10(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = log10(2.0)
	fyl2x				; st(0) = common logarithm of argument
	ret

log10	endp
	end

log2() Base-2 alias Binary Logarithm Function

The function log2() returns the base-2 logarithm of its argument.

log2(significand × 2exponent−1023) = log2significand × (exponent - 1023)

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

double frexp(double x, int *z);
double log(double x);

double log2(double argument)
{
    int exponent;

    if (argument != argument)
        return INDEFINITE;

    if (argument < 0.0)
        return INDEFINITE;

    if (argument == 0.0)
        return -INFINITY;

    if (argument == INFINITY)
        return INFINITY;

    argument = frexp(argument, &exponent);
#ifdef OPTIONAL
    if (mantissa == 0.5)
        return (double) (exponent - 1);
#endif
    return 1.4426950408889634 * log(argument) + exponent;
}
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720721.aspx

; log2(x) = binary logarithm (to base 2)

	.686
	.model	flat, C
	.code

log2	proc	public			; [esp+4] = argument

	fld1				; st(0) = 1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0
	fyl2x				; st(0) = binary logarithm of argument
	ret

log2	endp
	end

logb() Function

The function logb() returns the integral part of the base-2 logarithm of the absolute value of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

double logb(double argument)
{
    int exponent;

    if (argument != argument)
        return INDEFINITE;

    if (argument == 0.0)
        return -INFINITY;

    if (argument < 0.0)
        argument = -argument;

    if (argument == INFINITY)
        return INFINITY;

    exponent = *(unsigned long long *) &argument >> 52;

    return (exponent & 2047) - 1023;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# logb(±0)         = -INFINITY
# logb(±0.5)       = -1
# logb(±1)         = 0
# logb(±2)         = 1
# logb(±INFINITY)  = INFINITY
# logb(INDEFINITE) = INDEFINITE
# logb(x)          = floor(log2(fabs(x)))

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
logb:
	movq	rcx, xmm0		# rcx = argument
	add	rcx, rcx		# rcx = argument << 1
					#     = |argument| << 1
	jz	.Lzero			# argument = ±0.0?

	mov	rax, rcx
	shr	rax, 53			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	cmp	eax, BIAS * 2 + 1
	jne	.Lnormal		# biased exponent <> 2047?
					# (argument normal?)
	shl	rax, 12
	jnz	.Lindefinite		# argument = INDEFINITE?

.Linfinity:				# argument = ±INFINITY
	mov	rax, 0x7FF0000000000000
	movq	xmm0, rax		# xmm0 = 0x1.0p+1024
					#      = INFINITY
	ret
.Lnormal:
	sub	eax, BIAS		# eax = biased exponent - 1023
					#     = unbiased exponent of argument
	cvtsi2sd xmm0, eax		# xmm0 = unbiased exponent of argument
	ret
.Ldenormal:
	bsr	rax, rcx		# rax = index of most significant '1' bit
					#     = biased exponent + 52
	sub	eax, BIAS + 52		# eax = unbiased exponent of argument
	cvtsi2sd xmm0, eax		# xmm0 = unbiased exponent of argument
	ret
.Lzero:
	mov	rax, 0xFFF0000000000000
	movq	xmm0, rax		# xmm0 = -0x1.0p+1024
					#      = -INFINITY
	ret
.Lindefinite:
	mov	rax, 0x7FF8000000000000
	movq	xmm0, rax		# xmm0 = 0x1.8p+1024
					#      = INDEFINITE
	ret

.size	logb, .-logb
.type	logb, @function
.global	logb
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/e4x82d9s.aspx

	.686
	.model	flat, C
	.code

logb	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = mantissa
					;       = argument / 2.0**exponent,
					; st(1) = exponent
	fstp	st(0)			; st(0) = exponent
	ret

logb	endo
	end

ilogb() Function

The function logb() returns the integral part of the base-2 logarithm of the absolute value of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int ilogb(double argument)
{
    int exponent = *(unsigned long long *) &argument >> 52;

    return argument == 0.0 ? -2147483648 : (exponent & 2047) - 1023;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# ilogb(±0)         = -2**31
# ilogb(±0.5)       = -1
# ilogb(±1)         = 0
# ilogb(±2)         = +1
# ilogb(±INFINITY)  = +1024
# ilogb(INDEFINITE) = +1024
# ilogb(x)          = floor(log2(fabs(x)))

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
ilogb:
	movq	rcx, xmm0		# rcx = argument
	add	rcx, rcx		# rcx = argument << 1
					#     = |argument| << 1
	jz	.Lzero			# argument = ±0.0?

	mov	rax, rcx
	shr	rax, 53			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
.Lnormal:
	sub	eax, BIAS		# eax = biased exponent - 1023
					#     = unbiased exponent of argument
	ret
.Ldenormal:
	bsr	rax, rcx		# rax = index of most significant '1' bit
					#     = biased exponent + 52
	sub	eax, BIAS + 52		# eax = unbiased exponent of argument
	ret
.Lzero:
	mov	eax, -2147483648	# eax = -2**31
	ret

.size	ilogb, .-ilogb
.type	ilogb, @function
.global	ilogb
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720719.aspx

	.686
	.model	flat, C
	.code

ilogb	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = mantissa
					;       = argument / 2.0**exponent,
					; st(1) = exponent
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa
	push	eax
	fistp	dword ptr [esp]		; [esp] = exponent,
					; st(0) = mantissa
	pop	eax			; eax = exponent
	ret

ilogb	endp
	end

Trigonometric Functions

machine e = 0x1.5BF0A8B145769p+1 = 2.7182818284590452 is 0x1.5355FB8AC404Ep−54 = 0.7228234458646251e−16 greater than the exact value of e. machine pi = 0x1.921FB54442D18p+1 = 3.1415926535897932 is 0x1.1A62633145C07p−53 = 1.2246467991473532e−16 greater than the exact value of π
sin(π) = 0, but sin(3.14159265358979324) = 1.2246467991473532e−16;
cos(π/2) = 0, but cos(1.5707963267948966) = 6.123233995736766e−17;
tan(π/2) = ∞, but tan(1.5707963267948966) = 16331239353195370 = 1/6.123233995736766e−17 0x0.28BE60DB9391054A7F09D5F47D4D377036D8A5664F10E4107F9458EAF7AEF1586DC91B8E909374B801924BBA827464873F877AC72C4A69CFBA208D7D4BAED1213A671C09AD17DF904E64758E60D4CE7D272117E2EF7E4A0EC7FE25FFF7816603FBCBC462D6829B47DB4D9FB3C9F2C26DD3D18FD9A797FA8B5D49EEB1FAF97C5ECF41CE7DE294A4BA9AFED7EC47E357421580CC11BF1EDAEA 0b0.00101000101111100110000011011011100100111001000100000101010010100111111100001001110101011111010001111101010011010011011101110000001101101101100010100101011001100100111100010000111001000001000001111111100101000101100011101010111101111010111011110001010110000110110111001001000110111000111010010000100100110111010010111000000000011001001001001011101110101000001001110100011001001000011100111111100001110111101011000111001011000100101001101001110011111011101000100000100011010111110101001011101011101101000100100001001110100110011100011100000010011010110100010111110111111001000001001110011001000111010110001110011000001101010011001110011111010010011100100001000101111110001011101111011111100100101000001110110001111111111000100101111111111111011110000001011001100000001111111011110010111100010001100010110101101000001010011011010001111101101101001101100111111011001111001001111100101100001001101101110100111101000110001111110110011010011110010111111110101000101101011101010010011110111010110001111110101111100101111100010111101100111101000001110011100111110111100010100101001010010010111010100110101111111011010111111011000100011111100011010101110100001000010101100000001100110000010001101111110001111011011010111010101111110000110011111011110000100000100110101111010000110110000111011010100111100011100100010110000101011110111001100001101100001000011001011001100110000101010111110001010010100000011010000100000010001101111111111101100010000000010011010111001100100111001100010000011000000110000101010101011011001010011100111010100011001001011000001110001001111011110000001000110001101011 0x0.A2F9836E4E441529FC2757D1F534DDC0DB6295993C439041FE5163ABDEBBC561B7246E3A424DD2E006492EEA09D1921CFE1DEB1CB129A73EE88235F52EBB4484E99C7026B45F7E413991D639835339F49C845F8BBDF9283B1FF897FFDE05980FEF2F118B5A0A6D1F6D367ECF27CB09B74F463F669E5FEA2D7527BAC7EBE5F17B3D0739F78A5292EA6BFB5FB11F8D5D0856033046FC7B6BABF0CFBC209AF4361DA9E391615EE61B086599855F14A068408DFFD8804D73273106061556CA73A8C960E27BC08C6B

π = 3.14159265358979324 = 0x1.921FB54442D18p+1
remainder = 0x1'FFFFFFFFFFFFF'A61D414728C8B'C4F533 = quadrant 1, 0x0.0000000000000'59E2BEB8D7374'3B0ACD = 0.00000000000000007796343665038750893128850032303923134435791966849159349864407171054711716273732946547170286066830158233642578125 = 7.79634366503875089e−17 = 0x1.678AFAE35CDD1p−54
reduced = 1.2246467991473532e−16

6381956970095103 × 2797 =
0x16AC5B262CA1FF × 2797 =
0x1.6AC5B262CA1FFp+849 =
5.319372648326541416707296656673541083813475…e+255
is the binary64 that is closest to a multiple of π/2 ???
remainder = quadrant 1, +2.983942503748065…e−19=0x1.604820E0811AA'802p−62
reduced = 4.68716592425462761112…e−19

cos(0x1.0p+120) = −0.92587902285483786730386176410741494673083320992866…

cos(2) = −0.41614683654714238699756822950076218976600077107554…

sin(22) = −0.00885130929040387592169025681577233246328920395133256644233083529808955201463… 22 = π × 7.002817496…

sin(1.0e+22) = −0.8522008497671888017727…

cos() (Circular) Cosine Function

The function cos() returns the (circular) cosine of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/ff770589.aspx

	.686
	.model	flat, C
	.code

cos	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fcos				; st(0) = cosine of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Lexit			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?

	fstp	st(1)			; st(0) = argument'
	fcos				; st(0) = cosine of argument'
endif
Lexit:
	ret

cos	endp
	end
Caveat: although the FSCALE instruction yields 2×π in double-extended (80-bit) precision, and the FPREM1 instruction operates in double-extended (80-bit) precision too, reduction of arguments that are greater than 263 in magnitude to the interval (-π, π) looses almost all precision: for example 0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 = -1.897517260289773073471397690781259370851330459117889404296875 instead of 4.68716592425462761112…e−19!

cot() (Circular) Cotangent Function

The function cot() returns the (circular) cotangent of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; cot(x) = 1 / tan(x)
;        = cos(x) / sin(x)

	.686
	.model	flat, C
	.code

cot	proc	public			; [esp+4] argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Ldone			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
	ret
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?

	fstp	st(1)			; st(0) = argument'
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument'
endif
Ldone:
	fdivrp	st(1), st(0)		; st(0) = 1.0 / tangent of argument
					;       = cotangent of argument
	ret

cot	endp
	end
Caveat: although the FSCALE instruction yields 2×π in double-extended (80-bit) precision, and the FPREM1 instruction operates in double-extended (80-bit) precision too, reduction of arguments that are greater than 263 in magnitude to the interval (-π, π) looses almost all precision: for example 0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 = -1.897517260289773073471397690781259370851330459117889404296875 instead of 4.68716592425462761112…e−19!

sin() (Circular) Sine Function

The function sin() returns the (circular) sine of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/ff770597.aspx

	.686
	.model	flat, C
	.code

sin	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fsin				; st(0) = sine of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Lexit			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?

	fstp	st(1)			; st(0) = argument'
	fsin				; st(0) = sine of argument'
endif
Lexit:
	ret

sin	endp
	end
Caveat: although the FSCALE instruction yields 2×π in double-extended (80-bit) precision, and the FPREM1 instruction operates in double-extended (80-bit) precision too, reduction of arguments that are greater than 263 in magnitude to the interval (-π, π) looses almost all precision: for example 0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 = -1.897517260289773073471397690781259370851330459117889404296875 instead of 4.68716592425462761112…e−19!

tan() (Circular) Tangent Function

The function tan() returns the (circular) tangent of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/ff770595.aspx

	.686
	.model	flat, C
	.code

tan	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jnp	Ldone			; |argument| < 2**63?
ifndef REDUCE
	fsub	st(0), st(0)		; st(0) = argument - argument
					;       = 0.0 (or INDEFINITE)
	fdiv	st(0), st(0)		; st(0) = INDEFINITE
	ret
else
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fldpi				; st(0) = pi,
					; st(1) = 1.0,
					; st(2) = argument
	fscale				; st(0) = pi * 2**1,
					; st(1) = 1.0,
					; st(2) = argument
	fstp	st(1)			; st(0) = pi * 2**1,
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = pi * 2**1
Lreduce:
	fprem1				; st(0) = argument modulo (pi * 2**1)
					;       = argument',
					; st(1) = pi * 2**1
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce			; |argument'| > pi?

	fstp	st(1)			; st(0) = argument'
	fptan				; st(0) = 1.0,
					; st(1) = tangent of argument'
endif
Ldone:
	fstp	st(0)			; st(0) = tangent of argument
	ret

tan	endp
	end
Caveat: although the FSCALE instruction yields 2×π in double-extended (80-bit) precision, and the FPREM1 instruction operates in double-extended (80-bit) precision too, reduction of arguments that are greater than 263 in magnitude to the interval (-π, π) looses almost all precision: for example 0x1.6AC5B262CA1FFp+849, the closest integral multiple of π/2 in double precision, is reduced to -0x1.E5C3B0F08A43A7B0p0 = -1.897517260289773073471397690781259370851330459117889404296875 instead of 4.68716592425462761112…e−19!

Inverse Trigonometric Functions

acos() Arc Cosine Function

The function acos() returns the (principal) arc cosine of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
int signbit(double x);
double sqrt(double x);

static inline
double asin_poly(double t)
{
    // for -0.5 <= t <= 0.5,
    // a minimax polynomial of degree 12 in t**2 approximates asin(t)

    double s = t * t;

    return (((((((((((+0x1.02FF4C7428A47p-5 * s
                      -0x1.032E75CCD4AE8p-6) * s
                      +0x1.3C0E0817E9742p-6) * s
                      +0x1.B0EF96B727E7Ep-8) * s
                      +0x1.8E3FD48D0FB6Fp-7) * s
                      +0x1.C70DDF81249FCp-7) * s
                      +0x1.1C6B5042EC6B2p-6) * s
                      +0x1.6E89F8578B64Ep-6) * s
                      +0x1.F1C72C5FD95BAp-6) * s
                      +0x1.6DB6DB407C2B3p-5) * s
                      +0x1.3333333375CD0p-4) * s
                      +0x1.55555555552F4p-3) * s * t + t;
}

double acos(double x)
{
    // for -1.0 <= x < -0.5, arccos(x) = (π/2 - asin_poly(sqrt((1 + x) / 2))) * 2
    // for -0.5 <= x <= 0.5, arccos(x) = π/2 - asin_poly(x)
    // for  0.5 <  x <= 1.0, arccos(x) = asin_poly(sqrt((1 - x) / 2)) * 2

    double z = fabs(x);
    int i = z > 0.5;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

    if (x != x)
        return INDEFINITE;

    if (x == 1.0)
        return 0.0;

    if (x == 0.0)
        return 1.57079632679489662;      // π/2

    if (x == -1.0)
        return 3.14159265358979324;      // π

    if (z > 1.0)
        return INDEFINITE;
#endif
    if (i)
        z = sqrt(0.5 - 0.5 * z);

    z = asin_poly(z);
    z = copysign(z, x);

    if (i) {                             // |x| > 0.5?
        if (signbit(x)) {                // x < -0.5?
#ifdef FP_FAST_FMA
            z = fma(1.8656436928143307, 0.8419594442630920, z);
#else
            z += -0x1.5777A5CF72CECp-18; // tail of π/2
            z += 0x1.921FC00000000p-0;   // head of π/2
#endif
        }
        z += z;
    } else {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;
        z += 0x1.921FC00000000p-0;
#endif
    }

    return z;
}
Note: used within the fma() function, the product 1.8656436928143307 × 1.6839188885261840 = 0x1.DD9AD336A05p+0 × 0x1.AF154EEB562D6p+0 = 0x1.921FB54442D18469898CC517p+1 (courtesy of Norbert Juffa and Tor Myklebust) provides 104 bits of π, equivalent to 31 decimal places.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
acos:
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (0.5 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?

	sbb	eax, eax		# eax = (0.5 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 0.5?
.Lbig:
	mulsd	xmm2, xmm1		# xmm2 = 0.5 * |argument|
	subsd	xmm1, xmm2		# xmm1 = 0.5 - 0.5 * |argument|
	sqrtsd	xmm2, xmm1		# xmm2 = sqrt(0.5 - 0.5 * |argument|)
					#      = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
	mov	rcx, 0x3FA02FF4C7428A47
	movq	xmm3, rcx		# xmm3 = 0x1.02FF4C7428A47p-5
					#      = 0.031615876506539346
	mulsd	xmm3, xmm2
	mov	rdx, 0xBF9032E75CCD4AE8
	movq	xmm4, rdx		# xmm4 = -0x1.032E75CCD4AE8p-6
					#      = -0.015819182433299966
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F93C0E0817E9742
	movq	xmm3, rcx		# xmm3 = 0x1.3C0E0817E9742p-6
					#      = 0.019290454772679107
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F7B0EF96B727E7E
	movq	xmm4, rdx		# xmm4 = 0x1.B0EF96B727E7Ep-8
					#      = 0.006606077476277171
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F88E3FD48D0FB6F
	movq	xmm3, rcx		# xmm3 = 0x1.8E3FD48D0FB6Fp-7
					#      = 0.012153605255773773
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F8C70DDF81249FC
	movq	xmm4, rdx		# xmm4 = 0x1.C70DDF81249FCp-7
					#      = 0.013887151845016092
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F91C6B5042EC6B2
	movq	xmm3, rcx		# xmm3 = 0x1.1C6B5042EC6B2p-6
					#      = 0.017359569912236146
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F96E89F8578B64E
	movq	xmm4, rdx		# xmm4 = 0x1.6E89F8578B64Ep-6
					#      = 0.022371761819320483
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F9F1C72C5FD95BA
	movq	xmm3, rcx		# xmm3 = 0x1.F1C72C5FD95BAp-6
					#      = 0.030381959280381322
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA6DB6DB407C2B3
	movq	xmm4, rdx		# xmm4 = 0x1.6DB6DB407C2B3p-5
					#      = 0.044642856813771024
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3FB3333333375CD0
	movq	xmm3, rcx		# xmm3 = 0x1.3333333375CD0p-4
					#      = 0.075000000003785816
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC55555555552F4
	movq	xmm4, rdx		# xmm4 = 0x1.55555555552F4p-3
					#      = 0.166666666666649754
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
.if 0
	mov	rcx, 0x3FF0000000000000
	movq	xmm3, rcx		# xmm3 = 0x1.0p+0
					#      = 1.0
	addsd	xmm3, xmm4
	mulsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.else
	mulsd	xmm4, xmm1
	addsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.endif
	orpd	xmm0, xmm1		# xmm0 = polynomial(argument)

	test	eax, eax
	jz	.Lsmall			# |argument| <= 0.5?

	movmskpd eax, xmm0		# eax = (argument & -0.0) ? 0b?1 : 0b?0
	shr	eax, 1
	jnc	.Lpositive		# argument > 0.5?
.Lnegative:
	mov	rdx, 0x3FF921FC00000000
	movq	xmm1, rdx		# xmm1 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	addsd	xmm1, xmm0
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm0, rcx		# xmm0 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm0, xmm1		# xmm0 = pi/2 - polynomial(argument)
.Lpositive:
	addsd	xmm0, xmm0		# xmm0 = acos(argument)
	ret
.Lsmall:
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm1, rcx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	subsd	xmm1, xmm0
	mov	rdx, 0x3FF921FC00000000
	movq	xmm0, rdx		# xmm1 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	addsd	xmm0, xmm1		# xmm0 = pi/2 - polynomial(argument)
					#      = acos(argument)
	ret

.size	acos, .-acos
.type	acos, @function
.global	acos
.end
The following implementation for the i387 FPU uses its FPATAN instruction and the formula arccos(argument) = arctan2(argument, sqrt(1 − argument²)) based upon the identities cos(result) = argument, sin²(result) + cos²(result) = 1 and tan(result) = sin(result) / cos(result):
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/bztkwykh.aspx

; arccos(x) = arctan(sqrt((1 + x) * (1 - x)) / x)
;           = arctan(sqrt(1 - x**2) / x)
;           = arctan2(x, sqrt(1 - x**2))
;           = arctan2(x, sqrt((1 + x) * (1 - x)))

	.686
	.model	flat, C
	.code

acos	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fadd	st(0), st(1)		; st(0) = 1.0 + argument,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fsub	st(0), st(2)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fmulp	st(1), st(0)		; st(0) = (1.0 - argument) * (1.0 + argument)
					;       = 1.0 - argument**2,
					; st(1) = argument
else
	fld	st(0)			; st(0) = st(1) = argument
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument
	fsubrp	st(1), st(0)		; st(0) = 1.0 - argument**2,
					; st(1) = argument
endif
	fsqrt				; st(0) = square root of (1.0 - argument**2),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = square root of (1.0 - argument**2)
	fpatan				; st(0) = inverse circular cosine of argument
	ret

acos	endp
	end

acot() Arc Cotangent Function

The function acot() returns the (principal) arc cotangent of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; arccot(x) = arctan(1 / x)
;           = arctan2(x, 1)

	.686
	.model	flat, C
	.code

acot	proc	public			; [esp+4] = argument

	fld1				; st(0) = 1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0
	fpatan				; st(0) = inverse circular tangent of (1.0 / argument)
					;       = inverse circular cotangent of argument
	ret

acot	endp
	end

acot2() Arc Cotangent Function

The function acot2() returns the (principal) arc cotangent of the quotient of its arguments.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; arccot2(y, x) = arctan2(x, y)

	.686
	.model	flat, C
	.code

acot2	proc	public			; [esp+12] = denominator
					; [esp+4] = numerator

	fld	real8 ptr [esp+12]	; st(0) = denominator
	fld	real8 ptr [esp+4]	; st(0) = numerator,
					; st(1) = denominator
	fpatan				; st(0) = inverse circular tangent of (denominator / numerator)
					;       = inverse circular cotangent of (numerator / denominator)
	ret

acot2	endp
	end

asin() Arc Sine Function

The function asin() returns the (principal) arc sine of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
double sqrt(double x);

static inline
double asin_poly(double t)
{
    // for -0.5 <= t <= 0.5,
    // a minimax polynomial of degree 12 in t**2 approximates asin(t)

    double s = t * t;

    return (((((((((((+0x1.02FF4C7428A47p-5 * s
                      -0x1.032E75CCD4AE8p-6) * s
                      +0x1.3C0E0817E9742p-6) * s
                      +0x1.B0EF96B727E7Ep-8) * s
                      +0x1.8E3FD48D0FB6Fp-7) * s
                      +0x1.C70DDF81249FCp-7) * s
                      +0x1.1C6B5042EC6B2p-6) * s
                      +0x1.6E89F8578B64Ep-6) * s
                      +0x1.F1C72C5FD95BAp-6) * s
                      +0x1.6DB6DB407C2B3p-5) * s
                      +0x1.3333333375CD0p-4) * s
                      +0x1.55555555552F4p-3) * s * t + t;
}

double asin(double x)
{
    // for -1.0 <= x < -0.5, arcsin(x) = -π/2 + asin_poly(sqrt((1 + x) / 2)) * 2
    // for -0.5 <= x <= 0.5, arcsin(x) = asin_poly(x)
    // for  0.5 <  x <= 1.0, arcsin(x) =  π/2 - asin_poly(sqrt((1 - x) / 2)) * 2

    double z = fabs(x);
    int i = z > 0.5;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

    if (x != x)
        return INDEFINITE;

    if (x == 0.0)
        return x;

    if (z == 1.0)
        return copysign(1.57079632679489662, x); // ±π/2

    if (z > 1.0)
        return INDEFINITE;
#endif
    if (i)
        z = sqrt(0.5 - 0.5 * z);

    z = asin_poly(z);

    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -2.0 * z);
#else
        z = 0x1.921FC00000000p-0 - (z + z);      // head of π/2
        z += -0x1.5777A5CF72CECp-18;             // tail of π/2
#endif
    }

    return copysign(z, x);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
asin:
	mov	rax, 0x3FE0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p-1
					#      = 0.5
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (0.5 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?

	sbb	eax, eax		# eax = (0.5 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 0.5?
.Lbig:
	mulsd	xmm2, xmm1		# xmm2 = 0.5 * |argument|
	subsd	xmm1, xmm2		# xmm1 = 0.5 - 0.5 * |argument|
	sqrtsd	xmm2, xmm1		# xmm2 = sqrt(0.5 - 0.5 * |argument|)
					#      = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
	mov	rcx, 0x3FA02FF4C7428A47
	movq	xmm3, rcx		# xmm3 = 0x1.02FF4C7428A47p-5
					#      = 0.031615876506539346
	mulsd	xmm3, xmm2
	mov	rdx, 0xBF9032E75CCD4AE8
	movq	xmm4, rdx		# xmm4 = -0x1.032E75CCD4AE8p-6
					#      = -0.015819182433299966
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F93C0E0817E9742
	movq	xmm3, rcx		# xmm3 = 0x1.3C0E0817E9742p-6
					#      = 0.019290454772679107
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F7B0EF96B727E7E
	movq	xmm4, rdx		# xmm4 = 0x1.B0EF96B727E7Ep-8
					#      = 0.006606077476277171
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F88E3FD48D0FB6F
	movq	xmm3, rcx		# xmm3 = 0x1.8E3FD48D0FB6Fp-7
					#      = 0.012153605255773773
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F8C70DDF81249FC
	movq	xmm4, rdx		# xmm4 = 0x1.C70DDF81249FCp-7
					#      = 0.013887151845016092
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F91C6B5042EC6B2
	movq	xmm3, rcx		# xmm3 = 0x1.1C6B5042EC6B2p-6
					#      = 0.017359569912236146
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F96E89F8578B64E
	movq	xmm4, rdx		# xmm4 = 0x1.6E89F8578B64Ep-6
					#      = 0.022371761819320483
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3F9F1C72C5FD95BA
	movq	xmm3, rcx		# xmm3 = 0x1.F1C72C5FD95BAp-6
					#      = 0.030381959280381322
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA6DB6DB407C2B3
	movq	xmm4, rdx		# xmm4 = 0x1.6DB6DB407C2B3p-5
					#      = 0.044642856813771024
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0x3FB3333333375CD0
	movq	xmm3, rcx		# xmm3 = 0x1.3333333375CD0p-4
					#      = 0.075000000003785816
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC55555555552F4
	movq	xmm4, rdx		# xmm4 = 0x1.55555555552F4p-3
					#      = 0.166666666666649754
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
.if 0
	mov	rcx, 0x3FF0000000000000
	movq	xmm3, rcx		# xmm3 = 0x1.0p+0
					#      = 1.0
	addsd	xmm3, xmm4
	mulsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.else
	mulsd	xmm4, xmm1
	addsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.endif
	test	eax, eax
	jz	.Lsmall			# |argument| <= 0.5?

	addsd	xmm1, xmm1		# xmm1 = 2.0 * polynomial(argument')
	mov	rcx, 0x3FF921FC00000000
	movq	xmm2, rcx		# xmm2 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	subsd	xmm2, xmm1
	mov	rdx, 0xBEA5777A5CF72CEC
	movq	xmm1, rdx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm1, xmm2		# xmm1 = pi/2 - 2.0 * polynomial(argument')
.Lsmall:
	orpd	xmm0, xmm1		# xmm0 = polynomial(argument)
					#      = asin(argument)
	ret

.size	asin, .-asin
.type	asin, @function
.global	asin
.end
The following implementation for the i387 FPU uses its FPATAN instruction and the formula arcsin(argument) = arctan2(sqrt(1 − argument²), argument) based upon the identities sin(result) = argument, sin²(result) + cos²(result) = 1 and tan(result) = sin(result) / cos(result):
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/txk32e70.aspx

; arcsin(x) = arctan(x / sqrt((1 + x) * (1 - x)))
;           = arctan(x / sqrt(1 - x**2))
;           = arctan2(sqrt(1 - x**2), x)
;           = arctan2(sqrt((1 + x) * (1 - x)), x)

	.686
	.model	flat, C
	.code

asin	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fadd	st(0), st(1)		; st(0) = 1.0 + argument,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fsub	st(0), st(2)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = argument
	fmulp	st(1), st(0)		; st(0) = (1.0 - argument) * (1.0 + argument)
					;       = 1.0 - argument**2,
					; st(1) = argument
else
	fld	st(0)			; st(0) = st(1) = argument
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument
	fsubrp	st(1), st(0)		; st(0) = 1.0 - argument**2,
					; st(1) = argument
endif
	fsqrt				; st(0) = square root of (1.0 - argument**2),
					; st(1) = argument
	fpatan				; st(0) = inverse circular sine of argument
	ret

asin	endp
	end

atan() Arc Tangent Function

The function atan() returns the (principal) arc tangent of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double copysign(double x, double y);
double fabs(double x);
double fma(double x, double y, double z);
int signbit(double x);

static inline
double atan_poly(double t)
{
    // for -1.0 <= t <= 1.0,
    // a minimax polynomial of degree 19 in t**2 approximates atan(t)

    double s = t * t;
#ifdef FP_FAST_FMA
    double r = -0x1.53E1D2A25FF34p-16;
    r = fma(r, s,  0x1.D3B63DBB65AF4p-13);
    r = fma(r, s, -0x1.312788DDE0801p-10);
    r = fma(r, s,  0x1.F9690C82492DBp-9);
    r = fma(r, s, -0x1.2CF5AABC7CEF3p-7);
    r = fma(r, s,  0x1.162B0B2A3BFCEp-6);
    r = fma(r, s, -0x1.A7256FEB6FC5Cp-6);
    r = fma(r, s,  0x1.171560CE4A483p-5);
    r = fma(r, s, -0x1.4F44D841450E1p-5);
    r = fma(r, s,  0x1.7EE3D3F36BB94p-5);
    r = fma(r, s, -0x1.AD32AE04A9FD1p-5);
    r = fma(r, s,  0x1.E17813D66954Fp-5);
    r = fma(r, s, -0x1.11089CA9A5BCDp-4);
    r = fma(r, s,  0x1.3B12B2DB51738p-4);
    r = fma(r, s, -0x1.745D022F8DC5Cp-4);
    r = fma(r, s,  0x1.C71C709DFE927p-4);
    r = fma(r, s, -0x1.2492491FA1744p-3);
    r = fma(r, s,  0x1.99999999840D2p-3);
    r = fma(r, s, -0x1.555555555544Cp-2);
    r = fma(r, s,  1.0);
    return r * t;
#else
    return ((((((((((((((((((-0x1.3CBF44A88555Fp-16 * s
                             +0x1.B81666EB938AFp-13) * s
                             -0x1.21F657F3915DAp-10) * s
                             +0x1.E5005F4C78C20p-9) * s
                             -0x1.2399E74A75E56p-7) * s
                             +0x1.0FF6A2A0D2286p-6) * s
                             -0x1.A1006DE22CDACp-6) * s
                             +0x1.14C4D24651F2Ep-5) * s
                             -0x1.4DEE09915F638p-5) * s
                             +0x1.7E4B31D8A55AEp-5) * s
                             -0x1.ACFE938E04FCAp-5) * s
                             +0x1.E16A933B73622p-5) * s
                             -0x1.11074E45F93E0p-4) * s
                             +0x1.3B1283C0CA0B1p-4) * s
                             -0x1.745CFD878FEE8p-4) * s
                             +0x1.C71C704FB4F9Fp-4) * s
                             -0x1.2492491E100BBp-3) * s
                             +0x1.999999997B9DDp-3) * s
                             -0x1.55555555553C5p-2) * s * t + t;
#endif
}

double atan(double x)
{
    // with arctan(-x)     = -arctan(x),
    //      arctan(1 / x)  = π/2 - arctan(x)
    // and  arctan(1 / -x) = -π/2 - arctan(x),
    // for       x < -1, arctan(x) = -π/2 - atan_poly(1 / x),
    // for -1 <= x <= 1, arctan(x) = atan_poly(x),
    // for  1 <  x,      arctan(x) =  π/2 - atan_poly(1 / x)

    double z = fabs(x);
    int i = z > 1.0;
#ifdef OPTIONAL
#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

    if (x != x)
        return INDEFINITE;

    if (z == INFINITY)
        return copysign(1.57079632679489662, x); // π/2

    if (x == 0.0)
        return x;
#endif
    if (i)
        z = 1.0 / z;

    z = atan_poly(z);

    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;          // tail of π/2
        z += 0x1.921FC00000000p-0;               // head of π/2
#endif
    }

    return copysign(z, x);
}

double atan2(double y, double x)
{
    double z;
#if 0
    if (fabs(x) > fabs(y))
        z = atan(y / x);
    else {
        z = atan(x / y);
        z = copysign(1.57079632679489662, z) - z;
    }

    if (signbit(x))
        z += copysign(3.14159265358979324, y);
#else
    int i;

    if (x == 0.0) {
        if (y > 0.0)
            return 1.57079632679489662;  // π/2

        if (y < 0.0)
            return -1.57079632679489662; // -π/2

        return signbit(x) ? copysign(3.14159265358979324, y) : y;
    }

    y = fabs(y);
    z = fabs(x);
    i = z < y;
    if (i)
        z /= y;
    else
        z = y / z;

    z = atan_poly(z);

    if (i) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, 0.8419594442630920, -z);
#else
        z = -0x1.5777A5CF72CECp-18 - z;  // tail of π/2
        z += 0x1.921FC00000000p-0;       // head of π/2
#endif
    }

    if (signbit(x)) {
#ifdef FP_FAST_FMA
        z = fma(1.8656436928143307, -1.6839188885261840, z);
#else
        z -= 3.14159265358979324;        // π
#endif
        if (y == 0.0)
            z = -z;
    }

    return z;
}
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
atan:
	mov	rax, 0x3FF0000000000000
	movq	xmm1, rax		# xmm1 = 0x1.0p+0
					#      = 1.0
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	subsd	xmm2, xmm0		# xmm2 = -argument
	andpd	xmm2, xmm0		# xmm2 = |argument|
	xorpd	xmm0, xmm2		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ucomisd	xmm1, xmm2		# CF = (1.0 < |argument|)
#	jp	.Lindefinite		# argument = INDEFINITE?

	sbb	eax, eax		# eax = (1.0 < |argument|) ? -1 : 0
	jnb	.Lhorner		# |argument| <= 1.0?
.Lbig:
	divsd	xmm1, xmm2		# xmm1 = 1.0 / |argument|
	movsd	xmm2, xmm1		# xmm2 = argument'
.Lhorner:
	movsd	xmm1, xmm2		# xmm1 = argument'
	mulsd	xmm2, xmm2		# xmm2 = argument'**2
.ifdef ALTERNATE
	mov	rcx, 0xBEF53E1D2A25FF34
	movq	xmm3, rcx		# xmm3 = -0x1.53E1D2A25FF34p-16
					#      = -2.0258553044438107e-5
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F2D3B63DBB65AF4
	movq	xmm4, rdx		# xmm4 = 0x1.D3B63DBB65AF4p-13
					#      = 2.2302240345758279e-4
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF5312788DDE0801
	movq	xmm3, rcx		# xmm3 = -0x1.312788DDE0801p-10
					#      = -1.1640717779930478e-3
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F6F9690C82492DB
	movq	xmm4, rdx		# xmm4 = 0x1.F9690C82492DBp-9
					#      = 3.8559749383629666e-3
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF82CF5AABC7CEF3
	movq	xmm3, rcx		# xmm3 = -0x1.2CF5AABC7CEF3p-7
					#      = -9.1845592187165034e-3
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F9162B0B2A3BFCE
	movq	xmm4, rdx		# xmm4 = 0x1.162B0B2A3BFCEp-6
					#      = 1.6978035834597276e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF9A7256FEB6FC5C
	movq	xmm3, rcx		# xmm3 = -0x1.A7256FEB6FC5Cp-6
					#      = -2.5826796814495942e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA171560CE4A483
	movq	xmm4, rdx		# xmm4 = 0x1.171560CE4A483p-5
					#      = 3.4067811082715081e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFA4F44D841450E1
	movq	xmm3, rcx		# xmm3 = -0x1.4F44D841450E1p-5
					#      = -4.0926382420509951e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA7EE3D3F36BB94
	movq	xmm4, rdx		# xmm4 = 0x1.7EE3D3F36BB94p-5
					#      = 4.6739496199157987e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFAAD32AE04A9FD1
	movq	xmm3, rcx		# xmm3 = -0x1.AD32AE04A9FD1p-5
					#      = -5.2392330054601317e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FAE17813D66954F
	movq	xmm4, rdx		# xmm4 = 0x1.E17813D66954Fp-5
					#      = 5.8773077721790849e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB11089CA9A5BCD
	movq	xmm3, rcx		# xmm3 = -0x1.11089CA9A5BCDp-4
					#      = -6.6658603633512573e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FB3B12B2DB51738
	movq	xmm4, rdx		# xmm4 = 0x1.3B12B2DB51738p-4
					#      = 7.6922129305867837e-2
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB745D022F8DC5C
	movq	xmm3, rcx		# xmm3 = -0x1.745D022F8DC5Cp-4
					#      = -9.0909012354005225e-2
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FBC71C709DFE927
	movq	xmm4, rdx		# xmm4 = 0x1.C71C709DFE927p-4
					#      = 0.11111110678749424
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFC2492491FA1744
	movq	xmm3, rcx		# xmm3 = -0x1.2492491FA1744p-3
					#      = -0.14285714271334815
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC99999999840D2
	movq	xmm4, rdx		# xmm4 = 0x1.99999999840D2p-3
					#      = 0.19999999999755019
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFD555555555544C
	movq	xmm3, rcx		# xmm3 = -0x1.555555555544Cp-2
					#      = -0.3333333333333186
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
.else
	mov	rcx, 0xBEF3CBF44A88555F
	movq	xmm3, rcx		# xmm3 = -0x1.3CBF44A88555Fp-16
					#      = -1.8879600846307350e-5
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F2B81666EB938AF
	movq	xmm4, rdx		# xmm4 = 0x1.B81666EB938AFp-13
					#      = 2.0985007664581698e-4
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF521F657F3915DA
	movq	xmm3, rcx		# xmm3 = -0x1.21F657F3915DAp-10
					#      = -0.0011061183148667248
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F6E5005F4C78C20
	movq	xmm4, rdx		# xmm4 = 0x1.E5005F4C78C20p-9
					#      = 0.003700267441887131
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF82399E74A75E56
	movq	xmm3, rcx		# xmm3 = -0x1.2399E74A75E56p-7
					#      = -0.008898961958876555
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3F90FF6A2A0D2286
	movq	xmm4, rdx		# xmm4 = 0x1.0FF6A2A0D2286p-6
					#      = 0.016599329773529202
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBF9A1006DE22CDAC
	movq	xmm3, rcx		# xmm3 = -0x1.A1006DE22CDACp-6
					#      = -0.025451762493231264
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA14C4D24651F2E
	movq	xmm4, rdx		# xmm4 = 0x1.14C4D24651F2Ep-5
					#      = 0.033785258000135307
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFA4DEE09915F638
	movq	xmm3, rcx		# xmm3 = -0x1.4DEE09915F638p-5
					#      = -0.040762919127683650
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FA7E4B31D8A55AE
	movq	xmm4, rdx		# xmm4 = 0x1.7E4B31D8A55AEp-5
					#      = 0.046666715007784063
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFAACFE938E04FCA
	movq	xmm3, rcx		# xmm3 = -0x1.ACFE938E04FCAp-5
					#      = -0.052367485230348246
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FAE16A933B73622
	movq	xmm4, rdx		# xmm4 = 0x1.E16A933B73622p-5
					#      = 0.058766639292667358
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB11074E45F93E0
	movq	xmm3, rcx		# xmm3 = -0x1.11074E45F93E0p-4
					#      = -0.066657357936108053
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FB3B1283C0CA0B1
	movq	xmm4, rdx		# xmm4 = 0x1.3B1283C0CA0B1p-4
					#      = 0.076921953831176962
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFB745CFD878FEE8
	movq	xmm3, rcx		# xmm3 = -0x1.745CFD878FEE8p-4
					#      = -0.090908995008245008
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FBC71C704FB4F9F
	movq	xmm4, rdx		# xmm4 = 0x1.C71C704FB4F9Fp-4
					#      = 0.111111105648261418
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFC2492491E100BB
	movq	xmm3, rcx		# xmm3 = -0x1.2492491E100BBp-3
					#      = -0.142857142667713294
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
	mov	rdx, 0x3FC999999997B9DD
	movq	xmm4, rdx		# xmm4 = 0x1.999999997B9DDp-3
					#      = 0.199999999996591266
	addsd	xmm4, xmm3
	mulsd	xmm4, xmm2
	mov	rcx, 0xBFD55555555553C5
	movq	xmm3, rcx		# xmm3 = -0x1.55555555553C5p-2
					#      = -0.333333333333311110
	addsd	xmm3, xmm4
	mulsd	xmm3, xmm2
.endif # ALTERNATE
.if 0
	mov	rdx, 0x3FF0000000000000
	movq	xmm4, rdx		# xmm4 = 0x1.0p+0
					#      = 1.0
	addsd	xmm4, xmm3
	mulsd	xmm1, xmm4		# xmm1 = polynomial(argument')
.else
	mulsd	xmm3, xmm1
	addsd	xmm1, xmm3		# xmm1 = polynomial(argument')
.endif
	test	eax, eax
	jz	.Lsmall			# |argument| <= 1.0?

	mov	rdx, 0x3FF921FC00000000
	movq	xmm2, rdx		# xmm2 = 0x1.921FC00000000p-0
					#      = 1.5707969665527344
					#      = head of pi/2
	subsd	xmm2, xmm1
	mov	rcx, 0xBEA5777A5CF72CEC
	movq	xmm1, rcx		# xmm1 = -0x1.5777A5CF72CECp-21
					#      = -6.3975783775576863e-7
					#      = tail of pi/2
	addsd	xmm1, xmm2		# xmm1 = pi/2 - polynomial(argument')
					#      = atan(|argument|)
.Lsmall:
	orpd	xmm0, xmm1		# xmm0 = atan(argument)
	ret

.size	atan, .-atan
.type	atan, @function
.global	atan
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/88c36t42.aspx

; arctan(x) = arctan2(1, x)

	.686
	.model	flat, C
	.code

atan	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fld1				; st(0) = 1.0,
					; st(1) = argument
	fpatan				; st(0) = inverse circular tangent of (argument / 1.0)
	ret

atan	endp
	end

atan2() Arc Tangent Function

The function atan2() returns the (principal) arc tangent of the quotient of its arguments.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/88c36t42.aspx

	.686
	.model	flat, C
	.code

atan2	proc	public			; [esp+12] = denominator
					; [esp+4] = numerator

	fld	real8 ptr [esp+4]	; st(0) = numerator
	fld	real8 ptr [esp+12]	; st(0) = denominator,
					; st(1) = numerator
	fpatan				; st(0) = inverse circular tangent of (numerator / denominator)
	ret

atan2	endp
	end

Hyperbolic Functions

cosh() Hyperbolic Cosine Function

The function acosh() returns the hyperbolic cosine of its argument.

coth() Hyperbolic Cotangent Function

The function acosh() returns the hyperbolic cotangent of its argument.

sinh() Hyperbolic Sine Function

The function acosh() returns the hyperbolic sine of its argument.

tanh() Hyperbolic Tangent Function

The function acosh() returns the hyperbolic tangent of its argument.

Inverse alias Area Hyperbolic Functions

acosh() Area Hyperbolic Cosine Function

The function acosh() returns the inverse hyperbolic cosine of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465171.aspx

; arcosh(x) = log(x + sqrt((x + 1) * (x - 1)))
;           = log(x + sqrt(x**2 - 1))

	.686
	.model	flat, C
	.code

acosh	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = argument**2 - 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fsqrt				; st(0) = sqrt(argument**2 - 1.0),
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + sqrt(argument**2 - 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument + sqrt(argument**2 - 1.0))
					;       = inverse hyperbolic cosine of argument
	ret

acosh	endp
	end

acoth() Area Hyperbolic Cotangent Function

The function acoth() returns the inverse hyperbolic cotangent of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; arcoth(x) = log((x + 1) / (x - 1)) / 2
;           = log(1 + 2 / (x - 1)) / 2
;           = log1p(2 / (x - 1)) / 2

	.686
	.model	flat, C
	.code

single	record	sign:1, exponent:8, mantissa:23

bias	equ	1 shl (width exponent - 1) - 1

acoth	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = argument,
					; st(3) = ln(2.0)
	fadd	st(2), st(0)		; st(0) = 1.0,
					; st(1) = argument,
					; st(2) = argument + 1.0,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = argument - 1.0,
					; st(1) = argument + 1.0,
					; st(2) = ln(2.0)
	fdivp	st(1), st(0)		; st(0) = (argument + 1.0) / (argument - 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of ((argument + 1.0) / (argument - 1.0))
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = inverse hyperbolic cotangent of argument
	pop	eax
	ret

acoth	endp
	end

asinh() Area Hyperbolic Sine Function

The function asinh() returns the inverse hyperbolic sine of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465168.aspx

; arsinh(x) = log(x + sqrt(x**2 + 1))
;           = log1p(x + sqrt(x**2 + 1) - 1)
;           = log1p(x + x**2 / (sqrt(x**2 + 1) + 1))

	.686
	.model	flat, C
	.code

asinh	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = ln(2.0)
	fld	st(0)			; st(0) = argument,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fmul	st(0), st(0)		; st(0) = argument**2,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = argument**2,
					; st(2) = argument,
					; st(3) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument**2 + 1.0,
					; st(1) = argument,
					; st(2) = ln(2.0)
	fsqrt				; st(0) = sqrt(argument**2 + 1.0),
					; st(1) = argument,
					; st(2) = ln(2.0)
	faddp	st(1), st(0)		; st(0) = argument + sqrt(argument**2 + 1.0),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of (argument + sqrt(argument**2 + 1.0))
					;       = inverse hyperbolic sine of argument
	ret

asinh	endp
	end

atanh() Area Hyperbolic Tangent Function

The function atanh() returns the inverse hyperbolic tangent of its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn324930.aspx

; artanh(x) = log((1 + x) / (1 - x)) / 2
;           = log(1 + 2 * x / (1 - x)) / 2
;           = log1p(2 * x / (1 - x)) / 2
; artanh(x) = log((1 + x) / (1 - x)) / 2
;           = (log(1 + x) - log(1 - x)) / 2
;           = (log1p(x) - log1p(-x)) / 2

	.686
	.model	flat, C
	.code

single	record	sign:1, exponent:8, mantissa:23

bias	equ	1 shl (width exponent - 1) - 1

atanh	proc	public			; [esp+4] = argument

	fldln2				; st(0) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = ln(2.0)
	fld1				; st(0) = 1.0,
					; st(1) = 1.0,
					; st(2) = ln(2.0)
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = 1.0,
					; st(3) = ln(2.0)
	fadd	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = 1.0 + argument,
					; st(3) = ln(2.0)
	fsubp	st(1), st(0)		; st(0) = 1.0 - argument,
					; st(1) = 1.0 + argument,
					; st(2) = ln(2.0)
	fdivp	st(1), st(0)		; st(0) = (1.0 + argument) / (1.0 - argument),
					; st(1) = ln(2.0)
	fyl2x				; st(0) = natural logarithm of ((1.0 + argument) / (1.0 - argument))
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = inverse hyperbolic tangent of argument
	pop	eax
	ret

atanh	endp
	end

Irregular Functions

The following functions exhibit special or unusual behaviour: they don’t always propagate a NaN!

fmax() Function

The function fmax() returns its other argument if one argument is a NaN, else the larger of its arguments.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fmax(double left, double right)
{
#ifdef QUIET
    return (left > right) || (left == left) ? left : right == right ? right : right + right;
#else
    return (left > right) || (right != right) ? left : right;
#endif
}
Note: with the preprocessor macro QUIET defined, a signaling NaN is returned as a quiet NaN.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fmax:
	movsd	xmm2, xmm0		# xmm2 = left
	maxsd	xmm2, xmm1		# xmm2 = (left > right) ? left : right
					#      = (left # right) ? right : max(left, right)
	cmpsd	xmm1, xmm0, 3		# xmm1 = (left # right) ? ~0L : 0L
	andpd	xmm0, xmm1		# xmm0 = (left # right) ? left : 0L
	andnpd	xmm1, xmm2		# xmm1 = (left # right) ? 0L : max(left, right)
	orpd	xmm0, xmm1		# xmm0 = (left # right) ? left : max(left, right)
					#      = fmax(left, right)
	ret

.size	fmax, .-fmax
.type	fmax, @function
.global	fmax
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720717.aspx

	.686
	.model	flat, C
	.code

fmax	proc	public			; [esp+12] = right
					; [esp+4] = left

	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fucomi	st(0), st(0)		; eflags = right ><=# right
	fcmovu	st(0), st(1)		; st(0) = (right # right) ? left : right,
					; st(1) = left
if 0
	fld	st(1)			; st(0) = left,
					; st(1) = (right # right) ? left : right,
					; st(2) = left
	fucomip	st(0), st(1)		; eflags = left ><=# ((right # right) ? left : right),
					; st(0) = (right # right) ? left : right,
					; st(1) = left
	fcmovnb	st(0), st(1)		; st(0) = (left < right) ? right : left,
					; st(1) = left
else
	fxch	st(1)			; st(0) = left,
					; st(1) = (right # right) ? left : right
	fucomi	st(0), st(1)		; eflags = left ><=# ((right # right) ? left : right)
	fcmovb	st(0), st(1)		; st(0) = (left < right) ? right : left,
					; st(1) = (right # right) ? left : right
endif
	fstp	st(1)			; st(0) = fmax(left, right)
	ret

fmax	endp
	end

fmin() Function

The function fmin() returns its other argument if one argument is a NaN, else the smaller of its arguments.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fmin(double left, double right)
{
#ifdef QUIET
    return (left < right) || (left == left) ? left : right == right ? right : right + right;
#else
    return (left < right) || (right != right) ? left : right;
#endif
}
Note: with the preprocessor macro QUIET defined, a signaling NaN is returned as quiet NaN.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fmin:
	movsd	xmm2, xmm0		# xmm2 = left
	minsd	xmm2, xmm1		# xmm2 = (left < right) ? left : right
					#      = (left # right) ? right : min(left, right)
	cmpsd	xmm1, xmm0, 3		# xmm1 = (left # right) ? ~0L : 0L
	andpd	xmm0, xmm1		# xmm0 = (left # right) ? left : 0L
	andnpd	xmm1, xmm2		# xmm1 = (left # right) ? 0L : min(left, right)
	orpd	xmm0, xmm1		# xmm0 = (left # right) ? left : min(left, right)
					#      = fmin(left, right)
	ret

.size	fmin, .-fmin
.type	fmin, @function
.global	fmin
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720716.aspx

	.686
	.model	flat, C
	.code

fmin	proc	public			; [esp+12] = right
					; [esp+4] = left

	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fucomi	st(0), st(0)		; eflags = right ><=# right
	fcmovu	st(0), st(1)		; st(0) = (right # right) ? left : right,
					; st(1) = left
	fucomi	st(0), st(1)		; eflags = ((right # right) ? left : right) ><=# left
	fcmovnb	st(0), st(1)		; st(0) = (left < right) ? left : right,
					; st(1) = left
	fstp	st(1)			; st(0) = fmin(left, right)
	ret

fmin	endp
	end

hypot() Function

The function hypot() returns +∞ if one of its arguments is a NaN, but the other argument is ±∞, else the square root of the sum of the squares of its arguments, √(a2 + b2), which is occasionally called Pythagorean Sum.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY 0x1.0p+1024

double fabs(double x);
double fma(double x, double y, double z);
double frexp(double x, int *z);
double ldexp(double x, int z);
double sqrt(double x);

double hypot(double left, double right)
{
    double tmp;
    int exponent;

    right = fabs(right);

    if (right == INFINITY)
        return right;

    left = fabs(left);

    if (left == INFINITY)
        return left;

    if (left < right)
        tmp = right, right = left, left = tmp;

    left = frexp(left, &exponent);
    right = ldexp(right, -exponent);
#ifdef FP_FAST_FMA
    return ldexp(sqrt(fma(left, left, right * right)), exponent);
#else
    return ldexp(sqrt(left * left + right * right), exponent);
#endif
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# hypot(a, ±INFINITY)  = +INFINITY
# hypot(a, INDEFINITE) = INDEFINITE
# hypot(a, ±0)         = |a|
# hypot(a, b)          = hypot(a, -b)
#                      = hypot(b, a)
# hypot(a, b)          = sqrt(a**2 + b**2)
#                      = sqrt((max(|a|, |b|) * 2**c)**2 + (min(|a|, |b|) * 2**c)**2) / 2**c

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
hypot:
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm1
	subsd	xmm2, xmm0		# xmm2 = -left
	andpd	xmm0, xmm2		# xmm0 = |left|
	jz	.Lleft			# right = ±0.0?
					# right = INDEFINITE?
	xorpd	xmm2, xmm2		# xmm2 = 0.0
	ucomisd	xmm2, xmm0
	subsd	xmm2, xmm1		# xmm2 = -right
	andpd	xmm1, xmm2		# xmm1 = |right|
	jz	.Lright			# left = ±0.0?
					# left = INDEFINITE?
	movsd	xmm2, xmm0
	maxsd	xmm0, xmm1		# xmm0 = max(|left|, |right|)
					#      = left'
	minsd	xmm1, xmm2		# xmm1 = min(|left|, |right|)
					#      = right'
	movq	rax, xmm0
	shr	rax, 54
	shl	eax, 2			# eax = biased exponent of left'
	mov	ecx, BIAS * 2 - 1
	sub	ecx, eax		# ecx = 2045
					#     - biased exponent of left'
					#     = biased exponent of (normalized) scale factor
					#     = {1, 5, 9, ..., 2045}
	inc	eax			# eax = biased exponent of reciprocal scale factor
	shl	rcx, 52
	shl	rax, 52
	movq	xmm2, rcx		# xmm2 = (normalized) scale factor
.ifdef SSE4_1
	unpcklpd xmm2, xmm2
	unpcklpd xmm0, xmm1		# xmm0[63:0] = left',
					# xmm0[127:64] = right'
	mulpd	xmm0, xmm2		# xmm0[63:0] = left' * scale factor,
					# xmm0[127:64] = right' * scale factor
	dppd	xmm0, xmm0, 0x31	# xmm0 = (left' * scale factor)**2
					#      + (right' * scale factor)**2
					#      = (left'**2 + right'**2) * scale factor**2
.else
	mulsd	xmm0, xmm2		# xmm0 = left' * scale factor
	mulsd	xmm1, xmm2		# xmm1 = right' * scale factor
	mulsd	xmm0, xmm0		# xmm0 = (left' * scale factor)**2
	mulsd	xmm1, xmm1		# xmm1 = (right' * scale factor)**2
	addsd	xmm0, xmm1		# xmm0 = (left' * scale factor)**2
					#      + (right' * scale factor)**2
					#      = (left'**2 + right'**2) * scale factor**2
.endif
	sqrtsd	xmm0, xmm0		# xmm0 = sqrt(left'**2 + right'**2) * scale factor
	movq	xmm1, rax		# xmm1 = reciprocal scale factor
	mulsd	xmm0, xmm1		# xmm0 = sqrt(left'**2 + right'**2)
					#      = hypot(left, right)
	ret
.Lleft:
	jnp	.Lexit			# right <> INDEFINITE?
					# (right = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm0
	je	.Lexit			# left = ±INFINITY?
.Linfinity:
.Lcommon:
	movsd	xmm0, xmm1		# xmm0 = |right|
	ret
.Lright:
	jnp	.Lcommon		# left <> INDEFINITE?
					# (left = ±0.0?)
	mov	rax, 0x7FF0000000000000
	movq	xmm2, rax		# xmm2 = 0x1.0p+1024
					#      = INFINITY
	ucomisd	xmm2, xmm1
	je	.Linfinity		# right = ±INFINITY?
.Lindefinite:
.Lexit:
	ret

.size	hypot, .-hypot
.type	hypot, @function
.global	hypot
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/a9yb3dbt.aspx

; hypot(x, ±INFINITY)  = +INFINITY
; hypot(x, INDEFINITE) = INDEFINITE
; hypot(x, ±0)         = |x|
; hypot(x, y)          = hypot(x, -y)
;                      = hypot(y, x)
; hypot(x, y)          = sqrt(x**2 + y**2)
;                      = sqrt((max(|x|, |y|) / 2**z)**2 + (min(|x|, |y|) / 2**z)**2) * 2**z

	.686
	.model	flat, C
	.code

hypot	proc	public			; [esp+12] = right
					; [esp+4] = left

	fld	real8 ptr [esp+4]	; st(0) = left
	ftst
	fstsw	ax			; ax = FPU status word

					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > 0.0
					; .  0   ...  0   .   1   ........  st(0) < 0.0
					; .  1   ...  0   .   0   ........  st(0) = 0.0
					; .  1   ...  1   .   1   ........  st(0) # 0.0

	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah

					; CF (carry flag)  = C0
					;                    C1
					; PF (parity flag) = C2
					; ZF (zero flag)   = C3
					; AF (adjust flag) = .
					; SF (sign flag)   = B(usy)

	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fabs				; st(0) = |right|,
					; st(1) = left
	jz	Lspecial		; left = ±0.0?
					; left = INDEFINITE?
	fxch	st(1)			; st(0) = left,
					; st(1) = |right|
	fabs				; st(0) = |left|,
					; st(1) = |right|
	fucom	st(1)
	fstsw	ax			; ax = FPU status word

					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   .   0   ........  st(0) > st(1)
					; .  0   ...  0   .   1   ........  st(0) < st(1)
					; .  1   ...  0   .   0   ........  st(0) = st(1)
					; .  1   ...  1   .   1   ........  st(0) # st(1)

	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah

	jp	Lunordered		; |right| = INDEFINITE?
	jnb	Lscale			; |left| >= |right|?
Lbelow:
	fxch	st(1)			; st(0) = max(|left|, |right|)
					;       = left',
					; st(1) = min(|left|, |right|)
					;       = right'
Lscale:
	fxtract				; st(0) = left' / 2**exponent,
					; st(1) = exponent,
					; st(2) = right'
	fmul	st(0), st(0)		; st(0) = (left' / 2**exponent)**2,
					; st(1) = exponent,
					; st(2) = right'
	fxch	st(2)			; st(0) = right',
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	fld	st(1)			; st(0) = exponent,
					; st(1) = right',
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fchs				; st(0) = -exponent,
					; st(1) = right',
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fxch	st(1)			; st(0) = right',
					; st(1) = -exponent,
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fscale				; st(0) = right' * 2**-exponent
					;       = right' / 2**exponent,
					; st(1) = -exponent,
					; st(2) = exponent,
					; st(3) = (left' / 2**exponent)**2
	fstp	st(1)			; st(0) = right' / 2**exponent,
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	fmul	st(0), st(0)		; st(0) = (right' / 2**exponent)**2,
					; st(1) = exponent,
					; st(2) = (left' / 2**exponent)**2
	faddp	st(2), st(0)		; st(0) = exponent,
					; st(1) = (left' / 2**exponent)**2
					;       + (right' / 2**exponent)**2
					;       = (left'**2 + right'**2) / (2**exponent)**2
	fxch	st(1)			; st(0) = (left' / 2**exponent)**2
					;       + (right' / 2**exponent)**2
					;       = (left'**2 + right'**2) / (2**exponent)**2,
					; st(1) = exponent
	fsqrt				; st(0) = sqrt(left'**2 + right'**2) / 2**exponent,
					; st(1) = exponent
	fscale				; st(0) = sqrt(left'**2 + right'**2),
					; st(1) = exponent
	fstp	st(1)			; st(0) = hypot(left, right)
	ret
;;Lunordered:
;;	fxam
;;	fstsw	ax			; ax = FPU status word,
;;					; ah = B:C3:T:O:P:C2:C1:C0
;;	and	ah, 0x45
;;	cmp	ah, 0x05
;;	jne	Lindefinite		; |left| <> INFINITY?
;;Linfinity:
;;	fstp	st(1)			; st(0) = |left|
;;					;       = INFINITY
;;					;       = hypot(±INFINITY, right)
;;	ret
Lspecial:
	jnp	Lzero			; left <> INDEFINITE?
					; left = ±0.0?
Lunordered
	fxam
	fstsw	ax			; ax = FPU status word

					; B  C3  TOP  C2  C1  C0  low byte
					; .  0   ...  0   0   0   ........  st(0) = +unsupported
					; .  0   ...  0   1   0   ........  st(0) = -unsupported
					; .  0   ...  0   0   1   ........  st(0) = +indefinite
					; .  0   ...  0   1   1   ........  st(0) = -indefinite
					; .  0   ...  1   0   0   ........  st(0) = +finite
					; .  0   ...  1   1   0   ........  st(0) = -finite
					; .  0   ...  1   0   1   ........  st(0) = +infinity
					; .  0   ...  1   1   1   ........  st(0) = -infinity
					; .  1   ...  0   0   0   ........  st(0) = +0.0
					; .  1   ...  0   1   0   ........  st(0) = -0.0
					; .  1   ...  0   0   1   ........  st(0) = +empty
					; .  1   ...  0   1   1   ........  st(0) = -empty
					; .  1   ...  1   0   0   ........  st(0) = +denormal
					; .  1   ...  1   1   0   ........  st(0) = -denormal

	and	ah, 0x45
	cmp	ah, 0x05
	jne	Lindefinite		; |right| <> INFINITY?
Linfinity:
Lzero:
	fstp	st(1)			; st(0) = |right|
					;       = hypot(left, ±INFINITY)
					;       = hypot(±0.0, right)
	ret
Lindefinite:
	faddp	st(1), st(0)		; st(0) = INDEFINITE
	ret

hypot	endp
	end

pow() Function

The function pow() returns +1 if its first argument is +1 or if its second argument is ±0, even if the other argument is a NaN, else its first argument raised to the power of its second argument.

Regular Functions

cbrt() Function

The function cbrt() returns the cube root of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY   (1.0 / 0.5e-323)
#define INDEFINITE (0.0 * INFINITY)

double fabs(double x);
double frexp(double x, int *z);
double ldexp(double x, int z);

double cbrt(double argument)
{
    static const double scale[5] = {0x1.428A2F98D728Bp-1,  // 2**(-2/3)
                                    0x1.965FEA53D6E3Dp-1,  // 2**(-1/3)
                                    1.0,                   // 2**0
                                    0x1.428A2F98D728Bp-0,  // 2**(1/3)
                                    0x1.965FEA53D6E3Dp-0}; // 2**(2/3)
    double a, b, c;
    int exponent;

    if (argument != argument)
        return INDEFINITE;

    if (argument == 0.0)
        return argument;

    a = fabs(argument);

    if (a == INFINITY)
        return argument;

    a = frexp(a, &exponent);

    // for 0.5 <= a < 1.0,
    // a minimax polynomial of degree 6 yields an approximation
    // of the cube root, followed by a single Halley iteration

    b = (((((-0x1.29801E893366Dp-3 * a
             +0x1.91E2A6FE7E984p-1) * a
             -0x1.D5AE6CFA20F0Cp-0) * a
             +0x1.39350ADAD51ECp+1) * a
             -0x1.0EB8277CD8D5Dp+1) * a
             +0x1.8218DDE9028B4p-0) * a
             +0x1.6B69CBA168FF2p-2;
    c = b * b * b;
    c = b * (2.0 * a + c) / (a + 2.0 * c);
    c = argument < 0.0 ? -c : c;

    return ldexp(c * scale[2 + exponent % 3], exponent / 3);
}

ceil() Function

The function ceil() returns the smallest integral value not less than its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double ceil(double argument)
{
#ifdef TRUNC
    double trunc(double x);

    double tmp = trunc(argument);

    return (argument > tmp) ? tmp + 1.0 : tmp;
#else
    double tmp;

    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;

    return argument;
#endif
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: requires SSE 4.1 instruction set!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
ceil:
	roundsd	xmm0, xmm0, 2		# xmm0 = argument rounded up (towards +INFINITY)
	ret

.size	ceil, .-ceil
.type	ceil, @function
.global	ceil
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/atdhw2dx.aspx

	.686
	.model	flat, C
	.code

ceil	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
; ceil(x) = x > trunc(x) ? trunc(x) + 1.0 : trunc(x)

	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?

	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?

	fxch	st(2)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument'
	fsubr	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument - argument'
					;       = trunc(argument)
	fcomp	st(2)			; st(0) = 1.0,
					; st(1) = trunc(argument)
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	ja	Labove			; argument > trunc(argument)?

	fstp	st(1)			; st(0) = trunc(argument)
					;       = ceil(argument)
	ret
Labove:
	faddp	st(1), st(0)		; st(0) = trunc(argument) + 1.0
					;       = ceil(argument)
Lexit:
else
; ceil(x) = x > rint(x) ? rint(x) + 1.0 : rint(x)

	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	frndint				; st(0) = rint(argument),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = rint(argument)
	fucomip	st(0), st(1)		; eflags = argument ><=# rint(argument),
					; st(0) = rint(argument)
	fld1				; st(0) = 1.0,
					; st(1) = rint(argument)
	fldz				; st(0) = 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fcmovnbe st(0), st(1)		; st(0) = (rint(argument) < argument) ? 1.0 : 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	faddp	st(2), st(0)		; st(0) = 1.0,
					; st(1) = ceil(argument)
	fstp	st(0)			; st(0) = ceil(argument)
endif
	ret

ceil	endp
	end

fabs() Function

The function fabs() returns the absolute value alias magnitude of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fabs(double argument)
{
    *(unsigned long long *) &argument <<= 1;
    *(unsigned long long *) &argument >>= 1;

    return argument;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
fabs:
.if 0
	xorpd	xmm1, xmm1		# xmm1 = 0.0
	subsd	xmm1, xmm0		# xmm1 = -argument
	maxsd	xmm0, xmm1		# xmm0 = |argument|
	ret
.else
	movq	rax, xmm0		# rax = argument
	btr	rax, 63			# rax = |argument|
	movq	xmm0, rax		# xmm0 = |argument|
	ret
.endif
.size	fabs, .-fabs
.type	fabs, @function
.global	fabs
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/18z15bk0.aspx

	.686
	.model	flat; C
	.code

_fabs	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fabs				; st(0) = |argument|
	ret

_fabs	endp
	end

fdim() Function

The function fdim() returns the positive difference of its arguments.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fdim(double left, double right)
{
    return left < right ? 0.0 : left - right;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = left
					# xmm1 = right
fdim:
	movsd	xmm2, xmm0		# xmm2 = left
	cmpsd	xmm0, xmm1, 5		# xmm0 = (left < right) ? ~0L : 0L
	subsd	xmm2, xmm1		# xmm2 = left - right
	andnpd	xmm0, xmm2		# xmm0 = (left < right) ? 0.0 : left - right
	ret

.size	fdim, .-fdim
.type	fdim, @function
.global	fdim
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720714.aspx

	.686
	.model	flat, C
	.code

fdim	proc	public			; [esp+12] = right
					; [esp+4] = left

	fld	real8 ptr [esp+4]	; st(0) = left
	fld	real8 ptr [esp+12]	; st(0) = right,
					; st(1) = left
	fsubp	st(1), st(0)		; st(0) = left - right
	fldz				; st(0) = 0.0,
					; st(1) = left - right
	fucomi	st(0), st(1)		; eflags = 0.0 ><=# left - right
	fcmovb	st(0), st(1)		; st(0) = (left > right) ? left - right : 0.0,
					; st(1) = left - right
	fcmovu	st(0), st(1)		; st(0) = (left # right) ? left - right
					;       : (left > right) ? left - right : 0.0,
					; st(1) = left - right
	fstp	st(1)			; st(0) = fdim(left, right)
	ret

fdim	endp
	end

floor() Function

The function floor() returns the largest integral value not greater than its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double floor(double argument)
{
#ifdef TRUNC
    double trunc(double x);

    double tmp = trunc(argument);

    return (argument < tmp) ? tmp - 1.0 : tmp;
#else
    double tmp;

    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if (argument != 0.0)
        argument += 0.0;

    return argument;
#endif
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: requires SSE 4.1 instruction set!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
floor:
	roundsd	xmm0, xmm0, 1		# xmm0 = argument rounded down (towards -INFINITY)
	ret

.size	floor, .-floor
.type	floor, @function
.global	floor
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/x39715t6.aspx

	.686
	.model	flat, C
	.code

floor	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
; floor(x) = x < trunc(x) ? trunc(x) - 1.0 : trunc(x)

	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?

	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?

	fxch	st(2)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument'
	fsubr	st(2), st(0)		; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument - argument'
					;       = trunc(argument)
	fcomp	st(2)			; st(0) = 1.0,
					; st(1) = trunc(argument)
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jb	Lbelow			; argument < trunc(argument)?

	fstp	st(1)			; st(0) = trunc(argument)
					;       = floor(argument)
	ret
Lbelow:
	fsubp	st(1), st(0)		; st(0) = trunc(argument) - 1.0
					;       = floor(argument)
Lexit:
else
; floor(x) = x > rint(x) ? rint(x) - 1.0 : rint(x)

	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	frndint				; st(0) = rint(argument),
					; st(1) = argument
	fxch	st(1)			; st(0) = argument,
					; st(1) = rint(argument)
	fucomip	st(0), st(1)		; eflags = argument ><=# rint(argument),
					; st(0) = rint(argument)
	fld1				; st(0) = 1.0,
					; st(1) = rint(argument)
	fldz				; st(0) = 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fcmovb	st(0), st(1)		; st(0) = (rint(argument) > argument) ? 1.0 : 0.0,
					; st(1) = 1.0,
					; st(2) = rint(argument)
	fsubp	st(2), st(0)		; st(0) = 1.0,
					; st(1) = floor(argument)
	fstp	st(0)			; st(0) = floor(argument)
endif
	ret

floor	endp
	end

fma() Function

The function fma() returns the sum of the product of its first and second argument plus its third argument, calculated in full precision and without intermediate rounding of the product.

Note: this means for example that fma(2.0, nextafter(INFINITY, 0.0), -nextafter(INFINITY, 0.0)) returns nextafter(INFINITY, 0.0), and fma(0.5, nextafter(0.0, INFINITY), nextafter(0.0, INFINITY)) returns 2.0 * nextafter(0.0, INFINITY), despite the over- respectively underflow of the (intermediate) product!

// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double frexp(double x, int *z);
double ldexp(double x, int z);

static inline // Veltkamp
void _2split(double *h, double *l, double x)
{
#if 0
    int e;
    double f = frexp(x, &e);
    double g = f * 0x1.0000002000000p+27;
#if 1
    g -= g - f;
#else
    g += f - g;
#endif
    *l = ldexp(f - g, e);
    *h = ldexp(g, e);
#else
    unsigned long long ull = *(unsigned long long *) &x & (~0ULL << 26);

    *h = *(double *) &ull
    *l = x - *h;
#endif
}

static inline // Dekker
void _2product(double *h, double *l, double x, double y)
{
    double xl, xh, yl, yh, zl, zh = x * y;

    _2split(&xh, &xl, x);
    _2split(&yh, &yl, y);

    zl = xl * yl + (xl * yh + (xh * yl + (xh * yh - zh)));
#ifdef DEKKER
    *h = zl + zh;
#if 0
    *l = zl - (*h - zh);
#else
    *l = zl + (zh - *h);
#endif
#else
    *l = zl;
    *h = zh;
#endif
}

#if 0
static inline // Møller, Knuth
void _2sum(double *h, double *l, double x, double y)
{
    double s = x + y;
    double t = s - x;
#if 0
    *l = (x - (s - t)) + (y - t);
#elif 0
    *l = (x - (s - t)) - (t - y);
#elif 0
    *l = (x + (t - s)) - (t - y);
#else
    *l = (x + (t - s)) + (y - t);
#endif
    *h = s;
}
#else
static inline // Boldo, Melquiond: |u| >= |v| >= |w|
double _3sum(double u, double v, double w)
{
    double h = w + v;
    double l = w + (v - h);

    // round high part of intermediate sum to odd when
    // its fraction is even and also inexact, i.e. low
    // part of intermediate sum is not equal to zero

    if ((l != 0.0)
     && ((*(unsigned long long *) &h & 1ull) == 0ull))
        *(unsigned long long *) &h |= 1ull;

    return u + h;
}
#endif

double fma(double multiplicand, double multiplier, double addend)
{
    int o;
    double ph, pl, qh, ql, rh, rl, sh, sl;
    double product = multiplicand * multiplier;

    if ((multiplicand - multiplicand != 0.0)
     || (multiplier - multiplier != 0.0)
     || (addend - addend != 0.0)) // at least one argument INFINITE?
        return product + addend;

    if (addend == 0.0) // when product underflows to ±0.0,
                       // its sign determines the sign of the result
        return (product == 0.0)
            && (multiplier != 0.0)
            && (multiplicand != 0.0) ? product : product + addend;

    if ((multiplicand == 0.0) || (multiplier == 0.0))
        return addend;

    o = product - product != 0.0;
    if (o) { // product overflows?
        if ((product < 0.0) == (addend < 0.0))
            return product;

        multiplier *= 0.5;
        addend *= 0.5;
#if 0
        product = 2.0 * (multiplicand * multiplier + addend);
        if (product - product != 0.0)
            return product;
#endif
    }

    _2product(&ph, &pl, multiplicand, multiplier);
#if 0
    _2sum(&qh, &ql, ph, addend);
    _2sum(&rh, &rl, pl, qh);
#if 0
    _2sum(&sh, &sl, ql, rl);
#else
    sh = rl + ql;
#endif
    sh += rh;
#else
    if (fabs(addend) < fabs(pl))
        sh = _3sum(ph, pl, addend);
    else if (fabs(addend) < fabs(ph))
        sh = _3sum(ph, addend, pl);
    else
        sh = _3sum(addend, ph, pl);
#endif
    return o ? sh + sh : sh;
}
Note: the function _2product() implements Dekker’s product, an error-free (exact) transformation that exposes in the absence of overflows the properties h + l = x × y and |h| ≥ |l| × 253.

Note: the function _2sum() implements Møller’s and Knuth’s sum, an error-free (exact) transformation that exposes in the absence of overflows the properties h + l = x + y, |l| ≤ |x| and |h| ≥ |l| × 253.

# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# CAVEAT: requires default (round to nearest, ties to even) rounding mode!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = multiplicand
					# xmm1 = multiplier
					# xmm2 = addend
fma:
	movsd	xmm3, xmm0		# xmm3 = multiplicand
	movsd	xmm4, xmm1		# xmm4 = multiplier
	movsd	xmm5, xmm2		# xmm5 = addend
	subsd	xmm3, xmm0		# xmm3 = multiplicand - multiplicand
	subsd	xmm4, xmm1		# xmm4 = multiplier - multiplier
	subsd	xmm5, xmm2		# xmm5 = addend - addend
	ucomisd	xmm3, xmm0
	movsd	xmm3, xmm0		# xmm3 = multiplicand
	mulsd	xmm0, xmm1		# xmm0 = multiplicand * multiplier
					#      = product
	je	.Lmultiplicand		# multiplicand = ±0.0?
					# multiplicand = ±INFINITY?
					# multiplicand = INDEFINITE?
	ucomisd	xmm4, xmm1
	je	.Lmultiplier		# multiplier = ±0.0?
					# multiplier = ±INFINITY?
					# multiplier = INDEFINITE?
	ucomisd	xmm5, xmm2
	je	.Laddend		# addend = ±0.0?
					# addend = ±INFINITY?
					# addend = INDEFINITE?
	movsd	xmm4, xmm0
	subsd	xmm4, xmm0		# xmm4 = product - product
	ucomisd	xmm4, xmm0
	jp	.Loverflow		# product = ±INFINITY?
.Lveltkamp:
	mov	eax, 0x03FFFFFF		# rax = 2**26 - 1
	movq	xmm4, rax
	movq	xmm5, rax
	andnpd	xmm4, xmm3		# xmm4 = upper half of multiplicand
	andnpd	xmm5, xmm1		# xmm5 = upper half of multiplier
	subsd	xmm3, xmm4		# xmm3 = lower half of multiplicand
	subsd	xmm1, xmm5		# xmm1 = lower half of multiplier
.Ldekker:
	unpcklpd xmm4, xmm3		# xmm4[63:0] = upper half of multiplicand,
					# xmm4[127:64] = lower half of multiplicand
	unpcklpd xmm5, xmm1		# xmm5[63:0] = upper half of multiplier,
					# xmm5[127:64] = lower half of multiplier
	unpcklpd xmm3, xmm4		# xmm3[63:0] = lower half of multiplicand,
					# xmm3[127:64] = upper half of multiplicand
	mulpd	xmm4, xmm5		# xmm4[63:0] = upper half of multiplicand
					#            * upper half of multiplier,
					# xmm4[127:64] = lower half of multiplicand
					#              * lower half of multiplier
	mulpd	xmm3, xmm5		# xmm3[63:0] = lower half of multiplicand
					#            * upper half of multiplier,
					# xmm3[127:64] = upper half of multiplicand
					#              * lower half of multiplier
.Ltail:
	movsd	xmm1, xmm4
	subsd	xmm1, xmm0
	addsd	xmm1, xmm3
	unpckhpd xmm3, xmm3
	addsd	xmm1, xmm3
	unpckhpd xmm4, xmm4
	addsd	xmm1, xmm4		# xmm1 = upper half of multiplicand
					#      * upper half of multiplier
					#      - multiplicand * multiplier
					#      + lower half of multiplicand
					#      * upper half of multiplier
					#      + upper half of multiplicand
					#      * lower half of multiplier
					#      + lower half of multiplicand
					#      * lower half of multiplier
					#      = tail part of (intermediate) product
					# xmm0 = head part of (intermediate) product
.Lmøller:
	movsd	xmm3, xmm0
	addsd	xmm0, xmm2
	movsd	xmm4, xmm0		# xmm4 = head part of first intermediate sum
	subsd	xmm0, xmm3
	subsd	xmm2, xmm0
	subsd	xmm0, xmm4
	addsd	xmm0, xmm3
	addsd	xmm0, xmm2		# xmm0 = tail part of first intermediate sum
.Lknuth:
	movsd	xmm3, xmm4
	addsd	xmm4, xmm1
	movsd	xmm2, xmm4		# xmm2 = head part of second intermediate sum
	subsd	xmm4, xmm3
	subsd	xmm1, xmm4
	subsd	xmm4, xmm2
	addsd	xmm4, xmm3
	addsd	xmm4, xmm1		# xmm4 = tail part of second intermediate sum

	addsd	xmm0, xmm4		# xmm0 = tail part of first intermediate sum
					#      + tail part of second intermediate sum
					#      = head part of third intermediate sum
.Lfinal:
	addsd	xmm0, xmm2		# xmm0 = product + addend
					#      = fma(multiplicand, multiplier, addend)
	ret
.Lmultiplicand:
	jp	.Lfinal			# multiplicand = INDEFINITE?
					# multiplicand = ±INFINITY?

					# multiplicand = ±0.0!
	ucomisd	xmm4, xmm1
.Lmultiplier:
	jp	.Lfinal			# multiplier = INDEFINITE?
					# multiplier = ±INFINITY?

					# multiplier = ±0.0,
					# multiplicand <> ±INFINITY,
					# multiplicand <> INDEFINITE!
.Lindefinite:
	movsd	xmm0, xmm2		# xmm0 = addend
	ret
.Laddend:
	jp	.Lindefinite		# addend = INDEFINITE?
					# addend = ±INFINITY?

					# addend = ±0.0,
					# multiplier <> ±0.0,
					# multiplier <> ±INFINITY,
					# multiplier <> INDEFINITE,
					# multiplicand <> ±0.0,
					# multiplicand <> ±INFINITY,
					# multiplicand <> INDEFINITE!
	ucomisd	xmm0, xmm2
	je	.Lunderflow		# product = ±0.0?
.Lproduct:
	movsd	xmm4, xmm0
	subsd	xmm4, xmm0		# xmm4 = product - product
	ucomisd	xmm4, xmm0
	jnp	.Lfinal			# product <> ±INFINITY?
.Loverflow:
	movq	rcx, xmm0		# rcx = product
	movq	rdx, xmm2		# rdx = addend
	xor	rdx, rcx		# rdx = (addend < 0.0) = (product < 0.0) ? positive : negative
	jns	.Linfinity		# (addend < 0.0) = (product < 0.0)?
					# (sign of addend = sign of product?)
	mov	rax, 0x3FE0000000000000
	movq	xmm5, rax		# xmm5 = 0x1.0p-1
					#      = 0.5
	mulsd	xmm1, xmm5		# xmm1 = multiplier * 0.5
					#      = multiplier'
	mulsd	xmm2, xmm5		# xmm2 = addend * 0.5
					#      = addend'
.if 1
	movsd	xmm0, xmm3		# xmm0 = multiplicand
	mulsd	xmm0, xmm1		# xmm0 = multiplicand * multiplier'
					#      = product'
.else
	movsd	xmm4, xmm1
	movsd	xmm5, xmm2
	mulsd	xmm4, xmm3		# xmm4 = multiplier' * multiplicand
					#      = product'
	addsd	xmm5, xmm4		# xmm5 = product' + addend'
	addsd	xmm5, xmm5		# xmm5 = (product' + addend') * 2.0
	subsd	xmm5, xmm5		# xmm5 = (product' + addend') * 2.0
					#      - (product' + addend') * 2.0
	ucomisd	xmm5, xmm5
	jp	.Linfinity		# (product' + addend') * 2.0 = ±INFINITY?

	movsd	xmm0, xmm4		# xmm0 = product'
.endif
	call	.Lveltkamp
	addsd	xmm0, xmm0		# xmm0 = (product' + addend') * 2.0
					#      = fma(multiplicand, multiplier, addend)
.Linfinity:
.Lunderflow:
	ret

.size	fma, .-fma
.type	fma, @function
.global	fma
.end

fmod() Function

The function fmod() returns the remainder from the division of its arguments.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fma(double x, double y, double z);
double trunc(double x);

double fmod(double dividend, double divisor)
{
#ifdef TRUNC
    double quotient = trunc(dividend / divisor)
#else
    double tmp, quotient = dividend / divisor;

    if ((quotient > 0.0) && (quotient < 0x1.0p+52)) {
        tmp = quotient;
        quotient += 0x1.0p+52;
        quotient -= 0x1.0p+52;
        if (quotient > tmp)
            quotient -= 1.0;
    } else if ((quotient < 0.0) && (quotient > -0x1.0p+52)) {
        tmp = quotient;
        quotient -= 0x1.0p+52;
        quotient += 0x1.0p+52;
        if (quotient < tmp)
            quotient += 1.0;
    }
#endif
#if 0 // avoid subtractive cancellation
    return quotient == 0.0 ? dividend : dividend - divisor * quotient;
#else
    return fma(-quotient, divisor, dividend);
#endif
}
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/20dckbeh.aspx

; fmod(dividend, divisor) = dividend % divisor
;                         = dividend - divisor * trunc(dividend / divisor)

	.686
	.model	flat, C
	.code

fmod	proc	public			; [esp+12] = divisor
					; [esp+4] = dividend

	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
Lreduce:
	fprem				; st(0) = remainder,
					; st(1) = divisor
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce

	fstp	st(1)			; st(0) = remainder
	ret

fmod	endp
	end

fpclassify() Function

The function fpclassify() returns the implementation-defined category of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define FP_ZERO      0
#define FP_SUBNORMAL 1
#define FP_NORMAL    2
#define FP_INFINITE  3
#define FP_NAN       4

#define INFINITY 0x1.0p+1024
#define MINIMUM  0x1.0p-1022

double fabs(double x);

int fpclassify(double argument)
{
#if 1
    unsigned long long ull = *(unsigned long long *) &double << 1;

    if (ull == 0)
        return FP_ZERO;

    if (ull < (1ULL << 53))
        return FP_SUBNORMAL;

    if (ull < (2047ULL << 53))
        return FP_NORMAL;

    if (ull == (2047ULL << 53))
        return FP_INFINITE;
#else
    if (argument == 0.0)
        return FP_ZERO;

    argument = fabs(argument);

    if (argument < MINIMUM)
        return FP_SUBNORMAL;

    if (argument < INFINITY)
        return FP_NORMAL;

    if (argument == INFINITY)
        return FP_INFINITE;
#endif
    return FP_NAN;
}

frexp() Function

The function frexp() returns the normalized fraction and the (integral) exponent of its first argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/w1xfschh.aspx

	.686
	.model	flat, C
	.code

single	record	sign:1, exponent:8, mantissa:23

bias	equ	1 shl (width exponent - 1) - 1

frexp	proc	public			; [esp+12] = address of exponent
					; [esp+4] = argument
if 0
	fld1				; st(0) = 1.0
	fchs				; st(0) = -1.0
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = -1.0
	fxtract				; st(0) = argument / 2.0**exponent
					;       = mantissa,
					; st(1) = exponent,
					; st(2) = -1.0
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa,
					; st(2) = -1.0
	fsub	st(0), st(2)		; st(0) = exponent + 1.0,
					; st(1) = mantissa,
					; st(2) = -1.0
	mov	eax, [esp+12]		; eax = address of exponent
	fistp	dword ptr [eax]		; [eax] = exponent + 1.0,
					; st(0) = mantissa,
					; st(1) = -1.0
	fscale				; st(0) = mantissa / 2.0,
					; st(1) = -1.0
	fstp	st(1)			; st(0) = mantissa / 2.0
else
	fld	real8 ptr [esp+4]	; st(0) = argument
	fxtract				; st(0) = argument / 2.0**exponent
					;       = mantissa,
					; st(1) = exponent
	fxch	st(1)			; st(0) = exponent,
					; st(1) = mantissa
	mov	eax, [esp+12]		; eax = address of exponent
	fistp	dword ptr [eax]		; [eax] = exponent,
					; st(0) = mantissa
	inc	dword ptr [eax]		; [eax] = exponent + 1
	push	(bias - 1) shl width mantissa
					; [esp] = 0x3F000000
					;       = 0.5F
	fmul	real4 ptr [esp]		; st(0) = mantissa / 2.0
	pop	eax
endif
	ret

frexp	endp
	end

isfinite() Function

The function isfinite() returns non-zero if its argument is a finite floating-point number.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int isfinite(double argument)
{
    return (*(unsigned long long *) &argument << 1) < (2047ULL << 53);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isfinite:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	setb	al			# eax = (|argument| < 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| < 0x1.0p+1024) ? 1 : 0
	ret

.size	isfinite, .-isfinite
.type	isfinite, @function
.global	isfinite
.end

isinf() Function

The function isinf() returns non-zero if its argument is +∞ or −∞.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int isinf(double argument)
{
    return (*(unsigned long long *) &argument << 1) == (2047ULL << 53);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isinf:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	sete	al			# eax = (|argument| = 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| = 0x1.0p+1024) ? 1 : 0
	ret

.size	isinf, .-isinf
.type	isinf, @function
.global	isinf
.end

isnan() Function

The function isnan() returns non-zero if its argument is a NaN.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int isnan(double argument)
{
    return (*(unsigned long long *) &argument << 1) > (2047ULL << 53);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isnan:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	cmp	rdx, rax
	seta	al			# eax = (|argument| > 0x1.0p+1024) ? 1 : 0
	mov	eax, eax		# rax = (|argument| > 0x1.0p+1024) ? 1 : 0
	ret

.size	isnan, .-isnan
.type	isnan, @function
.global	isnan
.end

isnormal() Function

The function isnormal() returns non-zero if its argument is a non-zero finite floating-point number.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int isnormal(double argument)
{
    return ((*(unsigned long long *) &argument << 1) < (2047ULL << 53))
        && ((*(unsigned long long *) &argument << 1) != 0);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
isnormal:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0xFFE0000000000000 # rax = 0x1.0p+1024 << 1
	seta	cl			# cl = (|argument| <> 0.0) ? 1 : 0
	cmp	rdx, rax
	seta	al			# eax = (|argument| < 0x1.0p+1024) ? 1 : 0
	and	eax, ecx		# rax = (0.0 < |argument| < 0x1.0p+1024) ? 1 : 0
	ret

.size	isnormal, .-isnormal
.type	isnormal, @function
.global	isnormal
.end

issubnormal() Function

The function issubnormal() returns non-zero if its argument is a (non-zero) subnormal floating-point number.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int issubnormal(double argument)
{
    return ((*(unsigned long long *) &argument << 1) < (1ULL << 53))
        && ((*(unsigned long long *) &argument << 1) != 0);
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
issubnormal:
	movq	rdx, xmm0		# rdx = argument
	add	rdx, rdx		# rdx = argument << 1
					#     = |argument| << 1
	mov	rax, 0x0020000000000000 # rax = 0x1.0p-1022 << 1
	seta	cl			# cl = (|argument| <> 0.0) ? 1 : 0
	cmp	rdx, rax
	setb	al			# eax = (|argument| < 0x1.0p-1022) ? 1 : 0
	and	eax, ecx		# rax = (0.0 < |argument| < 0x1.0p-1022) ? 1 : 0
	ret

.size	issubnormal, .-issubnormal
.type	issubnormal, @function
.global	issubnormal
.end

ldexp() Function

The function ldexp() returns its first argument multiplied by 2 raised to the power of its (integral) second argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/zx52ds7f.aspx
; https://msdn.microsoft.com/en-us/library/dn465179.aspx

; ldexp(x, n) = x * 2**n
; scalbn(x, n) = x * 2**n

	.686
	.model	flat, C
	.code

ldexp	proc	public			; [esp+12] = exponent
scalbn	proc	public			; [esp+4] = argument

	fild	dword ptr [esp+12]	; st(0) = exponent
	fld	real8 ptr [esp+4]	; st(0) = argument,
					; st(1) = exponent
	fscale				; st(0) = argument * 2.0**exponent,
					; st(1) = exponent
	fstp	st(1)			; st(0) = argument * 2.0**exponent
	ret

scalbn	endp
ldexp	endp
	end

remainder() Function

The function remainder() returns the remainder from the division of its arguments, with the quotient rounded according to the current mode.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fma(double x, double y, double z);
double rint(double x);

double remainder(double dividend, double divisor)
{
#ifdef RINT
    double quotient = rint(dividend / divisor);
#else
    double quotient = dividend / divisor;

    if ((quotient > 0.0) && (quotient < 0x1.0p+52)) {
        quotient += 0x1.0p+52;
        quotient -= 0x1.0p+52;
    } else if ((quotient < 0.0) && (quotient > -0x1.0p+52)) {
        quotient -= 0x1.0p+52;
        quotient += 0x1.0p+52;
    }
#endif
#if 0 // avoid subtractive cancellation
    return quotient == 0.0 ? dividend : dividend - divisor * quotient;
#else
    return fma(-quotient, divisor, dividend);
#endif
}
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465170.aspx

	.686
	.model	flat, C
	.code

remainder proc	public			; [esp+12] = dividend
					; [esp+4 ] = divisor

	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
Lreduce:
	fprem1				; st(0) = remainder,
					; st(1) = divisor
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce

	fstp	st(1)			; st(0) = remainder
	ret

remainder endp
	end

remquo() Function

The function remquo() returns the remainder and the (partial) integral quotient from the division of its arguments.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fma(double x, double y, double z);
double trunc(double x);

double remquo(double dividend, double divisor, int *quotient)
{
#ifdef TRUNC
    double ratio = trunc(dividend / divisor);
#else
    double tmp, ratio = dividend / divisor;

    if ((ratio > 0.0) && (ratio < 0x1.0p+52)) {
        tmp = ratio;
        ratio += 0x1.0p+52;
        ratio -= 0x1.0p+52;
        if (ratio > tmp)
            ratio -= 1.0;
    } else if ((ratio < 0.0) && (ratio > -0x1.0p+52)) {
        tmp = ratio;
        ratio -= 0x1.0p+52;
        ratio += 0x1.0p+52;
        if (ratio < tmp)
            ratio += 1.0;
    }
#endif
    *quotient = (int) ratio;
#if 0 // avoid subtractive cancellation
    return ratio == 0.0 ? dividend : dividend - divisor * ratio;
#else
    return fma(-ratio, divisor, dividend);
#endif
}
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465175.aspx

	.686
	.model	flat, C
	.code

remquo	proc	public			; [esp+20] = address of (partial) quotient
					; [esp+12] = divisor
					; [esp+4] = dividend

	fld	real8 ptr [esp+12]	; st(0) = divisor
	fld	real8 ptr [esp+4]	; st(0) = dividend,
					; st(1) = divisor
	mov	ecx, [esp+20]		; ecx = address of quotient
	mov	eax, [esp+16]		; eax = high dword of divisor
	xor	eax, [esp+8]		; eax = high dword of divisor
					;     ^ high dword of dividend
	cdq				; edx = (sign of dividend <> sign of divisor) ? -1 : 0
Lreduce:
	fprem1				; st(0) = dividend modulo divisor,
					; st(1) = divisor,
					; C0:C3:C1 = least significant bits of quotient
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lreduce

	fstp	st(1)			; st(0) = dividend modulo divisor
					;       = remainder
Lquotient:
	and	eax, 4300h		; eax = 0b0:C3:0000:C1:C0:00000000
	imul	eax, 910000h
	shr	eax, 29			; eax = C0:C3:C1
					;     = (partial) quotient
Lsign:
	xor	eax, edx
	sub	eax, edx		; eax = (sign of dividend <> sign of divisor)
					;     ? -quotient : quotient
	mov	[ecx], eax
	ret

remquo	endp
	end

rint() Function

The function rint() returns the according to the current rounding mode nearest integral value to its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double rint(double argument)
{
    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;

    return argument;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: requires SSE 4.1 instruction set!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
rint:
	roundsd	xmm0, xmm0, 4		# xmm0 = argument rounded according to current mode
	ret

.size	rint, .-rint
.type	rint, @function
.global	rint
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465165.aspx

; NOTE: depending on current rounding mode, rint() is equivalent to
;       floor(), ceil(), roundeven() or trunc(), but differs from
;       round(); while roundeven() breaks ties to the nearest even
;       integer, round() rounds ties away from 0, what neither FPU
;       nor CPU support in their instruction sets!

	.686
	.model	flat, C
	.code

rint	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	frndint				; st(0) = rint(argument)
	ret

rint	endp
	end

round() Function

The function round() returns the nearest integral value to its argument, rounding ties away from 0.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double round(double argument)
{
#ifdef TRUNC
    double trunc(double x);

    double tmp = trunc(argument);

    if (argument > 0.0)
        return argument - tmp < 0.5 ? tmp : tmp + 1.0;

    if (argument < 0.0)
        return argument - tmp > -0.5 ? tmp : tmp - 1.0;

    return tmp;
#else
    double tmp;

    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument - tmp <= -0.5)
            argument += 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument - tmp >= 0.5)
            argument -= 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;

    return argument;
#endif
}

roundeven() Function

The function roundeven() returns the nearest integral value to its argument, rounding ties to even.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: requires SSE 4.1 instruction set!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
roundeven:
	roundsd	xmm0, xmm0, 0		# xmm0 = argument rounded to nearest (even) integer
	ret

.size	roundeven, .-roundeven
.type	roundeven, @function
.global	roundeven
.end

signbit() Function

The function signbit() returns 1 if the sign of its argument is negative, else 0.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

int signbit(double argument)
{
    return *(long long *) &argument < 0;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
signbit:
	movmskpd eax, xmm0		# rax = (argument & -0.0) ? 0b?1 : 0b?0
	and	eax, 1			# rax = (argument & -0.0) ? 1 : 0
					#     = signbit(argument)
	ret

.size	signbit, .-signbit
.type	signbit, @function
.global	signbit
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

	.686
	.model	flat, C
	.code

signbit	proc	public			; [esp+4] = argument

	mov	eax, [esp+8]		; eax = high dword of argument
	shr	eax, 31			; eax = (argument & -0.0) ? 1 : 0
	ret

signbit	endp
	end

sqrt() Function

The function sqrt() returns the positive square root of its argument.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
sqrt:
	sqrtsd	xmm0, xmm0		# xmm0 = square root of argument
	ret

.size	sqrt, .-sqrt
.type	sqrt, @function
.global	sqrt
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/f1xa99e6.aspx

	.686
	.model	flat, C
	.code

sqrt	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
	fsqrt				; st(0) = square root of argument
	ret

sqrt	endp
	end

trunc() Function

The function trunc() returns the by magnitude largest integral value not greater than its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double trunc(double argument)
{
    double tmp;

    if ((argument > 0.0) && (argument < 0x1.0p+52)) {
        tmp = argument;
        argument += 0x1.0p+52;
        argument -= 0x1.0p+52;
        if (argument > tmp)
            argument -= 1.0;
    } else if ((argument < 0.0) && (argument > -0x1.0p+52)) {
        tmp = argument;
        argument -= 0x1.0p+52;
        argument += 0x1.0p+52;
        if (argument < tmp)
            argument += 1.0;
        else if (argument == 0.0)
            argument = -0.0;
    } else if (argument != 0.0)
        argument += 0.0;

    return argument;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: requires SSE 4.1 instruction set!

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = argument
trunc:
	roundsd	xmm0, xmm0, 3		# xmm0 = argument rounded towards zero
	ret

.size	trunc, .-trunc
.type	trunc, @function
.global	trunc
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720727.aspx

	.686
	.model	flat, C
	.code

trunc	proc	public			; [esp+4] = argument

	fld	real8 ptr [esp+4]	; st(0) = argument
if 0
	ftst
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jz	Lexit			; argument = ±0.0?

	fld1				; st(0) = 1.0,
					; st(1) = argument
	fld	st(1)			; st(0) = argument,
					; st(1) = 1.0,
					; st(2) = argument
Lmodulo:
	fprem				; st(0) = argument modulo 1.0
					;       = argument',
					; st(1) = 1.0,
					; st(2) = argument
	fstsw	ax			; ax = FPU status word,
					; ah = B:C3:T:O:P:C2:C1:C0
	sahf				; SF:ZF:0:AF:0:PF:1:CF = ah
	jp	Lmodulo			; |argument'| >= 1.0?

	fstp	st(1)			; st(0) = argument',
					; st(1) = argument
	fsubp	st(1), st(0)		; st(0) = argument - argument'
					;       = trunc(argument)
Lexit:
else
	fld	st(0)			; st(0) = argument,
					; st(1) = argument
	fabs				; st(0) = |argument|,
					; st(1) = argument
	fld	st(0)			; st(0) = |argument|,
					; st(1) = |argument|,
					; st(2) = argument
	frndint				; st(0) = rint(|argument|),
					; st(1) = |argument|,
					; st(2) = argument
	fxch	st(1)			; st(0) = |argument|,
					; st(1) = rint(|argument|),
					; st(2) = argument
	fucomip	st(0), st(1)		; eflags = |argument| ><=# rint(|argument|),
					; st(0) = rint(|argument|),
					; st(1) = argument
	fldz				; st(0) = 0.0,
					; st(1) = rint(|argument|),
					; st(2) = argument
	fld1				; st(0) = 1.0,
					; st(1) = 0.0,
					; st(2) = rint(|argument|),
					; st(3) = argument
	fcmovnb	st(0), st(1)		; st(0) = (rint(|argument|) <= |argument|) ? 0.0 : 1.0,
					; st(1) = 0.0,
					; st(2) = rint(|argument|),
					; st(3) = argument
	fsubp	st(2), st(0)		; st(0) = 0.0,
					; st(1) = trunc(|argument|),
					; st(2) = argument
	fucomip	st(0), st(2)		; eflags = 0.0 ><=# argument,
					; st(0) = trunc(|argument|),
					; st(1) = argument
	fst	st(1)			; st(0) = trunc(|argument|),
					; st(1) = trunc(|argument|)
	fchs				; st(0) = -trunc(|argument|),
					; st(1) = trunc(|argument|)
	fcmovbe	st(0), st(1)		; st(0) = (argument >= 0.0) ? trunc(|argument|) : -trunc(|argument|)
					;       = trunc(argument),
					; st(1) = trunc(|argument|)
	fstp	st(1)			; st(0) = trunc(argument)
endif
	ret

trunc	endp
	end

Special (bit-twiddling) Functions

ceil() Function

The function ceil() returns the smallest integral value not less than its argument.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: ceil() returns -0.0 for argument in (-1.0, -0.0]

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
ceil:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?

	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lsmall			# |argument| < 1.0?

	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?

	neg	ecx			# ecx = number of bits in fractional part of mantissa
	mov	rdx, rax
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	xor	rdx, rax		# rdx = fractional part of mantissa
	movq	xmm0, rax		# xmm0 = trunc(argument)
	neg	rdx			# CF = (fractional part of mantissa <> 0)
	sbb	ecx, ecx		# ecx = (fractional part of mantissa <> 0) ? -1 : 0
	shr	ecx, 22			# ecx = (fractional part of mantissa <> 0) ? 0x3FF : 0
	cqo				# rdx = (trunc(argument) < 0.0) ? -1 : 0
	not	edx			# edx = (trunc(argument) < 0.0) ? 0 : -1
	and	edx, ecx
	shl	rdx, 52			# rdx = (trunc(argument) < 0.0)
					#     | (fractional part of mantissa = 0)
					#     ? 0 : 0x3FF0000000000000
	movq	xmm1, rdx		# xmm0 = (trunc(argument) < 0.0)
					#      | (fractional part of mantissa = 0)
					#      ? 0.0 : 1.0
	addsd	xmm0, xmm1		# xmm0 = ceil(argument)
	ret
.Lsmall:
	test	rax, rax
	jns	.Lpositive
.Lnegative:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lpositive:
	mov	rax, 0x3FF0000000000000
	movq	xmm0, rax		# rax = 0x1.0p+0
					#     = 1.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret

.size	ceil, .-ceil
.type	ceil, @function
.global	ceil
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/atdhw2dx.aspx

; NOTE: ceil() returns -0.0 for argument in (-1.0, -0.0]

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

ceil	proc	public			; xmm0 = argument

	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?

	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	comisd	xmm1, xmm0		; CF = (rint(argument) < argument)
	adc	rax, 0			; rax = llrint(argument)
					;     + (rint(argument) < argument)
					;     = ceil(argument)
	cvtsi2sd xmm2, rax		; xmm2 = ceil(argument)
	xorpd	xmm1, xmm1		; xmm1 = 0.0
	subsd	xmm1, xmm0		; xmm1 = -argument
	xorpd	xmm0, xmm1		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = ceil(argument)
Lexit:
	ret

ceil	endp
	end
Note: returns a signaling NaN unchanged!

copysign() Function

The function copysign() returns its first operand with the sign of its second operand.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double fabs(double x);
int signbit(double x);

double copysign(double to, double from)
{
#if 0
    return signbit(from) ? -fabs(to) : fabs(to);
#else
    return signbit(from) == signbit(to) ? to : -to;
#endif
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = to
					# xmm1 = from
copysign:
	movq	rcx, xmm0		# rcx = to
	movq	rdx, xmm1		# rdx = from
	add	rdx, rdx		# CF = (from & -0.0)
	adc	rcx, rcx
	ror	rcx, 1
	movq	xmm0, rcx		# xmm0 = copysign(to, from)
	ret

.size	copysign, .-copysign
.type	copysign, @function
.global	copysign
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/0yafk1hc.aspx

	.686
	.model	flat, C
	.code

copysign proc	public			; [esp+12] = from
					; [esp+4] = to

	mov	eax, [esp+16]		; eax = high dword of from
if 0
	mov	edx, [esp+8]		; edx = high dword of to
	add	eax, eax		; CF = (from & -0.0)
	adc	edx, edx
	ror	edx, 1
	mov	[esp+8], edx
else
	shld	[esp+8], eax, 1
	ror	dword ptr [esp+8], 1
endif
	fld	real8 ptr [esp+4]	; st(0) = (from & -0.0) ? -|to| : |to|
	ret

copysign endp
	end

floor() Function

The function floor() returns the largest integral value not greater than its argument.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: floor() preserves -0.0

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
floor:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?

	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lsmall			# |argument| < 1.0?

	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?

	neg	ecx			# ecx = number of bits in fractional part of mantissa
	mov	rdx, rax
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	xor	rdx, rax		# rdx = fractional part of mantissa
	movq	xmm0, rax		# xmm0 = trunc(argument)
	neg	rdx			# CF = (fractional part of mantissa <> 0)
	sbb	ecx, ecx		# ecx = (fractional part of mantissa <> 0) ? -1 : 0
	shr	ecx, 22			# ecx = (fractional part of mantissa <> 0) ? 0x3FF : 0
	cqo				# rdx = (trunc(argument) < 0.0) ? -1 : 0
	and	edx, ecx
	shl	rdx, 52			# rdx = (trunc(argument) < 0.0)
					#     & (fractional part of mantissa <> 0)
					#     ? 0x3FF0000000000000 : 0
	movq	xmm1, rdx		# xmm1 = (trunc(argument) < 0.0)
					#      & (fractional part of mantissa <> 0)
					#      ? 1.0 : 0.0
	subsd	xmm0, xmm1		# xmm0 = floor(argument)
	ret
.Lsmall:
	test	rax, rax
	js	.Lnegative
.Lpositive:
	xorpd	xmm0, xmm0		# xmm0 = 0.0
	ret
.Lnegative:
	mov	rax, 0xBFF0000000000000
	movq	xmm0, rax		# rax = -0x1.0p+0
					#     = -1.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret

.size	floor, .-floor
.type	floor, @function
.global	floor
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/x39715t6.aspx

; NOTE: floor() preserves -0.0

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

floor	proc	public			; xmm0 = argument

	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?

	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	comisd	xmm0, xmm1		; CF = (rint(argument) > argument)
	sbb	rax, 0			; rax = llrint(argument)
					;     - (rint(argument) > argument)
					;     = floor(argument)
	cvtsi2sd xmm2, rax		; xmm2 = floor(argument)
	xorpd	xmm1, xmm1		; xmm1 = 0.0
	subsd	xmm1, xmm0		; xmm1 = -argument
	xorpd	xmm0, xmm1		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = floor(argument)
Lexit:
	ret

floor	endp
	end
Note: returns a signaling NaN unchanged!

frexp() Function

The function frexp() returns the (normalized) fraction and the (integral) exponent of its first argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double frexp(double argument, int *exponent)
{
    unsigned long long sign, ull;

    if (argument == 0.0)
        *exponent = 0;
    else {
        ull = *(unsigned long long *) &argument;
        *exponent = ull >> 52;
        *exponent &= 2047;
        if (*exponent > 0) {
            ull &= ~(2047ULL << 52);
            ull |= 1022ULL << 52;
        } else {
            sign = ull & (1ULL << 63);
            do {
                *exponent -= 1;
                ull += ull;
            } while (ull < (1ULL << 52));
            ull ^= 1023ULL << 52;
            ull |= sign;
        }
        *exponent -= 1022;
        argument = *(double *) &ull;
    }

    return argument;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
					# rdi = address of exponent
frexp:
	movq	rax, xmm0		# rax = argument
	lea	rcx, [rax+rax]		# rcx = argument << 1
					#     = |argument| << 1
	shr	rcx, 1			# rcx = |argument|
	mov	[rdi], ecx
	jz	.Lexit			# argument = ±0.0?

	shr	rcx, 52			# rcx = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	sub	ecx, BIAS - 1		# ecx = unbiased exponent + 1
	cmp	ecx, BIAS + 2
	mov	[rdi], ecx
	je	.Lexit			# unbiased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = ±INFINITY?)
.Lnormal:
	rol	rax, 1
	shl	rax, 11
	or	rax, BIAS - 1
	ror	rax, 12			# rax = fractional part of argument
	movq	xmm0, rax		# xmm0 = fractional part of argument
.Lexit:
	ret
.Ldenormal:
	xor	edx, edx
	add	rax, rax		# rax = argument << 1
					#     = |argument| << 1
	adc	edx, edx		# rdx = (argument & -0.0) ? 1 : 0
	shl	edx, 11
	or	edx, BIAS - 1
	bsr	rcx, rax		# rcx = index of most significant '1' bit in |argument| << 1
	xor	ecx, 63			# ecx = number of leading '0' bits in |argument| << 1
					#     = 11 - biased exponent
	shl	rax, cl			# rax = normalized significand of argument << 11
	add	rax, rax		# rax = fractional part of argument << 12
	or	rax, rdx
	ror	rax, 12			# rax = fractional part of argument
	movq	xmm0, rax		# xmm0 = fractional part of argument
	neg	ecx			# ecx = biased exponent - 11
	sub	ecx, BIAS - 12		# ecx = unbiased exponent + 1
	mov	[rdi], ecx
	ret

.size	frexp, .-frexp
.type	frexp, @function
.global	frexp
.end

ldexp() Function

The function ldexp() returns its first argument multiplied by 2 raised to the power of its (integral) second argument.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: for denormal argument and negative exponent or denormal
#       result, ldexp() rounds to nearest with ties to even!

.arch	generic64
.code64
.equiv	BIAS, 1023
.equiv	JCCLESS, 1
.intel_syntax noprefix
.text
					# xmm0 = argument
					# edi = exponent
ldexp:
	test	edi, edi
	jz	.Lexit			# exponent = 0?

	movq	rsi, xmm0		# rsi = argument
	lea	rax, [rsi+rsi]		# rax = argument << 1
					#     = |argument| << 1
	shr	rax, 1			# rax = |argument|
	jz	.Lexit			# argument = ±0.0?

	mov	rdx, rax		# rdx = |argument|
	shr	rax, 52			# rax = biased exponent
	jz	.Ldenormal		# biased exponent = 0?
					# (argument denormal?)
	cmp	eax, BIAS * 2 + 1
	je	.Lexit			# biased exponent = 2047?
					# (argument = INDEFINITE?)
					# (argument = ±INFINITY?)
.Lnormal:
	add	eax, edi		# eax = new biased exponent
	jle	.Lotherflow		# new biased exponent < 1?
					# (possible exponent underflow?)
	cmp	eax, BIAS * 2
	jg	.Loverflow		# new biased exponent > 2046?
					# (exponent overflow?)
	shl	rdi, 52
	add	rdx, rdi		# rdx = |argument| * 2.0**exponent
.Lcopysign:
.if 0
	shld	rdx, rsi, 1
	ror	rdx, 1			# rdx = argument * 2.0**exponent
.elseif 0
	add	rdx, rdx
	add	rsi, rsi		# CF = (argument & -0.0)
	rcr	rdx, 1			# rdx = argument * 2.0**exponent
.else
	add	rsi, rsi		# CF = (argument & -0.0)
	adc	rdx, rdx
	ror	rdx, 1			# rdx = argument * 2.0**exponent
.endif
	movq	xmm0, rdx		# xmm0 = argument * 2.0**exponent
.Lexit:
	ret
.Lunderflow:
	xor	rdx, rdx		# rdx = 0.0
	jmp	.Lcopysign
.Loverflow:
	mov	rdx, 0x7FF0000000000000	# rdx = 0x1.0p+1024
					#     = INFINITY
	jmp	.Lcopysign
.Lotherflow:
	cmp	eax, -52
	jl	.Lunderflow		# new (biased) exponent + 1 < -52?
					# (exponent underflow, even with mantissa rounded up?)

	dec	eax			# eax = new biased exponent
	neg	eax			# eax = 0 - new biased exponent
	mov	ecx, eax		# ecx = 0 - new biased exponent
					#     = shift count
	mov	rax, 0x000FFFFFFFFFFFFF
	and	rdx, rax		# rdx = mantissa
	inc	rax			# rax = 0x0010000000000000
					#     = explicit integer bit
	or	rdx, rax		# rdx = 1.mantissa
					#     = significand
	xor	eax, eax
.Lcontinue:
	shrd	rax, rdx, cl		# rax = excess part of significand
	shr	rdx, cl			# rdx = significand >> -(new biased exponent)
					#     = |argument| * 2.0**exponent
.ifnotdef JCCLESS
	add	rax, rax		# rax = excess part of significand << 1,
					# CF = (excess part of significand >= 0x8000000000000000),
					# ZF = (excess part of significand = 0x8000000000000000)
	jnc	.Lcopysign		# excess part of significand < 0x8000000000000000?
	jnz	.Lround			# excess part of significand > 0x8000000000000000?
.Ltie:
	bt	edx, 0			# CF = (significand odd) ? 1 : 0
.Lround:
	adc	rdx, 0			# rdx = significand rounded to nearest even
.else
	xor	ecx, ecx
	add	rax, rax		# rax = excess part of significand << 1,
					# CF = (excess part of significand >= 0x8000000000000000),
					# ZF = (excess part of significand = 0x8000000000000000)
	adc	ecx, ecx		# ecx = (excess part of significand < 0x8000000000000000) ? 0 : 1
	neg	rax			# CF = (excess part of significand <> 0x8000000000000000)
	sbb	eax, eax		# eax = (excess part of significand = 0x8000000000000000) ? 0 : -1
	or	eax, edx
	and	eax, ecx		# rax = (excess part of significand > 0x8000000000000000)
					#     | (excess part of significand = 0x8000000000000000)
					#     & (significand odd) ? 1 : 0
	add	rdx, rax		# rdx = significand rounded to nearest even
.endif # JCCLESS
	jmp	.Lcopysign
.Ldenormal:
	bsr	rcx, rdx		# rcx = index of most significant '1' bit in |argument|

	test	edi, edi
	js	.Lnegative		# exponent < 0?

	xor	ecx, 63			# ecx = number of leading '0' bits in |argument|
	sub	ecx, 12			# ecx = number of leading '0' bits in mantissa
	cmp	ecx, edi
	jb	.Lnormalize		# exponent > number of leading '0' bits in mantissa?

	mov	ecx, edi
	shl	rdx, cl			# rdx = mantissa << exponent
					#     = |argument| << exponent
					#     = |argument| * 2.0**exponent
	jmp	.Lcopysign
.Lnegative:
.if 0
	add	ecx, edi		# ecx = index of most significant '1' bit in mantissa
					#     + exponent
					#     = new index of most significant '1' bit in mantissa
	inc	ecx			# ecx = new index of most significant '1' bit in significand
.else
	stc
	adc	ecx, edi		# ecx = new index of most significant '1' bit in significand
.endif
	js	.Lunderflow		# mantissa underflow, even with mantissa rounded up?

	neg	edi
	mov	ecx, edi		# ecx = -exponent
#	shrd	rax, rdx, cl
#	shr	rdx, cl			# rdx = mantissa >> exponent
					#     = |argument| >> exponent
					#     = |argument| * 2.0**exponent
	jmp	.Lcontinue
.Lnormalize:
	inc	ecx			# ecx = number of leading '0' bits in mantissa + 1
					#     = number of leading '0' bits in significand
	sub	edi, ecx		# edi = exponent
					#     - number of leading '0' bits in significand
					#     = new (biased) exponent
	cmp	edi, BIAS * 2
	jge	.Loverflow		# new biased exponent > 2046?
					# (exponent overflow?)
	shl	rdi, 52
	shl	rdx, cl			# rdx = significand << number of leading '0' bits in significand
					#     = |argument| << number of leading '0' bits in significand
	add	rdx, rdi		# rdx = |argument| * 2.0**exponent
	jmp	.Lcopysign
	ret

.size	ldexp, .-ldexp
.type	ldexp, @function
.global	ldexp
.end

modf() Function

The function modf() returns the fractional and integral parts of its argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

#define INFINITY (1.0 / 0.5e-323)

double fabs(double x);
double trunc(double x);

double modf(double argument, double *integer)
{
    *integer = trunc(argument);

    return fabs(argument) == INFINITY ? 1.0 / argument : argument - *integer;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
					# rdi = address of integer part
modf:
	movq	rax, xmm0		# rax = argument
	shr	rax, 52			# rax = sign and biased exponent
	mov	ecx, BIAS * 2 + 1
	and	ecx, eax		# rcx = biased exponent
	sub	eax, ecx
	shl	rax, 52			# rax = sign of argument
	mov	[rdi], rax		# *integer = ±0.0
	sub	ecx, BIAS		# rcx = biased exponent - 1023
					#     = unbiased exponent
	js	.Lexit			# unbiased exponent < 0?
					# (no integer part?)
	cmp	ecx, 52
	jge	.Linteger		# no fractional part?

	mov	rdx, 0x000FFFFFFFFFFFFF
	shr	rdx, cl			# rdx = mask for fractional part of mantissa
	movq	rcx, xmm0		# rcx = argument
	test	rcx, rdx
	jz	.Linteger		# fractional part of mantissa = 0?
					# (argument is integer?)
.Lfraction:
	not	rdx			# rdx = mask for sign, biased exponent and integer part of mantissa
	and	rdx, rcx		# rdx = sign, biased exponent and integer part of mantissa
					#     = integer part of argument
	mov	[rdi], rdx
	movq	xmm1, rdx		# xmm1 = integer part of argument
	subsd	xmm0, xmm1		# xmm0 = argument - integer part of argument
					#      = fractional part of argument
	ret
.Linteger:
	movq	[rdi], xmm0		# *integer = argument
	movq	xmm0, rax		# xmm0 = ±0.0
					#      = fractional part of argument
.Lexit:
	ret

.size	modf, .-modf
.type	modf, @function
.global	modf
.end

nextafter() Function

The function nextafter() returns the next representable double-precision floating-point number from its first argument in direction of its second argument.
// Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

double nextafter(double from, double to)
{
    if (from == to)
        return to;

    if (to != to)
        return to;

    if (from != from)
        return from;

    if (from == 0.0)
        return to < 0.0 ? -0x1.0p-1074 : 0x1.0p-1074;
#if 0
    if ((from < to) && (from < 0.0)
     || (from > to) && (from > 0.0))
#elif 0
    if ((from > to) == (from > 0.0))
#else
    if ((from < to) == (from < 0.0))
#endif
        --*(unsigned long long *) &from; // from -= 1 ULP
    else
        ++*(unsigned long long *) &from; // from += 1 ULP

    return from;
}
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	generic64
.code64
.intel_syntax noprefix
.text
					# xmm0 = from
					# xmm1 = to
nextafter:
	ucomisd	xmm1, xmm0		# CF = (from > to)
	je	.Lspecial		# from = to?
					# from = INDEFINITE?
					# to = INDEFINITE?
.Lnotequal:
	sbb	rdx, rdx		# rdx = (from > to) ? -1 : 0
	movq	rcx, xmm0		# rcx = from
	mov	rax, rcx
	add	rax, rax		# CF = (from & -0.0)
	jz	.Lzero			# from = ±0.0?
.Lnext:
	sbb	rax, rax		# rax = (from < 0.0) ? -1 : 0
	xor	rax, rdx		# rax = (from < 0.0) ^ (from > to) ? -1 : 0
					#     = (from < 0.0) & (from < to)
					#     | (from > 0.0) & (from > to) ? -1 : 0
	or	rax, 1			# rax = (from < 0.0) ^ (from > to) ? -1 : 1
					#     = (from < 0.0) = (from < to) ? -1 : 1
					#     = (from > 0.0) = (from > to) ? -1 : 1
	add	rax, rcx
	movq	xmm0, rax		# xmm0 = from ± 1 ULP
.ifdef COMPLIANT
	xorpd	xmm1, xmm1
	addsd	xmm1, xmm0
.endif
	ret
.Lzero:
	movmskpd eax, xmm1		# rax = (to & -0.0) ? 0b?1 : 0b?0
	or	eax, 2			# rax = (to & -0.0) ? 0b11 : 0b10
	ror	rax, 1			# rax = (to & -0.0) ? 0x8000000000000001 : 1
	movq	xmm0, rax		# xmm0 = (to & -0.0) ? -0x1.0p-1074 : 0x1.0p-1074
.ifdef COMPLIANT
	xorpd	xmm1, xmm1
	addsd	xmm1, xmm0
.endif
	ret
.Lspecial:
	jp	.Lindefinite		# to = INDEFINITE?
					# from = INDEFINITE?
.Lequal:
	movsd	xmm0, xmm1		# xmm0 = to
	ret
.Lindefinite:
	addsd	xmm0, xmm1		# xmm0 = INDEFINITE
	ret

.size	nextafter, .-nextafter
.type	nextafter, @function
.global	nextafter
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/h0dff77w.aspx

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

nextafter proc	public			; xmm0 = from
					; xmm1 = to
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	ucomisd	xmm1, xmm2		; CF = (to < 0.0)
	jp	Lto			; to = INDEFINITE?

	sbb	rax, rax		; rax = (to < 0.0) ? -1 : 0
	ucomisd	xmm0, xmm1		; CF = (from < to)
;;	jp	Lfrom			; from = INDEFINITE?
;;	je	Lto			; from = to?
	je	Lspecial		; from = to?
					; from = INDEFINITE?
Lnotequal:
	sbb	rcx, rcx		; rcx = (from < to) ? -1 : 0
	ucomisd	xmm0, xmm2		; CF = (from < 0.0)
	jz	Lzero			; from = ±0.0?
Lnext:
	movd	rdx, xmm0		; rdx = from
	sbb	rax, rax		; rax = (from < 0.0) ? -1 : 0
	xor	rax, rcx		; rax = (from < 0.0) = (from < to) ? 0 : -1
	or	rax, 1			; rax = (from < 0.0) = (from < to) ? 1 : -1
	sub	rdx, rax
	movd	xmm0, rdx		; xmm0 = from ± 1 ULP
ifdef MXCSR
	addsd	xmm2, xmm0
endif
	ret
Lzero:
	shl	rax, 63			; rax = (to < 0.0) ? 0x8000000000000000 : 0
	or	rax, 1			; rax = (to < 0.0) ? 0x8000000000000001 : 1
	movd	xmm0, rax		; xmm0 = (to < 0.0) ? -0x1.0p-1074 : 0x1.0p-1074
ifdef MXCSR
	addsd	xmm2, xmm0
endif
	ret
Lspecial:
	jp	Lfrom			; from = INDEFINITE?
Lto:
	movsd	xmm0, xmm1		; xmm0 = to
Lfrom:
	ret

nextafter endp
	end
Note: returns signaling NaNs unchanged!
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

.arch	i686
.code32
.intel_syntax noprefix
.text
					# [esp+12] = to
					# [esp+4] = from
nextafter:
	fld	real8 ptr [esp+4]	# st(0) = from
	fld	real8 ptr [esp+12]	# st(0) = to,
					# st(1) = from
	fucomi	st(0), st(1)
	je	.Lspecial		# from = to?
					# from = INDEFINITE?
					# to = INDEFINITE?
	sbb	edx, edx		# edx = (to < from) ? -1 : 0
	fsub	st(0), st(0)		# st(0) = 0.0,
					# st(1) = from
	fucomip	st(0), st(1)		# st(0) = from
	jz	.Lzero			# from = ±0.0?

	sbb	eax, eax		# eax = (from > 0.0) ? -1 : 0
	xor	eax, edx		# eax = (from > 0.0) ^ (from < to) ? -1 : 0
					#     = (from > 0.0) & (from > to)
					#     | (from < 0.0) & (from < to) ? -1 : 0
	or	eax, 1			# eax = (from > 0.0) ^ (from < to) ? -1 : 1
					#     = (from > 0.0) = (from > to) ? -1 : 1
					#     = (from < 0.0) = (from < to) ? -1 : 1
	cdq				# edx:eax = (from < 0.0) = (from > to) ? -1 : 1
	sub	[esp+4], eax
	sbb	[esp+8], edx		# from = from
					#      - (from < 0.0) = (from > to) ? -1 : 1
					#      = from'
	fld	real8 ptr [esp+4]	# st(0) = from',
					# st(1) = from
	fstp	st(1)			# st(0) = nextafter(from, to)
	ret
.Lzero:
	and	dword ptr [esp+16], 0x80000000
	mov	dword ptr [esp+12], 0x1	# to = (to & -0.0) ? 0x8000000000000001 : 1
	fld	real8 ptr [esp+12]	# st(0) = (to & -0.0) ? -0x1.0p-1074 : 0x1.0p-1074
					# st(1) = from
.Lequal:
	fstp	st(1)			# st(0) = nextafter(from, to)
	ret
.Lspecial:
	jnp	.Lequal			# from = to?
.Lindefinite:
	faddp	st(1), st(0)		# st(0) = from + to
					#       = INDEFINITE
	ret

.size	nextafter, .-nextafter
.type	nextafter, @function
.global	nextafter
.end

rint() Function

The function rint() returns the according to the current rounding mode nearest integral value to its argument.
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/dn465165.aspx

; NOTE: rint() preserves -0.0, and returns -0.0 for argument in
;       [-0.5, -0.0] or (-1.0, -0.0]

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

rint	proc	public			; xmm0 = argument

	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?

	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvtsd2si rax, xmm0		; rax = llrint(argument)
	cvtsi2sd xmm1, rax		; xmm1 = rint(argument)
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	subsd	xmm2, xmm0		; xmm2 = -argument
	xorpd	xmm0, xmm2		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm1		; xmm0 = rint(argument)
Lexit:
	ret

rint	endp
	end
Note: returns a signaling NaN unchanged!

round() Function

The function round() returns the nearest integral value to its argument, rounding ties away from 0.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: round() returns -0.0 for argument in (-0.5, -0.0]

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
round:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?

	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS - 1		# rcx = 1 + unbiased exponent of |argument|
	jl	.Lzero			# |argument| < 0.5?

	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 53
	jge	.Lexit			# |argument| >= 0x1.0p+52?

	neg	ecx			# ecx = number of bits in fractional part of mantissa
	shr	rax, cl			# CF = (fraction >= 0.5)
	sbb	edx, edx		# edx = (fraction >= 0.5) ? -1 : 0
	shl	rax, cl			# rax = trunc(argument)
	movq	xmm1, rax		# xmm1 = trunc(argument)
	movmskpd eax, xmm0		# rax = (argument & -0.0) ? 0b?1 : 0b?0
	shr	edx, 22			# edx = (fraction >= 0.5) ? 0x3FF : 0
	shl	eax, 11
	or	eax, edx
	shl	rax, 52			# rax = (fraction >= 0.5) ? 0x3FF0000000000000 : 0
					#     | (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rax		# xmm0 = {-1.0, 0.0, 1.0}
	addsd	xmm0, xmm1		# xmm0 = round(argument)
	ret
.Lzero:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret

.size	round, .-round
.type	round, @function
.global	round
.end

trunc() Function

The function trunc() returns the largest integral value not greater than the magnitude of its argument.
# Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

# NOTE: trunc() returns -0.0 for argument in (-1.0, -0.0]

.arch	generic64
.code64
.equiv	BIAS, 1023
.intel_syntax noprefix
.text
					# xmm0 = argument
trunc:
	movq	rax, xmm0		# rax = argument
	mov	rcx, rax		# rcx = argument
	add	rcx, rcx
	jz	.Lexit			# argument = ±0.0?

	shr	rcx, 53			# rcx = biased exponent of |argument|
	sub	ecx, BIAS		# rcx = unbiased exponent of |argument|
	jl	.Lzero			# |argument| < 1.0?

	cmp	ecx, BIAS
	jg	.Lmxcsr			# argument = ±INFINITY?
					# argument = INDEFINITE?
	sub	ecx, 52
	jge	.Lexit			# |argument| >= 0x1.0p+52?

	neg	ecx			# ecx = number of bits in fractional part of mantissa
	shr	rax, cl
	shl	rax, cl			# rax = trunc(argument)
	movq	xmm0, rax		# xmm0 = trunc(argument)
	ret
.Lzero:
	cqo				# rdx = (argument & -0.0) ? -1 : 0
	shl	rdx, 63			# rdx = (argument & -0.0) ? 0x8000000000000000 : 0
	movq	xmm0, rdx		# xmm0 = (argument & -0.0) ? -0.0 : 0.0
	ret
.Lmxcsr:
	addsd	xmm0, xmm0
.Lexit:
	ret

.size	trunc, .-trunc
.type	trunc, @function
.global	trunc
.end
; Copyright © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>

; https://msdn.microsoft.com/en-us/library/mt720727.aspx

; NOTE: trunc() returns -0.0 for argument in (-1.0, -0.0]

	.code

double	record	sign:1, exponent:11, mantissa:52

bias	equ	1 shl (width exponent - 1) - 1

trunc	proc	public			; xmm0 = argument

	movd	rax, xmm0		; rax = argument
	add	rax, rax
	jz	Lexit			; argument = ±0.0?

	shr	rax, 1 + width mantissa	; rax = biased exponent of |argument|
	cmp	eax, bias + width mantissa
	jae	Lexit			; |argument| > 0x1.0p+52?
					; (argument = integer?)
					; argument = INDEFINITE?
	cvttsd2si rax, xmm0		; rax = trunc(argument)
	cvtsi2sd xmm1, rax		; xmm1 = trunc(argument)
	xorpd	xmm2, xmm2		; xmm2 = 0.0
	subsd	xmm2, xmm0		; xmm2 = -argument
	xorpd	xmm0, xmm2		; xmm0 = (argument & -0.0) ? -0.0 : +0.0
	orpd	xmm0, xmm2		; xmm0 = trunc(argument)
Lexit:
	ret

trunc	endp
	end
Note: returns a signaling NaN unchanged!

Bibliography

Ole Møller, Quasi Double-Precision in Floating Point Addition, BIT Numerical Mathematics, Volume 5(1):37-50, March 1965, ISSN 0006-3835, 1572-9125.

Theodorus J. Dekker, A Floating-Point Technique for Extending the Available Precision, Numerische Mathematik, Volume 18(3):224-242, June 1971, ISSN 0029-599X, 0945-3245.

Pat H. Sterbenz, Floating-Point Computation, Prentice-Hall, 1974, ISBN 0-13-322495-3.

William J. Cody and William Waite, Software Manual for the Elementary Functions, Prentice-Hall, 1980, ISBN 0-13-822064-6.

Seppo I. Linnainmaa, Software for Doubled-Precision Floating-Point Computations, ACM Transactions on Mathematical Software, Volume 7(3):272-283, September 1981, ISSN 0098-3500, 1557-7295.

Mary H. Payne and Robert N. Hanek, Radian Reduction for Trigonometric Functions, ACM SIGNUM Newsletter, Volume 18(1):19-24, January 1983, ISSN 0163-5778.

Mary H. Payne and Robert N. Hanek, Degree Reduction for Trigonometric Functions, ACM SIGNUM Newsletter, Volume 18(2):18-19, April 1983, ISSN 0163-5778.

Sylvie Boldo and Guillaume Melquiond, Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd, IEEE Transactions on Computers, Volume 57(4):462-471, April 2008, ISSN 0018-9340, 1557-9956.

Contact and Feedback

If you miss anything here, have additions, comments, corrections, criticism or questions, want to give feedback, hints or tipps, report broken links, bugs, deficiencies, errors, inaccuracies, misrepresentations, omissions, shortcomings, vulnerabilities or weaknesses, …: don’t hesitate to contact me and feel free to ask, comment, criticise, flame, notify or report!

Use the X.509 certificate to send S/MIME encrypted mail.

Note: email in weird format and without a proper sender name is likely to be discarded!

I dislike HTML (and even weirder formats too) in email, I prefer to receive plain text.
I also expect to see your full (real) name as sender, not your nickname.
I abhor top posts and expect inline quotes in replies.

Terms and Conditions

By using this site, you signify your agreement to these terms and conditions. If you do not agree to these terms and conditions, do not use this site!

Data Protection Declaration

This web page records no (personal) data and stores no cookies in the web browser.

The web service is operated and provided by

Telekom Deutschland GmbH
Business Center
D-64306 Darmstadt
Germany
<‍hosting‍@‍telekom‍.‍de‍>
+49 800 5252033

The web service provider stores a session cookie in the web browser and records every visit of this web site with the following data in an access log on their server(s):


Copyright © 2005–2024 • Stefan Kanthak • <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>