Valid HTML 4.01 Transitional Valid CSS Valid SVG 1.0

Me, myself & IT

Not Quite so Optimising Microsoft® Visual C Compiler

Purpose
Side Note
Execution Times (Sustained Reciprocal Throughput)
Case 0
Case 1
Case 2
Case 3
Case 4
Case 5
Case 6
Case 7
Case 8
Case 9
Case 10
Case 11
Case 12
Case 13
Case 14
Case 15
Case 16
Case 17
Case 18
Case 19
Case 20
Case 21
Case 22
Case 23
Case 24
Case 25
Case 26
Case 27
Case 28
Case 29
Case 30
Case 31
Case 32
Case 33
Case 34
Case 35
Case 36

Purpose

Demonstrate poor and unoptimised or wrong code generation of Microsoft’s (not quite so) optimising Visual C compiler, with (currently) 37 cases, plus a side note on the implementation of the arithmetic division and multiplication routines for 64-bit integers on the x86 alias I386 processor architecture.

The most hilarious cases are 0, 1 and 5, the bugs are shown in cases 13, 34 and 35, while the worst cases are presented in cases 8 to 11, 15, 20 to 29, 32, 33 and 36.

Side Note

On the x86 alias I386 processor architecture, 64-bit arithmetic 64÷64-bit division and 64×64-bit multiplication operations are implemented via calls of several (almost) undocumented compiler helper routines: _alldiv(), _allrem() and _alldvrm() for division of two signed 64-bit integers, returning a signed 64-bit quotient, remainder or both, _aulldiv(), _aullrem() and _aulldvrm() for division of two unsigned 64-bit integers, returning an unsigned 64-bit quotient, remainder or both, plus _allmul() for multiplication of two signed as well as unsigned 64-bit integers, returning the (un)signed product modulo 264.
Additionally, 64-bit shift operations are implemented via calls of some more undocumented helper routines: _allshl() for both a signed or an unsigned 64-bit integer, _allshr() for a signed 64-bit integer, plus _aullshr() for an unsigned 64-bit integer.

Note: all helper routines use non-standard calling or naming convention, none of them can be called from C or C++ code!

Especially the implementation of the 64÷64-bit division routines (albeit written in assembler) is very poor: on current Intel® processors (i.e. those introduced in the last 15 years) they are about 5 to 9 times slower than properly optimised code, and about 7 to 11 times slower than native 128÷64-bit division operations.

Note: according to comments in the source code blcrtasm.asm, their initial version, handling 32-bit integers on 16-bit Intel 8086/8088 processors, was written November 29, 1983; they were modified November 19, 1993, to handle 64-bit integers on 32-bit Intel 80386 processors (introduced October 1985, i.e. 8 years earlier), but without taking advantage of the 32-bit processor’s new capabilities: the slow loop which shifts the operands by just one bit per pass with SHR and RCR instructions was not replaced with a BSR instruction followed by two pairs of SHLD plus SHL and SHRD plus SHR instructions to shift the operands in one go.

Measured on an Intel processor of the Core2 family running under Windows PE, dividing 16 billion pairs of 64-bit pseudo-random numbers produced by 6 different independent (deterministic random bit) generators, _aulldiv() and _aullrem() consume from 114 to 125 processor clock cycles per call; the assembler routines provided with my own NOMSVCRT.LIB consume from 16 to 32 processor clock cycles per call, while the native 64-bit machine instructions consume from 8 to 19 processor clock cycles per operation.

Note: codenamed Penryn, Wolfdale and Yorkfield, Intel introduced these processors from late 2007 to early 2009.

For comparison: the (corresponding) __divdi3(), __moddi3(), __udivdi3() and __umoddi3() assembler routines from the builtins library of LLVM’s compiler-rt runtime libraries, originally written in December 2008 by Apple’s Stephen Canon, adapted for the Visual C compiler and improved by me, consume from 27 to 37 processor clock cycles per call.

Caveat: these heavily-optimized assembly§ routines are not shipped with current packages of LLVM for Windows, but (nearly 5 times bigger and more than 2 times slower) optimized implementations§ written in C, which too are not properly optimised: instead to take advantage of the 64-bit shift operations supported by their own Clang compiler they use a bunch of conditionally executed complementary 32-bit left and right shift operations to handle shift counts below and above the word length individually, then combine their results with logical or operations.

Note: even this not so optimised C implementation is about 2 to 3 times faster than Microsoft’s assembler implementation!

§ These questionable attributions are made on the compiler-rt runtime libraries web page.

Warning: the _lldiv(), _llrem(), _ulldiv() and _ullrem() assembler routines published by AMD® in their Software Optimization Guide for AMD Family 15h Processors, Publication No. 47414, Revision 3.06, January 2012, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.13, February 2011, Software Optimization Guide for AMD Family 10h and 12h Processors, Publication No. 40546, Revision 3.10, February 2009, Software Optimization Guide for AMD64 Processors, Publication No. 25112, Revision 3.06, September 2005, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.04, March 2004, Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors, Publication No. 25112, Revision 3.03, September 2003, AMD Athlon Processor x86 Code Optimization Guide, Publication No. 22007, Revision K, February 2002, have bugs and return wrong results; for example, unsigned division of 18446744073709551615÷4294967299 yields the quotient 4294967294 instead of 4294967293 (in other notation (264−1)÷(232+3)=232−2 instead of 232−3, or 0xFFFFFFFFFFFFFFFF÷0x100000003=0xFFFFFFFE instead of 0xFFFFFFFD), and the remainder 18446744069414584325 instead of 8.
This bug shows for multiple (other) dividends too, and also with the divisors 7516192769=0x1C0000001, 15032385539=0x380000003, …!

Execution Times (Sustained Reciprocal Throughput)

Measurements are performed using 8 runs of 64-bit pseudo-random numbers, produced by 6 different independent (deterministic random bit) generators, with 1 billion divisions per run returning the quotient and 1 billion divisions per run returning the remainder, totalling in 16 billion divisions.

The table shows the execution times of 64÷64-bit division routines from different libraries on several processors, in average, minimum and maximum processor clock cycles per call, as well as their code sizes in bytes and number of instructions; the upper half for the routines written in assembler, the lower half for the native 64-bit hardware and the routines written in C.

NOMSVCRT.LIB LLVM Compiler-RT Microsoft
_aulldiv()_aullrem() _aulldiv()_aullrem() _aulldiv()_aullrem()
3639 50 [66]51 [71] 4243 instructions
92109 125 [157]132 [172] 102115 bytes
AMD Ryzen5 3600 581271012 121619131518 445059474953
AMD Ryzen7 2700X 91116111417 161823161823 535863566472 minimum,
Intel Core i5-8400 131619162023 162427162427 140146154139148154 average
Intel Core i5-7400 101837132038 132445132445 129136141132140148 and
Intel Core i5-6600 131619162123 162427162427 133144155135146154 maximum
Intel Core i5-4670 91619121922 172226162428 118128141119129137 number
Intel Core i5-3550 121620162023 172429182429 118130149122135143 of
Intel Core i3-2328M 21222426 23352235 129158142164 processor
Intel Core2 Duo P8700 161923182432 283237273136 114117121117122125 clock
Intel Core2 Duo E8500 171924202430 293237283136 114119125119124128 cycles
Intel Core2 Quad Q8400 171924202430 293338273237 116120127119125132 per
AMD A4-9125 Radeon3 253037222938 323748313848 9194969397102 function
AMD AthlonII X4 635 687275697173 647278647278 129134139131136138 call
Intel Pentium®4 8711215098124163 97129181117143182 301332376320351371

Note: the values in square brackets in the last two rows denote the number of instructions and bytes of the original __udivdi3() and __umoddi3() assembler routines.

Native LLVM Compiler-RT Microsoft
DIVREM __udivdi3()__umoddi3() __udivdi3()__umoddi3()
34 8 + 25433 + 254 8 + 26312 + 263 instructions
710 27 + 73279 + 732 27 + 63840 + 638 bytes
AMD Ryzen5 3600 57105710 314657375363 345675355773
AMD Ryzen7 2700X 57105710 365161435667 426176446174 minimum,
Intel Core i5-8400 172023172023 344956445865 436781476882 average
Intel Core i5-7400 161819161820 304552365159 335872335872 and
Intel Core i5-6600 172023172023 304956435765 366581396682 maximum
Intel Core i5-4670 182124182124 364754435663 325872355973 number
Intel Core i5-3550 212527212527 284453365258 336075356177 of
Intel Core i3-2328M 21262325 45655674 processor
Intel Core2 Duo P8700 8141791419 495967627282 607582617583 clock
Intel Core2 Duo E8500 516067637282 617683637684 cycles
Intel Core2 Quad Q8400 8141981419 506067637382 627783627784 per
AMD A4-9125 Radeon3 68106811 526677637388 6183996284102 function
AMD AthlonII X4 635 747680747680 516572607482 567182577280 call
Intel Pentium®4 99125162100143190 7611214578114146

Note: optimising the __udivmoddi4() routine for speed, Microsoft’s current Visual C 2017 compiler emits 9 instructions more than LLVM’s Clang compiler, counting but 94 bytes less; the __udivdi3() and __umoddi3() routines call __udivmoddi4() to perform the division, just like the __divdi3() and __moddi3() routines shown in case 23.

Case 0

According to their documentation on MSDN, the preprocessor macros Int32x32To64 and UInt32x32To64 defined in the header file WINNT.H of the Windows SDK (are supposed to) generate just a single multiply instruction:
Multiplies two signed 32-bit integers, returning a signed 64-bit integer result. The function performs optimally on 32-bit Windows.
[…]
This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Multiplies two unsigned 32-bit integers, returning an unsigned 64-bit integer result. The function performs optimally on 32-bit Windows.
[…]
This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Contrary to these stateadvertisements, the 32-bit Visual C compiler but generates a call of the (almost) undocumented compiler helper routine _allmul() instead of the single multiply instruction!

Demonstration

Note: if necessary, see the MSDN article Use the Microsoft C++ toolset from the command line for an introduction.
  1. Create the text file case0.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2004-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
    #define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
    
    int main(int argc)
    {
        long long x = argc * -argc;
        long long y = Int32x32To64(argc, -argc);
        long long z = UInt32x32To64(argc, -argc);
    }
  2. Generate the assembly listing file case0.asm from the source file case0.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Tccase0.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case0.c
  3. Display the assembly listing file case0.asm created in step 2.:

    TYPE case0.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case0.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Odtp
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _z$ = -24						; size = 8
    _y$ = -16						; size = 8
    _x$ = -8						; size = 8
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case0.c
    ; Line 7
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 24					; 00000018H
    	push	esi
    ; Line 8
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	imul	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	DWORD PTR _x$[ebp], eax
    	mov	DWORD PTR _x$[ebp+4], edx
    ; Line 9
    	mov	eax, DWORD PTR _argc$[ebp]
    	cdq
    	mov	ecx, eax
    	mov	esi, edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	imul	DWORD PTR _argc$[ebp]
    	cdq
    	push	edx
    	push	eax
    	push	esi
    	push	ecx
    	call	__allmul
    	mov	DWORD PTR _y$[ebp], eax
    	mov	DWORD PTR _y$[ebp+4], edx
    ; Line 10
    	mov	edx, DWORD PTR _argc$[ebp]
    	neg	edx
    	mov	eax, DWORD PTR _argc$[ebp]
    	neg	eax
    	mul	DWORD PTR _argc$[ebp]
    	mul	edx
    	mov	DWORD PTR _z$[ebp], eax
    	mov	DWORD PTR _z$[ebp+4], edx
    ; Line 11
    	xor	eax, eax
    	pop	esi
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the difference between unsigned and signed multiplication: while a single multiply instruction is generated for the former, a call of the (almost) undocumented compiler helper routine _allmul() is generated for the latter!
Note: for a real life example where such unoptimised code is generated, see the MSDN article Converting a time_t Value to a File Time and the MSKB article 167296!

Fix

Both preprocessor macros should have been replaced a long time ago by the intrinsic functions __emul() and __emulu() introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
#define Int32x32To64(a, b)  ((long long)(((long long)((long)(a))) * ((long)(b))))
#define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
#else
         long long __emul(int, int);
unsigned long long __emulu(unsigned int, unsigned int);
#pragma intrinsic(__emul, __emulu)
#define Int32x32To64  __emul
#define UInt32x32To64 __emulu
#endif
Note: Visual C 2017 and earlier fail to provide the inverse intrinsic functions for 64÷32-bit integer division; Visual C 2019 finally introduced them as _div64() and _udiv64().

Of course this also applies to the preprocessor macros (really: inline assembler functions) Int64ShllMod32(), Int64ShraMod32() and Int64ShrlMod32() defined in the header file WINNT.H of the Windows SDK; these too should have been replaced a long time ago by the intrinsic functions __ll_lshift(), __ll_rshift() and __ull_rshift() introduced with the Visual C 2005 compiler!

#if _MSC_VER < 1400
…
#else
unsigned long long __ll_lshift(unsigned long long, int);
         long long __ll_rshift(long long, int);
unsigned long long __ull_rshift(unsigned long long, int);
#pragma intrinsic(__ll_lshift, __ll_rshift, __ull_rshift)
#define Int64ShllMod32 __ll_lshift
#define Int64ShraMod32 __ll_rshift
#define Int64ShrlMod32 __ull_rshift
#endif
The sample code for converting from seconds since January 1, 1970, to 100 nano-seconds since January 1, 1601, should be (re)written without preprocessor macros and use the intrinsic function __emulu() instead:
#include <windows.h>

VOID EpochToFileTime(ULONG seconds, LPFILETIME pft)
{
    ULONGLONG ull = __emulu(seconds, 10000000UL) + 116444736000000000ULL;
    pft->dwLowDateTime = ull;
    pft->dwHighDateTime = ull >> 32;
}

Case 1

Demonstration

  1. Create the text file case1.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    long long __fastcall Int32x32To64(long v, long w)
    {
        return (long long) v * w;
    }
    
    long __fastcall Int32x32To64Div32(long x, long y, long z)
    {
        return Int32x32To64(x, y) / z;
    }
    
    long __fastcall Int32x32To64Rem32(long x, long y, long z)
    {
        return Int32x32To64(x, y) % z;
    }
    
    __inline
    unsigned long long __fastcall UInt32x32To64(unsigned long v, unsigned long w)
    {
        return (unsigned long long) v * w;
    }
    
    unsigned long __fastcall UInt32x32To64Div32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) / z;
    }
    
    unsigned long __fastcall UInt32x32To64Rem32(unsigned long x, unsigned long y, unsigned long z)
    {
        return UInt32x32To64(x, y) % z;
    }
  2. Generate the assembly listing file case1.asm from the source file case1.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase1.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case1.c
    case1.c(11): warning C4244: 'return' : conversion from '__int64' to 'long', possible loss of data
    case1.c(27): warning C4244: 'return' : conversion from 'unsigned __int64' to 'unsigned long', possible loss of data
  3. Display the assembly listing file case1.asm created in step 2.:

    TYPE case1.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case1.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@Int32x32To64@8
    PUBLIC	@Int32x32To64Div32@12
    PUBLIC	@Int32x32To64Rem32@12
    PUBLIC	@UInt32x32To64@8
    PUBLIC	@UInt32x32To64Div32@12
    PUBLIC	@UInt32x32To64Rem32@12
    EXTRN	__alldiv:PROC
    EXTRN	__allrem:PROC
    EXTRN	__aulldiv:PROC
    EXTRN	__aullrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64@8
    _TEXT	SEGMENT
    @Int32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 7
    	ret	0
    @Int32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 10
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 11
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__alldiv
    	pop	esi
    ; Line 12
    	ret	4
    @Int32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@Int32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @Int32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 6
    	mov	eax, ecx
    	imul	edx
    ; Line 15
    	push	esi
    ; Line 6
    	mov	esi, eax
    	mov	ecx, edx
    ; Line 16
    	idiv	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	mov	eax, DWORD PTR _z$[esp]
    	cdq
    	push	edx
    	push	eax
    	push	ecx
    	push	esi
    	call	__allrem
    	pop	esi
    ; Line 17
    	ret	0
    @Int32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64@8
    _TEXT	SEGMENT
    @UInt32x32To64@8 PROC					; COMDAT
    ; _v$ = ecx
    ; _w$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 23
    	ret	0
    @UInt32x32To64@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Div32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Div32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 27
    	div	DWORD PTR _z$[esp-4]
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aulldiv
    ; Line 28
    	ret	0
    @UInt32x32To64Div32@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@UInt32x32To64Rem32@12
    _TEXT	SEGMENT
    _z$ = 8							; size = 4
    @UInt32x32To64Rem32@12 PROC				; COMDAT
    _x$ = ecx
    _y$ = edx
    ; File c:\users\stefan\desktop\case1.c
    ; Line 22
    	mov	eax, ecx
    	mul	edx
    ; Line 32
    	div	DWORD PTR _z$[esp-4]
    	mov	eax, edx
    	push	0
    	push	DWORD PTR _z$[esp]
    	push	edx
    	push	eax
    	call	__aullrem
    ; Line 33
    	ret	0
    @UInt32x32To64Rem32@12 ENDP
    _TEXT	ENDS
    END
    While the compiler here (contrary to case 0) generates the proper code for the multiplications, it but fails to generate the corresponding proper code for the immediately following divisions.

    Also notice the difference between the signed and unsigned variants of the combined multiplication and division routines: instead to push the (properly sign-extended) divisor first and the product afterwards, the product is computed first, then moved into two (intermediate) registers which are finally pushed for the calls of the _alldiv(), _allrem(), _aulldiv() and _aullrem() compiler helper routines.

Case 2

Demonstration

  1. Create the text file case2.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    extern long long foo(void);
    extern long long bar(void);
    
    long long product(void)
    {
        return foo() * bar();
    }
  2. Generate the assembly listing file case2.asm from the source file case2.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase2.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case2.c
  3. Display the assembly listing file case2.asm created in step 2.:

    TYPE case2.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case2.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_product
    EXTRN	_foo:PROC
    EXTRN	_bar:PROC
    EXTRN	__allmul:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_product
    _TEXT	SEGMENT
    _product PROC						; COMDAT
    ; File c:\users\stefan\desktop\case2.c
    ; Line 7
    	push	esi
    	push	edi
    ; Line 8
    	call	_foo
    	mov	edi, eax
    	mov	esi, edx
    	push	edx
    	push	eax
    	call	_bar
    	push	edx
    	push	eax
    	push	esi
    	push	edi
    	call	__allmul
    	pop	edi
    	pop	esi
    ; Line 9
    	ret	0
    _product ENDP
    _TEXT	ENDS
    END
    Multiplication is commutative, so the arguments for the external routine _allmul() can be swapped, saving 6 of the 13 instructions generated, and without clobbering the registers EDI and ESI for intermediate storage.

Case 3

Demonstration

  1. Create the text file case3.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long __stdcall div(long long foo, long long bar)
    {
        return foo / bar;
    }
    
    long long __stdcall mod(long long foo, long long bar)
    {
        return foo % bar;
    }
    
    long long __stdcall mul(long long foo, long long bar)
    {
        return foo * bar;
    }
  2. Generate the assembly listing file case3.asm from the source file case3.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase3.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case3.c
  3. Display the assembly listing file case3.asm created in step 2.:

    TYPE case3.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case3.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_div@16
    PUBLIC	_mod@16
    PUBLIC	_mul@16
    EXTRN	__alldiv:PROC
    EXTRN	__allmul:PROC
    EXTRN	__allrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_div@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _div@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case3.c
    ; Line 5
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _foo$[esp+8]
    	push	DWORD PTR _foo$[esp+8]
    	call	__alldiv
    	jmp	__alldiv
    ; Line 6
    	ret	16					; 00000010H
    _div@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mod@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mod@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case3.c
    ; Line 10
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _foo$[esp+8]
    	push	DWORD PTR _foo$[esp+8]
    	call	__allrem
    	jmp	__allrem
    ; Line 11
    	ret	16					; 00000010H
    _mod@16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mul@16
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _bar$ = 16						; size = 8
    _mul@16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case3.c
    ; Line 15
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _bar$[esp]
    	push	DWORD PTR _foo$[esp+8]
    	push	DWORD PTR _foo$[esp+8]
    	call	__allmul
    	jmp	__allmul
    ; Line 16
    	ret	16					; 00000010H
    _mul@16	ENDP
    _TEXT	ENDS
    END

Case 4

Separate calls of the compiler helper routines _aulldiv(), and _aullrem() instead of a single call of _aulldvrm().

Demonstration

  1. Create the text file case4.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long __udivmoddi4(unsigned long long Dividend,
                                    unsigned long long Divisor,
                                    unsigned long long *Remainder)
    {
    #ifdef ALTERNATE
        const unsigned long long Quotient = Dividend / Divisor;
        const unsigned long long Modulus = Dividend % Divisor;
    
        if (Remainder != 0)
            *Remainder = Modulus;
    
        return Quotient;
    #else
        if (Remainder != 0)
            *Remainder = Dividend % Divisor;
    
        return Dividend / Divisor;
    #endif
    }
  2. Generate the assembly listing file case4.asm from the source file case4.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase4.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case4.c
  3. Display the assembly listing file case4.asm created in step 2.:

    TYPE case4.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case4.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___udivmoddi4
    EXTRN	__aulldiv:PROC
    EXTRN	__aullrem:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___udivmoddi4
    _TEXT	SEGMENT
    _Dividend$ = 8						; size = 8
    _Divisor$ = 16						; size = 8
    _Remainder$ = 24					; size = 4
    ___udivmoddi4 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case4.c
    ; Line 16
    	mov	eax, DWORD PTR _Dividend$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _Remainder$[esp]
    	test	esi, esi
    	je	SHORT $LN2@ull
    ; Line 17
    	push	DWORD PTR _Divisor$[esp+4]
    	push	DWORD PTR _Divisor$[esp+4]
    	push	DWORD PTR _Dividend$[esp+12]
    	push	eax
    	call	__aullrem
    	mov	DWORD PTR [esi], eax
    	mov	eax, DWORD PTR _Dividend$[esp]
    	mov	DWORD PTR [esi+4], edx
    $LN2@ull:
    ; Line 19
    	push	DWORD PTR _Divisor$[esp+4]
    	push	DWORD PTR _Divisor$[esp+4]
    	push	DWORD PTR _Dividend$[esp+12]
    	push	eax
    	call	__aulldiv
    	pop	esi
    ; Line 21
    	ret	0
    ___udivmoddi4 ENDP
    _TEXT	ENDS
    END
  4. Generate another assembly listing file case4.asm from the source file case4.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase4.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case4.c
  5. Display the assembly listing file case4.asm created in step 4.:

    TYPE case4.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case4.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___udivmoddi4
    EXTRN	__aulldvrm:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___udivmoddi4
    _TEXT	SEGMENT
    _Dividend$ = 8						; size = 8
    _Divisor$ = 16						; size = 8
    _Remainder$ = 24					; size = 4
    ___udivmoddi4 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case4.c
    ; Line 6
    	push	ebx
    	push	esi
    ; Line 8
    	push	DWORD PTR _Divisor$[esp+8]
    	push	DWORD PTR _Divisor$[esp+8]
    	push	DWORD PTR _Dividend$[esp+16]
    	push	DWORD PTR _Dividend$[esp+16]
    	call	__aulldvrm
    ; Line 11
    	mov	esi, DWORD PTR _Remainder$[esp+4]
    	test	esi, esi
    	je	SHORT $LN2@ull
    ; Line 12
    	mov	DWORD PTR [esi], ecx
    	mov	DWORD PTR [esi+4], ebx
    $LN2@ull:
    ; Line 21
    	pop	esi
    	pop	ebx
    	ret	0
    ___udivmoddi4 ENDP
    _TEXT	ENDS
    END

Case 5

Division by powers of 2 is only optimised for constant divisors.

Demonstration

  1. Create the text file case5.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifndef SIGNED
    unsigned dividebypowerof2(unsigned number, unsigned exponent)
    {
        return number / (1U << exponent);
    }
    
    unsigned modulopowerof2(unsigned number, unsigned exponent)
    {
        return number % (1U << exponent);
    }
    
    unsigned quotient(unsigned argument)
    {
        return dividebypowerof2(argument, 9);
    }
    
    unsigned remainder(unsigned argument)
    {
        return modulopowerof2(argument, 9);
    }
    #else
    signed dividebypowerof2(signed number, unsigned exponent)
    {
        return number / (1 << exponent);
    }
    
    signed modulopowerof2(signed number, unsigned exponent)
    {
        return number % (1 << exponent);
    }
    
    signed quotient(signed argument)
    {
        return dividebypowerof2(argument, 9);
    }
    
    signed remainder(signed argument)
    {
        return modulopowerof2(argument, 9);
    }
    #endif
  2. Generate the assembly listing file case5.asm from the source file case5.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase5.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case5.c
  3. Display the assembly listing file case5.asm created in step 2.:

    TYPE case5.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case5.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_dividebypowerof2
    PUBLIC	_modulopowerof2
    PUBLIC	_quotient
    PUBLIC	_remainder
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_dividebypowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _dividebypowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 6
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	xor	edx, edx
    	mov	eax, DWORD PTR _number$[esp-4]
    	push	esi
    	mov	esi, 1
    	shl	esi, cl
    	div	esi
    	pop	esi
    	shr	eax, cl
    ; Line 7
    	ret	0
    _dividebypowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_quotient
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _quotient PROC						; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 6
    	mov	eax, DWORD PTR _argument$[esp-4]
    	shr	eax, 9
    ; Line 17
    	ret	0
    _quotient ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_modulopowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _modulopowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 11
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	xor	edx, edx
    	or	eax, -1
    	shl	eax, cl
    	mov	eax, DWORD PTR _number$[esp-4]
    	and	eax, DWORD PTR _number$[esp-4]
    	push	esi
    	mov	esi, 1
    	shl	esi, cl
    	div	esi
    	pop	esi
    	mov	eax, edx
    ; Line 12
    	ret	0
    _modulopowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_remainder
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _remainder PROC						; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 11
    	mov	eax, DWORD PTR _argument$[esp-4]
    	and	eax, 511				; 000001ffH
    ; Line 22
    	ret	0
    _remainder ENDP
    _TEXT	ENDS
    END
  4. Generate another assembly listing file case5.asm from the source file case5.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro SIGNED defined on the command line:

    CL.EXE /Bv /c /DSIGNED /Fa /FoNUL: /Gy /Ox /Tccase5.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case5.c
  5. Display the assembly listing file case5.asm created in step 4.:

    TYPE case5.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case5.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_dividebypowerof2
    PUBLIC	_modulopowerof2
    PUBLIC	_quotient
    PUBLIC	_remainder
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_dividebypowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _dividebypowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 26
    	mov	eax, DWORD PTR _number$[esp-4]
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	push	esi
    	mov	esi, 1
    	cdq
    	shl	esi, cl
    	idiv	esi
    	pop	esi
    	not	ecx
    	shr	edx, 1
    	shr	edx, cl
    	not	ecx
    	add	eax, edx
    	sar	eax, cl
    ; Line 27
    	ret	0
    _dividebypowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_quotient
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _quotient PROC						; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 26
    	mov	eax, DWORD PTR _argument$[esp-4]
    	cdq
    	and	edx, 511				; 000001ffH
    	shr	edx, 23					; 00000017H
    	add	eax, edx
    	sar	eax, 9
    ; Line 37
    	ret	0
    _quotient ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_modulopowerof2
    _TEXT	SEGMENT
    _number$ = 8						; size = 4
    _exponent$ = 12						; size = 4
    _modulopowerof2 PROC					; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 31
    	mov	eax, DWORD PTR _number$[esp-4]
    	mov	ecx, DWORD PTR _exponent$[esp-4]
    	push	ebx
    	push	esi
    	mov	esi, 1
    	cdq
    	shl	esi, cl
    	idiv	esi
    	pop	esi
    	mov	eax, edx
    	xor	ebx, ebx
    	shld	ebx, edx, cl
    	or	edx, -1
    	add	ebx, eax
    	shl	edx, cl
    	and	edx, ebx
    	sub	eax, edx
    	pop	ebx
    ; Line 32
    	ret	0
    _modulopowerof2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_remainder
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _remainder PROC						; COMDAT
    ; File c:\users\stefan\desktop\case5.c
    ; Line 31
    	mov	eax, DWORD PTR _argument$[esp-4]
    	and	eax, -2147483137			; 800001ffH
    	jns	SHORT $LN5@remainder
    	dec	eax
    	or	eax, -512				; fffffe00H
    	inc	eax
    $LN5@remainder:
    	cdq
    	shr	edx, 23					; 00000017H
    	add	edx, eax
    	and	edx, -512				; fffffe00H
    	sub	eax, edx
    ; Line 42
    	ret	0
    _remainder ENDP
    _TEXT	ENDS
    END

Case 6

Demonstration

  1. Create the text file case6.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long foo(long long foo)
    {
        foo <<= 1;
        foo += 1;
        foo |= 1;
    
        return foo;
    }
    
    long long bar(long long bar)
    {
        bar += bar;
        bar += 1;
        bar |= 1;
    
        return bar;
    }
  2. Generate the assembly listing file case6.asm from the source file case6.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase6.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case6.c
  3. Display the assembly listing file case6.asm created in step 2.:

    TYPE case6.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case6.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_foo
    PUBLIC	_bar
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_foo
    _TEXT	SEGMENT
    _foo$ = 8						; size = 8
    _foo	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case6.c
    ; Line 5
    	mov	eax, DWORD PTR _foo$[esp-4]
    	mov	edx, DWORD PTR _foo$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 6
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 10
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_bar
    _TEXT	SEGMENT
    _bar$ = 8						; size = 8
    _bar	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case6.c
    ; Line 14
    	mov	eax, DWORD PTR _bar$[esp-4]
    	mov	edx, DWORD PTR _bar$[esp]
    	shld	edx, eax, 1
    	add	eax, eax
    	adc	edx, edx
    ; Line 15
    	add	eax, 1
    	adc	edx, 0
    	inc	eax
    ; Line 19
    	ret	0
    _foo	ENDP
    _TEXT	ENDS
    END
    While the optimiser recognises that the addition of 1 yields an odd number and therefore generates no code for the logical or, it but fails to recognise that both the shift of foo and the addition of bar to itself yield an even number, so the following addition of 1 can’t produce a carry, and an addition with carry ADC instruction is nonsense!

Case 7

Demonstration

  1. Create the text file case7.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long add(unsigned long low, unsigned long high)
    {
        return low + ((unsigned long long) high << 32);
    }
    
    unsigned long long or(unsigned long low, unsigned long high)
    {
        return low | ((unsigned long long) high << 32);
    }
    
    unsigned long long alias(unsigned long low, unsigned long high)
    {
        union
        {
            unsigned long      ul[2];
            unsigned long long ull;
        } dummy = {low, high};
    
        return dummy.ull;
    }
  2. Generate the assembly listing file case7.asm from the source file case7.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase7.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case7.c
  3. Display the assembly listing file case7.asm created in step 2.:

    TYPE case7.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case7.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_add
    PUBLIC	_or
    PUBLIC	_alias
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_add
    _TEXT	SEGMENT
    _low$ = 8						; size = 4
    _high$ = 12						; size = 4
    _add	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case7.c
    ; Line 5
    	mov	edx, DWORD PTR _high$[esp-4]
    	xor	eax, eax
    	add	eax, DWORD PTR _low$[esp-4]
    	adc	edx, 0
    	add	eax, DWORD PTR _low$[esp-4]
    ; Line 6
    	ret	0
    _add	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_or
    _TEXT	SEGMENT
    _low$ = 8						; size = 4
    _high$ = 12						; size = 4
    _or	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case7.c
    ; Line 10
    	mov	edx, DWORD PTR _high$[esp-4]
    	xor	eax, eax
    	or	eax, DWORD PTR _low$[esp-4]
    	add	eax, DWORD PTR _low$[esp-4]
    ; Line 11
    	ret	0
    _or	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_alias
    _TEXT	SEGMENT
    _low$ = 8						; size = 4
    _high$ = 12						; size = 4
    _alias	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case7.c
    ; Line 21
    	mov	eax, DWORD PTR _low$[esp-4]
    	mov	edx, DWORD PTR _high$[esp-4]
    ; Line 22
    	ret	0
    _alias	ENDP
    _TEXT	ENDS
    END
    The optimiser does not recognise the expressions commonly used for combining two doublewords into a single quadword.

Case 8

Demonstration

  1. Create the text file case8.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long rotate32(unsigned long value, unsigned int count)
    {
        return (value << count) | (value >> (32 - count));
    }
    
    unsigned long rotate32x(unsigned long value, unsigned int count)
    {
        return (value << count) ^ (value >> (32 - count));
    }
    
    unsigned long long rotate64x(unsigned long long value, unsigned int count)
    {
        return (value << count) ^ (value >> (64 - count));
    }
    
    unsigned long long rotate64(unsigned long long value, unsigned int count)
    {
        return (value << count) | (value >> (64 - count));
    }
    
    unsigned long long intrinsic(unsigned long long value, unsigned int count)
    {
        return _rotl64(value, count);
    }
  2. Generate the assembly listing file case8.asm from the source file case8.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase8.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case8.c
  3. Display the assembly listing file case8.asm created in step 2.:

    TYPE case8.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case8.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_rotate32
    PUBLIC	_rotate32x
    PUBLIC	_rotate64x
    PUBLIC	_rotate64
    PUBLIC	_intrinsic
    EXTRN	__allshl:PROC
    EXTRN	__aullshr:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case8.c
    ; Line 5
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 6
    	ret	0
    _rotate32 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate32x
    _TEXT	SEGMENT
    _value$ = 8						; size = 4
    _count$ = 12						; size = 4
    _rotate32x PROC						; COMDAT
    ; File c:\users\stefan\desktop\case8.c
    ; Line 9
    	push	esi
    ; Line 10
    	mov	esi, DWORD PTR _value$[esp]
    	mov	ecx, 32					; 00000020H
    	sub	ecx, DWORD PTR _count$[esp]
    	mov	eax, esi
    	shr	eax, cl
    	mov	ecx, DWORD PTR _count$[esp]
    	shl	esi, cl
    	xor	eax, esi
    	pop	esi
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, DWORD PTR _count$[esp-4]
    	rol	eax, cl
    ; Line 11
    	ret	0
    _rotate32x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64x
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64x PROC						; COMDAT
    ; File c:\users\stefan\desktop\case8.c
    ; Line 15
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	xor	edx, ebp
    	xor	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 16
    	ret	0
    _rotate64x ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_rotate64
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _rotate64 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case8.c
    ; Line 20
    	mov	eax, DWORD PTR _value$[esp-4]
    	mov	ecx, 64					; 00000040H
    	sub	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	ebx
    	push	ebp
    	call	__aullshr
    	mov	ecx, DWORD PTR _count$[esp+4]
    	mov	ebx, eax
    	mov	eax, DWORD PTR _value$[esp+4]
    	mov	ebp, edx
    	mov	edx, DWORD PTR _value$[esp+8]
    	call	__allshl
    	or	edx, ebp
    	or	eax, ebx
    	pop	ebp
    	mov	ecx, DWORD PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp-4]
    	test	cl, 32					; 00000020H
    	jz	SHORT @F
    	xchg	eax, edx
    @@:
    	test	cl, 31					; 0000001fH
    	jz	SHORT @F
    	push	ebx
    	mov	ebx, edx
    	shld	edx, eax, cl
    	shld	eax, ebx, cl
    	pop	ebx
    @@:
    ; Line 21
    	ret	0
    _rotate64 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_intrinsic
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _count$ = 16						; size = 4
    _intrinsic PROC						; COMDAT
    ; File c:\users\stefan\desktop\case8.c
    ; Line 25
    	mov	cl, BYTE PTR _count$[esp-4]
    	mov	edx, DWORD PTR _value$[esp]
    	push	esi
    	mov	esi, DWORD PTR _value$[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	mov	esi, edx
    	test	cl, 32					; 00000020H
    	cmovnz	edx, eax
    	cmovnz	eax, esi
    	cmovnz	esi, edx
    	je	SHORT $LN3@intrinsic
    	mov	eax, esi
    	mov	esi, edx
    	mov	edx, eax
    $LN3@intrinsic:
    	mov	eax, esi
    	and	cl, 31					; 0000001fH
    	je	SHORT $LN4@intrinsic
    	shld	eax, edx, cl
    	shld	edx, esi, cl
    $LN4@intrinsic:
    ; Line 26
    	pop	esi
    	ret	0
    _intrinsic ENDP
    _TEXT	ENDS
    END
    Except for the first function, the optimiser fails to recognise the commonly used expressions for rotate operations!
    Also notice the unoptimised code generated for (not only swapping the register EDX with ESI in) the intrinsic function _rotl64().

Case 9

Horrible load of code generated for swapping the bytes of a 64-bit operand instead of a single BSWAP instruction or two MOVBE instructions.

Demonstration

  1. Create the text file case9.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifdef ALTERNATE
    unsigned short swap16(unsigned short us)
    {
        return ((us & 0xFF00U) >> 8)
             | ((us & 0x00FFU) << 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return ((ul & 0xFF000000UL) >> 3 * 8)
             | ((ul & 0x00FF0000UL) >>     8)
             | ((ul & 0x0000FF00UL) <<     8)
             | ((ul & 0x000000FFUL) << 3 * 8);
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return ((ull & 0xFF00000000000000ULL) >> 7 * 8)
             | ((ull & 0x00FF000000000000ULL) >> 5 * 8)
             | ((ull & 0x0000FF0000000000ULL) >> 3 * 8)
             | ((ull & 0x000000FF00000000ULL) >>     8)
             | ((ull & 0x00000000FF000000ULL) <<     8)
             | ((ull & 0x0000000000FF0000ULL) << 3 * 8)
             | ((ull & 0x000000000000FF00ULL) << 5 * 8)
             | ((ull & 0x00000000000000FFULL) << 7 * 8);
    }
    #else
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) & 0xFF00U
             | (us >> 8) & 0x00FFU;
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (ul << 3 * 8) & 0xFF000000UL
             | (ul <<     8) & 0x00FF0000UL
             | (ul >>     8) & 0x0000FF00UL
             | (ul >> 3 * 8) & 0x000000FFUL;
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (ull << 7 * 8) & 0xFF00000000000000ULL
             | (ull << 5 * 8) & 0x00FF000000000000ULL
             | (ull << 3 * 8) & 0x0000FF0000000000ULL
             | (ull <<     8) & 0x000000FF00000000ULL
             | (ull >>     8) & 0x00000000FF000000ULL
             | (ull >> 3 * 8) & 0x0000000000FF0000ULL
             | (ull >> 5 * 8) & 0x000000000000FF00ULL
             | (ull >> 7 * 8) & 0x00000000000000FFULL;
    }
    #endif
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing file case9.asm from the source file case9.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case9.c
  3. Display the assembly listing file case9.asm created in step 2.:

    TYPE case9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 32
    	movzx	edx, cx
    	mov	eax, edx
    	shr	edx, 8
    	shl	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 34
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 38
    	bswap	ecx
    	mov	eax, ecx
    ; Line 42
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 45
    	mov	r8, rcx
    ; Line 46
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 54
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  4. Generate another assembly listing file case9.asm from the source file case9.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case9.c
  5. Display the assembly listing file case9.asm created in step 4.:

    TYPE case9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 32
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 34
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 38
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 42
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 46
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 54
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  6. Repeat the previous steps with the alternate implementation; generate another assembly listing file case9.asm from the source file case9.c created in step 1., now using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case9.c
  7. Display the assembly listing file case9.asm created in step 6.:

    TYPE case9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 6
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 8
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 12
    	bswap	ecx
    	mov	eax, ecx
    ; Line 16
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 19
    	mov	r8, rcx
    ; Line 20
    	bswap	rcx
    	mov	rax, rcx
    	mov	r9, rcx
    	mov	rax, 71776119061217280			; 00ff000000000000H
    	mov	rdx, r8
    	and	r9, rax
    	and	edx, 65280				; 0000ff00H
    	mov	rax, rcx
    	shr	rax, 16
    	or	r9, rax
    	mov	rax, rcx
    	shr	r9, 16
    	mov	rcx, 280375465082880			; 0000ff0000000000H
    	and	rax, rcx
    	mov	rcx, 1095216660480			; 000000ff00000000H
    	or	r9, rax
    	mov	rax, r8
    	and	rax, rcx
    	shr	r9, 16
    	or	r9, rax
    	mov	rcx, r8
    	mov	rax, r8
    	shr	r9, 8
    	shl	rax, 16
    	and	ecx, 16711680				; 00ff0000H
    	or	rdx, rax
    	mov	eax, -16777216				; ff000000H
    	and	rax, r8
    	shl	rdx, 16
    	or	rdx, rcx
    	shl	rdx, 16
    	or	rax, rdx
    	shl	rax, 8
    	or	rax, r9
    ; Line 28
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 32 (in words: thirty-two) instructions for the function swap64() instead of only a single (in words: one) BSWAP instruction!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternative form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

  8. Generate another assembly listing file case9.asm from the source file case9.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case9.c
  9. Display the assembly listing file case9.asm created in step 8.:

    TYPE case9.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case9.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 6
    	movbe	ax, DWORD PTR _ul$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 8
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 12
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	eax, DWORD PTR _ul$[esp-4]
    	bswap	eax
    ; Line 16
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case9.c
    ; Line 20
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	movbe	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	push	edi
    	mov	edi, DWORD PTR _ull$[esp+12]
    	mov	ebx, edx
    	and	ebx, 16711680				; 00ff0000H
    	mov	eax, edi
    	shrd	eax, ecx, 16
    	xor	ebp, ebp
    	mov	esi, edi
    	or	ebp, eax
    	shr	ecx, 16					; 00000010H
    	or	ebx, ecx
    	mov	eax, edx
    	shrd	ebp, ebx, 16
    	and	eax, 65280				; 0000ff00H
    	and	esi, 65280				; 0000ff00H
    	shr	ebx, 16					; 00000010H
    	xor	ecx, ecx
    	or	ebx, eax
    	movzx	eax, dl
    	shrd	ebp, ebx, 16
    	shr	ebx, 16					; 00000010H
    	or	ebx, eax
    	mov	eax, edi
    	shld	edx, eax, 16
    	shrd	ebp, ebx, 8
    	shl	eax, 16					; 00000010H
    	or	edx, ecx
    	or	esi, eax
    	shr	ebx, 8
    	shld	edx, esi, 16
    	mov	eax, edi
    	and	edi, -16777216				; ff000000H
    	shl	esi, 16					; 00000010H
    	and	eax, 16711680				; 00ff0000H
    	or	esi, eax
    	shld	edx, esi, 16
    	shl	esi, 16					; 00000010H
    	or	esi, edi
    	shld	edx, esi, 8
    	pop	edi
    	shl	esi, 8
    	or	edx, ebx
    	or	ebp, esi
    	pop	esi
    	mov	eax, ebp
    	pop	ebp
    	pop	ebx
    ; Line 28
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    END
    Note: the assembly listing shows 52 (in words: fifty-two) instructions for the function swap64() instead of only 2 (in words: two) MOVBE instructions!

    While the optimiser recognises the commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) with a 32-bit operand in an alternative form too and generates a single BSWAP instruction then, it fails to recognise these expressions with a 16-bit or a 64-bit operand.

Case 10

Awful load of code generated for swapping the bytes of a 32-bit or 64-bit operand instead of BSWAP or MOVBE instructions.

Demonstration

  1. Create the text file case10.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    unsigned short swap16(unsigned short us)
    {
        return (us << 8) | (us >> 8);
    }
    
    unsigned long swap32(unsigned long ul)
    {
        return (unsigned long) swap16((unsigned short) ul) << 16
             | (unsigned long) swap16((unsigned short) (ul >> 16));
    }
    
    unsigned long long swap64(unsigned long long ull)
    {
        return (unsigned long long) swap32((unsigned long) ull) << 32
             | (unsigned long long) swap32((unsigned long) (ull >> 32));
    }
    
    unsigned short swap16alt(unsigned short us)
    {
        return ((us >> 8) & 0x00FFU)
             | ((us       & 0x00FFU) << 8);
    }
    
    unsigned long swap32alt(unsigned long ul)
    {
        ul = ((ul >> 8) & 0x00FF00FFUL)
           | ((ul       & 0x00FF00FFUL) << 8);
    
        return (ul << 16) | (ul >> 16);
    }
    
    unsigned long long swap64alt(unsigned long long ull)
    {
        ull = ((ull >> 8) & 0x00FF00FF00FF00FFULL)
            | ((ull       & 0x00FF00FF00FF00FFULL) << 8);
        ull = ((ull >> 16) & 0x0000FFFF0000FFFFULL)
            | ((ull        & 0x0000FFFF0000FFFFULL) << 16);
    
        return (ull << 32) | (ull >> 32);
    }
    
    unsigned long swap32rot(unsigned long ul)
    {
        return _lrotr(ul, 8) & 0xFF00FF00UL
             | _lrotl(ul, 8) & 0x00FF00FFUL;
    }
    
    unsigned long long swap64rot(unsigned long long ull)
    {
        ull = _rotr64(ull, 16) & 0xFFFF0000FFFF0000ULL
            | _rotl64(ull, 16) & 0x0000FFFF0000FFFFULL;
        ull = _rotr64(ull, 8) & 0x00FF00FF00FF00FFULL
            | _rotl64(ull, 8) & 0xFF00FF00FF00FF00ULL;
    
        return ull;
    }
    Note: better use the appropriate intrinsic function _byteswap_ushort(), _byteswap_ulong() or _byteswap_uint64() instead of such expressions!
  2. Generate the assembly listing file case10.asm from the source file case10.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase10.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case10.c
  3. Display the assembly listing file case10.asm created in step 2.:

    TYPE case10.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	swap16
    PUBLIC	swap32
    PUBLIC	swap64
    PUBLIC	swap16alt
    PUBLIC	swap32alt
    PUBLIC	swap64alt
    PUBLIC	swap32rot
    PUBLIC	swap64rot
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16
    _TEXT	SEGMENT
    us$ = 8
    swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 6
    	movzx	edx, cx
    	mov	eax, edx
    	shl	dx, 8
    	shr	eax, 8
    	or	ax, dx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 7
    	ret	0
    swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32
    _TEXT	SEGMENT
    ul$ = 8
    swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 11
    	mov	eax, ecx
    ; Line 6
    	rol	cx, 8
    ; Line 11
    	shr	eax, 16
    ; Line 6
    	rol	ax, 8
    ; Line 11
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16
    	or	eax, ecx
    	bswap	eax
    ; Line 13
    	ret	0
    swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64
    _TEXT	SEGMENT
    ull$ = 8
    swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 11
    	mov	eax, ecx
    ; Line 17
    	mov	r9, rcx
    ; Line 11
    	shr	eax, 16
    ; Line 6
    	rol	ax, 8
    	movzx	r8d, ax
    	rol	cx, 8
    	movzx	eax, cx
    	shl	rax, 16
    ; Line 17
    	or	rax, r8
    	shr	r9, 32					; 00000020H
    ; Line 6
    	movzx	ecx, r9w
    ; Line 17
    	shl	rax, 16
    ; Line 6
    	rol	cx, 8
    	movzx	edx, cx
    ; Line 17
    	or	rax, rdx
    ; Line 11
    	shr	r9d, 16
    ; Line 6
    	rol	r9w, 8
    ; Line 17
    	shl	rax, 16
    ; Line 6
    	movzx	ecx, r9w
    ; Line 17
    	or	rax, rcx
    	mov	rax, rcx
    	bswap	rax
    ; Line 19
    	ret	0
    swap64	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap16alt
    _TEXT	SEGMENT
    us$ = 8
    swap16alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 23
    	movzx	edx, cx
    	movzx	eax, dl
    	shr	edx, 8
    	shl	eax, 8
    	or	eax, edx
    	movzx	eax, cx
    	xchg	ah, al
    ; Line 25
    	ret	0
    swap16alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32alt
    _TEXT	SEGMENT
    ul$ = 8
    swap32alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 32
    	mov	eax, ecx
    	mov	edx, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	shl	edx, 8
    	xor	eax, edx
    	and	eax, 16711935				; 00ff00ffH
    	xor	eax, ecx
    	rol	eax, 16
    	bswap	eax
    ; Line 33
    	ret	0
    swap32alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64alt
    _TEXT	SEGMENT
    ull$ = 8
    swap64alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 37
    	mov	rdx, rcx
    	mov	rax, rcx
    	bswap	rax
    	shl	rcx, 8
    	shl	rax, 8
    	shr	rdx, 8
    	xor	rdx, rax
    	mov	rax, 71777214294589695			; 00ff00ff00ff00ffH
    	and	rdx, rax
    	xor	rdx, rcx
    ; Line 42
    	mov	rax, rdx
    	mov	rcx, rdx
    	shl	rdx, 16
    	shr	rax, 16
    	shl	rcx, 16
    	xor	rax, rcx
    	mov	rcx, 281470681808895			; 0000ffff0000ffffH
    	and	rax, rcx
    	xor	rax, rdx
    	rol	rax, 32					; 00000020H
    ; Line 43
    	ret	0
    swap64alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap32rot
    _TEXT	SEGMENT
    ul$ = 8
    swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 47
    	mov	eax, ecx
    	rol	ecx, 8
    	ror	eax, 8
    	and	ecx, 16711935				; 00ff00ffH
    	and	eax, -16711936				; ff00ff00H
    	or	eax, ecx
    	bswap	eax
    ; Line 49
    	ret	0
    swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	swap64rot
    _TEXT	SEGMENT
    ull$ = 8
    swap64rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 53
    	mov	rax, rcx
    	mov	rdx, rcx
    	mov	rax, -281470681808896			; ffff0000ffff0000H
    	ror	rdx, 16
    	and	rdx, rax
    	rol	rcx, 16
    	mov	rax, 281470681808895			; 0000ffff0000ffffH
    	and	rcx, rax
    	or	rdx, rcx
    ; Line 56
    	mov	rcx, -71777214294589696			; ff00ff00ff00ff00H
    	mov	rax, rdx
    	rol	rax, 8
    	and	rax, rcx
    	ror	rdx, 8
    	mov	rcx, 71777214294589695			; 00ff00ff00ff00ffH
    	and	rdx, rcx
    	or	rax, rdx
    	bswap	rax
    ; Line 59
    	ret	0
    swap64rot ENDP
    _TEXT	ENDS
    END
    Note: instead of just 2 instructions for each of the 4 functions, the assembly listing shows 20 (in words: twenty) instructions for the function swap64(), 8 instructions for the function swap32(), 5 instructions for the function swap16(), and 6 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

  4. Generate another assembly listing file case10.asm from the source file case10.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase10.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case10.c
  5. Display the assembly listing file case10.asm created in step 4.:

    TYPE case10.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case10.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_swap16
    PUBLIC	_swap32
    PUBLIC	_swap64
    PUBLIC	_swap16alt
    PUBLIC	_swap32alt
    PUBLIC	_swap64alt
    PUBLIC	_swap32rot
    PUBLIC	_swap64rot
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 6
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	mov	eax, ecx
    	shl	ecx, 8
    	shr	eax, 8
    	or	eax, ecx
    ; Line 7
    	ret	0
    _swap16	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 11
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 6
    	rol	cx, 8
    	rol	ax, 8
    ; Line 11
    	movzx	ecx, cx
    	movzx	eax, ax
    	shl	ecx, 16					; 00000010H
    	or	eax, ecx
    ; Line 13
    	ret	0
    _swap32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 17
    	movbe	eax, DWORD PTR _ull$[esp]
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, DWORD PTR _ull$[esp-4]
    ; Line 11
    	mov	eax, ecx
    	shr	eax, 16					; 00000010H
    ; Line 6
    	rol	ax, 8
    ; Line 17
    	movzx	eax, ax
    	cdq
    	push	ebx
    	mov	ebx, DWORD PTR _ull$[esp+4]
    	push	esi
    	mov	esi, eax
    ; Line 6
    	rol	cx, 8
    ; Line 16
    	push	edi
    ; Line 17
    	mov	edi, edx
    	movzx	eax, cx
    	cdq
    	shld	edx, eax, 16
    	shl	eax, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 6
    	mov	ax, bx
    ; Line 17
    	shld	edi, esi, 16
    ; Line 6
    	rol	ax, 8
    ; Line 17
    	movzx	eax, ax
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edi, edx
    	or	esi, eax
    ; Line 11
    	shr	ebx, 16					; 00000010H
    ; Line 6
    	rol	bx, 8
    ; Line 17
    	shld	edi, esi, 16
    	movzx	eax, bx
    	cdq
    	shl	esi, 16					; 00000010H
    	or	edx, edi
    	pop	edi
    	or	eax, esi
    	pop	esi
    	pop	ebx
    ; Line 19
    	ret	0
    _swap64	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap16alt
    _TEXT	SEGMENT
    _us$ = 8						; size = 2
    _swap16alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 23
    	movbe	ax, WORD PTR _us$[esp-4]
    	movzx	ecx, WORD PTR _us$[esp-4]
    	movzx	eax, cl
    	shl	eax, 8
    	shr	ecx, 8
    	or	eax, ecx
    ; Line 25
    	ret	0
    _swap16alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32alt
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 32
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	shr	eax, 8
    	mov	edx, ecx
    	shl	edx, 8
    	xor	eax, edx
    	and	eax, 16711935				; 00ff00ffH
    	shl	ecx, 8
    	xor	eax, ecx
    	rol	eax, 16					; 00000010H
    ; Line 33
    	ret	0
    _swap32alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64alt
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64alt PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 37
    	movbe	eax, DWORD PTR _ull$[esp]
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	mov	eax, DWORD PTR _ull$[esp-4]
    	push	ebx
    	push	esi
    	mov	esi, DWORD PTR _ull$[esp+8]
    	push	edi
    	mov	edx, esi
    	mov	ebx, esi
    	mov	ecx, eax
    	shrd	ecx, edx, 8
    	mov	edi, eax
    	shld	ebx, edi, 8
    	shld	esi, eax, 8
    	shl	edi, 8
    	xor	ecx, edi
    	shr	edx, 8
    	xor	edx, ebx
    	shl	eax, 8
    	and	edx, 16711935				; 00ff00ffH
    	xor	edx, esi
    	and	ecx, 16711935				; 00ff00ffH
    	xor	ecx, eax
    ; Line 40
    	mov	edi, edx
    	mov	esi, ecx
    	shrd	esi, edi, 16
    	shr	edi, 16					; 00000010H
    	mov	eax, edi
    	mov	ebx, edx
    	mov	edi, ecx
    	shld	ebx, edi, 16
    	shld	edx, ecx, 16
    	xor	eax, ebx
    	shl	edi, 16					; 00000010H
    	xor	esi, edi
    	and	eax, 65535				; 0000ffffH
    	and	esi, 65535				; 0000ffffH
    	shl	ecx, 16					; 00000010H
    	xor	eax, edx
    	xor	esi, ecx
    ; Line 42
    	xor	edx, edx
    	pop	edi
    	or	edx, esi
    	xor	ecx, ecx
    	pop	esi
    	or	eax, ecx
    	pop	ebx
    ; Line 43
    	ret	0
    _swap64alt ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap32rot
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _swap32rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 47
    	movbe	eax, DWORD PTR _ul$[esp-4]
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	eax, ecx
    	ror	eax, 8
    	rol	ecx, 8
    	and	eax, -16711936				; ff00ff00H
    	and	ecx, 16711935				; 00ff00ffH
    	or	eax, ecx
    ; Line 49
    	ret	0
    _swap32rot ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_swap64rot
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _swap64rot PROC						; COMDAT
    ; File c:\users\stefan\desktop\case10.c
    ; Line 53
    	movbe	eax, DWORD PTR _ull$[esp]
    	movbe	edx, DWORD PTR _ull$[esp-4]
    	mov	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, DWORD PTR _ull$[esp]
    	push	ebx
    	push	esi
    	mov	eax, ecx
    	mov	esi, edx
    	shrd	esi, eax, 16
    	shrd	eax, edx, 16
    	push	edi
    	mov	edi, eax
    	mov	eax, edx
    	shld	eax, ecx, 16
    	shld	ecx, edx, 16
    	xor	eax, esi
    	xor	ecx, edi
    	and	eax, 65535				; 0000ffffH
    	xor	eax, esi
    	and	ecx, 65535				; 0000ffffH
    	xor	ecx, edi
    ; Line 56
    	mov	esi, eax
    	mov	edx, ecx
    	shrd	esi, edx, 8
    	shrd	edx, eax, 8
    	mov	ebx, edx
    	mov	edx, eax
    	mov	eax, ecx
    	mov	ecx, edx
    	shld	ecx, eax, 8
    	shld	eax, edx, 8
    	mov	edi, eax
    ; Line 58
    	mov	edx, eax
    	xor	edx, ebx
    	and	edx, 16711935				; 00ff00ffH
    	mov	eax, ecx
    	xor	eax, esi
    	xor	edx, edi
    	pop	edi
    	and	eax, 16711935				; 00ff00ffH
    	pop	esi
    	xor	eax, ecx
    	pop	ebx
    ; Line 59
    	ret	0
    _swap64rot ENDP
    _TEXT	ENDS
    END
    Note: instead of just 1 or 2 MOVBE instructions for each of the 4 functions, the assembly listing shows 38 (in words: thirty-eight) instructions for the function swap64(), 9 instructions for the function swap32(), 5 instructions for the function swap16(), and 7 instructions for the function swap32rot().

    The optimiser fails to recognise all these commonly used expressions to convert from little endian byte-order to big endian byte-order (and vice versa) for all operand sizes!
    Additionally the commonly used expression for a rotate operation is not recognised for a 16-bit operand.

Case 11

Demonstration

  1. Create the text file case11.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long lreverse(unsigned long ul)
    {
    #ifndef ALTERNATE
        ul = ((ul & 0xAAAAAAAAUL) >> 1)
           | ((ul & 0x55555555UL) << 1);
        ul = ((ul & 0xCCCCCCCCUL) >> 2)
           | ((ul & 0x33333333UL) << 2);
        ul = ((ul & 0xF0F0F0F0UL) >> 4)
           | ((ul & 0x0F0F0F0FUL) << 4);
        ul = ((ul & 0xFF00FF00UL) >> 8)
           | ((ul & 0x00FF00FFUL) << 8);
        ul = ((ul & 0xFFFF0000UL) >> 16)
           | ((ul & 0x0000FFFFUL) << 16);
    #else
        ul = ((ul >> 1) & 0x55555555UL)
           | ((ul << 1) & 0xAAAAAAAAUL);
        ul = ((ul >> 2) & 0x33333333UL)
           | ((ul << 2) & 0xCCCCCCCCUL);
        ul = ((ul >> 4) & 0x0F0F0F0FUL)
           | ((ul << 4) & 0xF0F0F0F0UL);
        ul = ((ul >> 8) & 0x00FF00FFUL)
           | ((ul << 8) & 0xFF00FF00UL);
        ul = ((ul >> 16) & 0x0000FFFFUL)
           | ((ul << 16) & 0xFFFF0000UL);
    #endif
        return ul;
    }
    
    unsigned long long llreverse(unsigned long long ull)
    {
    #ifndef ALTERNATE
        ull = ((ull & 0xAAAAAAAAAAAAAAAAULL) >> 1)
            | ((ull & 0x5555555555555555ULL) << 1);
        ull = ((ull & 0xCCCCCCCCCCCCCCCCULL) >> 2)
            | ((ull & 0x3333333333333333ULL) << 2);
        ull = ((ull & 0xF0F0F0F0F0F0F0F0ULL) >> 4)
            | ((ull & 0x0F0F0F0F0F0F0F0FULL) << 4);
        ull = ((ull & 0xFF00FF00FF00FF00ULL) >> 8)
            | ((ull & 0x00FF00FF00FF00FFULL) << 8);
        ull = ((ull & 0xFFFF0000FFFF0000ULL) >> 16)
            | ((ull & 0x0000FFFF0000FFFFULL) << 16);
        ull = ((ull & 0xFFFFFFFF00000000ULL) >> 32)
            | ((ull & 0x00000000FFFFFFFFULL) << 32);
    #else
        ull = ((ull >> 1) & 0x5555555555555555ULL)
            | ((ull << 1) & 0xAAAAAAAAAAAAAAAAULL);
        ull = ((ull >> 2) & 0x3333333333333333ULL)
            | ((ull << 2) & 0xCCCCCCCCCCCCCCCCULL);
        ull = ((ull >> 4) & 0x0F0F0F0F0F0F0F0FULL)
            | ((ull << 4) & 0xF0F0F0F0F0F0F0F0ULL);
        ull = ((ull >> 8) & 0x00FF00FF00FF00FFULL)
            | ((ull << 8) & 0xFF00FF00FF00FF00ULL);
        ull = ((ull >> 16) & 0x0000FFFF0000FFFFULL)
            | ((ull << 16) & 0xFFFF0000FFFF0000ULL);
        ull = ((ull >> 32) & 0x00000000FFFFFFFFULL)
            | ((ull << 32) & 0xFFFFFFFF00000000ULL);
    #endif
        return ull;
    }
  2. Generate the assembly listing file case11.asm from the source file case11.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase11.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case11.c
  3. Display the assembly listing file case11.asm created in step 2.:

    TYPE case11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case11.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lreverse
    PUBLIC	_llreverse
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lreverse
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _lreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\case11.c
    ; Line 6
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	edx, ecx
    	shr	edx, 1
    	lea	eax, DWORD PTR [ecx+ecx]
    	xor	edx, eax
    	lea	eax, DWORD PTR [ecx+ecx]
    	and	edx, 1431655765				; 55555555H
    	xor	edx, eax
    	lea	eax, [ecx+ecx]
    	shr	ecx, 1
    	and	eax, -1431655766			; aaaaaaaaH
    	and	ecx, 1431655765				; 55555555H
    	or	ecx, eax
    ; Line 8
    	mov	ecx, edx
    	shr	ecx, 2
    	lea	eax, DWORD PTR [edx*4]
    	xor	ecx, eax
    	lea	eax, DWORD PTR [edx*4]
    	and	ecx, 858993459				; 33333333H
    	xor	ecx, eax
    	lea	eax, [ecx*4]
    	shr	ecx, 2
    	and	eax, -858993460				; ccccccccH
    	and	ecx, 858993459				; 33333333H
    	or	ecx, eax
    ; Line 10
    	mov	edx, ecx
    	mov	eax, ecx
    	shl	eax, 4
    	shr	edx, 4
    	xor	edx, eax
    	shl	ecx, 4
    	and	edx, 252645135				; 0f0f0f0fH
    	xor	edx, ecx
    	mov	eax, ecx
    	shr	ecx, 4
    	shl	eax, 4
    	and	ecx, 252645135				; 0f0f0f0fH
    	and	eax, -252645136				; f0f0f0f0H
    	or	eax, ecx
    ; Line 12
    	mov	eax, edx
    	mov	ecx, edx
    	shr	eax, 8
    	shl	ecx, 8
    	xor	eax, ecx
    	shl	edx, 8
    	and	eax, 16711935				; 00ff00ffH
    	xor	eax, edx
    ; Line 14
    	rol	eax, 16					; 00000010H
    	bswap	eax
    ; Line 29
    	ret	0
    _lreverse ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llreverse
    _TEXT	SEGMENT
    _ull$11$ = 8						; size = 4
    _ull$ = 8						; size = 8
    _llreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\case11.c
    ; Line 34
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	mov	esi, DWORD PTR _ull$[esp+8]
    	mov	ebx, esi
    	mov	eax, esi
    	shld	ecx, eax, 1
    	push	edi
    	mov	edi, edx
    	add	eax, eax
    	shrd	ebx, edi, 1
    	shld	edx, esi, 1
    	xor	ebx, eax
    	shr	edi, 1
    	xor	edi, ecx
    	add	esi, esi
    	and	ebx, 1431655765				; 55555555H
    	and	edi, 1431655765				; 55555555H
    	xor	ebx, esi
    	xor	edi, edx
    	mov	eax, DWORD PTR _ull$[esp+8]
    	mov	edx, DWORD PTR _ull$[esp+4]
    	mov	ecx, -1431655766			; aaaaaaaaH
    	lea	esi, [eax+eax]
    	lea	edi, [edx+edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 1
    	shr	edx, 1
    	or	eax, esi
    	or	edx, edi
    ; Line 36
    	mov	edx, ebx
    	mov	esi, edi
    	shrd	edx, esi, 2
    	mov	eax, ebx
    	mov	ecx, edi
    	shld	ecx, eax, 2
    	shld	edi, ebx, 2
    	shr	esi, 2
    	xor	esi, ecx
    	shl	eax, 2
    	xor	edx, eax
    	shl	ebx, 2
    	and	esi, 858993459				; 33333333H
    	and	edx, 858993459				; 33333333H
    	xor	esi, edi
    	xor	edx, ebx
    	mov	ecx, -858993460				; ccccccccH
    	lea	esi, [4*eax]
    	lea	edi, [4*edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 2
    	shr	edx, 2
    	or	eax, esi
    	or	edx, edi
    ; Line 38
    	mov	ebx, esi
    	mov	ecx, esi
    	mov	edi, edx
    	mov	eax, edx
    	shrd	edi, ebx, 4
    	shld	ecx, eax, 4
    	shld	esi, edx, 4
    	shl	eax, 4
    	xor	edi, eax
    	shr	ebx, 4
    	xor	ebx, ecx
    	shl	edx, 4
    	and	edi, 252645135				; 0f0f0f0fH
    	and	ebx, 252645135				; 0f0f0f0fH
    	xor	ebx, esi
    	xor	edi, edx
    	mov	ecx, 252645135				; 0f0f0f0fH
    	mov	esi, ecx
    	mov	edi, ecx
    	and	esi, eax
    	and	edi, edx
    	shl	esi, 4
    	shl	edi, 4
    	shr	eax, 4
    	shr	edx, 4
    	and	eax, ecx
    	and	edx, ecx
    	or	eax, esi
    	or	edx, edi
    ; Line 40
    	mov	ebp, edi
    	mov	esi, ebx
    	shrd	ebp, esi, 8
    	mov	eax, edi
    	mov	ecx, ebx
    	shld	ecx, eax, 8
    	shr	esi, 8
    	xor	esi, ecx
    	shl	eax, 8
    	xor	ebp, eax
    	and	esi, 16711935				; 00ff00ffH
    	shld	ebx, edi, 8
    	and	ebp, 16711935				; 00ff00ffH
    	xor	esi, ebx
    	shl	edi, 8
    	mov	DWORD PTR _ull$11$[esp+12], esi
    	xor	ebp, edi
    ; Line 42
    	mov	edi, DWORD PTR _ull$11$[esp+12]
    	mov	eax, ebp
    	mov	ecx, edi
    	mov	edx, ebp
    	shrd	edx, esi, 16
    	shld	ecx, eax, 16
    	shr	esi, 16					; 00000010H
    	shl	eax, 16					; 00000010H
    	xor	edx, eax
    	xor	esi, ecx
    	shld	edi, ebp, 16
    	and	esi, 65535				; 0000ffffH
    	movzx	ecx, dx
    	xor	esi, edi
    	shl	ebp, 16					; 00000010H
    ; Line 60
    	pop	edi
    	mov	eax, esi
    	xor	ecx, ebp
    	pop	esi
    	xor	edx, edx
    	pop	ebp
    	or	edx, ecx
    	pop	ebx
    	bswap	eax
    	bswap	edx
    ; Line 61
    	ret	0
    _llreverse ENDP
    _TEXT	ENDS
    END
    In both functions, the optimiser fails to recognise that the final two or three shift & mask assignments operating on 8 bits and more can be translated into one or two BSWAP instructions instead of 9 or even 36 (in words: thirty-six instructions!

    In the function llreverse() it fails to recognise that no shift operation crosses a register boundary and thus the generation of SHLD and SHRD instructions is not necessary!

  4. Repeat the previous steps with the alternate implementation; generate another assembly listing file case11.asm from the source file case11.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /GS- /Gy /Ox /Tccase11.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case11.c
  5. Display the assembly listing file case11.asm created in step 4.:

    TYPE case11.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case11.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lreverse
    PUBLIC	_llreverse
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lreverse
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _lreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\case11.c
    ; Line 17
    	mov	ecx, DWORD PTR _ul$[esp-4]
    	mov	edx, ecx
    	shr	edx, 1
    	lea	eax, DWORD PTR [ecx+ecx]
    	xor	edx, eax
    	lea	eax, DWORD PTR [ecx+ecx]
    	and	edx, 1431655765				; 55555555H
    	xor	edx, eax
    	lea	eax, [ecx+ecx]
    	shr	ecx, 1
    	and	eax, -1431655766			; aaaaaaaaH
    	and	ecx, 1431655765				; 55555555H
    	or	ecx, eax
    ; Line 19
    	mov	ecx, edx
    	shr	ecx, 2
    	lea	eax, DWORD PTR [edx*4]
    	xor	ecx, eax
    	lea	eax, DWORD PTR [edx*4]
    	and	ecx, 858993459				; 33333333H
    	xor	ecx, eax
    	lea	eax, [ecx*4]
    	shr	ecx, 2
    	and	eax, -858993460				; ccccccccH
    	and	ecx, 858993459				; 33333333H
    	or	ecx, eax
    ; Line 21
    	mov	edx, ecx
    	mov	eax, ecx
    	shl	eax, 4
    	shr	edx, 4
    	xor	edx, eax
    	shl	ecx, 4
    	and	edx, 252645135				; 0f0f0f0fH
    	xor	edx, ecx
    	mov	eax, ecx
    	shr	ecx, 4
    	shl	eax, 4
    	and	ecx, 252645135				; 0f0f0f0fH
    	and	eax, -252645136				; f0f0f0f0H
    	or	eax, ecx
    ; Line 23
    	mov	eax, edx
    	mov	ecx, edx
    	shr	eax, 8
    	shl	ecx, 8
    	xor	eax, ecx
    	shl	edx, 8
    	and	eax, 16711935				; 00ff00ffH
    	xor	eax, edx
    ; Line 25
    	rol	eax, 16					; 00000010H
    	bswap	eax
    ; Line 29
    	ret	0
    _lreverse ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llreverse
    _TEXT	SEGMENT
    _ull$11$ = 8						; size = 4
    _ull$ = 8						; size = 8
    _llreverse PROC						; COMDAT
    ; File c:\users\stefan\desktop\case11.c
    ; Line 47
    	mov	edx, DWORD PTR _ull$[esp]
    	mov	ecx, edx
    	push	ebx
    	push	ebp
    	push	esi
    	mov	esi, DWORD PTR _ull$[esp+8]
    	mov	ebx, esi
    	mov	eax, esi
    	shld	ecx, eax, 1
    	push	edi
    	mov	edi, edx
    	add	eax, eax
    	shrd	ebx, edi, 1
    	shld	edx, esi, 1
    	xor	ebx, eax
    	shr	edi, 1
    	xor	edi, ecx
    	add	esi, esi
    	and	ebx, 1431655765				; 55555555H
    	and	edi, 1431655765				; 55555555H
    	xor	ebx, esi
    	xor	edi, edx
    	mov	eax, DWORD PTR _ull$[esp]
    	mov	edx, DWORD PTR _ull$[esp-4]
    	mov	ecx, -1431655766			; aaaaaaaaH
    	lea	esi, [eax+eax]
    	lea	edi, [edx+edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 1
    	shr	edx, 1
    	or	eax, esi
    	or	edx, edi
    ; Line 49
    	mov	edx, ebx
    	mov	esi, edi
    	shrd	edx, esi, 2
    	mov	eax, ebx
    	mov	ecx, edi
    	shld	ecx, eax, 2
    	shld	edi, ebx, 2
    	shr	esi, 2
    	xor	esi, ecx
    	shl	eax, 2
    	xor	edx, eax
    	shl	ebx, 2
    	and	esi, 858993459				; 33333333H
    	and	edx, 858993459				; 33333333H
    	xor	esi, edi
    	xor	edx, ebx
    	mov	ecx, -858993460				; ccccccccH
    	lea	esi, [4*eax]
    	lea	edi, [4*edx]
    	and	esi, ecx
    	and	edi, ecx
    	and	eax, ecx
    	and	edx, ecx
    	shr	eax, 2
    	shr	edx, 2
    	or	eax, esi
    	or	edx, edi
    ; Line 51
    	mov	ebx, esi
    	mov	ecx, esi
    	mov	edi, edx
    	mov	eax, edx
    	shrd	edi, ebx, 4
    	shld	ecx, eax, 4
    	shld	esi, edx, 4
    	shl	eax, 4
    	xor	edi, eax
    	shr	ebx, 4
    	xor	ebx, ecx
    	shl	edx, 4
    	and	edi, 252645135				; 0f0f0f0fH
    	and	ebx, 252645135				; 0f0f0f0fH
    	xor	ebx, esi
    	xor	edi, edx
    	mov	ecx, 252645135				; 0f0f0f0fH
    	mov	esi, ecx
    	mov	edi, ecx
    	and	esi, eax
    	and	edi, edx
    	shl	esi, 4
    	shl	edi, 4
    	shr	eax, 4
    	shr	edx, 4
    	and	eax, ecx
    	and	edx, ecx
    	or	eax, esi
    	or	edx, edi
    ; Line 53
    	mov	ebp, edi
    	mov	esi, ebx
    	shrd	ebp, esi, 8
    	mov	eax, edi
    	mov	ecx, ebx
    	shld	ecx, eax, 8
    	shr	esi, 8
    	xor	esi, ecx
    	shl	eax, 8
    	xor	ebp, eax
    	and	esi, 16711935				; 00ff00ffH
    	shld	ebx, edi, 8
    	and	ebp, 16711935				; 00ff00ffH
    	xor	esi, ebx
    	shl	edi, 8
    	mov	DWORD PTR _ull$11$[esp+12], esi
    	xor	ebp, edi
    ; Line 55
    	mov	edi, DWORD PTR _ull$11$[esp+12]
    	mov	eax, ebp
    	mov	ecx, edi
    	mov	edx, ebp
    	shrd	edx, esi, 16
    	shld	ecx, eax, 16
    	shr	esi, 16					; 00000010H
    	shl	eax, 16					; 00000010H
    	xor	edx, eax
    	xor	esi, ecx
    	shld	edi, ebp, 16
    	and	esi, 65535				; 0000ffffH
    	movzx	ecx, dx
    	xor	esi, edi
    	shl	ebp, 16					; 00000010H
    ; Line 60
    	pop	edi
    	mov	eax, esi
    	xor	ecx, ebp
    	pop	esi
    	xor	edx, edx
    	pop	ebp
    	or	edx, ecx
    	pop	ebx
    	bswap	eax
    	bswap	edx
    ; Line 61
    	ret	0
    _llreverse ENDP
    _TEXT	ENDS
    END
    As bad as above!

Case 12

Superfluous load and store instructions using superfluous temporary variable generated by the Visual C 2017 and Visual C 2010 compilers.

Demonstration

  1. Create the text file case12.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __inline
    unsigned long htonl(unsigned long ul)
    {
    #if _MSC_VER >= 1900
    	__asm	movbe	eax, ul
    #else
    	__asm	mov	eax, ul
    	__asm	bswap	eax
    #endif
    }
    
    int main(int argc)
    {
        unsigned long array[] = {'MSFT', 'MSVC', 'POOR', 'CODE'};
    
        argc = htonl(argc);
    
        for (argc = 0; argc < sizeof(array) / sizeof(*array); argc++)
            array[argc] = htonl(array[argc]);
    }
  2. Generate the assembly listing file case12.asm from the source file case12.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase12.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case12.c
  3. Display the assembly listing file case12.asm created in step 2.:

    TYPE case12.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case12.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case12.c
    ; Line 7
    	movbe	eax, DWORD PTR _ul$[esp-4]
    ; Line 12
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -8						; size = 16
    _ul$ = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case12.c
    ; Line 15
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[esp+8], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[esp+12], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[esp+16], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[esp+20], 1129268293	; 434f4445H
    ; Line 18
    	movbe	eax, DWORD PTR _argc$[esp+12]
    ; Line 20
    	xor	ecx, ecx
    	npad	6
    $LL4@main:
    ; Line 21
    	mov	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _ul$[esp+12], eax
    	movbe	eax, DWORD PTR _ul$[esp+12]
    	movbe	eax, DWORD PTR _array$[esp+ecx*4+16]
    	mov	DWORD PTR _array$[esp+ecx*4+16], eax
    	inc	ecx
    	cmp	ecx, 4
    	jb	SHORT $LL4@main
    ; Line 22
    	add	esp, 16					; 00000010H
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous in(s)ane transfer of the EAX register to and from the (intermediate) variable _ul$ generated for line 21!
    Also notice that the superfluous instruction generated for line 18 doesn’t use an intermediate variable!
  4. Generate another assembly listing file case12.asm from the source file case12.c created in step 1., now using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase12.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    case12.c
  5. Display the assembly listing file case12.asm created in step 4.:

    TYPE case12.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\case12.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_htonl
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_htonl
    _TEXT	SEGMENT
    _ul$ = 8						; size = 4
    _htonl	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case12.c
    ; Line 9
    	mov	eax, DWORD PTR _ul$[esp-4]
    ; Line 10
    	bswap	eax
    ; Line 11
    	ret	0
    _htonl	ENDP
    _TEXT	ENDS
    
    PUBLIC	_main
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _array$ = -16						; size = 16
    $T1040 = 8						; size = 4
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; Line 15
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 16					; 00000010H
    ; Line 16
    	mov	DWORD PTR _array$[ebp], 1297303124	; 4d534654H
    	mov	DWORD PTR _array$[ebp+4], 1297307203	; 4d535643H
    	mov	DWORD PTR _array$[ebp+8], 1347374930	; 504f4f52H
    	mov	DWORD PTR _array$[ebp+12], 1129268293	; 434f4445H
    ; Line 18
    	mov	eax, DWORD PTR _argc$[ebp]
    	bswap	eax
    ; Line 20
    	xor	edx, edx
    $LL3@main:
    	lea	ecx, DWORD PTR _array$[ebp+edx*4]
    ; Line 21
    	mov	eax, DWORD PTR [ecx]
    	mov	DWORD PTR $T1040[ebp], eax
    	mov	eax, DWORD PTR $T1040[ebp]
    	bswap	eax
    	inc	edx
    	mov	DWORD PTR [ecx], eax
    	cmp	edx, 4
    	jb	SHORT $LL3@main
    ; Line 22
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous in(s)ane transfer of the EAX register to and from the intermediate (temporary) variable $T1040 generated for line 21!
    Again notice that the superfluous instructions generated for line 18 don’t use an intermediate (temporary) variable!

Case 13

Completely wrong code generated with __forceinline versus __inline by the Visual C 2017 compiler (and all previous versions too) when specified for a __fastcall function with a body written in inline assembler.

Note: the advice against this combination given in the MSDN article Using and Preserving Registers in Inline Assembly does not apply here: there is no code which might clobber any register!

Demonstration

  1. Create the text file case13.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 0x80000000 ? polynomial ^ (argument << 1) : argument << 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            add ecx, ecx ; ecx = argument << 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xC5);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xC5 represents the primitive polynomial x32+x7+x6+x2+x0; it gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing file case13.asm from the source file case13.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase13.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case13.c
    case13.c(4): warning C4100: 'polynomial': unreferenced formal parameter
    case13.c(4): warning C4100: 'argument': unreferenced formal parameter
  3. Display the assembly listing file case13.asm created in step 2.:

    TYPE case13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case13.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\case13.c
    ; Line 11
    	add	ecx, ecx
    ; Line 12
    	sbb	eax, eax
    ; Line 13
    	and	eax, edx
    ; Line 14
    	xor	eax, ecx
    ; Line 15
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case13.c
    ; Line 21
    	push	ebx
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	mov	edx, 197				; 000000c5H
    	xor	ebx, ebx
    	xor	edx, edx
    $LL4@main:
    ; Line 26
    	inc	edx
    	inc	ebx
    ; Line 27
    	mov	ecx, eax
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 28
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, edx
    	mov	eax, ebx
    	pop	ebx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    The variable lfsr alias argument, held in register ECX (the first argument of functions with __fastcall calling convention), is not initialized with the constant 123456789, register EDX (the second argument of functions with __fastcall calling convention) is never loaded with the constant 0xC5, and the return value from the (inlined) function held in register EAX is not loaded back into register ECX!

Mitigation

After replacing the keyword __forceinline with __inline the Visual C compiler generates correct code, but does not inline the function any more.

Note: replacing the function call avoids this compiler bug of course too, but generates no optimised code!

  1. Generate another assembly listing file case13.asm from the source file case13.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tccase13.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case13.c
  2. Display the assembly listing file case13.asm created in step 4.:

    TYPE case13.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case13.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\case13.c
    ; Line 5
    	push	esi
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    	mov	eax, esi
    	xor	eax, edx
    	test	ecx, ecx
    	cmovns	eax, esi
    	pop	esi
    	add	ecx, ecx
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case13.c
    ; Line 21
    	mov	ecx, 123456789				; 075bcd15H
    ; Line 22
    	xor	eax, eax
    	push	esi
    $LL4@main:
    ; Line 7
    	lea	esi, DWORD PTR [ecx+ecx]
    ; Line 26
    	inc	eax
    ; Line 28
    	mov	edx, esi
    	xor	edx, 197				; 000000c5H
    	test	ecx, ecx
    	cmovns	edx, esi
    	mov	ecx, edx
    IFDEF VARIANT
    	lea	edx, DWORD PTR [ecx+ecx]
    	sar	ecx, 31
    	and	ecx, 197				; 000000c5H
    	xor	ecx, edx
    ELSE
    	add	ecx, ecx
    	sbb	edx, edx
    	and	edx, 197				; 000000c5H
    	xor	ecx, edx
    ENDIF
    	cmp	ecx, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	pop	esi
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn’t need the extraneous register ESI to preserve the value of the shifted (or doubled) variable!

    Note: the assembly listing also shows an alternative variant.

Case 14

This is the reversed case of the second variant from (the previous) case 13.

Demonstration

  1. Create the text file case14.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline // here be dragons!
    unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
    {
    #ifdef MITIGATE
        return argument & 1 ? polynomial ^ (argument >> 1) : argument >> 1;
    #else
        __asm // 32-bit linear feedback shift register
        {
            shr ecx, 1   ; ecx = argument >> 1
            sbb eax, eax ; eax = CF ? -1 : 0
            and eax, edx ; eax = CF ? polynomial : 0
            xor eax, ecx ; eax = (argument >> 1) ^ (CF ? polynomial : 0)
        }
    #endif
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xA3000000);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xA3000000 represents the same primitive polynomial x32+x30+x26+x25+x0 alias x32+x7+x6+x2+x0 as 0xC5; it’s just the bit-reversed value and gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing file case14.asm from the source file case14.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro MITIGATE defined on the command line:

    CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tccase14.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case14.c
  3. Display the assembly listing file case14.asm created in step 2.:

    TYPE case14.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case14.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	@lfsr32@8
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	@lfsr32@8
    _TEXT	SEGMENT
    @lfsr32@8 PROC						; COMDAT
    ; _argument$ = ecx
    ; _polynomial$ = edx
    ; File c:\users\stefan\desktop\case14.c
    ; Line 5
    	push	esi
    ; Line 7
    	mov	esi, ecx
    	shr	esi, 1
    	mov	eax, esi
    	xor	eax, edx
    	and	cl, 1
    	cmove	eax, esi
    	pop	esi
    	shr	ecx, 1
    	sbb	eax, eax
    	and	eax, edx
    	xor	eax, ecx
    ; Line 17
    	ret	0
    @lfsr32@8 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case14.c
    ; Line 20
    	push	esi
    ; Line 21
    	mov	eax, 123456789				; 075bcd15H
    ; Line 22
    	xor	esi, esi
    	xor	ecx, ecx
    $LL4@main:
    ; Line 7
    	mov	edx, eax
    	mov	ecx, eax
    	shr	edx, 1
    ; Line 26
    	inc	esi
    	inc	ecx
    ; Line 28
    	mov	eax, edx
    	xor	eax, -1560281088			; a3000000H
    ; Line 7
    	and	cl, 1
    ; Line 28
    	cmove	eax, edx
    	and	eax, 1
    	neg	eax
    	and	eax, -1560281088			; a3000000H
    	xor	eax, edx
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 30
    	mov	eax, esi
    	pop	esi
    	mov	eax, ecx
    ; Line 31
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set by the SHR instruction from the least significant bit can be used here; this variant also doesn’t need the extraneous register ECX to preserve the original value of the shifted variable!

    Note: the assembly listing shows an alternative, equally optimised variant.

Case 15

These are the 64-bit variants of (the previous) case 14 and of the second variant from case 13.

Demonstration

  1. Create the text file case15.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long right()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
    #ifdef ALTERNATE
            lfsr = (-((long long) lfsr & 1) & 0xD800000000000000) ^ (lfsr >> 1);
    #else
            lfsr = lfsr & 1 ? 0xD800000000000000 ^ (lfsr >> 1) : lfsr >> 1;
    #endif
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    
    unsigned long long left()
    {
        unsigned long long lfsr = 0x0123456789ABCDEF;
        unsigned long long period = 0;
    
        do
        {
            period++;
    #ifdef ALTERNATE
            lfsr = (-((long long) lfsr < 0) & 0x1B) ^ (lfsr << 1);
    #else
            lfsr = (long long) lfsr < 0 ? 0x1B ^ (lfsr << 1) : lfsr << 1;
    #endif
        } while (lfsr != 0x0123456789ABCDEF);
    
        return period;
    }
    Note: both constants 0xD800000000000000 and 0x1B represent the primitive polynomial x64+x63+x61+x60+x0 alias x64+x4+x3+x1+x0; it gives the 64-bit LFSR its maximum period length of 264−1.
  2. Generate the assembly listing file case15.asm from the source file case15.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case15.c
  3. Display the assembly listing file case15.asm created in step 2.:

    TYPE case15.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	right
    PUBLIC	left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	right
    _TEXT	SEGMENT
    right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 5
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 6
    	xor	r8d, r8d
    	mov	rax, r9
    	mov	r10, -2882303761517117440		; d800000000000000H
    	npad	6
    $LL4@right:
    ; Line 14
    	mov	rdx, rax
    	movzx	ecx, al
    	shr	rdx, 1
    	inc	r8
    ; Line 16
    	mov	rax, rdx
    	xor	rax, r10
    	and	cl, 1
    	cmove	rax, rdx
    	and	eax, 1
    	neg	rax
    	and	rax, r10
    	xor	rax, edx
    	cmp	rax, r9
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	rax, r8
    ; Line 19
    	ret	0
    right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	left
    _TEXT	SEGMENT
    left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 23
    	mov	r9, 81985529216486895			; 0123456789abcdefH
    ; Line 24
    	xor	eax, eax
    	mov	rcx, r9
    	npad	1
    $LL4@left:
    ; Line 34
    	mov	rdx, rcx
    	lea	r8, QWORD PTR [rcx+rcx]
    	inc	rax
    	mov	rcx, r8
    	xor	rcx, 27
    	test	rdx, rdx
    	cmovns	rcx, r9
    	add	rcx, rcx
    	sbb	rdx, rdx
    	and	rdx, 27
    	xor	rcx, rdx
    	cmp	rcx, r9
    	jne	SHORT $LL4@left
    ; Line 37
    	ret	0
    left	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function right() is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separate AND instruction, the carry flag CF already set by the SHR instruction from the least significant bit can be used here; this variant also doesn’t need the extraneous register RCX to preserve the original value of the shifted variable!

    Additionally the registers RAX and R8 can be swapped, rendering the MOV instruction generated for line 14 superfluous.

    While the code generated for the function left() is correct too, the compiler likewise fails to perform an even more obvious optimisation: instead to explicitly set the sign flag SF with a separate TEST instruction, the carry flag CF set (from the most significant alias sign bit) by a SHL (as well as an ADD) instruction can be used here; this variant also doesn’t need the extraneous register R8 to preserve the value of the shifted (or doubled) variable!

  4. Generate another assembly listing file case15.asm from the source file case15.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case15.c
  5. Display the assembly listing file case15.asm created in step 4.:

    TYPE case15.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case15.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_right
    PUBLIC	_left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_right
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 4
    	sub	esp, 8
    	push	esi
    	xorps	xmm0, xmm0
    ; Line 5
    	mov	ecx, -1985229329			; 89abcdefH
    ; Line 6
    	movlpd	QWORD PTR _period$[esp+12], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	esi, DWORD PTR _period$[esp+16]
    	xor	esi, esi
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+16]
    	xor	edi, edi
    $LL4@right:
    ; Line 10
    	add	edi, 1
    ; Line 14
    	mov	edx, ecx
    	adc	esi, 0
    	and	ecx, 1
    	shrd	edx, eax, 1
    	shrd	ecx, eax, 1
    	shr	eax, 1
    	or	ecx, 0
    	mov	ecx, edx
    	je	SHORT $LN7@right
    	xor	ecx, 0
    	xor	eax, -671088640				; d8000000H
    $LN7@right:
    	and	edx, 1
    	neg	edx
    	and	edx, -671088640				; d8000000H
    	xor	eax, edx
    ; Line 16
    	cmp	ecx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@right
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	eax, edi
    	mov	edx, esi
    	pop	edi
    	pop	esi
    ; Line 19
    	add	esp, 8
    	ret	0
    _right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_left
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 22
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 23
    	mov	eax, -1985229329			; 89abcdefH
    	push	esi
    ; Line 24
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	ecx, 19088743				; 01234567H
    	mov	ebx, DWORD PTR _period$[esp+16]
    	xor	ebx, ebx
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+24]
    	xor	edi, edi
    $LL4@left:
    ; Line 28
    	add	ebx, 1
    	mov	edx, eax
    	mov	esi, ecx
    	adc	edi, 0
    	shld	esi, edx, 1
    	add	edx, edx
    	add	eax, eax
    	adc	ecx, ecx
    ; Line 32
    	test	ecx, ecx
    	jg	SHORT $LN6@left
    	jl	SHORT $LN11@left
    	test	eax, eax
    	jae	SHORT $LN6@left
    $LN11@left:
    	mov	eax, edx
    	mov	ecx, esi
    	xor	eax, 27					; 0000001bH
    	xor	ecx, 0
    	jmp	SHORT $LN7@left
    $LN6@left:
    	mov	eax, edx
    	mov	ecx, esi
    $LN7@left:
    	sbb	edx, edx
    	and	edx, 27					; 0000001bH
    	xor	eax, edx
    ; Line 34
    	cmp	eax, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@left
    	cmp	ecx, 19088743				; 01234567H
    	jne	SHORT $LL4@left
    ; Line 36
    	mov	edx, edi
    	mov	eax, ebx
    	pop	edi
    	pop	esi
    	pop	ebx
    ; Line 37
    	add	esp, 8
    	ret	0
    _left	ENDP
    _TEXT	ENDS
    END
    The code generated for the function right() is totally screwed up: the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EDI and ESI, which have to be transferred into EDX:EAX upon exit; register ECX, which holds the lower half of the variable lfsr, is clobbered inside the loop without necessity and has to be reloaded; the result of the AND instruction present in the EFLAGS register is ignored, and evaluated again with an extraneous OR instruction; the XOR instruction with immediate operand 0 has no effect and is superfluous too!

    The code generated for the function left() is even worse: again the variable period is allocated on the stack, zeroed using the SSE register XMM0, then loaded into the registers ESI and EDI, but never used again; instead to hold the variable period in the register pair EDX:EAX used for the return value, it is held in the registers EBX and EDI, which have to be transferred into EDX:EAX upon exit; instead to use the carry flag CF already set by the SHLD instruction, or the sign flag SF set by the first TEST instruction, a full comparison against 0 is performed, using three conditional branch instructions; the registers EAX and ECX, which hold the variable lfsr, are copied without necessity into the registers EDX and ESI, which are then used for the shift and exclusive-or operation; the XOR instruction with immediate operand 0 has no effect and is superfluous!

  6. Generate another assembly listing file case15.asm from the source file case15.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case15.c
  7. Display the assembly listing file case15.asm created in step 6.:

    TYPE case15.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case15.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_right
    PUBLIC	_left
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_right
    _TEXT	SEGMENT
    _period$ = -8						; size = 8
    _right	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 4
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 5
    	mov	edx, -1985229329			; 89abcdefH
    	push	esi
    ; Line 6
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	ebx, DWORD PTR _period$[esp+16]
    	push	edi
    	mov	edi, DWORD PTR _period$[esp+24]
    $LL4@right:
    ; Line 10
    	add	ebx, 1
    ; Line 12
    	mov	ecx, edx
    	adc	edi, 0
    	xor	esi, esi
    	and	ecx, 1
    	neg	ecx
    	adc	esi, esi
    	xor	ecx, ecx
    	shrd	edx, eax, 1
    	neg	esi
    	and	esi, -671088640				; d8000000H
    	shr	eax, 1
    	xor	edx, ecx
    	xor	eax, esi
    ; Line 16
    	cmp	edx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@right
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@right
    ; Line 18
    	mov	edx, edi
    	mov	eax, ebx
    	pop	edi
    	pop	esi
    	pop	ebx
    ; Line 19
    	add	esp, 8
    	ret	0
    _right	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_left
    _TEXT	SEGMENT
    $T1 = -8						; size = 8
    _period$ = -8						; size = 8
    _left	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case15.c
    ; Line 22
    	sub	esp, 8
    	push	ebx
    	xorps	xmm0, xmm0
    ; Line 23
    	mov	ecx, -1985229329			; 89abcdefH
    	push	esi
    ; Line 24
    	movlpd	QWORD PTR _period$[esp+16], xmm0
    	mov	eax, 19088743				; 01234567H
    	mov	edx, DWORD PTR _period$[esp+20]
    	mov	esi, DWORD PTR _period$[esp+16]
    	push	edi
    $LL4@left:
    ; Line 28
    	add	esi, 1
    	adc	edx, 0
    ; Line 30
    	test	eax, eax
    	jg	SHORT $LN6@left
    	jl	SHORT $LN11@left
    	test	ecx, ecx
    	jae	SHORT $LN6@left
    $LN11@left:
    	mov	edi, 27					; 0000001bH
    	xor	ebx, ebx
    	jmp	SHORT $LN7@left
    $LN6@left:
    	xorps	xmm0, xmm0
    	movlpd	QWORD PTR $T1[esp+20], xmm0
    	mov	ebx, DWORD PTR $T1[esp+24]
    	mov	edi, DWORD PTR $T1[esp+20]
    $LN7@left:
    	shld	eax, ecx, 1
    	add	ecx, ecx
    	xor	eax, ebx
    	xor	ecx, edi
    ; Line 34
    	cmp	ecx, -1985229329			; 89abcdefH
    	jne	SHORT $LL4@left
    	cmp	eax, 19088743				; 01234567H
    	jne	SHORT $LL4@left
    ; Line 36
    	pop	edi
    	mov	eax, esi
    	pop	esi
    	pop	ebx
    ; Line 37
    	add	esp, 8
    	ret	0
    _left	ENDP
    _TEXT	ENDS
    END
    The code generated is as bad as in step 4. before!

Case 16

This is a variation of the previous cases 13 and 14, supposed to prod and tickle the optimiser.

Demonstration

  1. Create the text file case16.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    __forceinline
    long lfsr32(long argument, long polynomial)
    {
        return (((long long) argument >> 32) & polynomial) ^ (argument << 1);
    }
    
    int main()
    {
        unsigned lfsr = 123456789;
        unsigned period = 0;
    
        do
        {
            period++;
            lfsr = lfsr32(lfsr, 0xC5);
        } while (lfsr != 123456789);
    
        return period;
    }
    Note: the constant 0xC5 represents the primitive polynomial x32+x7+x6+x2+x0; it gives the 32-bit LFSR its maximum period length of 232−1.
  2. Generate the assembly listing file case16.asm from the source file case16.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase16.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case16.c
  3. Display the assembly listing file case16.asm created in step 2.:

    TYPE case16.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case16.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lfsr32
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lfsr32
    _TEXT	SEGMENT
    _argument$ = 8						; size = 4
    _polynomial$ = 12					; size = 4
    _lfsr32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case16.c
    ; Line 5
    	push	esi
    ; Line 6
    	mov	esi, DWORD PTR _argument$[esp]
    	mov	eax, esi
    	mov	eax, DWORD PTR _argument$[esp-4]
    	cdq
    	mov	ecx, edx
    	and	edx, DWORD PTR _polynomial$[esp-4]
    	sar	ecx, 31					; 0000001fH
    	lea	eax, DWORD PTR [esi+esi]
    	add	eax, eax
    	xor	eax, edx
    	pop	esi
    ; Line 7
    	ret	0
    _lfsr32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case16.c
    ; Line 10
    	push	esi
    	push	edi
    ; Line 11
    	mov	ecx, 123456789				; 075bcd15H
    	mov	eax, 123456789				; 075bcd15H
    ; Line 12
    	xor	edi, edi
    	xor	ecx, ecx
    	npad	7
    $LL4@main:
    ; Line 6
    	mov	eax, ecx
    ; Line 16
    	inc	edi
    	inc	ecx
    ; Line 6
    	cdq
    	add	eax, eax
    	add	ecx, ecx
    	mov	esi, edx
    	and	edx, 197				; 000000c5H
    	xor	eax, edx
    	xor	ecx, edx
    	sar	esi, 31					; 0000001fH
    ; Line 18
    	cmp	ecx, 123456789				; 075bcd15H
    	cmp	eax, 123456789				; 075bcd15H
    	jne	SHORT $LL4@main
    ; Line 20
    	mov	eax, ecx
    	mov	eax, edi
    	pop	edi
    	pop	esi
    ; Line 21
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    The registers EDI and ESI are used and clobbered without necessity and reason.
    Especially notice the superfluous MOV and SAR instructions generated for line 6: their result is never used!

Case 17

Demonstration

  1. Create the text file case17.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long lcg64() // linear congruential generator
    {
        static unsigned long long z = 1066149217761810ULL;
    
        z = z * 6906969069ULL + 1234567ULL;
    
        return z;
    }
    Note: both constants are from George Marsaglia’s KISS64 pseudo-random number generator; they give the 64-bit LCG its maximum period length of 264.
  2. Generate the assembly listing file case17.asm from the source file case17.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase17.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case17.c
  3. Display the assembly listing file case17.asm created in step 2.:

    TYPE case17.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case17.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lcg64
    EXTRN	__allmul:PROC
    
    _DATA	SEGMENT
    ?z@?1??lcg64@@9@9 DQ 0003c9a83566fa12H			; `lcg64'::`2'::z
    _DATA	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lcg64
    _TEXT	SEGMENT
    _lcg64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case17.c
    ; Line 7
    	push	1
    	push	-1682965523				; 9baffbedH
    	push	DWORD PTR ?z@?1??lcg64@@9@9+4
    	push	DWORD PTR ?z@?1??lcg64@@9@9
    	call	__allmul
    	mov	ecx, -1682965523			; 9baffbedH
    	mov	eax, DWORD PTR ?z@?1??lcg64@@9@9
    	mul	ecx
    	add	edx, DWORD PTR ?z@?1??lcg64@@9@9
    	imul	ecx, DWORD PTR ?z@?1??lcg64@@9@9+4
    	add	eax, 1234567				; 0012d687H
    	mov	DWORD PTR ?z@?1??lcg64@@9@9, eax
    	adc	edx, 0
    	adc	edx, ecx
    	mov	DWORD PTR ?z@?1??lcg64@@9@9+4, edx
    ; Line 10
    	ret	0
    _lcg64	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the compiler fails to perform an obvious optimisation: the constant 6906969069 is equal to 232+2612001773 (the hexadecimal notation 0x19BAFFBED shows this immediately); multiplication with 232 can be replaced by a simple addition.

    Note: an optimising compiler should clearly not emit 5 instructions for the call of an external routine to multiply 64-bit values, but emit the 6 instructions which can perform this operation inline!

Case 18

Demonstration

  1. Create the text file case18.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long msws32(void) // enhanced middle-square generator
    {
        static unsigned long v = 0UL;
        static unsigned long w = 0UL;
    
        w += 0x9E3779B9UL;
        v = (unsigned long) __ull_rshift(__emulu(v, v), 16);
        v += w;
        v = _byteswap_ulong(v);
    
        return v;
    }
    
    unsigned long mswsbw(void) // enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    
        w += 0x9E3779B97F4A7C15ULL;
        v *= v;
        v += w;
        v = (v << 32) | (v >> 32);
    
        return (unsigned long) v;
    }
    
    unsigned long long msws64(void) // enhanced middle-square generator
    {
        static unsigned long long v = 0ULL;
        static unsigned long long w = 0ULL;
    #ifdef _WIN64
        const unsigned long long x;
        const unsigned long long y = _umul128(v, v, &x);
    
        v = __shiftright128(y, x, 32);
    #else
        v = (__emulu((unsigned long) v, (unsigned long) v) >> 32)
          + (__emulu((unsigned long) v, (unsigned long) (v >> 32)) << 1)
          + (__emulu((unsigned long) (v >> 32), (unsigned long) (v >> 32)) << 32);
    #endif
        w += 0x9E3779B97F4A7C15ULL;
        v += w;
        v = _byteswap_uint64(v);
    
        return v;
    }
    
    int main(void)
    {
    #ifdef _WIN64
        volatile unsigned long long ull = msws64();
    #else
        volatile unsigned long ul = msws32();
    #endif
    }
    Note: the constants 0x9E3779B9 and 0x9E3779B97F4A7C15 are the fractional part of the golden ratio Φ = (√5+1)/2, which is equal to the inverse or reciprocal value φ = 1/Φ = Φ−1 = (√5−1)/2 = 0.6180339887498948482…, multiplied by 232 and 264 respectively – or just 232/Φ and 264/Φ.
  2. Generate the assembly listing file case18.asm from the source file case18.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase18.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case18.c
    case18.c(55): warning C4189: 'ul': local variable is initialized but not referenced
  3. Display the assembly listing file case18.asm created in step 2.:

    TYPE case18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case18.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_msws32
    PUBLIC	_mswsbw
    PUBLIC	_msws64
    PUBLIC	_main
    EXTRN	__allmul:PROC
    
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DQ 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws32
    _TEXT	SEGMENT
    _msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mul	eax
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??msws32@@9@9
    	shrd	eax, edx, 16
    	mov	edx, DWORD PTR ?w@?1??msws32@@9@9
    	sub	esi, 1640531527				; 61c88647H
    	add	edx, -1640531527			; 9e3779b9H
    	add	eax, esi
    	add	eax, edx
    	mov	DWORD PTR ?w@?1??msws32@@9@9, esi
    	mov	DWORD PTR ?w@?1??msws32@@9@9, edx
    ; Line 11
    	shr	edx, 16					; 00000010H
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 13
    	pop	esi
    ; Line 14
    	ret	0
    _msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_mswsbw
    _TEXT	SEGMENT
    _mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 21
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9
    	add	ecx, 2135587861				; 7f4a7c15H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, ecx
    	mov	ecx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, ecx
    ; Line 22
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9
    	mov	eax, ecx
    	mul	eax
    	imul	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	add	ecx, ecx
    	add	edx, ecx
    ; Line 23
    	add	eax, DWORD PTR ?w@?1??mswsbw@@9@9
    	adc	edx, DWORD PTR ?w@?1??mswsbw@@9@9+4
    ; Line 22
    	mov	ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4
    	mov	eax, DWORD PTR ?v@?1??mswsbw@@9@9
    	push	esi
    	mov	esi, DWORD PTR ?w@?1??mswsbw@@9@9+4
    	push	edi
    	mov	edi, DWORD PTR ?w@?1??mswsbw@@9@9
    	push	ecx
    	push	eax
    	add	edi, 2135587861				; 7f4a7c15H
    	push	ecx
    	adc	esi, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9, edi
    	push	eax
    	mov	DWORD PTR ?w@?1??mswsbw@@9@9+4, esi
    	call	__allmul
    	add	eax, edi
    ; Line 26
    	pop	edi
    	adc	edx, esi
    	xor	ecx, ecx
    	or	ecx, eax
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9, edx
    	mov	DWORD PTR ?v@?1??mswsbw@@9@9+4, ecx
    	mov	dword PTR ?v@?1??mswsbw@@9@9+4, eax
    	mov	eax, edx
    	pop	esi
    ; Line 27
    	ret	0
    _mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_msws64
    _TEXT	SEGMENT
    _msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 43
    	add	DWORD PTR ?w@?1??msws64@@9@9, 2135587861 ; 7f4a7c15H
    	mov	ecx, DWORD PTR ?v@?1??msws64@@9@9+4
    	mov	eax, ecx
    	push	ebx
    	mov	ebx, DWORD PTR ?w@?1??msws64@@9@9+4
    	adc	ebx, -1640531527			; 9e3779b9H
    	push	ebp
    	mul	ecx
    	push	esi
    	mov	esi, DWORD PTR ?v@?1??msws64@@9@9
    	mov	ebp, eax
    	push	edi
    	mov	edi, edx
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ebx
    	shld	edi, ebp, 31
    	mov	eax, esi
    	mul	ecx
    	shl	ebp, 31					; 0000001fH
    ; Line 44
    	add	ebp, eax
    	mov	eax, esi
    	adc	edi, ebx
    	shld	edi, ebp, 1
    	mul	esi
    	add	ebp, ebp
    	add	ebp, edx
    	adc	edi, 0
    	add	ebp, DWORD PTR ?w@?1??msws64@@9@9
    ; Line 45
    	bswap	ebp
    	adc	edi, ebx
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, ebp
    	bswap	edi
    	mov	DWORD PTR ?v@?1??msws64@@9@9, edi
    ; Line 47
    	mov	eax, edi
    	pop	edi
    	pop	esi
    	mov	edx, ebp
    	pop	ebp
    	pop	ebx
    ; Line 39
    	push	ebx
    	mov	eax, DWORD PTR ?v@?1??msws64@@9@9
    	mov	ebx, eax
    	mul	eax
    	mov	ecx, edx
    	mov	eax, ebx
    	mov	ebx, DWORD PTR ?v@?1??msws64@@9@9+4
    	mul	ebx
    	imul	ebx, ebx
    	add	eax, eax
    	adc	edx, edx
    	add	eax, ecx
    	adc	edx, ebx
    ; Line 42
    	mov	ecx, DWORD PTR ?w@?1??msws64@@9@9
    	mov	ebx, DWORD PTR ?w@?1??msws64@@9@9+4
    	add	ecx, 2135587861				; 7f4a7c15H
    	adc	ebx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?w@?1??msws64@@9@9, ecx
    	mov	DWORD PTR ?w@?1??msws64@@9@9+4, ebx
    ; Line 44
    	add	eax, ecx
    	adc	edx, ebx
    ; Line 45
    	bswap	eax
    	bswap	edx
    	xchg	eax, edx
    	mov	DWORD PTR ?v@?1??msws64@@9@9, eax
    	mov	DWORD PTR ?v@?1??msws64@@9@9+4, edx
    	pop	ebx
    ; Line 47
    	ret	0
    _msws64	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _ul$ = -4						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 51
    	push	ecx
    	push	0
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mul	eax
    ; Line 51
    	push	esi
    ; Line 8
    	mov	esi, DWORD PTR ?w@?1??msws32@@9@9
    	mov	ecx, DWORD PTR ?w@?1??msws32@@9@9
    ; Line 9
    	shrd	eax, edx, 16
    	sub	esi, 1640531527				; 61c88647H
    	add	ecx, -1640531527			; 9e3779b9H
    	add	eax, esi
    	add	eax, ecx
    	mov	DWORD PTR ?w@?1??msws32@@9@9, esi
    	mov	DWORD PTR ?w@?1??msws32@@9@9, ecx
    ; Line 11
    	bswap	eax
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    	shr	edx, 16					; 00000010H
    ; Line 55
    	mov	DWORD PTR _ul$[esp+8], eax
    	mov	DWORD PTR _ul$[esp+4], eax
    ; Line 57
    	xor	eax, eax
    	pop	esi
    	pop	ecx
    	pop	eax
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function msws32() is correct, there is no reason to use and clobber register ESI instead of the volatile register ECX!
    Also notice the superfluous SHR instruction: its result is never used.

    While the code generated for the function mswsbw() is correct, an optimising compiler should not emit 7 instructions to call an external routine for squaring a 64-bit value, but emit the 6 instructions which can perform this operation inline!
    Also notice the superfluous XOR and OR instructions generated for line 26.

    While the code generated for the function msws64() is correct too, it has 39 instructions and clobbers all registers, but still performs multiple avoidable transfers between them; the optimal code has only 28 instructions and clobbers just 1 register!
    Especially notice the weird way to move the contents of register EBP into register EDI in lines 43 and 44, using two SHLD plus a SHL instruction.

    While the code generated for the function main() is correct, there is no reason to use and clobber register ESI instead of the volatile register ECX!
    Again notice the superfluous SHR instruction: its result is never used.

  4. Generate another assembly listing file case18.asm from the source file case18.c created in step 1., now using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase18.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case18.c
    case18.c(34): warning C4132: 'x': const object should be initialized
    case18.c(53): warning C4189: 'ull': local variable is initialized but not referenced
  5. Display the assembly listing file case18.asm created in step 4.:

    TYPE case18.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	msws32
    PUBLIC	mswsbw
    PUBLIC	msws64
    PUBLIC	main
    
    _BSS	SEGMENT
    ?v@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::v
    ?w@?1??msws32@@9@9 DD 01H DUP (?)			; `msws32'::`2'::w
    ?v@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::v
    ?w@?1??mswsbw@@9@9 DQ 01H DUP (?)			; `mswsbw'::`2'::w
    ?v@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::v
    ?w@?1??msws64@@9@9 DQ 01H DUP (?)			; `msws64'::`2'::w
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws32
    _TEXT	SEGMENT
    msws32	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 9
    	mov	eax, DWORD PTR ?v@?1??msws32@@9@9
    	mov	r8d, DWORD PTR ?w@?1??msws32@@9@9
    	mul	rax
    	add	r8d, -1640531527			; 9e3779b9H
    	shr	rax, 16
    	add	eax, r8d
    	mov	DWORD PTR ?w@?1??msws32@@9@9, r8d
    ; Line 11
    	bswap	eax
    	mov	DWORD PTR ?v@?1??msws32@@9@9, eax
    ; Line 14
    	ret	0
    msws32	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	mswsbw
    _TEXT	SEGMENT
    mswsbw	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 21
    	mov	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	mov	rax, 7046029254386353131		; 61c8864680b583ebH
    	sub	rcx, rax
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    ; Line 22
    	mov	rax, QWORD PTR ?v@?1??mswsbw@@9@9
    	imul	rax, rax
    	mov	QWORD PTR ?w@?1??mswsbw@@9@9, rcx
    	add	rax, rcx
    ; Line 24
    	rol	rax, 32					; 00000020H
    	mov	QWORD PTR ?v@?1??mswsbw@@9@9, rax
    ; Line 27
    	ret	0
    mswsbw	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	main
    _TEXT	SEGMENT
    x$1 = 8
    ull$ = 8
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 35
    	mov	rax, QWORD PTR ?v@?1??msws64@@9@9
    ; Line 43
    	mov	r8, 7046029254386353131			; 61c8864680b583ebH
    	mov	rcx, QWORD PTR ?w@?1??msws64@@9@9
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	mul	rax
    	sub	rcx, r8
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	shrd	rax, rdx, 32				; 00000020H
    	mov	QWORD PTR x$1[rsp], rdx
    ; Line 44
    	add	rax, rcx
    	mov	QWORD PTR ?w@?1??msws64@@9@9, rcx
    ; Line 45
    	bswap	rax
    	mov	QWORD PTR ?v@?1??msws64@@9@9, rax
    ; Line 53
    	mov	QWORD PTR ull$[rsp], rax
    ; Line 57
    	xor	eax, eax
    	ret	0
    main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	msws64
    _TEXT	SEGMENT
    msws64	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case18.c
    ; Line 35
    	mov	rax, QWORD PTR ?v@?1??msws64@@9@9
    ; Line 43
    	mov	r8, 7046029254386353131			; 61c8864680b583ebH
    	mov	rcx, QWORD PTR ?w@?1??msws64@@9@9
    	mov	rcx, -7046029254386353131		; 9e3779b97f4a7c15H
    	mul	rax
    	sub	rcx, r8
    	add	rcx, QWORD PTR ?w@?1??mswsbw@@9@9
    	shrd	rax, rdx, 32				; 00000020H
    	mov	QWORD PTR ?w@?1??msws64@@9@9, rcx
    ; Line 44
    	add	rax, rcx
    ; Line 45
    	bswap	rax
    	mov	QWORD PTR ?v@?1??msws64@@9@9, rax
    ; Line 48
    	ret	0
    msws64	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function msws64() is correct, there is no reason to clobber register R8.
    Also notice the superfluous MOV instruction to the superfluous temporary variable x$1 in the function main().

Case 19

Demonstration

  1. Create the text file case19.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long sequence(void)
    {
        static unsigned long long weyl = 0ULL;
    
        weyl += 0x9E3779B97F4A7C15ULL;
    
        return weyl ^ (weyl >> 31);
    }
  2. Generate the assembly listing file case19.asm from the source file case19.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase19.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case19.c
  3. Display the assembly listing file case19.asm created in step 4.:

    TYPE case19.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case19.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_sequence
    
    _BSS	SEGMENT
    ?weyl@?1??sequence@@9@9 DQ 01H DUP (?)			; `sequence'::`2'::weyl
    _BSS	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sequence
    _TEXT	SEGMENT
    _sequence PROC
    ; File c:\users\stefan\desktop\case19.c
    ; Line 7
    	mov	ecx, DWORD PTR ?weyl@?1??sequence@@9@9+4
    	push	esi
    	mov	esi, DWORD PTR ?weyl@?1??sequence@@9@9
    	add	esi, 2135587861				; 7f4a7c15H
    	mov	eax, esi
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9, esi
    	adc	ecx, -1640531527			; 9e3779b9H
    	mov	edx, ecx
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9+4, ecx
    	mov	eax, DWORD PTR ?weyl@?1??sequence@@9@9
    	mov	edx, DWORD PTR ?weyl@?1??sequence@@9@9+4
    	add	eax, 2135587861				; 7f4a7c15H
    	adc	edx, -1640531527			; 9e3779b9H
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9, eax
    	mov	DWORD PTR ?weyl@?1??sequence@@9@9+4, edx
    	mov	ecx, eax
    	shrd	eax, edx, 31
    	xor	eax, ecx
    	mov	ecx, edx
    	shr	edx, 31					; 0000001fH
    ; Line 9
    	xor	eax, esi
    	xor	edx, ecx
    	pop	esi
    ; Line 10
    	ret	0
    _sequence ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, it clobbers register ESI without necessity.

Case 20

Demonstration

  1. Create the text file case20.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #ifdef _WIN64
    unsigned long long nearlydivisionless(unsigned long long range,
                                          unsigned long long (*random64)(void))
    {
        unsigned long long value = random64();
        unsigned long long limit;
        unsigned long long high;
        unsigned long long low = _umul128(value, range, &high);
    
        if (low < range)
            for (limit = (0 - range) % range;
                 low < limit;
                 low = _umul128(value, range, &high))
                value = random64();
    
        return high;
    }
    #else
    unsigned long nearlydivisionless(unsigned long range,
                                     unsigned long (*random32)(void))
    {
        unsigned long      value = random32();
        unsigned long      limit;
        unsigned long long multi = __emulu(value, range);
    
        if (range > (unsigned long) multi)
            for (limit = (0 - range) % range;
                 limit > (unsigned long) multi;
                 multi = __emulu(value, range))
                value = random32();
    
        return multi >> 32;
    }
    #endif
    Note: the function nearlydivisionless() returns a uniform distributed (pseudo-random) value in the interval [0, range); for the discussion of the algorithm see Daniel Lemire’s blog post Fast Bounded Random Numbers on GPUs.
  2. Generate the assembly listing file case20.asm from the source file case20.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase20.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case20.c
  3. Display the assembly listing file case20.asm created in step 2.:

    TYPE case20.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	nearlydivisionless
    …
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	nearlydivisionless
    _TEXT	SEGMENT
    range$ = 48
    random64$ = 56
    nearlydivisionless PROC					; COMDAT
    ; File c:\users\stefan\desktop\case20.c
    ; Line 6
    $LN15:
    	mov	QWORD PTR [rsp+16], rbx
    	push	rsi
    	sub	rsp, 32					; 00000020H
    	sub	rsp, 40					; 00000028H
    	mov	rsi, rdx
    	mov	r11, rdx
    	mov	rbx, rcx
    	mov	r10, rcx
    ; Line 7
    	call	rsi
    	call	r11
    	mov	r8, rax
    ; Line 10
    	mov	rax, rbx
    	mov	rax, r10
    	mul	r8
    	mov	rcx, rdx
    	mov	r8, rax
    ; Line 12
    	cmp	rax, rbx
    	cmp	rax, r10
    	jae	SHORT $LN12@nearlydivi
    ; Line 13
    	xor	edx, edx
    	mov	QWORD PTR [rsp+48], rdi
    	mov	rax, rbx
    	mov	rax, r10
    	neg	rax
    	div	rbx
    	div	r10
    	mov	rdi, rdx
    	mov	r9, rdx
    ; Line 14
    	cmp	r8, rdx
    	jae	SHORT $LN11@nearlydivi
    	npad	2
    $LL4@nearlydivi:
    ; Line 16
    	call	rsi
    	call	r11
    	mov	rcx, rax
    	mov	rax, rbx
    	mov	rax, r10
    	mul	rcx
    	cmp	rax, rdi
    	cmp	rax, r9
    	jb	SHORT $LL4@nearlydivi
    ; Line 18
    	mov	rdi, QWORD PTR [rsp+48]
    	mov	rax, rdx
    ; Line 19
    	mov	rbx, QWORD PTR [rsp+56]
    	add	rsp, 40					; 00000028H
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    $LN11@nearlydivi:
    	mov	rdi, QWORD PTR [rsp+48]
    ; Line 18
    	mov	rax, rcx
    ; Line 19
    	mov	rbx, QWORD PTR [rsp+56]
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    $LN12@nearlydivi:
    	mov	rbx, QWORD PTR [rsp+56]
    	mov	rax, rcx
    	add	rsp, 40					; 00000028H
    	add	rsp, 32					; 00000020H
    	pop	rsi
    	ret	0
    nearlydivisionless ENDP
    _TEXT	ENDS
    END
    Instead to use the volatile registers R9, R10 and R11, the generated code clobbers the registers RBX, RDI and RSI without necessity, and uses 11 (in words: eleven) superfluous instructions to save and restore them.
    Additionally notice the 5 instructions emitted for lines 18 and 19 before the label $LN12@nearlydivi:, and the same 5 instructions emitted again immediately after that label: 14 (in words: fourteen) from a total of 45 instructions are superfluous!
  4. Generate another assembly listing file case20.asm from the source file case20.c created in step 1., now using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase20.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case20.c
  5. Display the assembly listing file case20.asm created in step 4.:

    TYPE case20.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case20.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_nearlydivisionless
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_nearlydivisionless
    _TEXT	SEGMENT
    _range$ = 8						; size = 4
    _random32$ = 12						; size = 4
    _nearlydivisionless PROC				; COMDAT
    ; File c:\users\stefan\desktop\case20.c
    ; Line 23
    	push	ebx
    ; Line 24
    	mov	ebx, DWORD PTR _random32$[esp]
    	push	ebp
    	push	esi
    	call	ebx
    ; Line 26
    	mov	esi, DWORD PTR _range$[esp+8]
    	mul	esi
    	mov	ebp, eax
    	mov	ecx, edx
    ; Line 28
    	cmp	esi, ebp
    	jbe	SHORT $LN12@nearlydivi
    ; Line 29
    	mov	eax, esi
    	xor	edx, edx
    	neg	eax
    	div	esi
    	push	edi
    	mov	edi, edx
    ; Line 30
    	cmp	edi, ebp
    	jbe	SHORT $LN11@nearlydivi
    $LL4@nearlydivi:
    ; Line 32
    	call	ebx
    	mul	esi
    	cmp	edi, eax
    	ja	SHORT $LL4@nearlydivi
    ; Line 35
    	pop	edi
    	pop	esi
    	pop	ebp
    	mov	eax, edx
    	pop	ebx
    	ret	0
    $LN11@nearlydivi:
    	pop	edi
    	pop	esi
    	pop	ebp
    	mov	eax, ecx
    	pop	ebx
    	ret	0
    $LN12@nearlydivi:
    	pop	esi
    	pop	ebp
    	mov	eax, ecx
    	pop	ebx
    	ret	0
    _nearlydivisionless ENDP
    _TEXT	ENDS
    END

Case 21

Demonstration

  1. Create the text file case21.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int islarge(unsigned long long ull)
    {
    #ifndef ALTERNATE
        return ull > 0xFFFFFFFFULL;
    #else
        return (unsigned long) (ull >> 32) != 0UL;
    #endif
    }
    
    int overflow(unsigned long multiplicand, unsigned long multiplier)
    {
    #ifndef ALTERNATE
        return __emulu(multiplicand, multiplier) > 0xFFFFFFFFULL;
    #else
        return (unsigned long) (__emulu(multiplicand, multiplier) >> 32) != 0UL;
    #endif
    }
  2. Generate the assembly listing file case21.asm from the source file case21.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase21.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case21.c
  3. Display the assembly listing file case21.asm created in step 2.:

    TYPE case21.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case21.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_islarge
    PUBLIC	_overflow
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_islarge
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _islarge PROC						; COMDAT
    ; File c:\users\stefan\desktop\case21.c
    ; Line 6
    	cmp	DWORD PTR _ull$[esp], 0
    	ja	SHORT $LN5@islarge
    	cmp	DWORD PTR _ull$[esp-4], -1
    	ja	SHORT $LN5@islarge
    	xor	eax, eax
    	cmp	eax, DWORD PTR _ull$[esp]
    	setne	al
    ; Line 10
    	ret	0
    $LN5@islarge:
    ; Line 6
    	mov	eax, 1
    ; Line 10
    	ret	0
    _islarge ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_overflow
    _TEXT	SEGMENT
    _multiplicand$ = 8					; size = 4
    _multiplier$ = 12					; size = 4
    _overflow PROC						; COMDAT
    ; File c:\users\stefan\desktop\case21.c
    ; Line 15
    	mov	eax, DWORD PTR _multiplicand$[esp-4]
    	mul	DWORD PTR _multiplier$[esp-4]
    IFDEF VARIANT
    	seto	al
    	movzx	eax, al
    ELSE
    	test	edx, edx
    	setnz	al
    	movzx	eax, al
    	jne	SHORT $LN5@overflow
    	cmp	eax, -1
    	ja	SHORT $LN5@overflow
    	xor	eax, eax
    ; Line 19
    	ret	0
    $LN5@overflow:
    ; Line 15
    	mov	eax, 1
    ENDIF
    ; Line 19
    	ret	0
    _overflow ENDP
    _TEXT	ENDS
    END
    The compiler generates a superfluous CMP instruction and two superfluous performance degrading conditional branch instructions!

    Note: the assembly listing also shows an alternative variant.

  4. Repeat the previous steps with the alternate implementation; generate another assembly listing file case21.asm from the source file case21.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /GS- /Gy /Ox /Tccase21.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case21.c
  5. Display the assembly listing file case21.asm created in step 4.:

    TYPE case21.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case21.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_islarge
    PUBLIC	_overflow
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_islarge
    _TEXT	SEGMENT
    _ull$ = 8						; size = 8
    _islarge PROC						; COMDAT
    ; File c:\users\stefan\desktop\case21.c
    ; Line 8
    	mov	eax, DWORD PTR _ull$[esp]
    	neg	eax
    	sbb	eax, eax
    	neg	eax
    ; Line 10
    	ret	0
    _islarge ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_overflow
    _TEXT	SEGMENT
    _multiplicand$ = 8					; size = 4
    _multiplier$ = 12					; size = 4
    _overflow PROC						; COMDAT
    ; File c:\users\stefan\desktop\case21.c
    ; Line 17
    	mov	eax, DWORD PTR _multiplicand$[esp-4]
    	mul	DWORD PTR _multiplier$[esp-4]
    	neg	edx
    	sbb	edx, edx
    	neg	edx
    	mov	eax, edx
    	sbb	eax, eax
    	neg	eax
    ; Line 19
    	ret	0
    _overflow ENDP
    _TEXT	ENDS
    END
    The code generated for the alternative implementation of the function islarge() is good.
    The code generated for the alternative implementation of the function overflow is better, but still quite bad the: MUL instruction already sets the carry flag CF (and the overflow flag OF too), so there is no need for the first NEG instruction, and the SBB instruction should of course set register EAX instead of EDX!

Case 22

Demonstration

  1. Create the text file case22.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long ullmul(unsigned long long p, unsigned long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
    
    long long llmul(long long p, long long q)
    {
    #ifdef OPTIMIZE
        if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
            return __emulu((unsigned long) p, (unsigned long) q);
    
        if (((unsigned long) p | (unsigned long) q) == 0)
            return 0;
    #endif
        return __emulu((unsigned long) p, (unsigned long) q)
             + ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
             + ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
    }
  2. Generate the assembly listing file case22.asm from the source file case22.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase22.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case22.c
  3. Display the assembly listing file case22.asm created in step 2.:

    TYPE case22.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case22.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case22.c
    ; Line 12
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _q$[esp+4]
    	imul	esi, DWORD PTR _p$[esp]
    	add	esi, ecx
    	add	eax, 0
    	adc	edx, esi
    	pop	esi
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case22.c
    ; Line 26
    	mov	eax, DWORD PTR _p$[esp-4]
    	mul	DWORD PTR _q$[esp-4]
    	push	ebp
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+8]
    	mov	ebp, eax
    	mov	ecx, edi
    	imul	edi, DWORD PTR _p$[esp+4]
    	sar	ecx, 31					; 0000001fH
    	mov	ecx, DWORD PTR _p$[esp+8]
    	mov	eax, ecx
    	imul	ecx, DWORD PTR _q$[esp+4]
    	sar	eax, 31					; 0000001fH
    	add	edi, ecx
    	add	ebp, 0
    	mov	eax, ebp
    	adc	edx, edi
    	pop	edi
    	pop	ebp
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _q$[esp]
    	imul	ecx, DWORD PTR _p$[esp-4]
    	add	edx, ecx
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    Especially notice the superfluous arithmetic right shifts by 31 generated for the llmul() routine, and the preceding loads of the registers ECX and EAX: their results are never used!
    The other highlight is the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.
  4. Generate another assembly listing file case22.asm from the source file case22.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro OPTIMIZE defined on the command line:

    CL.EXE /Bv /c /DOPTIMIZE /Fa /FoNUL: /Gy /Ox /Tccase22.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case22.c
  5. Display the assembly listing file case22.asm created in step 4.:

    TYPE case22.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case22.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ullmul
    PUBLIC	_llmul
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullmul
    _TEXT	SEGMENT
    tv261 = 8						; size = 8
    tv252 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _ullmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case22.c
    ; Line 4
    	push	esi
    	mov	esi, DWORD PTR _q$[esp]
    	push	edi
    	mov	edi, DWORD PTR _p$[esp+4]
    ; Line 6
    	mov	eax, edi
    	or	eax, esi
    	jne	SHORT $LN2@ullmul
    ; Line 7
    	pop	edi
    	xor	edx, edx
    ; Line 15
    	pop	esi
    	ret	0
    $LN2@ullmul:
    	push	ebx
    ; Line 9
    	mov	ebx, DWORD PTR _q$[esp+12]
    	mov	eax, edi
    	push	ebp
    	mov	ebp, DWORD PTR _p$[esp+16]
    	mov	ecx, ebp
    	or	ecx, ebx
    	mov	DWORD PTR tv252[esp+16], 0
    	mov	DWORD PTR tv261[esp+16], 0
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	pop	ebp
    	pop	ebx
    	pop	edi
    	mul	esi
    ; Line 15
    	pop	esi
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ebx, edi
    	imul	ebp, esi
    	mul	esi
    	add	ebx, ebp
    	add	eax, 0
    	pop	ebp
    	adc	edx, ebx
    	pop	ebx
    	pop	edi
    ; Line 15
    	pop	esi
    ; Line 6
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN2@ullmul
    ; Line 9
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN3@ullmul
    ; Line 10
    	mul	DWORD PTR _q$[esp-4]
    $LN2@ullmul:
    ; Line 15
    	ret	0
    $LN3@ullmul:
    ; Line 12
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    ; Line 15
    	ret	0
    _ullmul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llmul
    _TEXT	SEGMENT
    tv249 = 8						; size = 8
    tv240 = 8						; size = 8
    _p$ = 8							; size = 8
    _q$ = 16						; size = 8
    _llmul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case22.c
    ; Line 18
    	sub	esp, 8
    	push	ebx
    ; Line 20
    	mov	ebx, DWORD PTR _p$[esp+12]
    	mov	eax, ebx
    	sar	eax, 31					; 0000001fH
    	mov	ecx, ebx
    	push	ebp
    	mov	ebp, DWORD PTR _q$[esp+16]
    	mov	DWORD PTR tv240[esp+20], eax
    	mov	eax, ebp
    	sar	eax, 31					; 0000001fH
    	or	ecx, ebp
    	push	esi
    	mov	esi, DWORD PTR _p$[esp+16]
    	push	edi
    	mov	edi, DWORD PTR _q$[esp+20]
    	mov	DWORD PTR tv249[esp+28], eax
    	mov	eax, esi
    	jne	SHORT $LN2@llmul
    ; Line 21
    	mul	edi
    	pop	edi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN2@llmul:
    ; Line 23
    	or	eax, edi
    	jne	SHORT $LN3@llmul
    ; Line 29
    	pop	edi
    	pop	esi
    	pop	ebp
    	xor	edx, edx
    	pop	ebx
    	add	esp, 8
    	ret	0
    $LN3@llmul:
    ; Line 26
    	mov	eax, esi
    	imul	esi, ebp
    	mul	edi
    	imul	edi, ebx
    	add	esi, edi
    	add	eax, 0
    	pop	edi
    	adc	edx, esi
    ; Line 29
    	pop	esi
    	pop	ebp
    	pop	ebx
    	add	esp, 8
    ; Line 20
    	mov	ecx, DWORD PTR _q$[esp]
    	mov	edx, DWORD PTR _p$[esp]
    	or	edx, ecx
    	jne	SHORT $LN2@ullmul
    ; Line 21
    	mul	DWORD PTR _q$[esp-4]
    ; Line 29
    	ret	0
    $LN2@llmul:
    ; Line 23
    	mov	eax, DWORD PTR _p$[esp-4]
    	mov	edx, DWORD PTR _q$[esp-4]
    	or	edx, eax
    	je	SHORT $LN3@llmul
    ; Line 26
    	imul	ecx, eax
    	mul	DWORD PTR _q$[esp-4]
    	add	edx, ecx
    	mov	ecx, DWORD PTR _p$[esp]
    	imul	ecx, DWORD PTR _q$[esp-4]
    	add	edx, ecx
    $LN3@llmul:
    ; Line 29
    	ret	0
    _llmul	ENDP
    _TEXT	ENDS
    END
    Instead to load the low parts of both arguments into the registers EAX and EDX (which return the result) and test their logical or for 0, the registers ESI and EDI are clobbered, which both must be saved and restored.
    In both routines superfluous temporary variables tv252 and tv261 respectively tv240 and tv249 are allocated and values assigned to them, which are but never used elsewhere – an advanced technique known as WORN!
    Again notice the superfluous arithmetic right shifts by 31 generated for the llmul() routine: their results are assigned to the (otherwise unused) temporary variables.
    The other highlight is again the addition of 0, which can’t set the carry flag CF, followed by an addition with carry ADC instruction, which adds this flag.

Case 23

Demonstration

  1. Create the text file case23.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long long __udivmoddi4(unsigned long long numerator,
                                    unsigned long long denominator,
                                    unsigned long long *remainder);
    
    #ifndef ALTERNATE
    long long __absdi2(long long argument)
    {
        long long s = argument >> 63;   // s = argument < 0 ? -1 : 0
        return (argument ^ s) - s;      // negate if argument < 0
    }
    
    long long __divdi3(long long dividend, long long divisor)
    {
        long long r = divisor >> 63;    // r = divisor < 0 ? -1 : 0
        long long s = dividend >> 63;   // s = dividend < 0 ? -1 : 0
        divisor = (divisor ^ r) - r;    // negate if divisor < 0
        dividend = (dividend ^ s) - s;  // negate if dividend < 0
        s ^= r;                         // sign of quotient
                                        // negate if quotient < 0
        return (__udivmoddi4(dividend, divisor, 0) ^ s) - s;
    }
    
    long long __moddi3(long long dividend, long long divisor)
    {
        long long r = divisor >> 63;    // r = divisor < 0 ? -1 : 0
        long long s = dividend >> 63;   // s = dividend < 0 ? -1 : 0
        divisor = (divisor ^ r) - r;    // negate if divisor < 0
        dividend = (dividend ^ s) - s;  // negate if dividend < 0
        __udivmoddi4(dividend, divisor, &r);
        return (r ^ s) - s;             // negate if dividend < 0
    }
    #else
    typedef union _large
    {
        long long ll;
        unsigned long long ull;
        struct
        {
            unsigned long low;
            long high;
        };
    } LARGE;
    
    long long __absdi2(long long argument)
    {
        LARGE value = {argument};
        long long s = (long long) value.high >> 32;
        return (value.ll ^ s) - s;
    }
    
    long long __divdi3(long long numerator, long long denominator)
    {
        LARGE divisor = {denominator};
        LARGE dividend = {numerator};
        long long r = (long long) divisor.high >> 32;
        long long s = (long long) dividend.high >> 32;
        divisor.ll = (divisor.ll ^ r) - r;
        dividend.ll = (dividend.ll ^ s) - s;
        s ^= r;
        return (__udivmoddi4(dividend.ull, divisor.ull, 0) ^ s) - s;
    }
    
    long long __moddi3(long long numerator, long long denominator)
    {
        LARGE divisor = {denominator};
        LARGE dividend = {numerator};
        LARGE remainder;
        long long r = (long long) divisor.high >> 32;
        long long s = (long long) dividend.high >> 32;
        divisor.ll = (divisor.ll ^ r) - r;
        dividend.ll = (dividend.ll ^ s) - s;
        __udivmoddi4(dividend.ull, divisor.ull, &remainder.ull);
        return (remainder.ll ^ s) - s;
    }
    
    long long __muldi3(long long multiplicand, long long multiplier)
    {
        LARGE p = {multiplicand};
        LARGE q = {multiplier};
        LARGE product = {__emulu(p.low, q.low)};
        product.high += p.low * q.high;
        product.high += q.low * p.high;
        return product.ll;
    }
    #endif
  2. Generate the assembly listing file case23.asm from the source file case23.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase23.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case23.c
  3. Display the assembly listing file case23.asm created in step 2.:

    TYPE case23.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case23.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___absdi2
    PUBLIC	___divdi3
    PUBLIC	___moddi3
    EXTRN	___udivmoddi4:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___absdi2
    _TEXT	SEGMENT
    _argument$ = 8						; size = 8
    ___absdi2 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 9
    	push	esi
    	push	edi
    ; Line 10
    	mov	edi, DWORD PTR _argument$[esp+8]
    	mov	esi, edi
    	sar	esi, 31					; 0000001fH
    	mov	ecx, edi
    	mov	edx, DWORD PTR _argument$[esp]
    	mov	eax, DWORD PTR _argument$[esp-4]
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    ; Line 11
    	xor	eax, ecx
    	xor	edx, ecx
    	sub	eax, ecx
    	sbb	edx, ecx
    	mov	eax, esi
    	xor	eax, DWORD PTR _argument$[esp+4]
    	mov	edx, ecx
    	xor	edx, edi
    	sub	eax, esi
    	pop	edi
    	sbb	edx, ecx
    	pop	esi
    ; Line 12
    	ret	0
    ___absdi2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___divdi3
    _TEXT	SEGMENT
    _s$1$ = -4						; size = 4
    _s$2$ = 8						; size = 4
    _dividend$ = 8						; size = 8
    _divisor$ = 16						; size = 8
    ___divdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 15
    	push	ecx
    ; Line 17
    	mov	eax, DWORD PTR _dividend$[esp+4]
    	mov	edx, eax
    	push	ebx
    	push	ebp
    	mov	ebp, DWORD PTR _divisor$[esp+12]
    	mov	ecx, eax
    	push	esi
    	sar	edx, 31					; 0000001fH
    	mov	ebx, ebp
    	sar	ecx, 31					; 0000001fH
    ; Line 19
    	mov	esi, edx
    	xor	esi, DWORD PTR _dividend$[esp+12]
    	push	edi
    	mov	DWORD PTR _s$1$[esp+20], edx
    	mov	edi, ebp
    	mov	edx, ecx
    	sar	edi, 31					; 0000001fH
    	xor	edx, eax
    	sar	ebx, 31					; 0000001fH
    	mov	eax, DWORD PTR _s$1$[esp+20]
    	sub	esi, eax
    	mov	eax, DWORD PTR _divisor$[esp+4]
    	mov	ecx, DWORD PTR _divisor$[esp]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	ebx, edx
    ; Line 22
    	push	0
    	sbb	edx, ecx
    	xor	eax, ebx
    	xor	ecx, edi
    	mov	DWORD PTR _s$1$[esp+24], eax
    	mov	DWORD PTR _s$2$[esp+20], ecx
    	mov	eax, edi
    	mov	ecx, ebx
    	xor	eax, ebp
    	xor	ecx, DWORD PTR _divisor$[esp+20]
    	sub	ecx, ebx
    	sbb	eax, edi
    	push	eax
    	push	ecx
    	push	edx
    	push	esi
    	mov	eax, DWORD PTR _dividend$[esp+16]
    	mov	ecx, DWORD PTR _dividend$[esp+12]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	push	eax
    	push	ecx
    	xor	ebx, edx
    	call	___udivmoddi4
    	xor	eax, ebx
    	xor	edx, ebx
    	sub	eax, ebx
    	sbb	edx, ebx
    	xor	eax, DWORD PTR _s$1$[esp+40]
    	add	esp, 20					; 00000014H
    	xor	edx, DWORD PTR _s$2$[esp+16]
    	sub	eax, DWORD PTR _s$1$[esp+20]
    	sbb	edx, DWORD PTR _s$2$[esp+16]
    	pop	edi
    	pop	esi
    	pop	ebp
    	pop	ebx
    ; Line 23
    	pop	ecx
    	ret	0
    ___divdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___moddi3
    _TEXT	SEGMENT
    _s$1$ = -12						; size = 4
    _r$ = -8						; size = 8
    _dividend$ = 8						; size = 8
    _divisor$ = 16						; size = 8
    ___moddi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 26
    	sub	esp, 12					; 0000000cH
    	sub	esp, 8
    ; Line 28
    	mov	edx, DWORD PTR _dividend$[esp+12]
    	mov	eax, edx
    	push	ebx
    	mov	ebx, DWORD PTR _divisor$[esp+16]
    	push	ebp
    	push	esi
    	sar	eax, 31					; 0000001fH
    	mov	esi, ebx
    	push	edi
    	mov	DWORD PTR _s$1$[esp+28], eax
    	mov	edi, ebx
    	sar	esi, 31					; 0000001fH
    ; Line 31
    	lea	eax, DWORD PTR _r$[esp+28]
    	lea	eax, DWORD PTR _r$[esp+12]
    	push	eax
    	sar	edi, 31					; 0000001fH
    	mov	ebp, edx
    	sar	ebp, 31					; 0000001fH
    	mov	ecx, edi
    	xor	ecx, DWORD PTR _divisor$[esp+28]
    	mov	eax, esi
    	xor	eax, ebx
    	mov	DWORD PTR _r$[esp+36], esi
    	sub	ecx, edi
    	mov	DWORD PTR _r$[esp+32], edi
    	sbb	eax, esi
    	mov	esi, DWORD PTR _s$1$[esp+32]
    	mov	eax, DWORD PTR _divisor$[esp+12]
    	mov	ecx, DWORD PTR _divisor$[esp+8]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	push	eax
    	push	ecx
    	mov	ecx, esi
    	mov	eax, ebp
    	xor	ecx, DWORD PTR _dividend$[esp+36]
    	xor	eax, edx
    	sub	ecx, esi
    	sbb	eax, ebp
    	mov	eax, DWORD PTR _dividend$[esp+12]
    	mov	ecx, DWORD PTR _dividend$[esp+8]
    	cdq
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	ebx, edx
    	push	eax
    	push	ecx
    	call	___udivmoddi4
    	add	esp, 20					; 00000014H
    ; Line 32
    	mov	eax, esi
    	mov	eax, ebx
    	xor	eax, DWORD PTR _r$[esp+28]
    	xor	eax, DWORD PTR _r$[esp+12]
    	mov	edx, ebp
    	mov	edx, ebx
    	xor	edx, DWORD PTR _r$[esp+32]
    	xor	edx, DWORD PTR _r$[esp+16]
    	sub	eax, esi
    	sub	eax, ebx
    	pop	edi
    	pop	esi
    	sbb	edx, ebp
    	sbb	edx, ebx
    	pop	ebp
    	pop	ebx
    ; Line 33
    	add	esp, 12					; 0000000cH
    	add	esp, 8
    	ret	0
    ___moddi3 ENDP
    _TEXT	ENDS
    END
    While the code generated for the function __absdi2() is correct, it has 16 instructions and uses the registers EDI and ESI without necessity; the properly optimised code has only 9 instructions and clobbers no registers.
    Especially notice the 2 arithmetic right shifts by 31: they are performed on the same value, but in different registers!

    While the code generated for the function __divdi3() is correct, it has 50 instructions, clobbers all registers, but still performs multiple avoidable transfers between them, and additionally uses two superfluous temporary variables _s$1$ and _s$2$, which hold even the same value; the optimal code has only 30 instructions and clobbers just 1 register!
    Especially notice the repeated arithmetic right shifts by 31: half of them are superfluous; this includes the instructions to load the registers used too.

    While the code generated for the function __moddi3() is correct too, it has 51 instructions, clobbers all registers, but still performs multiple avoidable transfers between them, and additionally uses a superfluous temporary variable _s$1$; the properly optimised code has only 34 instructions and clobbers just 1 register!
    Again notice the repeated arithmetic right shifts by 31: half of them are superfluous; this includes the instructions to load the registers used too.

  4. Repeat the previous steps with the alternate implementation; generate another assembly listing file case23.asm from the source file case23.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase23.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case23.c
    case23.c(43): warning C4201: nonstandard extension used: nameless struct/union
    case23.c(48): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(55): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(56): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(67): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(68): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(80): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(81): warning C4204: nonstandard extension used: non-constant aggregate initializer
    case23.c(82): warning C4204: nonstandard extension used: non-constant aggregate initializer
  5. Display the assembly listing file case23.asm created in step 4.:

    TYPE case23.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case23.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	___absdi2
    PUBLIC	___divdi3
    PUBLIC	___moddi3
    PUBLIC	___muldi3
    EXTRN	___udivmoddi4:PROC
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___absdi2
    _TEXT	SEGMENT
    _argument$ = 8						; size = 8
    ___absdi2 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 48
    	mov	eax, DWORD PTR _argument$[esp]
    	cdq
    ; Line 49
    	mov	ecx, edx
    	push	esi
    	mov	esi, ecx
    	mov	eax, ecx
    	xor	eax, DWORD PTR _argument$[esp]
    	sar	esi, 31					; 0000001fH
    	mov	edx, esi
    	xor	edx, DWORD PTR _argument$[esp+4]
    ; Line 50
    	sub	eax, ecx
    	sbb	edx, esi
    	pop	esi
    	mov	ecx, DWORD PTR _argument$[esp-4]
    	xor	ecx, edx
    	xor	eax, edx
    	sub	ecx, edx
    	sbb	eax, edx
    	mov	edx, eax
    	mov	eax, ecx
    ; Line 51
    	ret	0
    ___absdi2 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___divdi3
    _TEXT	SEGMENT
    _s$2$ = -4						; size = 4
    _s$1$ = 8						; size = 4
    _numerator$ = 8						; size = 8
    _denominator$ = 16					; size = 8
    ___divdi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 54
    	push	ecx
    ; Line 56
    	mov	eax, DWORD PTR _denominator$[esp+4]
    	cdq
    ; Line 57
    	mov	eax, DWORD PTR _numerator$[esp+4]
    	push	ebx
    	push	ebp
    	mov	ebx, edx
    	cdq
    	push	esi
    	push	edi
    ; Line 58
    	mov	eax, edx
    	mov	ebp, ebx
    	sar	edx, 31					; 0000001fH
    	mov	edi, eax
    	xor	edi, DWORD PTR _numerator$[esp+16]
    	mov	esi, edx
    	xor	esi, DWORD PTR _numerator$[esp+20]
    ; Line 62
    	mov	ecx, ebx
    	sar	ebp, 31					; 0000001fH
    	sub	edi, eax
    	push	0
    	sbb	esi, edx
    	xor	ecx, DWORD PTR _denominator$[esp+20]
    	xor	eax, ebx
    	xor	edx, ebp
    	mov	DWORD PTR _s$1$[esp+20], eax
    	mov	eax, ebp
    	xor	eax, DWORD PTR _denominator$[esp+24]
    	sub	ecx, ebx
    	mov	DWORD PTR _s$2$[esp+24], edx
    	sbb	eax, ebp
    	push	eax
    	push	ecx
    	push	esi
    	push	edi
    	call	___udivmoddi4
    	xor	eax, DWORD PTR _s$1$[esp+36]
    	add	esp, 20					; 00000014H
    	xor	edx, DWORD PTR _s$2$[esp+20]
    	sub	eax, DWORD PTR _s$1$[esp+16]
    	sbb	edx, DWORD PTR _s$2$[esp+20]
    	pop	edi
    	pop	esi
    	pop	ebp
    	pop	ebx
    ; Line 63
    	pop	ecx
    	ret	0
    ___divdi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___moddi3
    _TEXT	SEGMENT
    _s$2$ = -16						; size = 4
    _s$1$ = -12						; size = 4
    _remainder$ = -8					; size = 8
    _numerator$ = 8						; size = 8
    _denominator$ = 16					; size = 8
    ___moddi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 66
    	sub	esp, 16					; 00000010H
    ; Line 68
    	mov	eax, DWORD PTR _denominator$[esp+16]
    	cdq
    ; Line 70
    	mov	eax, DWORD PTR _numerator$[esp+16]
    	push	esi
    	mov	esi, edx
    	cdq
    	push	edi
    ; Line 71
    	mov	eax, edx
    	mov	DWORD PTR _s$1$[esp+24], edx
    	sar	eax, 31					; 0000001fH
    	mov	edi, esi
    	mov	DWORD PTR _s$2$[esp+24], eax
    ; Line 74
    	mov	ecx, esi
    	xor	ecx, DWORD PTR _denominator$[esp+20]
    	lea	eax, DWORD PTR _remainder$[esp+24]
    	push	eax
    	sar	edi, 31					; 0000001fH
    	mov	eax, edi
    	xor	eax, DWORD PTR _denominator$[esp+28]
    	sub	ecx, esi
    	mov	esi, DWORD PTR _s$2$[esp+28]
    	mov	esi, DWORD PTR _s$1$[esp+28]
    	sbb	eax, edi
    	push	eax
    	push	ecx
    	mov	ecx, edx
    	mov	eax, esi
    	xor	ecx, DWORD PTR _numerator$[esp+32]
    	xor	eax, DWORD PTR _numerator$[esp+36]
    	sub	ecx, edx
    	sbb	eax, esi
    	push	eax
    	push	ecx
    	call	___udivmoddi4
    ; Line 75
    	mov	eax, DWORD PTR _remainder$[esp+44]
    	add	esp, 20					; 00000014H
    	xor	eax, DWORD PTR _s$1$[esp+24]
    	mov	edx, DWORD PTR _remainder$[esp+28]
    	xor	edx, esi
    	sub	eax, DWORD PTR _s$1$[esp+24]
    	pop	edi
    	sbb	edx, esi
    	pop	esi
    ; Line 76
    	add	esp, 16					; 00000010H
    	ret	0
    ___moddi3 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	___muldi3
    _TEXT	SEGMENT
    _multiplicand$ = 8					; size = 8
    _multiplier$ = 16					; size = 8
    ___muldi3 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case23.c
    ; Line 82
    	mov	eax, DWORD PTR _multiplicand$[esp-4]
    	mul	DWORD PTR _multiplier$[esp-4]
    	mov	ecx, DWORD PTR _multiplier$[esp]
    ; Line 84
    	imul	ecx, DWORD PTR _multiplicand$[esp-4]
    	push	esi
    	mov	esi, DWORD PTR _multiplicand$[esp+4]
    	imul	esi, DWORD PTR _multiplier$[esp]
    	add	edx, ecx
    	add	edx, esi
    ; Line 85
    	pop	esi
    ; Line 86
    	ret	0
    ___muldi3 ENDP
    _TEXT	ENDS
    END
    The code generated for the alternate implementation, which is supposed to prod and tickle the optimiser, is only marginally better: again all registers are clobbered, superfluous temporary variables which hold the same value are used, and superfluous arithmetic right shifts by 31 are emitted.

Case 24

Demonstration

  1. Create the text file case24.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int variant0(long long x, long long y)
    {
        return (x ^ y) < 0;
    }
    
    int variant1(long long x, long long y)
    {
        return (x < 0) != (y < 0);
    }
    
    int variant2(long long x, long long y)
    {
        return (x >> 63) != (y >> 63);
    }
    
    int variant3(long long x, long long y)
    {
        return ((long) (x >> 32) < 0) != ((long) (y >> 32) < 0);
    }
    
    int variant4(long long x, long long y)
    {
        return ((long) (x >> 32) ^ (long) (y >> 32)) < 0;
    }
    
    int variant5(long long x, long long y)
    {
        return (long) ((x ^ y) >> 32) < 0;
    }
  2. Generate the assembly listing file case24.asm from the source file case24.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase24.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case24.c
  3. Display the assembly listing file case24.asm created in step 2.:

    TYPE case24.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case24.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_variant0
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant0
    _TEXT	SEGMENT
    tv128 = 8						; size = 8
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant0 PROC						; COMDAT
    ; File c:\users\stefan\desktop\case24.c
    ; Line 5
    	xor	eax, eax
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	ecx, DWORD PTR _x$[esp]
    	xor	eax, DWORD PTR _y$[esp-4]
    	xor	ecx, DWORD PTR _y$[esp]
    	mov	DWORD PTR tv128[esp], ecx
    	jg	SHORT $LN3@variant0
    	jl	SHORT $LN5@variant0
    	test	eax, eax
    	jae	SHORT $LN3@variant0
    $LN5@variant0:
    	mov	eax, 1
    	sets	al
    ; Line 6
    	ret	0
    $LN3@variant0:
    ; Line 5
    	xor	eax, eax
    ; Line 6
    	ret	0
    _variant0 ENDP
    _TEXT	ENDS
    PUBLIC	_variant1
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant1
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant1 PROC						; COMDAT
    ; Line 10
    	cmp	DWORD PTR _x$[esp], 0
    	jg	SHORT $LN5@variant1
    	jl	SHORT $LN7@variant1
    	cmp	DWORD PTR _x$[esp-4], 0
    	jae	SHORT $LN5@variant1
    $LN7@variant1:
    	mov	ecx, 1
    	jmp	SHORT $LN6@variant1
    $LN5@variant1:
    	xor	ecx, ecx
    $LN6@variant1:
    	sets	ah
    	cmp	DWORD PTR _y$[esp], 0
    	jg	SHORT $LN3@variant1
    	jl	SHORT $LN8@variant1
    	cmp	DWORD PTR _y$[esp-4], 0
    	jae	SHORT $LN3@variant1
    $LN8@variant1:
    	mov	eax, 1
    	xor	edx, edx
    	cmp	ecx, eax
    	setne	dl
    	mov	eax, edx
    	setl	al
    	cmp	al, ah
    	setne	al
    	movzx	eax, al
    ; Line 11
    	ret	0
    $LN3@variant1:
    ; Line 10
    	xor	eax, eax
    	xor	edx, edx
    	cmp	ecx, eax
    	setne	dl
    	mov	eax, edx
    ; Line 11
    	ret	0
    _variant1 ENDP
    _TEXT	ENDS
    PUBLIC	_variant2
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant2
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant2 PROC						; COMDAT
    ; Line 15
    	mov	eax, DWORD PTR _x$[esp]
    	xor	eax, DWORD PTR _y$[esp]
    	xor	ecx, ecx
    	and	eax, -2147483648			; 80000000H
    	or	ecx, eax
    	je	SHORT $LN3@variant2
    	mov	eax, 1
    ; Line 16
    	ret	0
    $LN3@variant2:
    ; Line 15
    	xor	eax, eax
    	sets	al
    	movzx	eax, al
    ; Line 16
    	ret	0
    _variant2 ENDP
    _TEXT	ENDS
    PUBLIC	_variant3
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant3
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant3 PROC						; COMDAT
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp]
    	mov	ecx, eax
    	sar	ecx, 31					; 0000001fH
    	xor	edx, edx
    	test	eax, eax
    	mov	eax, DWORD PTR _y$[esp]
    	sets	dl
    	mov	ecx, eax
    	sar	ecx, 31					; 0000001fH
    	xor	ecx, ecx
    	test	eax, eax
    	sets	cl
    	xor	eax, eax
    	cmp	edx, ecx
    	cmp	dl, cl
    	setne	al
    ; Line 21
    	ret	0
    _variant3 ENDP
    _TEXT	ENDS
    PUBLIC	_variant4
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant4
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant4 PROC						; COMDAT
    ; Line 25
    	mov	eax, DWORD PTR _x$[esp]
    	mov	ecx, eax
    	sar	ecx, 31					; 0000001fH
    	xor	eax, DWORD PTR _y$[esp]
    	mov	ecx, DWORD PTR _y$[esp]
    	mov	edx, ecx
    	sar	edx, 31					; 0000001fH
    	xor	eax, ecx
    	mov	eax, 0
    	setl	al
    ; Line 26
    	ret	0
    _variant4 ENDP
    _TEXT	ENDS
    PUBLIC	_variant5
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_variant5
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _y$ = 16						; size = 8
    _variant5 PROC						; COMDAT
    ; Line 30
    	mov	eax, DWORD PTR _x$[esp]
    	mov	ecx, eax
    	sar	ecx, 31					; 0000001fH
    	xor	ecx, DWORD PTR _y$[esp]
    	mov	ecx, DWORD PTR _y$[esp]
    	mov	edx, ecx
    	sar	edx, 31					; 0000001fH
    	xor	eax, ecx
    	mov	eax, 0
    	setl	al
    ; Line 31
    	ret	0
    _variant5 ENDP
    _TEXT	ENDS
    END

Case 25

Demonstration

  1. Create the text file case25.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long long llsgn0(long long value)
    {
        return value < 0 ? -1 : 0;
    }
    
    int llsgn1(long long value)
    {
        return value < 0;
    }
    
    int llsgn2(long long value)
    {
        return value >> 63;
    }
    
    int llsgn3(long long value)
    {
        return (value >> 63) != 0;
    }
    
    int llsgn4(long long value)
    {
        return (value & (1LL << 63)) != 0;
    }
  2. Generate the assembly listing file case25.asm from the source file case25.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase25.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case25.c
  3. Display the assembly listing file case25.asm created in step 2.:

    TYPE case25.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case25.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_llsgn0
    PUBLIC	_llsgn1
    PUBLIC	_llsgn2
    PUBLIC	_llsgn3
    PUBLIC	_llsgn4
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn0
    _TEXT	SEGMENT
    $T1 = 8							; size = 8
    _value$ = 8						; size = 8
    _llsgn0	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case25.c
    ; Line 5
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn0
    	jl	SHORT $LN5@llsgn0
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn0
    $LN5@llsgn0:
    	or	eax, -1
    	or	edx, eax
    ; Line 6
    	ret	0
    $LN3@llsgn0:
    	xorps	xmm0, xmm0
    ; Line 5
    	movlpd	QWORD PTR $T1[esp-4], xmm0
    	mov	eax, DWORD PTR $T1[esp-4]
    	mov	edx, DWORD PTR $T1[esp]
    	mov	eax, DWORD PTR _value$[esp]
    	cdq
    	mov	eax, edx
    ; Line 6
    	ret	0
    _llsgn0	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn1
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn1	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case25.c
    ; Line 10
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn1
    	jl	SHORT $LN5@llsgn1
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn1
    $LN5@llsgn1:
    	mov	eax, 1
    ; Line 11
    	ret	0
    $LN3@llsgn1:
    ; Line 10
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 11
    	ret	0
    _llsgn1	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn2
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn2	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case25.c
    ; Line 15
    	mov	ecx, DWORD PTR _value$[esp]
    	mov	eax, ecx
    	sar	eax, 31					; 0000001fH
    	sar	ecx, 31					; 0000001fH
    	mov	eax, DWORD PTR _value$[esp]
    	sar	eax, 31					; 0000001fH
    ; Line 16
    	ret	0
    _llsgn2	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn3
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn3	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case25.c
    ; Line 20
    	mov	ecx, DWORD PTR _value$[esp]
    	xor	eax, eax
    	and	ecx, -2147483648			; 80000000H
    	or	eax, ecx
    	je	SHORT $LN3@llsgn3
    	mov	eax, 1
    ; Line 21
    	ret	0
    $LN3@llsgn3:
    ; Line 20
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 21
    	ret	0
    _llsgn3	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsgn4
    _TEXT	SEGMENT
    _value$ = 8						; size = 8
    _llsgn4	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case25.c
    ; Line 25
    	cmp	DWORD PTR _value$[esp], 0
    	jg	SHORT $LN3@llsgn4
    	jl	SHORT $LN5@llsgn4
    	cmp	DWORD PTR _value$[esp-4], 0
    	jae	SHORT $LN3@llsgn4
    $LN5@llsgn4:
    	mov	eax, 1
    ; Line 26
    	ret	0
    $LN3@llsgn4:
    ; Line 25
    	xor	eax, eax
    	mov	eax, DWORD PTR _value$[esp]
    	shr	eax, 31					; 0000001fH
    ; Line 26
    	ret	0
    _llsgn4	ENDP
    _TEXT	ENDS
    END
    The optimiser fails to recognise all these commonly used expressions to determine the sign of an integer value!

    Especially notice the completely in(s)ane use of the SSE register XMM0 and the temporary variable $T1 instead of just two XOR instructions to zero the registers EAX and EDX in the function llsgn0(), the two SAR instructions in the function llsgn2(), and the completely insane code generated for the function llsgn3()!

Case 26

Demonstration

  1. Create the text file case26.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    long lsign(long x)
    {
        return (x > 0) - (x < 0);
    }
    
    long long llsign(long long x)
    {
        return (x > 0) - (x < 0);
    }
  2. Generate the assembly listing file case26.asm from the source file case26.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase26.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case26.c
  3. Display the assembly listing file case26.asm created in step 2.:

    TYPE case26.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case26.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lsign
    PUBLIC	_llsign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lsign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _lsign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case26.c
    ; Line 5
    	mov	ecx, DWORD PTR _x$[esp-4]
    	xor	eax, eax
    	test	ecx, ecx
    	setg	al
    	shr	ecx, 31					; 0000001fH
    	sub	eax, ecx
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    	neg	eax
    	adc	edx, edx
    	mov	eax, edx
    ; Line 6
    	ret	0
    _lsign	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsign
    _TEXT	SEGMENT
    $T1 = 8							; size = 8
    $T2 = 8							; size = 8
    _x$ = 8							; size = 8
    _llsign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case26.c
    ; Line 10
    	xor	edx, edx
    	mov	eax, DWORD PTR _x$[esp]
    	cmp	edx, DWORD PTR _x$[esp-4]
    	sbb	edx, eax
    	cdq
    	setl	al
    	movzx	eax, al
    	add	eax, edx
    	mov	ecx, DWORD PTR _x$[esp]
    	mov	eax, DWORD PTR _x$[esp-4]
    	push	esi
    	test	ecx, ecx
    	jl	SHORT $LN5@llsign
    	jg	SHORT $LN7@llsign
    	test	eax, eax
    	je	SHORT $LN5@llsign
    $LN7@llsign:
    	mov	edx, 1
    	xor	esi, esi
    	jmp	SHORT $LN6@llsign
    $LN5@llsign:
    	xorps	xmm0, xmm0
    	movlpd	QWORD PTR $T2[esp], xmm0
    	mov	esi, DWORD PTR $T2[esp+4]
    	mov	edx, DWORD PTR $T2[esp]
    $LN6@llsign:
    	test	ecx, ecx
    	jg	SHORT $LN3@llsign
    	jl	SHORT $LN8@llsign
    	test	eax, eax
    	jae	SHORT $LN3@llsign
    $LN8@llsign:
    	mov	eax, 1
    	xor	ecx, ecx
    	sub	edx, eax
    	mov	eax, edx
    	sbb	esi, ecx
    	mov	edx, esi
    	pop	esi
    ; Line 11
    	ret	0
    $LN3@llsign:
    	xorps	xmm0, xmm0
    ; Line 10
    	movlpd	QWORD PTR $T1[esp], xmm0
    	mov	eax, DWORD PTR $T1[esp]
    	sub	edx, eax
    	mov	ecx, DWORD PTR $T1[esp+4]
    	mov	eax, edx
    	sbb	esi, ecx
    	mov	edx, esi
    	pop	esi
    ; Line 11
    	ret	0
    _llsign	ENDP
    _TEXT	ENDS
    END
    The code generated for the function llsign() is as bad as it gets: 38 (in words: thirty-eight) instructions, including performance degrading conditional branches, instead of just 9 (in words: nine) instructions!
    Especially notice the braindead use of SSE instructions to zero the superfluous temporary variables $T1 and $T2 loaded into the registers EAX, ECX, EDX and ESI instead of XOR instructions to zero these registers!
  4. Generate another assembly listing file case26.asm from the source file case26.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, now with the switch /arch:SSE defined on the command line to disable the generation of SSE2 instructions:

    CL.EXE /arch:SSE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase26.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case26.c
  5. Display the assembly listing file case26.asm created in step 4.:

    TYPE case26.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case26.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_lsign
    PUBLIC	_llsign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_lsign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _lsign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case26.c
    ; Line 5
    	mov	ecx, DWORD PTR _x$[esp-4]
    	xor	eax, eax
    	test	ecx, ecx
    	setg	al
    	shr	ecx, 31					; 0000001fH
    	sub	eax, ecx
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    	neg	eax
    	adc	edx, edx
    	mov	eax, edx
    ; Line 6
    	ret	0
    _lsign	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_llsign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _llsign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case26.c
    ; Line 10
    	xor	edx, edx
    	mov	eax, DWORD PTR _x$[esp]
    	cmp	edx, DWORD PTR _x$[esp-4]
    	sbb	edx, eax
    	cdq
    	setl	al
    	movzx	eax, al
    	add	eax, edx
    	mov	ecx, DWORD PTR _x$[esp]
    	mov	eax, DWORD PTR _x$[esp-4]
    	push	esi
    	test	ecx, ecx
    	jl	SHORT $LN5@llsign
    	jg	SHORT $LN7@llsign
    	test	eax, eax
    	je	SHORT $LN5@llsign
    $LN7@llsign:
    	mov	edx, 1
    	jmp	SHORT $LN9@llsign
    $LN5@llsign:
    	xor	edx, edx
    $LN9@llsign:
    	xor	esi, esi
    	test	ecx, ecx
    	jg	SHORT $LN3@llsign
    	jl	SHORT $LN8@llsign
    	test	eax, eax
    	jae	SHORT $LN3@llsign
    $LN8@llsign:
    	mov	eax, 1
    	sub	edx, eax
    	mov	eax, edx
    	sbb	esi, 0
    	mov	edx, esi
    	pop	esi
    ; Line 11
    	ret	0
    $LN3@llsign:
    ; Line 10
    	xor	eax, eax
    	sub	edx, eax
    	sbb	esi, eax
    	mov	eax, edx
    	mov	edx, esi
    	pop	esi
    ; Line 11
    	ret	0
    _llsign	ENDP
    _TEXT	ENDS
    END
    The code generated for the function llsign() is now slightly less bad: 31 (in words: thirty-one) instructions, including performance degrading conditional branches, instead of just 9 (in words: nine) instructions!

Case 27

Demonstration

  1. Create the text file case27.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int ulcmp(unsigned left, unsigned right)
    {
        return (left > right) - (left < right);
    }
    
    int ullcmp(unsigned long long left, unsigned long long right)
    {
        return (left > right) - (left < right);
    }
  2. Generate the assembly listing file case27.asm from the source file case27.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase27.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case27.c
  3. Display the assembly listing file case27.asm created in step 2.:

    TYPE case27.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case27.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ulcmp
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ulcmp
    _TEXT	SEGMENT
    _left$ = 8						; size = 4
    _right$ = 12						; size = 4
    _ulcmp	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case27.c
    ; Line 4
    	mov	ecx, DWORD PTR _right$[esp-4]
    	mov	edx, DWORD PTR _left$[esp-4]
    	cmp	ecx, edx
    	sbb	eax, eax
    	neg	eax
    	cmp	edx, ecx
    	sbb	eax, eax
    	cmp	ecx, edx
    	adc	eax, 0
    	sbb	ecx, ecx
    	neg	ecx
    	sub	eax, ecx
    ; Line 5
    	ret	0
    _ulcmp	ENDP
    _TEXT	ENDS
    PUBLIC	_ullcmp
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ullcmp
    _TEXT	SEGMENT
    _left$ = 8						; size = 8
    _right$ = 16						; size = 8
    _ullcmp	PROC						; COMDAT
    ; Line 9
    	mov	ecx, DWORD PTR _left$[esp]
    	mov	edx, DWORD PTR _right$[esp]
    	push	esi
    	mov	esi, DWORD PTR _right$[esp]
    	push	edi
    	mov	edi, DWORD PTR _left$[esp+4]
    	cmp	ecx, edx
    	mov	eax, DWORD PTR _left$[esp+4]
    	sbb	eax, DWORD PTR _right$[esp+4]
    	sbb	eax, eax
    	cmp	edx, ecx
    	mov	edx, DWORD PTR _right$[esp+4]
    	sbb	edx, DWORD PTR _left$[esp+4]
    	adc	eax, 0
    	jb	SHORT $LN5@ullcmp
    	ja	SHORT $LN7@ullcmp
    	cmp	edi, esi
    	jbe	SHORT $LN5@ullcmp
    $LN7@ullcmp:
    	mov	eax, 1
    	jmp	SHORT $LN6@ullcmp
    $LN5@ullcmp:
    	xor	eax, eax
    $LN6@ullcmp:
    	cmp	ecx, edx
    	ja	SHORT $LN3@ullcmp
    	jb	SHORT $LN8@ullcmp
    	cmp	edi, esi
    	jae	SHORT $LN3@ullcmp
    $LN8@ullcmp:
    	mov	ecx, 1
    	pop	edi
    	sub	eax, ecx
    	pop	esi
    ; Line 10
    	ret	0
    $LN3@ullcmp:
    ; Line 9
    	xor	ecx, ecx
    	pop	edi
    	sub	eax, ecx
    	pop	esi
    ; Line 10
    	ret	0
    _ullcmp	ENDP
    _TEXT	ENDS
    END
    While the code generated for the function ulcmp() shows only 3 superfluous instructions, the code generated for the function ullcmp() is as bad as it gets: 29 (in words: twenty-nine) instructions, including performance degrading conditional branches, also clobbering the registers EDI and ESI without necessity, instead of just 11 (in words: eleven) instructions!

Case 28

Demonstration

  1. Create the text file case28.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int absolute(int x)
    {
    #ifdef ALTERNATE
        long long z = x;
        return x - ((x + x) & (z >> 32));
    #else
        return x - ((x + x) & (x >> 31));
    #endif
    }
    
    int maximum(int x, int y)
    {
    #ifdef ALTERNATE
        long long z = (y = x - y);
        x -= y & (z >> 32);
    #else
        y = -y;
        y += x;
        x -= y & (y >> 31);
    #endif
        return x;
    }
    
    int minimum(int x, int y)
    {
    #ifdef ALTERNATE
        long long z = (y -= x);
        x += y & (z >> 32);
    #else
        y -= x;
        x += y & (y >> 31);
    #endif
        return x;
    }
    
    int sign(int x)
    {
    #ifdef ALTERNATE
        long long z = x;
        return z >> 32;
    #else
        return x < 0 ? -1 : 0;
    #endif
    }
  2. Generate the assembly listing file case28.asm from the source file case28.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case28.c
  3. Display the assembly listing file case28.asm created in step 2.:

    TYPE case28.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case28.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 9
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	edx, eax
    	sar	edx, 31					; 0000001fH
    	lea	ecx, DWORD PTR [eax+eax]
    	and	edx, ecx
    	sub	eax, edx
    ; Line 11
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	edx, eax
    	sub	edx, DWORD PTR _y$[esp-4]
    ; Line 21
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    	and	ecx, edx
    	sub	eax, ecx
    ; Line 24
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 32
    	mov	ecx, DWORD PTR _y$[esp-4]
    	sub	ecx, DWORD PTR _x$[esp-4]
    ; Line 33
    	mov	eax, ecx
    	sar	eax, 31					; 0000001fH
    	and	eax, ecx
    	add	eax, DWORD PTR _x$[esp-4]
    ; Line 36
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 44
    	mov	eax, DWORD PTR _x$[esp-4]
    	sar	eax, 31					; 0000001fH
    ; Line 46
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, the optimiser fails to recognise these well-known but superfluous old school expressions optimisations and emit code better suited for current processors less than 23 years old instead, as shown in step 6. and following below.
  4. Repeat the previous steps with the alternate implementation; generate another assembly listing file case28.asm from the source file case28.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case28.c
  5. Display the assembly listing file case28.asm created in step 4.:

    TYPE case28.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case28.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 4
    	push	esi
    ; Line 6
    	mov	eax, DWORD PTR _x$[esp]
    	mov	esi, DWORD PTR _x$[esp]
    	mov	eax, esi
    	cdq
    ; Line 7
    	mov	ecx, edx
    	sar	ecx, 31					; 0000001fH
    	lea	ecx, DWORD PTR [esi+esi]
    	lea	ecx, DWORD PTR [eax+eax]
    	and	edx, ecx
    	sub	eax, edx
    	sub	esi, edx
    	mov	eax, esi
    	pop	esi
    ; Line 11
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 14
    	push	esi
    	push	edi
    ; Line 16
    	mov	edi, DWORD PTR _x$[esp+4]
    	mov	esi, edi
    	sub	esi, DWORD PTR _y$[esp+4]
    	mov	eax, esi
    	mov	ecx, DWORD PTR _x$[esp+4]
    	mov	eax, ecx
    	sub	eax, DWORD PTR _y$[esp+4]
    	cdq
    ; Line 17
    	mov	ecx, edx
    	and	edx, esi
    	and	edx, eax
    	sub	ecx, edx
    	sub	edi, edx
    	sar	ecx, 31					; 0000001fH
    ; Line 23
    	mov	eax, ecx
    	mov	eax, edi
    	pop	edi
    	pop	esi
    ; Line 24
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 27
    	push	esi
    ; Line 29
    	mov	esi, DWORD PTR _y$[esp]
    	sub	esi, DWORD PTR _x$[esp]
    	mov	eax, esi
    	mov	eax, DWORD PTR _y$[esp]
    	sub	eax, DWORD PTR _x$[esp]
    	cdq
    ; Line 30
    	mov	ecx, edx
    	and	edx, esi
    	and	edx, eax
    	add	edx, DWORD PTR _x$[esp]
    	sar	ecx, 31					; 0000001fH
    ; Line 35
    	mov	eax, edx
    	pop	esi
    ; Line 36
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 41
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    ; Line 42
    	mov	eax, edx
    	sar	eax, 31					; 0000001fH
    	mov	eax, edx
    ; Line 46
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END
    While the generated code is correct, it uses the registers EDI and ESI without necessity, and the majority of instructions are superfluous.
    Especially notice the arithmetic right shifts: their results are never used!
  6. Overwrite the text file case28.c created in step 1. with the following content:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int absolute(int x)
    {
        return x < 0 ? -x : x;
    }
    
    int maximum(int x, int y)
    {
        return x > y ? x : y;
    }
    
    int minimum(int x, int y)
    {
        return x < y ? x : y;
    }
    
    int sign(int x)
    {
        return x < 0 ? -1 : 0;
    }
  7. Generate the assembly listing file case28.asm from the source file case28.c created in step 6., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case28.c
  8. Display the assembly listing file case28.asm created in step 2.:

    TYPE case28.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case28.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_absolute
    PUBLIC	_maximum
    PUBLIC	_minimum
    PUBLIC	_sign
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_absolute
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _absolute PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 5
    	mov	eax, DWORD PTR _x$[esp-4]
    	cdq
    	xor	eax, edx
    	sub	eax, edx
    ; Line 6
    	ret	0
    _absolute ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_maximum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _maximum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 10
    	mov	eax, DWORD PTR _y$[esp-4]
    	cmp	DWORD PTR _x$[esp-4], eax
    	cmovg	eax, DWORD PTR _x$[esp-4]
    ; Line 11
    	ret	0
    _maximum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_minimum
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _y$ = 12						; size = 4
    _minimum PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 15
    	mov	eax, DWORD PTR _y$[esp-4]
    	cmp	DWORD PTR _x$[esp-4], eax
    	cmovl	eax, DWORD PTR _x$[esp-4]
    ; Line 16
    	ret	0
    _minimum ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_sign
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _sign	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case28.c
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp-4]
    	sar	eax, 31					; 0000001fH
    ; Line 21
    	ret	0
    _sign	ENDP
    _TEXT	ENDS
    END

Case 29

Demonstration

  1. Create the text file case29.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    unsigned long ul(unsigned long x)
    {
    #ifdef ALTERNATE
        while (~x & 1)
    #else
        while (!(x & 1))
    #endif
            x >>= 1;
    
        return x;
    }
    
    unsigned long long ull(unsigned long long x)
    {
    #ifdef ALTERNATE
        while (~x & 1)
    #else
        while (!(x & 1))
    #endif
            x >>= 1;
    
        return x;
    }
  2. Generate the assembly listing file case29.asm from the source file case29.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase29.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case29.c
  3. Display the assembly listing file case29.asm created in step 2.:

    TYPE case29.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case29.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ul
    PUBLIC	_ull
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ul
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _ul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case29.c
    ; Line 8
    	mov	eax, DWORD PTR _x$[esp-4]
    	test	al, 1
    	jne	SHORT $LN3@ul
    $LL2@ul:
    ; Line 10
    	shr	eax, 1
    	test	al, 1
    	je	SHORT $LL2@ul
    $LN3@ul:
    ; Line 13
    	ret	0
    _ul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ull
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _ull	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case29.c
    ; Line 20
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	ecx, eax
    	mov	edx, DWORD PTR _x$[esp]
    	and	ecx, 1
    	or	ecx, 0
    	test	al, 1
    	jne	SHORT $LN3@ull
    $LL2@ull:
    ; Line 22
    	shrd	eax, edx, 1
    	mov	ecx, eax
    	shr	edx, 1
    	and	ecx, 1
    	or	ecx, 0
    	test	al, 1
    	je	SHORT $LL2@ull
    $LN3@ull:
    ; Line 25
    	ret	0
    _ull	ENDP
    _TEXT	ENDS
    END
  4. Generate another assembly listing file case29.asm from the source file case29.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture, with the preprocessor macro ALTERNATE defined on the command line:

    CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase29.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case29.c
  5. Display the assembly listing file case29.asm created in step 4.:

    TYPE case29.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case29.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_ul
    PUBLIC	_ull
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ul
    _TEXT	SEGMENT
    _x$ = 8							; size = 4
    _ul	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case29.c
    ; Line 6
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	ecx, eax
    	not	ecx
    	test	cl, 1
    	je	SHORT $LN3@ul
    	npad	3
    	test	al, 1
    	jne	SHORT $LN3@ul
    $LL2@ul:
    ; Line 10
    	shr	eax, 1
    	mov	ecx, eax
    	not	ecx
    	test	cl, 1
    	jne	SHORT $LL2@ul
    	test	al, 1
    	je	SHORT $LL2@ul
    $LN3@ul:
    ; Line 13
    	ret	0
    _ul	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_ull
    _TEXT	SEGMENT
    _x$ = 8							; size = 8
    _ull	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case29.c
    ; Line 18
    	mov	eax, DWORD PTR _x$[esp-4]
    	mov	ecx, eax
    	mov	edx, DWORD PTR _x$[esp]
    	not	ecx
    	and	ecx, 1
    	or	ecx, 0
    	je	SHORT $LN3@ull
    	test	al, 1
    	jne	SHORT $LN3@ull
    $LL2@ull:
    ; Line 22
    	shrd	eax, edx, 1
    	mov	ecx, eax
    	shr	edx, 1
    	not	ecx
    	and	ecx, 1
    	or	ecx, 0
    	jne	SHORT $LL2@ull
    	test	al, 1
    	je	SHORT $LL2@ull
    $LN3@ull:
    ; Line 25
    	ret	0
    _ull	ENDP
    _TEXT	ENDS
    END
    The optimiser fails to recognise the commonly used (alternate) way to test whether an integer is even … or it is just bad in elementary boolean logic.

Case 30

Superfluous instructions generated for the intrinsic function __getcallerseflags() by the Visual C 2017 compiler (and previous versions too):

Demonstration

  1. Create the text file case30.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int main()
    {
        return __getcallerseflags();
    }
  2. Generate the assembly listing file case30.asm from the source file case30.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase30.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case30.c
  3. Display the assembly listing file case30.asm created in step 2.:

    TYPE case30.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case30.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    __$Eflags$ = 4						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case30.c
    ; Line 4
    	pushfd
    	push	ebp
    	mov	ebp, esp
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[ebp]
    ; Line 6
    	pop	ebp
    	pop	ecx
    	pop	eax
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
  4. Generate the assembly listing file case30.asm from the source file case30.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase30.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case30.c
  5. Display the assembly listing file case30.asm created in step 4.:

    TYPE case30.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    …
    
    ; Function compile flags: /Ogtpy
    ;	COMDAT	main
    _TEXT	SEGMENT
    __$Eflags$ = 0
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case30.c
    ; Line 4
    $LN4:
    	pushfq
    ; Line 5
    	mov	eax, DWORD PTR __$Eflags$[rsp]
    ; Line 6
    	pop	rcx
    	pop	rax
    	ret	0
    main	ENDP
    _TEXT	ENDS
    END

Case 31

Demonstration

  1. Create the text file case31.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define STRICT
    #define UNICODE
    #define WIN32_LEAN_AND_MEAN
    
    #include <windows.h>
    #include <unknwn.h>
    
    #define IF2CO(class, member, interface)	(&((class *) 0)->member == interface, \
    					 ((class *) (((char *) interface) - (size_t) &(((class *) 0)->member))))
    
    extern	const	GUID	CLSID_NULL;
    
    extern	DWORD	dwCount;
    
    typedef	struct	_CUnknown
    {
    	DWORD		dwCount;
    
    	IUnknown	Unknown;
    } CUnknown;
    
    HRESULT	WINAPI	Unknown_QueryInterface(IUnknown *this, REFIID rIID, VOID **ppv)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	if (ppv == NULL)
    		return E_POINTER;
    
    	*ppv = NULL;
    
    	if (rIID == NULL)
    		return E_INVALIDARG;
    
    	if (!IsEqualIID(rIID, &IID_IUnknown))
    		return E_NOINTERFACE;
    
    	*ppv = &that->Unknown;
    
    	_InterlockedIncrement(&that->dwCount);
    
    	return S_OK;
    }
    
    DWORD	WINAPI	Unknown_AddRef(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    
    	return _InterlockedIncrement(&that->dwCount);
    }
    
    DWORD	WINAPI	Unknown_Release(IUnknown *this)
    {
    	CUnknown	*that = IF2CO(CUnknown, Unknown, this);
    	DWORD		dw = _InterlockedDecrement(&that->dwCount);
    
    	if (dw != 0L)
    		return dw;
    
    	_InterlockedDecrement(&dwCount);
    
    	CoTaskMemFree(that);
    
    	return 0L;
    }
    
    const	IUnknownVtbl	Unknown_Vtbl = {Unknown_QueryInterface, Unknown_AddRef, Unknown_Release};
    Note: this ANSI C source is a minimum implementation of the IUnknown interface.
  2. Generate the assembly listing file case31.asm from the source file case31.c created in step 1., using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1is /Tccase31.c
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    case31.c
  3. Display the assembly listing file case31.asm created in step 2.:

    TYPE case31.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\case31.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_Unknown_Release@4
    PUBLIC	_Unknown_AddRef@4
    PUBLIC	_Unknown_QueryInterface@12
    PUBLIC	_Unknown_Vtbl
    
    ;	COMDAT	CONST
    CONST	SEGMENT
    _Unknown_Vtbl DD FLAT:_Unknown_QueryInterface@12
    	DD	FLAT:_Unknown_AddRef@4
    	DD	FLAT:_Unknown_Release@4
    CONST	ENDS
    
    EXTRN	_IID_IUnknown:BYTE
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_QueryInterface@12
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _rIID$ = 12						; size = 4
    _ppv$ = 16						; size = 4
    _Unknown_QueryInterface@12 PROC				; COMDAT
    ; File c:\users\stefan\desktop\case31.c
    ; Line 26
    	mov	edx, DWORD PTR _this$[esp-4]
    ; Line 28
    	mov	eax, DWORD PTR _ppv$[esp-4]
    	add	edx, -4					; fffffffcH
    	test	eax, eax
    	jne	SHORT $LN3@Unknown_Qu
    ; Line 29
    	mov	eax, -2147467261			; 80004003H
    	jmp	SHORT $LN4@Unknown_Qu
    $LN3@Unknown_Qu:
    ; Line 31
    	and	DWORD PTR [eax], 0
    	push	esi
    ; Line 33
    	mov	esi, DWORD PTR _rIID$[esp]
    	test	esi, esi
    	jne	SHORT $LN2@Unknown_Qu
    ; Line 34
    	mov	eax, -2147024809			; 80070057H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN2@Unknown_Qu:
    	push	ebx
    	push	edi
    ; Line 36
    	push	4
    	pop	ecx
    	xor	ebx, ebx
    	mov	edi, OFFSET _IID_IUnknown
    	repe	cmpsd
    	pop	edi
    	pop	ebx
    	je	SHORT $LN1@Unknown_Qu
    ; Line 37
    	mov	eax, -2147467262			; 80004002H
    	jmp	SHORT $LN7@Unknown_Qu
    $LN1@Unknown_Qu:
    ; Line 39
    	lea	ecx, DWORD PTR [edx+4]
    	mov	DWORD PTR [eax], ecx
    ; Line 41
    	xor	eax, eax
    	inc	eax
    	lock	xadd DWORD PTR [edx], eax
    ; Line 43
    	xor	eax, eax
    $LN7@Unknown_Qu:
    	pop	esi
    $LN4@Unknown_Qu:
    ; Line 44
    	ret	12					; 0000000cH
    _Unknown_QueryInterface@12 ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_AddRef@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_AddRef@4 PROC					; COMDAT
    ; Line 48
    	mov	ecx, DWORD PTR _this$[esp-4]
    ; Line 50
    	xor	eax, eax
    	add	ecx, -4					; fffffffcH
    	inc	eax
    	lock	xadd DWORD PTR [ecx], eax
    	inc	eax
    ; Line 51
    	ret	4
    _Unknown_AddRef@4 ENDP
    _TEXT	ENDS
    
    EXTRN	__imp__CoTaskMemFree@4:PROC
    EXTRN	_dwCount:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_Unknown_Release@4
    _TEXT	SEGMENT
    _this$ = 8						; size = 4
    _Unknown_Release@4 PROC					; COMDAT
    ; Line 55
    	mov	ecx, DWORD PTR _this$[esp-4]
    	add	ecx, -4					; fffffffcH
    ; Line 56
    	mov	edx, ecx
    	or	eax, -1
    	lock	xadd DWORD PTR [edx], eax
    	dec	eax
    ; Line 59
    	jne	SHORT $LN2@Unknown_Re
    ; Line 61
    	mov	eax, OFFSET _dwCount
    	or	edx, -1
    	lock	xadd DWORD PTR [eax], edx
    ; Line 63
    	push	ecx
    	call	DWORD PTR __imp__CoTaskMemFree@4
    ; Line 65
    	xor	eax, eax
    $LN2@Unknown_Re:
    ; Line 66
    	ret	4
    _Unknown_Release@4 ENDP
    _TEXT	ENDS
    END
    Notice the in(s)ane use of the EBX register around the inlined memcmp() function.

Case 32

Superfluous unreachable call of external routine __report_rangecheckfailure() generated by the Visual C 2017 compiler.

Demonstration

  1. Create the text file case32.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #define MAX_PATH 260
    
    typedef unsigned short wchar_t;
    
    unsigned __stdcall GetModuleFileNameA(void *, char *, unsigned);
    
    int main()
    {
        char sz[MAX_PATH];
        unsigned dw = GetModuleFileNameA(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = '\0';
    }
    
    unsigned __stdcall GetModuleFileNameW(void *, wchar_t *, unsigned);
    
    int wmain()
    {
        wchar_t sz[MAX_PATH];
        unsigned dw = GetModuleFileNameW(0, sz, MAX_PATH);
    
        if (dw < MAX_PATH)
            sz[dw] = L'\0';
    }
  2. Generate the assembly listing file case32.asm from the source file case32.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase32.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case32.c
  3. Display the assembly listing file case32.asm created in step 2.:

    TYPE case32.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case32.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    PUBLIC	_wmain
    EXTRN	___report_rangecheckfailure:PROC
    EXTRN	_GetModuleFileNameA@12:PROC
    EXTRN	_GetModuleFileNameW@12:PROC
    EXTRN	@__security_check_cookie@4:PROC
    EXTRN	___security_cookie:DWORD
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    _sz$ = -264						; size = 260
    __$ArrayPad$ = -4					; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case32.c
    ; Line 10
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 264				; 00000108H
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 12
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameA@12
    ; Line 14
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	BYTE PTR _sz$[ebp+eax], 0
    $LN2@main:
    ; Line 16
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_wmain
    _TEXT	SEGMENT
    _sz$ = -524						; size = 520
    __$ArrayPad$ = -4					; size = 4
    _wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case32.c
    ; Line 21
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 524				; 0000020cH
    	mov	eax, DWORD PTR ___security_cookie
    	xor	eax, ebp
    	mov	DWORD PTR __$ArrayPad$[ebp], eax
    	push	esi
    ; Line 23
    	mov	esi, 260				; 00000104H
    	lea	eax, DWORD PTR _sz$[ebp]
    	push	esi
    	push	eax
    	push	0
    	call	_GetModuleFileNameW@12
    ; Line 25
    	cmp	eax, esi
    	pop	esi
    	jae	SHORT $LN2@wmain
    ; Line 26
    	add	eax, eax
    	cmp	eax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	mov	ecx, DWORD PTR __$ArrayPad$[ebp]
    	xor	eax, eax
    	xor	ecx, ebp
    	call	@__security_check_cookie@4
    	mov	esp, ebp
    	pop	ebp
    	leave
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	___report_rangecheckfailure
    $LN11@wmain:
    $LN8@wmain:
    	int	3
    _wmain	ENDP
    _TEXT	ENDS
    END
    Notice the difference between the single-byte character routine main() and the double-byte character routine wmain(): in the former, the conditional assignment of the terminating NUL character is not removed; in the latter, a superfluous range check with a conditional branch that can never be taken is inserted instead, plus an unreachable call of the external routine __report_rangecheckfailure()!
  4. Generate the assembly listing file case32.asm from the source file case32.c created in step 1., using the Visual C 2017 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase32.c /W4 /Zl
    Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0
    
    case32.c
  5. Display the assembly listing file case32.asm created in step 4.:

    TYPE case32.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	main
    PUBLIC	wmain
    EXTRN	__report_rangecheckfailure:PROC
    EXTRN	GetModuleFileNameA:PROC
    EXTRN	GetModuleFileNameW:PROC
    EXTRN	__GSHandlerCheck:PROC
    EXTRN	__security_check_cookie:PROC
    EXTRN	__security_cookie:QWORD
    …
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	main
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 304
    main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case32.c
    ; Line 10
    $LN10:
    	sub	rsp, 328				; 00000148H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 12
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameA
    ; Line 14
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@main
    ; Line 15
    	mov	eax, eax
    	cmp	rax, 260				; 00000104H
    	jae	SHORT $LN9@main
    	mov	BYTE PTR sz$[rsp+rax], 0
    $LN2@main:
    ; Line 16
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 328				; 00000148H
    	ret	0
    $LN9@main:
    ; Line 15
    	call	__report_rangecheckfailure
    	int	3
    $LN8@main:
    main	ENDP
    _TEXT	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	wmain
    _TEXT	SEGMENT
    sz$ = 32
    __$ArrayPad$ = 560
    wmain	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case32.c
    ; Line 21
    $LN11:
    	sub	rsp, 584				; 00000248H
    	mov	rax, QWORD PTR __security_cookie
    	xor	rax, rsp
    	mov	QWORD PTR __$ArrayPad$[rsp], rax
    ; Line 23
    	mov	r8d, 260				; 00000104H
    	lea	rdx, QWORD PTR sz$[rsp]
    	xor	ecx, ecx
    	call	GetModuleFileNameW
    ; Line 25
    	cmp	eax, 260				; 00000104H
    	jae	SHORT $LN2@wmain
    ; Line 26
    	mov	eax, eax
    	add	rax, rax
    	cmp	rax, 520				; 00000208H
    	jae	SHORT $LN9@wmain
    $LN2@wmain:
    ; Line 27
    	xor	eax, eax
    	mov	rcx, QWORD PTR __$ArrayPad$[rsp]
    	xor	rcx, rsp
    	call	__security_check_cookie
    	add	rsp, 584				; 00000248H
    	ret	0
    $LN9@wmain:
    ; Line 26
    	call	__report_rangecheckfailure
    	int	3
    $LN8@wmain:
    wmain	ENDP
    _TEXT	ENDS
    END
    Notice the superfluous range checks with conditional branches that can never be taken, plus the unreachable calls of the external routine __report_rangecheckfailure()!
    Also notice that the conditional assignment of the terminating NUL character is not removed in the single-byte character routine main().

Case 33

Demonstration

  1. Create the text file case33.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #include <mmintrin.h>
    
    int main(int argc)
    {
        const __m64 t = _mm_cvtsi32_si64(argc);
        const __m64 u = _mm_set1_pi8(0);
        const __m64 v = _mm_set_pi8(1, 2, 3, 4, 5, 6, 7, 8);
        const __m64 w = _mm_setr_pi8(1, 2, 3, 4, 5, 6, 7, 8);
        const __m64 x = _mm_or_si64(t, u);
        const __m64 y = _mm_and_si64(v, w);
        const __m64 z = _mm_xor_si64(x, y);
    
        return _mm_cvtsi64_si32(z);
    }
  2. Generate the assembly listing file case33.asm from the source file case33.c created in step 1., using the Visual C 2017 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase33.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll:      Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll:        Version 19.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe:      Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll:  Version 14.13.26129.0
     C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0
    
    case33.c
  3. Display the assembly listing file case33.asm created in step 2.:

    TYPE case33.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 
    
    	TITLE	C:\Users\Stefan\Desktop\case33.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	_main
    
    ;	COMDAT	_CONST
    _CONST	SEGMENT
    bar	DQ	0807060504030201h
    foo	DQ	0102030405060708h
    _CONST	ENDS
    
    ; Function compile flags: /Ogspy
    ;	COMDAT	_main
    _TEXT	SEGMENT
    tv138 = -8						; size = 8
    tv129 = -8						; size = 8
    _argc$ = 8						; size = 4
    _main	PROC						; COMDAT
    ; File c:\users\stefan\desktop\case33.c
    ; Line 6
    	push	ebp
    	mov	ebp, esp
    	and	esp, -8					; fffffff8H
    	sub	esp, 8
    ; Line 7
    	movd	mm2, DWORD PTR _argc$[ebp]
    ; Line 8
    	xor	al, al
    ; Line 9
    	mov	DWORD PTR tv129[esp+12], 16909060	; 01020304H
    	mov	DWORD PTR tv129[esp+8], 84281096	; 05060708H
    	movd	mm0, al
    	punpcklbw mm0, mm0
    	punpcklbw mm0, mm0
    	punpcklbw mm0, mm0
    	pxor	mm0, mm0
    ; Line 11
    	por	mm2, mm0
    	movq	mm1, MMWORD PTR tv129[esp+8]
    	movq	mm1, MMWORD PTR bar
    	mov	DWORD PTR tv138[esp+8], 67305985	; 04030201H
    	mov	DWORD PTR tv138[esp+12], 134678021	; 08070605H
    	movq	mm0, MMWORD PTR tv138[esp+8]
    	movq	mm0, MMWORD PTR foo
    ; Line 12
    	pand	mm0, mm1
    ; Line 13
    	pxor	mm0, mm2
    ; Line 15
    	movd	eax, mm0
    ; Line 16
    	mov	esp, ebp
    	pop	ebp
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    The compiler abuses an XOR, a MOVQ plus three PUNPCKLBW instructions to broadcast the constant 0 into every byte of the MMX register MM0 instead of a single PXOR instruction; the following POR instruction is superfluous: a logical or with 0 has no effect!

    The constants 0x0102030405060708 and 0x0807060504030201 loaded into the MMX registers MM0 and MM1 are built during runtime, using two superfluous temporary variables tv129 and tv138, instead during compile time.

    Note: a proper optimising compiler would of course replace both constants and the PAND instruction with the resulting constant 0x0002020404020200!

Case 34

Multiple identical and superfluous SSE2 instructions generated for the intrinsic functions ceil() and floor() by the Visual C 2010 compiler for the x64 alias AMD64 processor architecture.

Demonstration

  1. Create the text file case34.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #pragma intrinsic(ceil, floor)
    
    double trunc(double value)
    {
        return value < 0.0 ? ceil(value) : floor(value);
    }
  2. Generate the assembly listing file case34.asm from the source file case34.c created in step 1., using the Visual C 2010 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /fp:fast /Gy /Ox /Tccase34.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\1033\clui.dll: Version 16.00.40219.1
    
    case34.c
  3. Display the assembly listing file case34.asm created in step 2.:

    TYPE case34.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC	__real@0000000000000000
    PUBLIC	trunc
    EXTRN	_fltused:DWORD
    ; File c:\users\stefan\desktop\case34.c
    ;	COMDAT	__real@0000000000000000
    CONST	SEGMENT
    __real@0000000000000000 DQ 00000000000000000r		; 0
    CONST	ENDS
    ; Function compile flags: /Ogtpy
    _TEXT	SEGMENT
    value$ = 8
    trunc	PROC
    ; Line 6
    	comisd	xmm0, QWORD PTR __real@0000000000000000
    	movmskpd rax, xmm0
    	and	eax, 1
    	cvttsd2si rcx, xmm0
    	xorpd	xmm1, xmm1
    	mov	rax, -9223372036854775808		; 8000000000000000H
    	jae	SHORT $LN3@trunc
    	jz	SHORT $LN3@trunc
    	cmp	rcx, rax
    	je	SHORT $LN5@trunc
    	neg	rcx
    	jo	SHORT $LN5@trunc
    	neg	rcx
    	pxor	xmm1, xmm1
    	cvtsi2sd xmm1, rcx
    	ucomisd	xmm1, xmm0
    	je	SHORT $LN5@trunc
    	unpcklpd xmm0, xmm0
    	movmskpd rax, xmm0
    	pxor	xmm0, xmm0
    	and	eax, 1
    	xor	eax, 1
    	add	rcx, rax
    	cvtsi2sd xmm0, rcx
    ; Line 7
    	ret	0
    $LN3@trunc:
    ; Line 6
    	cmp	rcx, rax
    	je	SHORT $LN5@trunc
    	neg	rcx
    	jo	SHORT $LN5@trunc
    	neg	rcx
    	pxor	xmm1, xmm1
    	cvtsi2sd xmm1, rcx
    	ucomisd	xmm1, xmm0
    	je	SHORT $LN5@trunc
    	unpcklpd xmm0, xmm0
    	movmskpd rax, xmm0
    	pxor	xmm0, xmm0
    	and	eax, 1
    	sub	rcx, rax
    	cvtsi2sd xmm0, rcx
    $LN5@trunc:
    ; Line 7
    	fatret	0
    trunc	ENDP
    _TEXT	ENDS
    END
    The compiler generates PXOR, XORPD and UNPCKLPD instructions to sanitise the upper lane of the XMM registers, although their upper lanes’ content is not used — except from the MOVMSKPD instructions, which get their output but sanitised with the AND instructions.

    The comparison with the constant 0x8000000000000000, the lowest signed 64-bit integer, followed by a jump-on-equal, is quite bad: it can be replaced with a negation followed by a jump-on-overflow, rendering the MOV instruction with its 8 byte immediate value superfluous.

    The initial comparison with the constant 0.0 is clumsy: it can be replaced with a MOVMSKPD plus an AND instruction, saving a memory access and the storage for the constant __real@0000000000000000; a XORPD instruction to zero a register followed by a comparison with it instead of the constant would also be better than the current code.

  4. Generate another assembly listing file case34.asm from the source file case34.c created in step 1., now using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /fp:fast /Gy /Ox /Tccase34.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    case34.c
    case34.c(3) : warning C4163: 'ceil' : not available as an intrinsic function
    case34.c(7) : warning C4013: 'ceil' undefined; assuming extern returning int
    OUCH: contrary to the documentation Intrinsics available on all architectures on MSDN, the ceil() function is not available as intrinsic for the x86 alias I386 processor architecture!

Case 35

Wrong code generated for comparison of floating-point numbers by the Visual C 2010 compiler for the x64 alias AMD64 processor architecture.

Demonstration

  1. Create the text file case35.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    #pragma comment(linker, "/NODEFAULTLIB")
    #pragma comment(linker, "/ENTRY:MainCRTStartup")
    #pragma comment(linker, "/SUBSYSTEM:CONSOLE")
    
    #if 0 // NOTE: MSC yields error C2124 for 1.0 / 0.0 and 0.0 / 0.0!
    #define INFINITY   (1.0 / 0.0)
    #define INDEFINITE (0.0 / 0.0)
    #else
    #define INFINITY   (1.0 / 5.0e-324)
    #define INDEFINITE (0.0 * INFINITY)
    #endif
    
    int _fltused;
    
    int MainCRTStartup(void)
    {
        volatile double indefinite = INDEFINITE, infinity = INFINITY, zero = 0.0;
    
        int bitmask = 0;
    
        if (indefinite == indefinite)
            bitmask += 1 << 0;
    
        if (indefinite < INDEFINITE)
            bitmask += 1 << 1;
    
        if (indefinite == INDEFINITE)
            bitmask += 1 << 2;
    
        if (indefinite > INDEFINITE)
            bitmask += 1 << 3;
    
        if (indefinite < INFINITY)
            bitmask += 1 << 4;
    
        if (indefinite == INFINITY)
            bitmask += 1 << 5;
    
        if (indefinite > INFINITY)
            bitmask += 1 << 6;
    
        if (indefinite < -INFINITY)
            bitmask += 1 << 7;
    
        if (indefinite == -INFINITY)
            bitmask += 1 << 8;
    
        if (indefinite > -INFINITY)
            bitmask += 1 << 9;
    
        if (indefinite < 0.0)
            bitmask += 1 << 10;
    
        if (indefinite == 0.0)
            bitmask += 1 << 11;
    
        if (indefinite > 0.0)
            bitmask += 1 << 12;
    
        if (indefinite < -0.0)
            bitmask += 1 << 13;
    
        if (indefinite == -0.0)
            bitmask += 1 << 14;
    
        if (indefinite > -0.0)
            bitmask += 1 << 15;
    
        if (indefinite < 1.0)
            bitmask += 1 << 16;
    
        if (indefinite == 1.0)
            bitmask += 1 << 17;
    
        if (indefinite > 1.0)
            bitmask += 1 << 18;
    
        if (infinity == INDEFINITE)
            bitmask += 1 << 19;
    
        if (infinity == INFINITY)
            bitmask += 1 << 20;
    
        if (infinity == -INFINITY)
            bitmask += 1 << 21;
    
        if (-infinity == -INFINITY)
            bitmask += 1 << 22;
    
        if (infinity > -INFINITY)
            bitmask += 1 << 23;
    
        if (infinity > 0.0)
            bitmask += 1 << 24;
    
        if (-infinity < -0.0)
            bitmask += 1 << 25;
    
        if (zero == INDEFINITE)
            bitmask += 1 << 26;
    
        if (zero == INFINITY)
            bitmask += 1 << 27;
    
        if (zero == -INFINITY)
            bitmask += 1 << 28;
    
        if (zero == 0.0)
            bitmask += 1 << 29;
    
        if (zero == -0.0)
            bitmask += 1 << 30;
    
        if (INDEFINITE == INDEFINITE)
            bitmask += 1 << 31;
    
        return bitmask;
    }
  2. Generate the assembly listing file case35.asm and the console program case35.exe from the source file case35.c created in step 1., using the Visual C 2010 compiler for the x64 alias AMD64 processor architecture:

    CL.EXE /Bv /Fa /fp:fast /Gy /Tccase35.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for x64
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\1033\clui.dll: Version 16.00.40219.1
    
    case35.c
    case35.c(116) : warning C4127: conditional expression is constant
    
    Microsoft (R) Incremental Linker Version 10.00.40219.386
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    /out:case35.exe
    case35.obj
  3. Display the assembly listing file case35.asm created in step 2.:

    TYPE case35.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    include	listing.inc
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    _DATA	SEGMENT
    COMM	_fltused:DWORD
    _DATA	ENDS
    PUBLIC	__mask@@NegDouble@
    PUBLIC	__real@3ff0000000000000
    PUBLIC	__real@8000000000000000
    PUBLIC	__real@0000000000000000
    PUBLIC	__real@fff0000000000000
    PUBLIC	__real@7ff0000000000000
    PUBLIC	__real@fff8000000000000
    PUBLIC	MainCRTStartup
    EXTRN	_fltused:DWORD
    ;	COMDAT	pdata
    ; File c:\users\stefan\desktop\case35.c
    pdata	SEGMENT
    $pdata$MainCRTStartup DD imagerel $LN35
    	DD	imagerel $LN35+972
    	DD	imagerel $unwind$MainCRTStartup
    pdata	ENDS
    ;	COMDAT	xdata
    xdata	SEGMENT
    $unwind$MainCRTStartup DD 010401H
    	DD	04204H
    xdata	ENDS
    ;	COMDAT	__mask@@NegDouble@
    CONST	SEGMENT
    __mask@@NegDouble@ DB 00H, 00H, 00H, 00H, 00H, 00H, 00H, 080H, 00H, 00H, 00H
    	DB	00H, 00H, 00H, 00H, 080H
    CONST	ENDS
    ;	COMDAT	__real@3ff0000000000000
    CONST	SEGMENT
    __real@3ff0000000000000 DQ 03ff0000000000000r	; 1
    CONST	ENDS
    ;	COMDAT	__real@8000000000000000
    CONST	SEGMENT
    __real@8000000000000000 DQ 08000000000000000r	; -0
    CONST	ENDS
    ;	COMDAT	__real@0000000000000000
    CONST	SEGMENT
    __real@0000000000000000 DQ 00000000000000000r	; 0
    CONST	ENDS
    ;	COMDAT	__real@fff0000000000000
    CONST	SEGMENT
    __real@fff0000000000000 DQ 0fff0000000000000r	; -1.#INF
    CONST	ENDS
    ;	COMDAT	__real@7ff0000000000000
    CONST	SEGMENT
    __real@7ff0000000000000 DQ 07ff0000000000000r	; 1.#INF
    CONST	ENDS
    ;	COMDAT	__real@fff8000000000000
    CONST	SEGMENT
    __real@fff8000000000000 DQ 0fff8000000000000r	; -1.#IND
    ; Function compile flags: /Odtp
    CONST	ENDS
    ;	COMDAT	MainCRTStartup
    _TEXT	SEGMENT
    infinity$ = 0
    bitmask$ = 8
    indefinite$ = 16
    zero$ = 24
    MainCRTStartup PROC					; COMDAT
    ; Line 18
    $LN35:
    	sub	rsp, 40					; 00000028H
    ; Line 19
    	movsdx	xmm0, QWORD PTR __real@fff8000000000000
    	movsdx	QWORD PTR indefinite$[rsp], xmm0
    	movsdx	xmm0, QWORD PTR __real@7ff0000000000000
    	movsdx	QWORD PTR infinity$[rsp], xmm0
    	xorpd	xmm0, xmm0
    	movsdx	QWORD PTR zero$[rsp], xmm0
    ; Line 21
    	mov	DWORD PTR bitmask$[rsp], 0
    ; Line 23
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	movsdx	xmm1, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, xmm1
    	jne	SHORT $LN32@MainCRTSta
    	jpe	SHORT $LN32@MainCRTSta
    ; Line 24
    	mov	eax, DWORD PTR bitmask$[rsp]
    	inc	eax
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN32@MainCRTSta:
    ; Line 26
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@fff8000000000000
    	jae	SHORT $LN31@MainCRTSta
    	jpe	SHORT $LN31@MainCRTSta
    ; Line 27
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 2
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN31@MainCRTSta:
    ; Line 29
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff8000000000000
    	jne	SHORT $LN30@MainCRTSta
    	jpe	SHORT $LN30@MainCRTSta
    ; Line 30
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 4
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN30@MainCRTSta:
    ; Line 32
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@fff8000000000000
    	jbe	SHORT $LN29@MainCRTSta
    ; Line 33
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 8
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN29@MainCRTSta:
    ; Line 35
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@7ff0000000000000
    	jae	SHORT $LN28@MainCRTSta
    	jpe	SHORT $LN28@MainCRTSta
    ; Line 36
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 16
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN28@MainCRTSta:
    ; Line 38
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@7ff0000000000000
    	jne	SHORT $LN27@MainCRTSta
    	jpe	SHORT $LN27@MainCRTSta
    ; Line 39
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 32					; 00000020H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN27@MainCRTSta:
    ; Line 41
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@7ff0000000000000
    	jbe	SHORT $LN26@MainCRTSta
    ; Line 42
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 64					; 00000040H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN26@MainCRTSta:
    ; Line 44
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@fff0000000000000
    	jae	SHORT $LN25@MainCRTSta
    	jpe	SHORT $LN25@MainCRTSta
    ; Line 45
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 128				; 00000080H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN25@MainCRTSta:
    ; Line 47
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff0000000000000
    	jne	SHORT $LN24@MainCRTSta
    	jpe	SHORT $LN24@MainCRTSta
    ; Line 48
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 256				; 00000100H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN24@MainCRTSta:
    ; Line 50
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@fff0000000000000
    	jbe	SHORT $LN23@MainCRTSta
    ; Line 51
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 512				; 00000200H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN23@MainCRTSta:
    ; Line 53
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@0000000000000000
    	jae	SHORT $LN22@MainCRTSta
    	jpe	SHORT $LN22@MainCRTSta
    ; Line 54
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 1024				; 00000400H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN22@MainCRTSta:
    ; Line 56
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@0000000000000000
    	jne	SHORT $LN21@MainCRTSta
    	jpe	SHORT $LN21@MainCRTSta
    ; Line 57
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 2048				; 00000800H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN21@MainCRTSta:
    ; Line 59
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@0000000000000000
    	jbe	SHORT $LN20@MainCRTSta
    ; Line 60
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 4096				; 00001000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN20@MainCRTSta:
    ; Line 62
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@8000000000000000
    	jae	SHORT $LN19@MainCRTSta
    	jpe	SHORT $LN19@MainCRTSta
    ; Line 63
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 8192				; 00002000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN19@MainCRTSta:
    ; Line 65
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@8000000000000000
    	jne	SHORT $LN18@MainCRTSta
    	jpe	SHORT $LN18@MainCRTSta
    ; Line 66
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 16384				; 00004000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN18@MainCRTSta:
    ; Line 68
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@8000000000000000
    	jbe	SHORT $LN17@MainCRTSta
    ; Line 69
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 32768				; 00008000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN17@MainCRTSta:
    ; Line 71
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@3ff0000000000000
    	jae	SHORT $LN16@MainCRTSta
    	jpe	SHORT $LN16@MainCRTSta
    ; Line 72
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 65536				; 00010000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN16@MainCRTSta:
    ; Line 74
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@3ff0000000000000
    	jne	SHORT $LN15@MainCRTSta
    	jpe	SHORT $LN15@MainCRTSta
    ; Line 75
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 131072				; 00020000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN15@MainCRTSta:
    ; Line 77
    	movsdx	xmm0, QWORD PTR indefinite$[rsp]
    	comisd	xmm0, QWORD PTR __real@3ff0000000000000
    	jbe	SHORT $LN14@MainCRTSta
    ; Line 78
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 262144				; 00040000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN14@MainCRTSta:
    ; Line 80
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff8000000000000
    	jne	SHORT $LN13@MainCRTSta
    	jpe	SHORT $LN13@MainCRTSta
    ; Line 81
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 524288				; 00080000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN13@MainCRTSta:
    ; Line 83
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@7ff0000000000000
    	jne	SHORT $LN12@MainCRTSta
    	jpe	SHORT $LN12@MainCRTSta
    ; Line 84
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 1048576				; 00100000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN12@MainCRTSta:
    ; Line 86
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff0000000000000
    	jne	SHORT $LN11@MainCRTSta
    	jpe	SHORT $LN11@MainCRTSta
    ; Line 87
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 2097152				; 00200000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN11@MainCRTSta:
    ; Line 89
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	xorpd	xmm0, QWORD PTR __mask@@NegDouble@
    	ucomisd	xmm0, QWORD PTR __real@fff0000000000000
    	jne	SHORT $LN10@MainCRTSta
    	jpe	SHORT $LN10@MainCRTSta
    ; Line 90
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 4194304				; 00400000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN10@MainCRTSta:
    ; Line 92
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	comisd	xmm0, QWORD PTR __real@fff0000000000000
    	jbe	SHORT $LN9@MainCRTSta
    ; Line 93
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 8388608				; 00800000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN9@MainCRTSta:
    ; Line 95
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	comisd	xmm0, QWORD PTR __real@0000000000000000
    	jbe	SHORT $LN8@MainCRTSta
    ; Line 96
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 16777216				; 01000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN8@MainCRTSta:
    ; Line 98
    	movsdx	xmm0, QWORD PTR infinity$[rsp]
    	xorpd	xmm0, QWORD PTR __mask@@NegDouble@
    	comisd	xmm0, QWORD PTR __real@8000000000000000
    	jae	SHORT $LN7@MainCRTSta
    	jpe	SHORT $LN7@MainCRTSta
    ; Line 99
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 33554432				; 02000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN7@MainCRTSta:
    ; Line 101
    	movsdx	xmm0, QWORD PTR zero$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff8000000000000
    	jne	SHORT $LN6@MainCRTSta
    	jpe	SHORT $LN6@MainCRTSta
    ; Line 102
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 67108864				; 04000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN6@MainCRTSta:
    ; Line 104
    	movsdx	xmm0, QWORD PTR zero$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@7ff0000000000000
    	jne	SHORT $LN5@MainCRTSta
    	jpe	SHORT $LN5@MainCRTSta
    ; Line 105
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 134217728				; 08000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN5@MainCRTSta:
    ; Line 107
    	movsdx	xmm0, QWORD PTR zero$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@fff0000000000000
    	jne	SHORT $LN4@MainCRTSta
    	jpe	SHORT $LN4@MainCRTSta
    ; Line 108
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 268435456				; 10000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN4@MainCRTSta:
    ; Line 110
    	movsdx	xmm0, QWORD PTR zero$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@0000000000000000
    	jne	SHORT $LN3@MainCRTSta
    	jpe	SHORT $LN3@MainCRTSta
    ; Line 111
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 536870912				; 20000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN3@MainCRTSta:
    ; Line 113
    	movsdx	xmm0, QWORD PTR zero$[rsp]
    	ucomisd	xmm0, QWORD PTR __real@8000000000000000
    	jne	SHORT $LN2@MainCRTSta
    	jpe	SHORT $LN2@MainCRTSta
    ; Line 114
    	mov	eax, DWORD PTR bitmask$[rsp]
    	add	eax, 1073741824				; 40000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN2@MainCRTSta:
    ; Line 116
    	xor	eax, eax
    	test	eax, eax
    	je	SHORT $LN1@MainCRTSta
    ; Line 117
    	mov	eax, DWORD PTR bitmask$[rsp]
    	sub	eax, -2147483648			; ffffffff80000000H
    	mov	DWORD PTR bitmask$[rsp], eax
    $LN1@MainCRTSta:
    ; Line 119
    	mov	eax, DWORD PTR bitmask$[rsp]
    ; Line 120
    	add	rsp, 40					; 00000028H
    	ret	0
    MainCRTStartup ENDP
    _TEXT	ENDS
    END
    The COMISD and UCOMISD instructions generated by the compiler set all three flags CF alias carry, PF alias parity and ZF alias zero when at least one operand is indefinite; the conditional jump instructions JA alias JNBE, JAE alias JNB alias JNC, JB alias JC alias JNAE, JBE alias JNA, JE alias JZ, and JNE alias JNZ, which inspect the CF and ZF flags, but don’t inspect the PF flag, are therefore not sufficient there: depending on whether they test the inverted or straight condition they must be accompanied by an additional JP alias JPE or JNP alias JPO instruction!
  4. Run the 64-bit console program case35.exe created in step 2. and display its exit code:

    .\case35.exe
    ECHO %ERRORLEVEL%
    1742433719
    1742433719 is equal to 0x67DB6DB7; the expected and correct exit code is but 0x63D00000 (1674575872): the IEEE 754 Standard for Floating-Point Arithmetic defines an indefinite value, which results for example from the division ±0.0÷±0.0, from the multiplication ±infinity×0.0, or from the subtraction of infinity−infinity, and which is unequal to any value, including itself!

    Note: the final comparison in line 116, where the compiler does not generate an UCOMISD instruction, but evaluates the constant expression, produces the correct result.

  5. Generate another assembly listing file case35.asm plus console program case35.exe from the source file case35.c created in step 1., now using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /Fa /fp:fast /Gy /Tccase35.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    case35.c
    case35.c(116) : warning C4127: conditional expression is constant
    
    Microsoft (R) Incremental Linker Version 10.00.40219.386
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    /out:case35.exe
    case35.obj
  6. Display the assembly listing file case35.asm created in step 4.:

    TYPE case35.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\case35.c
    	.686P
    	.XMM
    	include	listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    _DATA	SEGMENT
    COMM	__fltused:DWORD
    _DATA	ENDS
    PUBLIC	__real@3ff0000000000000
    PUBLIC	__real@8000000000000000
    PUBLIC	__real@fff0000000000000
    PUBLIC	__real@0000000000000000
    PUBLIC	__real@7ff0000000000000
    PUBLIC	__real@fff8000000000000
    PUBLIC	_MainCRTStartup
    EXTRN	__fltused:DWORD
    ;	COMDAT	__real@3ff0000000000000
    ; File c:\users\stefan\desktop\case35.c
    CONST	SEGMENT
    __real@3ff0000000000000 DQ 03ff0000000000000r	; 1
    CONST	ENDS
    ;	COMDAT	__real@8000000000000000
    CONST	SEGMENT
    __real@8000000000000000 DQ 08000000000000000r	; -0
    CONST	ENDS
    ;	COMDAT	__real@fff0000000000000
    CONST	SEGMENT
    __real@fff0000000000000 DQ 0fff0000000000000r	; -1.#INF
    CONST	ENDS
    ;	COMDAT	__real@0000000000000000
    CONST	SEGMENT
    __real@0000000000000000 DQ 00000000000000000r	; 0
    CONST	ENDS
    ;	COMDAT	__real@7ff0000000000000
    CONST	SEGMENT
    __real@7ff0000000000000 DQ 07ff0000000000000r	; 1.#INF
    CONST	ENDS
    ;	COMDAT	__real@fff8000000000000
    CONST	SEGMENT
    __real@fff8000000000000 DQ 0fff8000000000000r	; -1.#IND
    ; Function compile flags: /Odtp
    CONST	ENDS
    ;	COMDAT	_MainCRTStartup
    _TEXT	SEGMENT
    _zero$ = -32						; size = 8
    _indefinite$ = -24					; size = 8
    _bitmask$ = -12						; size = 4
    _infinity$ = -8						; size = 8
    _MainCRTStartup PROC					; COMDAT
    ; Line 18
    	push	ebp
    	mov	ebp, esp
    	sub	esp, 32					; 00000020H
    ; Line 19
    	fld	QWORD PTR __real@fff8000000000000
    	fstp	QWORD PTR _indefinite$[ebp]
    	fld	QWORD PTR __real@7ff0000000000000
    	fstp	QWORD PTR _infinity$[ebp]
    	fldz
    	fstp	QWORD PTR _zero$[ebp]
    ; Line 21
    	mov	DWORD PTR _bitmask$[ebp], 0
    ; Line 23
    	fld	QWORD PTR _indefinite$[ebp]
    	fld	QWORD PTR _indefinite$[ebp]
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN32@MainCRTSta
    ; Line 24
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 1
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN32@MainCRTSta:
    ; Line 26
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@fff8000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN31@MainCRTSta
    ; Line 27
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 2
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN31@MainCRTSta:
    ; Line 29
    	fld	QWORD PTR _indefinite$[ebp]
    	fld	QWORD PTR __real@fff8000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN30@MainCRTSta
    ; Line 30
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 4
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN30@MainCRTSta:
    ; Line 32
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@fff8000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN29@MainCRTSta
    ; Line 33
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 8
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN29@MainCRTSta:
    ; Line 35
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@7ff0000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN28@MainCRTSta
    ; Line 36
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 16					; 00000010H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN28@MainCRTSta:
    ; Line 38
    	fld	QWORD PTR _indefinite$[ebp]
    	fld	QWORD PTR __real@7ff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN27@MainCRTSta
    ; Line 39
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 32					; 00000020H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN27@MainCRTSta:
    ; Line 41
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@7ff0000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN26@MainCRTSta
    ; Line 42
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 64					; 00000040H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN26@MainCRTSta:
    ; Line 44
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@fff0000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN25@MainCRTSta
    ; Line 45
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 128				; 00000080H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN25@MainCRTSta:
    ; Line 47
    	fld	QWORD PTR _indefinite$[ebp]
    	fld	QWORD PTR __real@fff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN24@MainCRTSta
    ; Line 48
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 256				; 00000100H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN24@MainCRTSta:
    ; Line 50
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@fff0000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN23@MainCRTSta
    ; Line 51
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 512				; 00000200H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN23@MainCRTSta:
    ; Line 53
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@0000000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN22@MainCRTSta
    ; Line 54
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 1024				; 00000400H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN22@MainCRTSta:
    ; Line 56
    	fld	QWORD PTR _indefinite$[ebp]
    	fldz
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN21@MainCRTSta
    ; Line 57
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 2048				; 00000800H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN21@MainCRTSta:
    ; Line 59
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@0000000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN20@MainCRTSta
    ; Line 60
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 4096				; 00001000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN20@MainCRTSta:
    ; Line 62
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@8000000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN19@MainCRTSta
    ; Line 63
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 8192				; 00002000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN19@MainCRTSta:
    ; Line 65
    	fld	QWORD PTR _indefinite$[ebp]
    	fldz
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN18@MainCRTSta
    ; Line 66
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 16384				; 00004000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN18@MainCRTSta:
    ; Line 68
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@8000000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN17@MainCRTSta
    ; Line 69
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 32768				; 00008000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN17@MainCRTSta:
    ; Line 71
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@3ff0000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN16@MainCRTSta
    ; Line 72
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 65536				; 00010000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN16@MainCRTSta:
    ; Line 74
    	fld	QWORD PTR _indefinite$[ebp]
    	fld1
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN15@MainCRTSta
    ; Line 75
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 131072				; 00020000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN15@MainCRTSta:
    ; Line 77
    	fld	QWORD PTR _indefinite$[ebp]
    	fcomp	QWORD PTR __real@3ff0000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN14@MainCRTSta
    ; Line 78
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 262144				; 00040000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN14@MainCRTSta:
    ; Line 80
    	fld	QWORD PTR _infinity$[ebp]
    	fld	QWORD PTR __real@fff8000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN13@MainCRTSta
    ; Line 81
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 524288				; 00080000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN13@MainCRTSta:
    ; Line 83
    	fld	QWORD PTR _infinity$[ebp]
    	fld	QWORD PTR __real@7ff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN12@MainCRTSta
    ; Line 84
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 1048576				; 00100000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN12@MainCRTSta:
    ; Line 86
    	fld	QWORD PTR _infinity$[ebp]
    	fld	QWORD PTR __real@fff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN11@MainCRTSta
    ; Line 87
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 2097152				; 00200000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN11@MainCRTSta:
    ; Line 89
    	fld	QWORD PTR _infinity$[ebp]
    	fchs
    	fld	QWORD PTR __real@fff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN10@MainCRTSta
    ; Line 90
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 4194304				; 00400000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN10@MainCRTSta:
    ; Line 92
    	fld	QWORD PTR _infinity$[ebp]
    	fcomp	QWORD PTR __real@fff0000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN9@MainCRTSta
    ; Line 93
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 8388608				; 00800000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN9@MainCRTSta:
    ; Line 95
    	fld	QWORD PTR _infinity$[ebp]
    	fcomp	QWORD PTR __real@0000000000000000
    	fnstsw	ax
    	test	ah, 65					; 00000041H
    	jne	SHORT $LN8@MainCRTSta
    ; Line 96
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 16777216				; 01000000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN8@MainCRTSta:
    ; Line 98
    	fld	QWORD PTR _infinity$[ebp]
    	fchs
    	fcomp	QWORD PTR __real@8000000000000000
    	fnstsw	ax
    	test	ah, 5
    	jp	SHORT $LN7@MainCRTSta
    ; Line 99
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 33554432				; 02000000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN7@MainCRTSta:
    ; Line 101
    	fld	QWORD PTR _zero$[ebp]
    	fld	QWORD PTR __real@fff8000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN6@MainCRTSta
    ; Line 102
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 67108864				; 04000000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN6@MainCRTSta:
    ; Line 104
    	fld	QWORD PTR _zero$[ebp]
    	fld	QWORD PTR __real@7ff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN5@MainCRTSta
    ; Line 105
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 134217728				; 08000000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN5@MainCRTSta:
    ; Line 107
    	fld	QWORD PTR _zero$[ebp]
    	fld	QWORD PTR __real@fff0000000000000
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN4@MainCRTSta
    ; Line 108
    	mov	ecx, DWORD PTR _bitmask$[ebp]
    	add	ecx, 268435456				; 10000000H
    	mov	DWORD PTR _bitmask$[ebp], ecx
    $LN4@MainCRTSta:
    ; Line 110
    	fld	QWORD PTR _zero$[ebp]
    	fldz
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN3@MainCRTSta
    ; Line 111
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	add	edx, 536870912				; 20000000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN3@MainCRTSta:
    ; Line 113
    	fld	QWORD PTR _zero$[ebp]
    	fldz
    	fucompp
    	fnstsw	ax
    	test	ah, 68					; 00000044H
    	jp	SHORT $LN2@MainCRTSta
    ; Line 114
    	mov	eax, DWORD PTR _bitmask$[ebp]
    	add	eax, 1073741824				; 40000000H
    	mov	DWORD PTR _bitmask$[ebp], eax
    $LN2@MainCRTSta:
    ; Line 116
    	xor	ecx, ecx
    	je	SHORT $LN1@MainCRTSta
    ; Line 117
    	mov	edx, DWORD PTR _bitmask$[ebp]
    	sub	edx, -2147483648			; 80000000H
    	mov	DWORD PTR _bitmask$[ebp], edx
    $LN1@MainCRTSta:
    ; Line 119
    	mov	eax, DWORD PTR _bitmask$[ebp]
    ; Line 120
    	mov	esp, ebp
    	pop	ebp
    	ret	0
    _MainCRTStartup ENDP
    _TEXT	ENDS
    END
  7. Run the 32-bit console program case35.exe created in step 4. and display its exit code:

    .\case35.exe
    ECHO %ERRORLEVEL%
    1674575872
    The 32-bit program built with the same compiler options yields the correct result!

Case 36

Braindead use of superfluous temporary variables with a stack frame and avoidable conditional branch instruction generated for the inlined memcmp() intrinsic by the Visual C 2010 compiler.

Demonstration

  1. Create the text file case36.c with the following content in an arbitrary, preferable empty directory:

    // Copyleft © 2018-2024, Stefan Kanthak <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>
    
    int main(int argc, char *argv[])
    {
        return memcmp(argv[0], "case36.exe", sizeof("case36.exe"));
    }
  2. Generate the assembly listing file case36.asm from the source file case36.c created in step 1., using the Visual C 2010 compiler for the x86 alias I386 processor architecture:

    CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1isy /Tccase36.c /W4 /Zl
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    Compiler Passes:
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe:        Version 16.00.40219.1
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll:        Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll:      Version 16.00.40219.400
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll:        Version 16.00.40219.449
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe:      Version 10.00.40219.386
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll:  Version 10.00.40219.478
     C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1
    
    case36.c
  3. Display the assembly listing file case36.asm created in step 2.:

    TYPE case36.asm
    ; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 
    
    	TITLE	C:\Users\Stefan\Desktop\case36.c
    	.686P
    	.XMM
    	include listing.inc
    	.model	flat
    
    INCLUDELIB LIBCMT
    INCLUDELIB OLDNAMES
    
    PUBLIC  ??_C@_0L@EJIHOLIK@case36?4exe?$AA@		; `string'
    PUBLIC	_main
    ;	COMDAT	??_C@_0L@EJIHOLIK@case36?4exe?$AA@
    ; File c:\users\stefan\desktop\case36.c
    CONST	SEGMENT
    ??_C@_0L@EJIHOLIK@case36?4exe?$AA@ DB 'case36.exe', 00H	; `string'
    ; Function compile flags: /Odspy
    CONST	ENDS
    ;	COMDAT	_main
    _TEXT	SEGMENT
    tv68 = -8						; size = 4
    tv72 = -4						; size = 4
    _argc$ = 8						; size = 4
    _argv$ = 12						; size = 4
    _main	PROC						; COMDAT
    ; Line 4
    	push	ebp
    	mov	ebp, esp
    	push	ecx
    	push	ecx
    	push	esi
    	push	edi
    ; Line 5
    	push	11					; 0000000bH
    	pop	ecx
    	mov	edi, OFFSET ??_C@_0L@EJIHOLIK@case36?4exe?$AA@
    	mov	esi, OFFSET ??_C@_0L@EJIHOLIK@case36?4exe?$AA@
    	mov	eax, [esp+8]
    	mov	eax, DWORD PTR _argv$[ebp]
    	mov	esi, DWORD PTR [eax]
    	mov	edi, DWORD PTR [eax]
    	xor	eax, eax
    	mov	DWORD PTR tv72[ebp], eax
    	repe cmpsb
    	seta	al
    	sbb	eax, 0
    	je	SHORT $LN3@main
    	sbb	eax, eax
    	sbb	eax, -1
    	mov	DWORD PTR tv72[ebp], eax
    $LN3@main:
    	mov	eax, DWORD PTR tv72[ebp]
    	mov	DWORD PTR tv68[ebp], eax
    	mov	eax, DWORD PTR tv68[ebp]
    ; Line 6
    	pop	edi
    	pop	esi
    	leave
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    END
    OUCH: 5 (in words: five) superfluous MOV instructions, performing as many superfluous memory accesses on 2 superfluous temporary variables allocated via as many superfluous PUSH instructions, wasting 17 bytes in total, despite the /Os compiler option given!

    OOPS: despite the /Oy compiler option given, a frame pointer is setup in register EBP, using 3 superfluous instructions in 4 bytes!

    OOPS: with destination and source for the CMPSB instruction swapped, the following conditional branch JE plus the 2 SBB instructions can be replaced by a SETA and a SBB instruction, saving 1 instruction and 1 byte.

    Note: properly optimising for size, the compiler should but generate only 14 instructions in just 29 bytes instead of 25 instructions in 50 bytes!

Contact

If you miss anything here, have additions, comments, corrections, criticism or questions, want to give feedback, hints or tipps, report broken links, bugs, deficiencies, errors, inaccuracies, misrepresentations, omissions, shortcomings, vulnerabilities or weaknesses, …: don’t hesitate to contact me and feel free to ask, comment, criticise, flame, notify or report!

Use the X.509 certificate to send S/MIME encrypted mail.

Note: email in weird format and without a proper sender name is likely to be discarded!

I dislike HTML (and even weirder formats too) in email, I prefer to receive plain text.
I also expect to see your full (real) name as sender, not your nickname.
I abhor top posts and expect inline quotes in replies.

Terms and Conditions

By using this site, you signify your agreement to these terms and conditions. If you do not agree to these terms and conditions, do not use this site!

Data Protection Declaration

This web page records no (personal) data and stores no cookies in the web browser.

The web service is operated and provided by

Telekom Deutschland GmbH
Business Center
D-64306 Darmstadt
Germany
<‍hosting‍@‍telekom‍.‍de‍>
+49 800 5252033

The web service provider stores a session cookie in the web browser and records every visit of this web site with the following data in an access log on their server(s):


Copyright © 1995–2024 • Stefan Kanthak • <‍stefan‍.‍kanthak‍@‍nexgo‍.‍de‍>