OptimisingMicrosoft® Visual C Compiler
optimisingVisual C compiler, with (currently) 37 cases, plus a side note on the implementation of the
The most hilarious
cases are
0,
1
and
5,
the bugs are shown in cases
13,
34
and
35,
while the worst cases are presented in cases
8
to
11,
15,
20
to
29,
32,
33
and
36.
_alldiv()
,
_allrem()
and _alldvrm()
for division of
two signed 64-bit integers, returning a signed 64-bit quotient,
remainder or both,
_aulldiv()
,
_aullrem()
and _aulldvrm()
for division of
two unsigned 64-bit integers, returning an unsigned 64-bit quotient,
remainder or both, plus
_allmul()
for multiplication of two signed as well as unsigned 64-bit
integers, returning the (un)signed product modulo 264.
_allshl()
for
both a signed or an unsigned 64-bit integer, _allshr()
for a signed 64-bit integer, plus _aullshr()
for an
unsigned 64-bit integer.
Note: all helper routines use non-standard calling or naming convention, none of them can be called from C or C++ code!
Especially the implementation of the 64÷64-bit division routines (albeit written in assembler) is very poor: on current Intel® processors (i.e. those introduced in the last 15 years) they are about 5 to 9 times slower than properly optimised code, and about 7 to 11 times slower than native 128÷64-bit division operations.
Note: according to comments in the source code
blcrtasm.asm
,
their initial version, handling 32-bit integers on 16-bit
Intel 8086/8088 processors, was written
November 29, 1983; they were modified
November 19, 1993, to handle 64-bit integers on 32-bit
Intel 80386 processors (introduced October 1985, i.e. 8
years
earlier), but without taking advantage of the
32-bit processor’s new
capabilities: the
slow loop which shifts the operands by just one bit
per pass with SHR
and
RCR
instructions was
not replaced with a
BSR
instruction followed by
two pairs of SHLD
plus SHL
and
SHRD
plus
SHR
instructions to shift
the operands in one go.
Measured on an Intel processor of the
Core™2 family running under
Windows™ PE,
dividing 16 billion pairs of 64-bit pseudo-random numbers produced
by 6 different independent (deterministic random bit) generators,
_aulldiv()
and _aullrem()
consume from 114 to 125 processor clock
cycles per call; the assembler routines provided with my own
NOMSVCRT.LIB
consume from 16 to 32 processor clock cycles per call, while the
native 64-bit machine instructions consume from 8 to 19 processor
clock cycles per operation.
Note: codenamed Penryn, Wolfdale and Yorkfield, Intel introduced these processors from late 2007 to early 2009.
For comparison: the (corresponding)
__divdi3()
,
__moddi3()
,
__udivdi3()
and
__umoddi3()
assembler routines from the builtins library of
LLVM’s
compiler-rt
runtime libraries,
originally written in December 2008 by Apple’s
Stephen Canon, adapted for the Visual C compiler
and improved by me, consume from 27 to 37 processor clock cycles per
call.
Caveat: these
heavily-optimized assembly
§
routines are not shipped with current packages of
LLVM for Windows, but (nearly 5 times
bigger and more than 2 times slower)
optimized implementations
§
written in C, which too are not
properly optimised: instead to take advantage of the 64-bit shift
operations supported by their own
Clang
compiler they use a bunch of conditionally executed complementary
32-bit left and right shift operations to handle shift counts below
and above the word length individually, then combine their results
with logical or
operations.
Note: even this not so optimised C implementation is about 2 to 3 times faster than Microsoft’s assembler implementation!
§ These questionable attributions are
made on the
compiler-rt
runtime libraries
web page.
Warning: the _lldiv()
,
_llrem()
, _ulldiv()
and
_ullrem()
assembler routines published by
AMD®
in their
Software Optimization Guide for AMD Family 15h Processors
Software Optimization Guide for AMD Family 15h Processors,
Publication No. 47414, Revision 3.06, January 2012,
Software Optimization Guide for AMD Family 10h and 12h Processors,
Publication No. 40546, Revision 3.13, February 2011,
Software Optimization Guide for AMD Family 10h and 12h Processors,
Publication No. 40546, Revision 3.10, February 2009,
Software Optimization Guide for AMD64 Processors,
Publication No. 25112, Revision 3.06, September 2005,
Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors,
Publication No. 25112, Revision 3.04, March 2004,
Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors,
Publication No. 25112, Revision 3.03, September 2003,
AMD Athlon™ Processor x86 Code Optimization Guide,
Publication No. 22007, Revision K, February 2002,
have bugs and return wrong results; for example, unsigned division
of 18446744073709551615÷4294967299 yields the quotient
4294967294 instead of 4294967293 (in other notation
(264−1)÷(232+3)=232−2
instead of 232−3, or
0xFFFFFFFFFFFFFFFF÷0x100000003=0xFFFFFFFE instead of
0xFFFFFFFD), and the remainder 18446744069414584325 instead of 8.
This bug shows for multiple (other) dividends too,
and also with the divisors 7516192769=0x1C0000001,
15032385539=0x380000003, …!
The table shows the execution times of 64÷64-bit division routines from different libraries on several processors, in average, minimum and maximum processor clock cycles per call, as well as their code sizes in bytes and number of instructions; the upper half for the routines written in assembler, the lower half for the native 64-bit hardware and the routines written in C.
NOMSVCRT.LIB |
LLVM Compiler-RT |
Microsoft | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_aulldiv() | _aullrem() |
_aulldiv() | _aullrem() |
_aulldiv() | _aullrem() |
||||||||||||||
36 | 39 | 50 [66] | 51 [71] | 42 | 43 | instructions | |||||||||||||
92 | 109 | 125 [157] | 132 [172] | 102 | 115 | bytes | |||||||||||||
AMD Ryzen™5 3600 | 5 | 8 | 12 | 7 | 10 | 12 | 12 | 16 | 19 | 13 | 15 | 18 | 44 | 50 | 59 | 47 | 49 | 53 | |
AMD Ryzen™7 2700X | 9 | 11 | 16 | 11 | 14 | 17 | 16 | 18 | 23 | 16 | 18 | 23 | 53 | 58 | 63 | 56 | 64 | 72 | minimum, |
Intel Core i5-8400 | 13 | 16 | 19 | 16 | 20 | 23 | 16 | 24 | 27 | 16 | 24 | 27 | 140 | 146 | 154 | 139 | 148 | 154 | average |
Intel Core i5-7400 | 10 | 18 | 37 | 13 | 20 | 38 | 13 | 24 | 45 | 13 | 24 | 45 | 129 | 136 | 141 | 132 | 140 | 148 | and |
Intel Core i5-6600 | 13 | 16 | 19 | 16 | 21 | 23 | 16 | 24 | 27 | 16 | 24 | 27 | 133 | 144 | 155 | 135 | 146 | 154 | maximum |
Intel Core i5-4670 | 9 | 16 | 19 | 12 | 19 | 22 | 17 | 22 | 26 | 16 | 24 | 28 | 118 | 128 | 141 | 119 | 129 | 137 | number |
Intel Core i5-3550 | 12 | 16 | 20 | 16 | 20 | 23 | 17 | 24 | 29 | 18 | 24 | 29 | 118 | 130 | 149 | 122 | 135 | 143 | of |
Intel Core i3-2328M | 21 | 22 | 24 | 26 | 23 | 35 | 22 | 35 | 129 | 158 | 142 | 164 | processor | ||||||
Intel Core™2 Duo P8700 | 16 | 19 | 23 | 18 | 24 | 32 | 28 | 32 | 37 | 27 | 31 | 36 | 114 | 117 | 121 | 117 | 122 | 125 | clock |
Intel Core™2 Duo E8500 | 17 | 19 | 24 | 20 | 24 | 30 | 29 | 32 | 37 | 28 | 31 | 36 | 114 | 119 | 125 | 119 | 124 | 128 | cycles |
Intel Core™2 Quad Q8400 | 17 | 19 | 24 | 20 | 24 | 30 | 29 | 33 | 38 | 27 | 32 | 37 | 116 | 120 | 127 | 119 | 125 | 132 | per |
AMD A4-9125 Radeon™3 | 25 | 30 | 37 | 22 | 29 | 38 | 32 | 37 | 48 | 31 | 38 | 48 | 91 | 94 | 96 | 93 | 97 | 102 | function |
AMD Athlon™II X4 635 | 68 | 72 | 75 | 69 | 71 | 73 | 64 | 72 | 78 | 64 | 72 | 78 | 129 | 134 | 139 | 131 | 136 | 138 | call |
Intel Pentium®4 | 87 | 112 | 150 | 98 | 124 | 163 | 97 | 129 | 181 | 117 | 143 | 182 | 301 | 332 | 376 | 320 | 351 | 371 |
Note: the values in square brackets in the last two
rows denote the number of instructions and bytes of the original
__udivdi3()
and
__umoddi3()
assembler routines.
Native | LLVM Compiler-RT |
Microsoft | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DIV |
|
__udivdi3() | __umoddi3() |
__udivdi3() | __umoddi3() |
||||||||||||||
3 | 4 | 8 + 254 | 33 + 254 | 8 + 263 | 12 + 263 | instructions | |||||||||||||
7 | 10 | 27 + 732 | 79 + 732 | 27 + 638 | 40 + 638 | bytes | |||||||||||||
AMD Ryzen™5 3600 | 5 | 7 | 10 | 5 | 7 | 10 | 31 | 46 | 57 | 37 | 53 | 63 | 34 | 56 | 75 | 35 | 57 | 73 | |
AMD Ryzen™7 2700X | 5 | 7 | 10 | 5 | 7 | 10 | 36 | 51 | 61 | 43 | 56 | 67 | 42 | 61 | 76 | 44 | 61 | 74 | minimum, |
Intel Core i5-8400 | 17 | 20 | 23 | 17 | 20 | 23 | 34 | 49 | 56 | 44 | 58 | 65 | 43 | 67 | 81 | 47 | 68 | 82 | average |
Intel Core i5-7400 | 16 | 18 | 19 | 16 | 18 | 20 | 30 | 45 | 52 | 36 | 51 | 59 | 33 | 58 | 72 | 33 | 58 | 72 | and |
Intel Core i5-6600 | 17 | 20 | 23 | 17 | 20 | 23 | 30 | 49 | 56 | 43 | 57 | 65 | 36 | 65 | 81 | 39 | 66 | 82 | maximum |
Intel Core i5-4670 | 18 | 21 | 24 | 18 | 21 | 24 | 36 | 47 | 54 | 43 | 56 | 63 | 32 | 58 | 72 | 35 | 59 | 73 | number |
Intel Core i5-3550 | 21 | 25 | 27 | 21 | 25 | 27 | 28 | 44 | 53 | 36 | 52 | 58 | 33 | 60 | 75 | 35 | 61 | 77 | of |
Intel Core i3-2328M | 21 | 26 | 23 | 25 | 45 | 65 | 56 | 74 | processor | ||||||||||
Intel Core™2 Duo P8700 | 8 | 14 | 17 | 9 | 14 | 19 | 49 | 59 | 67 | 62 | 72 | 82 | 60 | 75 | 82 | 61 | 75 | 83 | clock |
Intel Core™2 Duo E8500 | 51 | 60 | 67 | 63 | 72 | 82 | 61 | 76 | 83 | 63 | 76 | 84 | cycles | ||||||
Intel Core™2 Quad Q8400 | 8 | 14 | 19 | 8 | 14 | 19 | 50 | 60 | 67 | 63 | 73 | 82 | 62 | 77 | 83 | 62 | 77 | 84 | per |
AMD A4-9125 Radeon™3 | 6 | 8 | 10 | 6 | 8 | 11 | 52 | 66 | 77 | 63 | 73 | 88 | 61 | 83 | 99 | 62 | 84 | 102 | function |
AMD Athlon™II X4 635 | 74 | 76 | 80 | 74 | 76 | 80 | 51 | 65 | 72 | 60 | 74 | 82 | 56 | 71 | 82 | 57 | 72 | 80 | call |
Intel Pentium®4 | 99 | 125 | 162 | 100 | 143 | 190 | 76 | 112 | 145 | 78 | 114 | 146 |
Note: optimising the
__udivmoddi4()
routine for speed, Microsoft’s current
Visual C 2017 compiler emits 9 instructions
more than LLVM’s
Clang
compiler, counting but 94 bytes less; the
__udivdi3()
and
__umoddi3()
routines call
__udivmoddi4()
to perform the division, just like the
__divdi3()
and
__moddi3()
routines shown in case
23.
WINNT.H
of the
Windows
SDK (are
supposed to) generate just a single multiply instruction:
Multiplies two signed 32-bit integers, returning a signed 64-bit integer result. The function performs optimally on 32-bit Windows.
[…]
This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
Multiplies two unsigned 32-bit integers, returning an unsigned 64-bit integer result. The function performs optimally on 32-bit Windows.Contrary to these
[…]
This function is implemented on all platforms by optimal inline code: a single multiply instruction that returns a 64-bit result.
_allmul()
instead of the single multiply instruction!
Create the text file case0.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2004-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define Int32x32To64(a, b) ((long long)(((long long)((long)(a))) * ((long)(b))))
#define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
int main(int argc)
{
long long x = argc * -argc;
long long y = Int32x32To64(argc, -argc);
long long z = UInt32x32To64(argc, -argc);
}
Generate the assembly listing file case0.asm
from the
source file case0.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Tccase0.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case0.c
Display the assembly listing file case0.asm
created in
step 2.:
TYPE case0.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case0.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _mainNotice the difference between unsigned and signed multiplication: while a single multiply instruction is generated for the former, a call of the (almost) undocumented compiler helper routineEXTRN __allmul:PROC; Function compile flags: /Odtp ; COMDAT _main _TEXT SEGMENT _z$ = -24 ; size = 8 _y$ = -16 ; size = 8 _x$ = -8 ; size = 8 _argc$ = 8 ; size = 4 _main PROC ; COMDAT ; File c:\users\stefan\desktop\case0.c ; Line 7 push ebp mov ebp, esp sub esp, 24 ; 00000018Hpush esi; Line 8 mov eax, DWORD PTR _argc$[ebp] neg eax imul DWORD PTR _argc$[ebp]imul eax, DWORD PTR _argc$[ebp] cdqmov DWORD PTR _x$[ebp], eax mov DWORD PTR _x$[ebp+4], edx ; Line 9mov eax, DWORD PTR _argc$[ebp] cdq mov ecx, eax mov esi, edxmov eax, DWORD PTR _argc$[ebp] neg eax imul DWORD PTR _argc$[ebp]cdq push edx push eax push esi push ecx call __allmulmov DWORD PTR _y$[ebp], eax mov DWORD PTR _y$[ebp+4], edx ; Line 10mov edx, DWORD PTR _argc$[ebp] neg edxmov eax, DWORD PTR _argc$[ebp] neg eax mul DWORD PTR _argc$[ebp]mul edxmov DWORD PTR _z$[ebp], eax mov DWORD PTR _z$[ebp+4], edx ; Line 11 xor eax, eaxpop esi mov esp, ebp pop ebpleave ret 0 _main ENDP _TEXT ENDS END
_allmul()
is generated for the latter!
real lifeexample where such unoptimised code is generated, see the MSDN article Converting a time_t Value to a File Time and the MSKB article 167296!
__emul()
and
__emulu()
introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
#define Int32x32To64(a, b) ((long long)(((long long)((long)(a))) * ((long)(b))))
#define UInt32x32To64(a, b) ((unsigned long long)(((unsigned long long)((unsigned int)(a))) * ((unsigned int)(b))))
#else
long long __emul(int, int);
unsigned long long __emulu(unsigned int, unsigned int);
#pragma intrinsic(__emul, __emulu)
#define Int32x32To64 __emul
#define UInt32x32To64 __emulu
#endif
Note: Visual C 2017 and
earlier fail to provide the inverseintrinsic functions for 64÷32-bit integer division; Visual C 2019 finally introduced them as
_div64()
and
_udiv64()
.
Of course this also applies to the preprocessor macros (really:
inline assembler functions)
Int64ShllMod32()
,
Int64ShraMod32()
and
Int64ShrlMod32()
defined in the header file WINNT.H
of the
Windows
SDK; these too
should have been replaced a long time ago by the
intrinsic functions
__ll_lshift()
,
__ll_rshift()
and
__ull_rshift()
introduced with the Visual C 2005 compiler!
#if _MSC_VER < 1400
…
#else
unsigned long long __ll_lshift(unsigned long long, int);
long long __ll_rshift(long long, int);
unsigned long long __ull_rshift(unsigned long long, int);
#pragma intrinsic(__ll_lshift, __ll_rshift, __ull_rshift)
#define Int64ShllMod32 __ll_lshift
#define Int64ShraMod32 __ll_rshift
#define Int64ShrlMod32 __ull_rshift
#endif
The sample code for converting from seconds since
January 1, 1970, to 100 nano-seconds since
January 1, 1601, should be (re)written without
preprocessor macros and use the intrinsic function
__emulu()
instead:
#include <windows.h>
VOID EpochToFileTime(ULONG seconds, LPFILETIME pft)
{
ULONGLONG ull = __emulu(seconds, 10000000UL) + 116444736000000000ULL;
pft->dwLowDateTime = ull;
pft->dwHighDateTime = ull >> 32;
}
Create the text file case1.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__inline
long long __fastcall Int32x32To64(long v, long w)
{
return (long long) v * w;
}
long __fastcall Int32x32To64Div32(long x, long y, long z)
{
return Int32x32To64(x, y) / z;
}
long __fastcall Int32x32To64Rem32(long x, long y, long z)
{
return Int32x32To64(x, y) % z;
}
__inline
unsigned long long __fastcall UInt32x32To64(unsigned long v, unsigned long w)
{
return (unsigned long long) v * w;
}
unsigned long __fastcall UInt32x32To64Div32(unsigned long x, unsigned long y, unsigned long z)
{
return UInt32x32To64(x, y) / z;
}
unsigned long __fastcall UInt32x32To64Rem32(unsigned long x, unsigned long y, unsigned long z)
{
return UInt32x32To64(x, y) % z;
}
Generate the assembly listing file case1.asm
from the
source file case1.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase1.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case1.c case1.c(11): warning C4244: 'return' : conversion from '__int64' to 'long', possible loss of data case1.c(27): warning C4244: 'return' : conversion from 'unsigned __int64' to 'unsigned long', possible loss of data
Display the assembly listing file case1.asm
created in
step 2.:
TYPE case1.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case1.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC @Int32x32To64@8 PUBLIC @Int32x32To64Div32@12 PUBLIC @Int32x32To64Rem32@12 PUBLIC @UInt32x32To64@8 PUBLIC @UInt32x32To64Div32@12 PUBLIC @UInt32x32To64Rem32@12While the compiler here (contrary to case 0) generates the proper code for the multiplications, it but fails to generate the corresponding proper code for the immediately following divisions.EXTRN __alldiv:PROC EXTRN __allrem:PROC EXTRN __aulldiv:PROC EXTRN __aullrem:PROC; Function compile flags: /Ogtpy ; COMDAT @Int32x32To64@8 _TEXT SEGMENT @Int32x32To64@8 PROC ; COMDAT ; _v$ = ecx ; _w$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 6 mov eax, ecx imul edx ; Line 7 ret 0 @Int32x32To64@8 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT @Int32x32To64Div32@12 _TEXT SEGMENT _z$ = 8 ; size = 4 @Int32x32To64Div32@12 PROC ; COMDAT _x$ = ecx _y$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 6 mov eax, ecx imul edx ; Line 10push esi; Line 6mov esi, eax mov ecx, edx; Line 11 idiv DWORD PTR _z$[esp-4]mov eax, DWORD PTR _z$[esp] cdq push edx push eax push ecx push esi call __alldiv pop esi; Line 12 ret 4 @Int32x32To64Div32@12 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT @Int32x32To64Rem32@12 _TEXT SEGMENT _z$ = 8 ; size = 4 @Int32x32To64Rem32@12 PROC ; COMDAT _x$ = ecx _y$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 6 mov eax, ecx imul edx ; Line 15push esi; Line 6mov esi, eax mov ecx, edx; Line 16 idiv DWORD PTR _z$[esp-4] mov eax, edxmov eax, DWORD PTR _z$[esp] cdq push edx push eax push ecx push esi call __allrem pop esi; Line 17 ret 0 @Int32x32To64Rem32@12 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT @UInt32x32To64@8 _TEXT SEGMENT @UInt32x32To64@8 PROC ; COMDAT ; _v$ = ecx ; _w$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 22 mov eax, ecx mul edx ; Line 23 ret 0 @UInt32x32To64@8 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT @UInt32x32To64Div32@12 _TEXT SEGMENT _z$ = 8 ; size = 4 @UInt32x32To64Div32@12 PROC ; COMDAT _x$ = ecx _y$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 22 mov eax, ecx mul edx ; Line 27 div DWORD PTR _z$[esp-4]push 0 push DWORD PTR _z$[esp] push edx push eax call __aulldiv; Line 28 ret 0 @UInt32x32To64Div32@12 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT @UInt32x32To64Rem32@12 _TEXT SEGMENT _z$ = 8 ; size = 4 @UInt32x32To64Rem32@12 PROC ; COMDAT _x$ = ecx _y$ = edx ; File c:\users\stefan\desktop\case1.c ; Line 22 mov eax, ecx mul edx ; Line 32 div DWORD PTR _z$[esp-4] mov eax, edxpush 0 push DWORD PTR _z$[esp] push edx push eax call __aullrem; Line 33 ret 0 @UInt32x32To64Rem32@12 ENDP _TEXT ENDS END
Also notice the difference between the signed and unsigned variants
of the combined multiplication and division routines: instead to
push the (properly sign-extended) divisor first and the product
afterwards, the product is computed first, then moved into two
(intermediate) registers which are finally pushed for the calls of
the
_alldiv()
,
_allrem()
,
_aulldiv()
and _aullrem()
compiler helper routines.
Create the text file case2.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
extern long long foo(void);
extern long long bar(void);
long long product(void)
{
return foo() * bar();
}
Generate the assembly listing file case2.asm
from the
source file case2.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase2.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case2.c
Display the assembly listing file case2.asm
created in
step 2.:
TYPE case2.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case2.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _product EXTRN _foo:PROC EXTRN _bar:PROC EXTRN __allmul:PROC ; Function compile flags: /Ogtpy ; COMDAT _product _TEXT SEGMENT _product PROC ; COMDAT ; File c:\users\stefan\desktop\case2.c ; Line 7Multiplication is commutative, so the arguments for the external routinepush esi push edi; Line 8 call _foomov edi, eax mov esi, edxpush edx push eax call _bar push edx push eaxpush esi push edicall __allmulpop edi pop esi; Line 9 ret 0 _product ENDP _TEXT ENDS END
_allmul()
can be swapped, saving 6 of the 13
instructions generated, and without clobbering the registers
EDI
and ESI
for intermediate storage.
Create the text file case3.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
long long __stdcall div(long long foo, long long bar)
{
return foo / bar;
}
long long __stdcall mod(long long foo, long long bar)
{
return foo % bar;
}
long long __stdcall mul(long long foo, long long bar)
{
return foo * bar;
}
Generate the assembly listing file case3.asm
from the
source file case3.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase3.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case3.c
Display the assembly listing file case3.asm
created in
step 2.:
TYPE case3.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case3.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _div@16 PUBLIC _mod@16 PUBLIC _mul@16 EXTRN __alldiv:PROC EXTRN __allmul:PROC EXTRN __allrem:PROC ; Function compile flags: /Ogtpy ; COMDAT _div@16 _TEXT SEGMENT _foo$ = 8 ; size = 8 _bar$ = 16 ; size = 8 _div@16 PROC ; COMDAT ; File c:\users\stefan\desktop\case3.c ; Line 5push DWORD PTR _bar$[esp] push DWORD PTR _bar$[esp] push DWORD PTR _foo$[esp+8] push DWORD PTR _foo$[esp+8] call __alldivjmp __alldiv ; Line 6ret 16 ; 00000010H_div@16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _mod@16 _TEXT SEGMENT _foo$ = 8 ; size = 8 _bar$ = 16 ; size = 8 _mod@16 PROC ; COMDAT ; File c:\users\stefan\desktop\case3.c ; Line 10push DWORD PTR _bar$[esp] push DWORD PTR _bar$[esp] push DWORD PTR _foo$[esp+8] push DWORD PTR _foo$[esp+8] call __allremjmp __allrem ; Line 11ret 16 ; 00000010H_mod@16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _mul@16 _TEXT SEGMENT _foo$ = 8 ; size = 8 _bar$ = 16 ; size = 8 _mul@16 PROC ; COMDAT ; File c:\users\stefan\desktop\case3.c ; Line 15push DWORD PTR _bar$[esp] push DWORD PTR _bar$[esp] push DWORD PTR _foo$[esp+8] push DWORD PTR _foo$[esp+8] call __allmuljmp __allmul ; Line 16ret 16 ; 00000010H_mul@16 ENDP _TEXT ENDS END
_aulldiv()
,
and _aullrem()
instead of a single call of
_aulldvrm()
.
Create the text file case4.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long __udivmoddi4(unsigned long long Dividend,
unsigned long long Divisor,
unsigned long long *Remainder)
{
#ifdef ALTERNATE
const unsigned long long Quotient = Dividend / Divisor;
const unsigned long long Modulus = Dividend % Divisor;
if (Remainder != 0)
*Remainder = Modulus;
return Quotient;
#else
if (Remainder != 0)
*Remainder = Dividend % Divisor;
return Dividend / Divisor;
#endif
}
Generate the assembly listing file case4.asm
from the
source file case4.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase4.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case4.c
Display the assembly listing file case4.asm
created in
step 2.:
TYPE case4.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case4.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC ___udivmoddi4 EXTRN __aulldiv:PROC EXTRN __aullrem:PROC ; Function compile flags: /Ogtpy ; COMDAT ___udivmoddi4 _TEXT SEGMENT _Dividend$ = 8 ; size = 8 _Divisor$ = 16 ; size = 8 _Remainder$ = 24 ; size = 4 ___udivmoddi4 PROC ; COMDAT ; File c:\users\stefan\desktop\case4.c ; Line 16 mov eax, DWORD PTR _Dividend$[esp-4] push esi mov esi, DWORD PTR _Remainder$[esp] test esi, esi je SHORT $LN2@ull ; Line 17 push DWORD PTR _Divisor$[esp+4] push DWORD PTR _Divisor$[esp+4] push DWORD PTR _Dividend$[esp+12] push eax call __aullrem mov DWORD PTR [esi], eax mov eax, DWORD PTR _Dividend$[esp] mov DWORD PTR [esi+4], edx $LN2@ull: ; Line 19 push DWORD PTR _Divisor$[esp+4] push DWORD PTR _Divisor$[esp+4] push DWORD PTR _Dividend$[esp+12] push eax call __aulldiv pop esi ; Line 21 ret 0 ___udivmoddi4 ENDP _TEXT ENDS END
Generate another assembly listing file case4.asm
from
the source file case4.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase4.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case4.c
Display the assembly listing file case4.asm
created in
step 4.:
TYPE case4.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case4.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC ___udivmoddi4 EXTRN __aulldvrm:PROC ; Function compile flags: /Ogtpy ; COMDAT ___udivmoddi4 _TEXT SEGMENT _Dividend$ = 8 ; size = 8 _Divisor$ = 16 ; size = 8 _Remainder$ = 24 ; size = 4 ___udivmoddi4 PROC ; COMDAT ; File c:\users\stefan\desktop\case4.c ; Line 6 push ebx push esi ; Line 8 push DWORD PTR _Divisor$[esp+8] push DWORD PTR _Divisor$[esp+8] push DWORD PTR _Dividend$[esp+16] push DWORD PTR _Dividend$[esp+16] call __aulldvrm ; Line 11 mov esi, DWORD PTR _Remainder$[esp+4] test esi, esi je SHORT $LN2@ull ; Line 12 mov DWORD PTR [esi], ecx mov DWORD PTR [esi+4], ebx $LN2@ull: ; Line 21 pop esi pop ebx ret 0 ___udivmoddi4 ENDP _TEXT ENDS END
Create the text file case5.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#ifndef SIGNED
unsigned dividebypowerof2(unsigned number, unsigned exponent)
{
return number / (1U << exponent);
}
unsigned modulopowerof2(unsigned number, unsigned exponent)
{
return number % (1U << exponent);
}
unsigned quotient(unsigned argument)
{
return dividebypowerof2(argument, 9);
}
unsigned remainder(unsigned argument)
{
return modulopowerof2(argument, 9);
}
#else
signed dividebypowerof2(signed number, unsigned exponent)
{
return number / (1 << exponent);
}
signed modulopowerof2(signed number, unsigned exponent)
{
return number % (1 << exponent);
}
signed quotient(signed argument)
{
return dividebypowerof2(argument, 9);
}
signed remainder(signed argument)
{
return modulopowerof2(argument, 9);
}
#endif
Generate the assembly listing file case5.asm
from the
source file case5.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase5.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case5.c
Display the assembly listing file case5.asm
created in
step 2.:
TYPE case5.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case5.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _dividebypowerof2 PUBLIC _modulopowerof2 PUBLIC _quotient PUBLIC _remainder ; Function compile flags: /Ogtpy ; COMDAT _dividebypowerof2 _TEXT SEGMENT _number$ = 8 ; size = 4 _exponent$ = 12 ; size = 4 _dividebypowerof2 PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 6 mov ecx, DWORD PTR _exponent$[esp-4]xor edx, edxmov eax, DWORD PTR _number$[esp-4]push esi mov esi, 1 shl esi, cl div esi pop esishr eax, cl ; Line 7 ret 0 _dividebypowerof2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _quotient _TEXT SEGMENT _argument$ = 8 ; size = 4 _quotient PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 6 mov eax, DWORD PTR _argument$[esp-4] shr eax, 9 ; Line 17 ret 0 _quotient ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _modulopowerof2 _TEXT SEGMENT _number$ = 8 ; size = 4 _exponent$ = 12 ; size = 4 _modulopowerof2 PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 11 mov ecx, DWORD PTR _exponent$[esp-4]xor edx, edxor eax, -1 shl eax, clmov eax, DWORD PTR _number$[esp-4]and eax, DWORD PTR _number$[esp-4]push esi mov esi, 1 shl esi, cl div esi pop esi mov eax, edx; Line 12 ret 0 _modulopowerof2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _remainder _TEXT SEGMENT _argument$ = 8 ; size = 4 _remainder PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 11 mov eax, DWORD PTR _argument$[esp-4] and eax, 511 ; 000001ffH ; Line 22 ret 0 _remainder ENDP _TEXT ENDS END
Generate another assembly listing file case5.asm
from
the source file case5.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro SIGNED
defined on the
command line:
CL.EXE /Bv /c /DSIGNED /Fa /FoNUL: /Gy /Ox /Tccase5.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case5.c
Display the assembly listing file case5.asm
created in
step 4.:
TYPE case5.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case5.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _dividebypowerof2 PUBLIC _modulopowerof2 PUBLIC _quotient PUBLIC _remainder ; Function compile flags: /Ogtpy ; COMDAT _dividebypowerof2 _TEXT SEGMENT _number$ = 8 ; size = 4 _exponent$ = 12 ; size = 4 _dividebypowerof2 PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 26 mov eax, DWORD PTR _number$[esp-4] mov ecx, DWORD PTR _exponent$[esp-4]push esi mov esi, 1cdqshl esi, cl idiv esi pop esinot ecx shr edx, 1 shr edx, cl not ecx add eax, edx sar eax, cl ; Line 27 ret 0 _dividebypowerof2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _quotient _TEXT SEGMENT _argument$ = 8 ; size = 4 _quotient PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 26 mov eax, DWORD PTR _argument$[esp-4] cdqand edx, 511 ; 000001ffHshr edx, 23 ; 00000017H add eax, edx sar eax, 9 ; Line 37 ret 0 _quotient ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _modulopowerof2 _TEXT SEGMENT _number$ = 8 ; size = 4 _exponent$ = 12 ; size = 4 _modulopowerof2 PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 31 mov eax, DWORD PTR _number$[esp-4] mov ecx, DWORD PTR _exponent$[esp-4] push ebxpush esi mov esi, 1cdqshl esi, cl idiv esi pop esi mov eax, edxxor ebx, ebx shld ebx, edx, cl or edx, -1 add ebx, eax shl edx, cl and edx, ebx sub eax, edx pop ebx ; Line 32 ret 0 _modulopowerof2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _remainder _TEXT SEGMENT _argument$ = 8 ; size = 4 _remainder PROC ; COMDAT ; File c:\users\stefan\desktop\case5.c ; Line 31 mov eax, DWORD PTR _argument$[esp-4]and eax, -2147483137 ; 800001ffH jns SHORT $LN5@remainder dec eax or eax, -512 ; fffffe00H inc eax $LN5@remainder:cdq shr edx, 23 ; 00000017H add edx, eax and edx, -512 ; fffffe00H sub eax, edx ; Line 42 ret 0 _remainder ENDP _TEXT ENDS END
Create the text file case6.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
long long foo(long long foo)
{
foo <<= 1;
foo += 1;
foo |= 1;
return foo;
}
long long bar(long long bar)
{
bar += bar;
bar += 1;
bar |= 1;
return bar;
}
Generate the assembly listing file case6.asm
from the
source file case6.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase6.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case6.c
Display the assembly listing file case6.asm
created in
step 2.:
TYPE case6.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case6.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _foo PUBLIC _bar ; Function compile flags: /Ogtpy ; COMDAT _foo _TEXT SEGMENT _foo$ = 8 ; size = 8 _foo PROC ; COMDAT ; File c:\users\stefan\desktop\case6.c ; Line 5 mov eax, DWORD PTR _foo$[esp-4] mov edx, DWORD PTR _foo$[esp]While the optimiser recognises that the addition of 1 yields an odd number and therefore generates no code for theshld edx, eax, 1add eax, eax adc edx, edx ; Line 6add eax, 1 adc edx, 0inc eax ; Line 10 ret 0 _foo ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _bar _TEXT SEGMENT _bar$ = 8 ; size = 8 _bar PROC ; COMDAT ; File c:\users\stefan\desktop\case6.c ; Line 14 mov eax, DWORD PTR _bar$[esp-4] mov edx, DWORD PTR _bar$[esp]shld edx, eax, 1add eax, eax adc edx, edx ; Line 15add eax, 1 adc edx, 0inc eax ; Line 19 ret 0 _foo ENDP _TEXT ENDS END
logical or, it but fails to recognise that both the shift of
foo
and
the addition of bar
to itself yield an even number, so
the following addition of 1 can’t produce a carry, and an
addition with carry ADC
instruction is nonsense!
Create the text file case7.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long add(unsigned long low, unsigned long high)
{
return low + ((unsigned long long) high << 32);
}
unsigned long long or(unsigned long low, unsigned long high)
{
return low | ((unsigned long long) high << 32);
}
unsigned long long alias(unsigned long low, unsigned long high)
{
union
{
unsigned long ul[2];
unsigned long long ull;
} dummy = {low, high};
return dummy.ull;
}
Generate the assembly listing file case7.asm
from the
source file case7.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase7.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case7.c
Display the assembly listing file case7.asm
created in
step 2.:
TYPE case7.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case7.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _add PUBLIC _or PUBLIC _alias ; Function compile flags: /Ogtpy ; COMDAT _add _TEXT SEGMENT _low$ = 8 ; size = 4 _high$ = 12 ; size = 4 _add PROC ; COMDAT ; File c:\users\stefan\desktop\case7.c ; Line 5 mov edx, DWORD PTR _high$[esp-4]The optimiser does not recognise the expressions commonly used for combining twoxor eax, eax add eax, DWORD PTR _low$[esp-4] adc edx, 0add eax, DWORD PTR _low$[esp-4] ; Line 6 ret 0 _add ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _or _TEXT SEGMENT _low$ = 8 ; size = 4 _high$ = 12 ; size = 4 _or PROC ; COMDAT ; File c:\users\stefan\desktop\case7.c ; Line 10 mov edx, DWORD PTR _high$[esp-4]xor eax, eax or eax, DWORD PTR _low$[esp-4]add eax, DWORD PTR _low$[esp-4] ; Line 11 ret 0 _or ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _alias _TEXT SEGMENT _low$ = 8 ; size = 4 _high$ = 12 ; size = 4 _alias PROC ; COMDAT ; File c:\users\stefan\desktop\case7.c ; Line 21 mov eax, DWORD PTR _low$[esp-4] mov edx, DWORD PTR _high$[esp-4] ; Line 22 ret 0 _alias ENDP _TEXT ENDS END
doublewordsinto a single
quadword.
Create the text file case8.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long rotate32(unsigned long value, unsigned int count)
{
return (value << count) | (value >> (32 - count));
}
unsigned long rotate32x(unsigned long value, unsigned int count)
{
return (value << count) ^ (value >> (32 - count));
}
unsigned long long rotate64x(unsigned long long value, unsigned int count)
{
return (value << count) ^ (value >> (64 - count));
}
unsigned long long rotate64(unsigned long long value, unsigned int count)
{
return (value << count) | (value >> (64 - count));
}
unsigned long long intrinsic(unsigned long long value, unsigned int count)
{
return _rotl64(value, count);
}
Generate the assembly listing file case8.asm
from the
source file case8.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase8.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case8.c
Display the assembly listing file case8.asm
created in
step 2.:
TYPE case8.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case8.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _rotate32 PUBLIC _rotate32x PUBLIC _rotate64x PUBLIC _rotate64 PUBLIC _intrinsicExcept for the first function, the optimiser fails to recognise the commonly used expressions for rotate operations!EXTRN __allshl:PROC EXTRN __aullshr:PROC; Function compile flags: /Ogtpy ; COMDAT _rotate32 _TEXT SEGMENT _value$ = 8 ; size = 4 _count$ = 12 ; size = 4 _rotate32 PROC ; COMDAT ; File c:\users\stefan\desktop\case8.c ; Line 5 mov eax, DWORD PTR _value$[esp-4] mov ecx, DWORD PTR _count$[esp-4] rol eax, cl ; Line 6 ret 0 _rotate32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _rotate32x _TEXT SEGMENT _value$ = 8 ; size = 4 _count$ = 12 ; size = 4 _rotate32x PROC ; COMDAT ; File c:\users\stefan\desktop\case8.c ; Line 9push esi; Line 10mov esi, DWORD PTR _value$[esp] mov ecx, 32 ; 00000020H sub ecx, DWORD PTR _count$[esp] mov eax, esi shr eax, cl mov ecx, DWORD PTR _count$[esp] shl esi, cl xor eax, esi pop esimov eax, DWORD PTR _value$[esp-4] mov ecx, DWORD PTR _count$[esp-4] rol eax, cl ; Line 11 ret 0 _rotate32x ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _rotate64x _TEXT SEGMENT _value$ = 8 ; size = 8 _count$ = 16 ; size = 4 _rotate64x PROC ; COMDAT ; File c:\users\stefan\desktop\case8.c ; Line 15mov eax, DWORD PTR _value$[esp-4] mov ecx, 64 ; 00000040H sub ecx, DWORD PTR _count$[esp-4] mov edx, DWORD PTR _value$[esp] push ebx push ebp call __aullshr mov ecx, DWORD PTR _count$[esp+4] mov ebx, eax mov eax, DWORD PTR _value$[esp+4] mov ebp, edx mov edx, DWORD PTR _value$[esp+8] call __allshl xor edx, ebp xor eax, ebx pop ebpmov ecx, DWORD PTR _count$[esp-4] mov edx, DWORD PTR _value$[esp] mov eax, DWORD PTR _value$[esp-4] test cl, 32 ; 00000020H jz SHORT @F xchg eax, edx @@: test cl, 31 ; 0000001fH jz SHORT @F push ebx mov ebx, edx shld edx, eax, cl shld eax, ebx, cl pop ebx @@: ; Line 16 ret 0 _rotate64x ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _rotate64 _TEXT SEGMENT _value$ = 8 ; size = 8 _count$ = 16 ; size = 4 _rotate64 PROC ; COMDAT ; File c:\users\stefan\desktop\case8.c ; Line 20mov eax, DWORD PTR _value$[esp-4] mov ecx, 64 ; 00000040H sub ecx, DWORD PTR _count$[esp-4] mov edx, DWORD PTR _value$[esp] push ebx push ebp call __aullshr mov ecx, DWORD PTR _count$[esp+4] mov ebx, eax mov eax, DWORD PTR _value$[esp+4] mov ebp, edx mov edx, DWORD PTR _value$[esp+8] call __allshl or edx, ebp or eax, ebx pop ebpmov ecx, DWORD PTR _count$[esp-4] mov edx, DWORD PTR _value$[esp] mov eax, DWORD PTR _value$[esp-4] test cl, 32 ; 00000020H jz SHORT @F xchg eax, edx @@: test cl, 31 ; 0000001fH jz SHORT @F push ebx mov ebx, edx shld edx, eax, cl shld eax, ebx, cl pop ebx @@: ; Line 21 ret 0 _rotate64 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _intrinsic _TEXT SEGMENT _value$ = 8 ; size = 8 _count$ = 16 ; size = 4 _intrinsic PROC ; COMDAT ; File c:\users\stefan\desktop\case8.c ; Line 25 mov cl, BYTE PTR _count$[esp-4] mov edx, DWORD PTR _value$[esp] push esimov esi, DWORD PTR _value$[esp]mov eax, DWORD PTR _value$[esp] mov esi, edx test cl, 32 ; 00000020H cmovnz edx, eax cmovnz eax, esi cmovnz esi, edxje SHORT $LN3@intrinsic mov eax, esi mov esi, edx mov edx, eax $LN3@intrinsic: mov eax, esi and cl, 31 ; 0000001fH je SHORT $LN4@intrinsicshld eax, edx, cl shld edx, esi, cl$LN4@intrinsic:; Line 26 pop esi ret 0 _intrinsic ENDP _TEXT ENDS END
EDX
with ESI
in) the
intrinsic function
_rotl64()
.
BSWAP
instruction or two
MOVBE
instructions.
Create the text file case9.c
with the following content
in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#ifdef ALTERNATE
unsigned short swap16(unsigned short us)
{
return ((us & 0xFF00U) >> 8)
| ((us & 0x00FFU) << 8);
}
unsigned long swap32(unsigned long ul)
{
return ((ul & 0xFF000000UL) >> 3 * 8)
| ((ul & 0x00FF0000UL) >> 8)
| ((ul & 0x0000FF00UL) << 8)
| ((ul & 0x000000FFUL) << 3 * 8);
}
unsigned long long swap64(unsigned long long ull)
{
return ((ull & 0xFF00000000000000ULL) >> 7 * 8)
| ((ull & 0x00FF000000000000ULL) >> 5 * 8)
| ((ull & 0x0000FF0000000000ULL) >> 3 * 8)
| ((ull & 0x000000FF00000000ULL) >> 8)
| ((ull & 0x00000000FF000000ULL) << 8)
| ((ull & 0x0000000000FF0000ULL) << 3 * 8)
| ((ull & 0x000000000000FF00ULL) << 5 * 8)
| ((ull & 0x00000000000000FFULL) << 7 * 8);
}
#else
unsigned short swap16(unsigned short us)
{
return (us << 8) & 0xFF00U
| (us >> 8) & 0x00FFU;
}
unsigned long swap32(unsigned long ul)
{
return (ul << 3 * 8) & 0xFF000000UL
| (ul << 8) & 0x00FF0000UL
| (ul >> 8) & 0x0000FF00UL
| (ul >> 3 * 8) & 0x000000FFUL;
}
unsigned long long swap64(unsigned long long ull)
{
return (ull << 7 * 8) & 0xFF00000000000000ULL
| (ull << 5 * 8) & 0x00FF000000000000ULL
| (ull << 3 * 8) & 0x0000FF0000000000ULL
| (ull << 8) & 0x000000FF00000000ULL
| (ull >> 8) & 0x00000000FF000000ULL
| (ull >> 3 * 8) & 0x0000000000FF0000ULL
| (ull >> 5 * 8) & 0x000000000000FF00ULL
| (ull >> 7 * 8) & 0x00000000000000FFULL;
}
#endif
Note: better use the appropriate intrinsic
function
_byteswap_ushort()
,
_byteswap_ulong()
or
_byteswap_uint64()
instead of such expressions!
Generate the assembly listing file case9.asm
from the
source file case9.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case9.c
Display the assembly listing file case9.asm
created in
step 2.:
TYPE case9.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC swap16 PUBLIC swap32 PUBLIC swap64 ; Function compile flags: /Ogtpy ; COMDAT swap16 _TEXT SEGMENT us$ = 8 swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 32Note: the assembly listing shows 32 (in words: thirty-two) instructions for the functionmovzx edx, cx mov eax, edx shr edx, 8 shl eax, 8 or ax, dxmovzx eax, cx xchg ah, al ; Line 34 ret 0 swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap32 _TEXT SEGMENT ul$ = 8 swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 38 bswap ecx mov eax, ecx ; Line 42 ret 0 swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap64 _TEXT SEGMENT ull$ = 8 swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 45mov r8, rcx; Line 46 bswap rcx mov rax, rcxmov r9, rcx mov rax, 71776119061217280 ; 00ff000000000000H mov rdx, r8 and r9, rax and edx, 65280 ; 0000ff00H mov rax, rcx shr rax, 16 or r9, rax mov rax, rcx shr r9, 16 mov rcx, 280375465082880 ; 0000ff0000000000H and rax, rcx mov rcx, 1095216660480 ; 000000ff00000000H or r9, rax mov rax, r8 and rax, rcx shr r9, 16 or r9, rax mov rcx, r8 mov rax, r8 shr r9, 8 shl rax, 16 and ecx, 16711680 ; 00ff0000H or rdx, rax mov eax, -16777216 ; ff000000H and rax, r8 shl rdx, 16 or rdx, rcx shl rdx, 16 or rax, rdx shl rax, 8 or rax, r9; Line 54 ret 0 swap64 ENDP _TEXT ENDS END
swap64()
instead of only a single (in words:
one) BSWAP
instruction!
While the optimiser recognises the commonly used expressions to
convert from little endian
byte-order to big endian
byte-order (and vice versa) with a 32-bit operand and generates a
single BSWAP
instruction then, it
fails to recognise these expressions with a 16-bit or a 64-bit
operand.
Generate another assembly listing file case9.asm
from
the source file case9.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case9.c
Display the assembly listing file case9.asm
created in
step 4.:
TYPE case9.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case9.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _swap16 PUBLIC _swap32 PUBLIC _swap64 ; Function compile flags: /Ogtpy ; COMDAT _swap16 _TEXT SEGMENT _us$ = 8 ; size = 2 _swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 32 movbe ax, WORD PTR _us$[esp-4]Note: the assembly listing shows 52 (in words: fifty-two) instructions for the functionmovzx ecx, WORD PTR _us$[esp-4] mov eax, ecx shl ecx, 8 shr eax, 8 or eax, ecx; Line 34 ret 0 _swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap32 _TEXT SEGMENT _ul$ = 8 ; size = 4 _swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 38 movbe eax, DWORD PTR _ul$[esp-4]mov eax, DWORD PTR _ul$[esp-4] bswap eax; Line 42 ret 0 _swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap64 _TEXT SEGMENT _ull$ = 8 ; size = 8 _swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 46 movbe edx, DWORD PTR _ull$[esp-4] movbe eax, DWORD PTR _ull$[esp]mov edx, DWORD PTR _ull$[esp] mov ecx, edx push ebx push ebp push esi push edi mov edi, DWORD PTR _ull$[esp+12] mov ebx, edx and ebx, 16711680 ; 00ff0000H mov eax, edi shrd eax, ecx, 16 xor ebp, ebp mov esi, edi or ebp, eax shr ecx, 16 ; 00000010H or ebx, ecx mov eax, edx shrd ebp, ebx, 16 and eax, 65280 ; 0000ff00H and esi, 65280 ; 0000ff00H shr ebx, 16 ; 00000010H xor ecx, ecx or ebx, eax movzx eax, dl shrd ebp, ebx, 16 shr ebx, 16 ; 00000010H or ebx, eax mov eax, edi shld edx, eax, 16 shrd ebp, ebx, 8 shl eax, 16 ; 00000010H or edx, ecx or esi, eax shr ebx, 8 shld edx, esi, 16 mov eax, edi and edi, -16777216 ; ff000000H shl esi, 16 ; 00000010H and eax, 16711680 ; 00ff0000H or esi, eax shld edx, esi, 16 shl esi, 16 ; 00000010H or esi, edi shld edx, esi, 8 pop edi shl esi, 8 or edx, ebx or ebp, esi pop esi mov eax, ebp pop ebp pop ebx; Line 54 ret 0 _swap64 ENDP _TEXT ENDS END
swap64()
instead of only 2 (in words:
two)
MOVBE
instructions!
While the optimiser recognises the commonly used expressions to
convert from little endian
byte-order to big endian
byte-order (and vice versa) with a 32-bit operand and generates a
single BSWAP
instruction then, it
fails to recognise these expressions with a 16-bit or a 64-bit
operand.
Repeat the previous steps with the alternate
implementation;
generate another assembly listing file case9.asm
from
the source file case9.c
created in step 1., now
using the Visual C 2017 compiler for the
x64 alias AMD64 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case9.c
Display the assembly listing file case9.asm
created in
step 6.:
TYPE case9.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC swap16 PUBLIC swap32 PUBLIC swap64 ; Function compile flags: /Ogtpy ; COMDAT swap16 _TEXT SEGMENT us$ = 8 swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 6Note: the assembly listing shows 32 (in words: thirty-two) instructions for the functionmovzx edx, cx mov eax, edx shl dx, 8 shr eax, 8 or ax, dxmovzx eax, cx xchg ah, al ; Line 8 ret 0 swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap32 _TEXT SEGMENT ul$ = 8 swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 12 bswap ecx mov eax, ecx ; Line 16 ret 0 swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap64 _TEXT SEGMENT ull$ = 8 swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 19mov r8, rcx; Line 20 bswap rcx mov rax, rcxmov r9, rcx mov rax, 71776119061217280 ; 00ff000000000000H mov rdx, r8 and r9, rax and edx, 65280 ; 0000ff00H mov rax, rcx shr rax, 16 or r9, rax mov rax, rcx shr r9, 16 mov rcx, 280375465082880 ; 0000ff0000000000H and rax, rcx mov rcx, 1095216660480 ; 000000ff00000000H or r9, rax mov rax, r8 and rax, rcx shr r9, 16 or r9, rax mov rcx, r8 mov rax, r8 shr r9, 8 shl rax, 16 and ecx, 16711680 ; 00ff0000H or rdx, rax mov eax, -16777216 ; ff000000H and rax, r8 shl rdx, 16 or rdx, rcx shl rdx, 16 or rax, rdx shl rax, 8 or rax, r9; Line 28 ret 0 swap64 ENDP _TEXT ENDS END
swap64()
instead of only a single (in words:
one) BSWAP
instruction!
While the optimiser recognises the commonly used expressions to
convert from little endian
byte-order to big endian
byte-order (and vice versa) with a 32-bit operand in an alternative
form too and generates a single BSWAP
instruction then, it fails to recognise these expressions with a
16-bit or a 64-bit operand.
Generate another assembly listing file case9.asm
from
the source file case9.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase9.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case9.c
Display the assembly listing file case9.asm
created in
step 8.:
TYPE case9.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case9.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _swap16 PUBLIC _swap32 PUBLIC _swap64 ; Function compile flags: /Ogtpy ; COMDAT _swap16 _TEXT SEGMENT _us$ = 8 ; size = 2 _swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 6 movbe ax, DWORD PTR _ul$[esp-4]Note: the assembly listing shows 52 (in words: fifty-two) instructions for the functionmovzx ecx, WORD PTR _us$[esp-4] mov eax, ecx shl ecx, 8 shr eax, 8 or eax, ecx; Line 8 ret 0 _swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap32 _TEXT SEGMENT _ul$ = 8 ; size = 4 _swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 12 movbe eax, DWORD PTR _ul$[esp-4]mov eax, DWORD PTR _ul$[esp-4] bswap eax; Line 16 ret 0 _swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap64 _TEXT SEGMENT _ull$ = 8 ; size = 8 _swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case9.c ; Line 20 movbe edx, DWORD PTR _ull$[esp-4] movbe eax, DWORD PTR _ull$[esp]mov edx, DWORD PTR _ull$[esp] mov ecx, edx push ebx push ebp push esi push edi mov edi, DWORD PTR _ull$[esp+12] mov ebx, edx and ebx, 16711680 ; 00ff0000H mov eax, edi shrd eax, ecx, 16 xor ebp, ebp mov esi, edi or ebp, eax shr ecx, 16 ; 00000010H or ebx, ecx mov eax, edx shrd ebp, ebx, 16 and eax, 65280 ; 0000ff00H and esi, 65280 ; 0000ff00H shr ebx, 16 ; 00000010H xor ecx, ecx or ebx, eax movzx eax, dl shrd ebp, ebx, 16 shr ebx, 16 ; 00000010H or ebx, eax mov eax, edi shld edx, eax, 16 shrd ebp, ebx, 8 shl eax, 16 ; 00000010H or edx, ecx or esi, eax shr ebx, 8 shld edx, esi, 16 mov eax, edi and edi, -16777216 ; ff000000H shl esi, 16 ; 00000010H and eax, 16711680 ; 00ff0000H or esi, eax shld edx, esi, 16 shl esi, 16 ; 00000010H or esi, edi shld edx, esi, 8 pop edi shl esi, 8 or edx, ebx or ebp, esi pop esi mov eax, ebp pop ebp pop ebx; Line 28 ret 0 _swap64 ENDP _TEXT ENDS END
swap64()
instead of only 2 (in words:
two)
MOVBE
instructions!
While the optimiser recognises the commonly used expressions to
convert from little endian
byte-order to big endian
byte-order (and vice versa) with a 32-bit operand in an alternative
form too and generates a single BSWAP
instruction then, it fails to recognise these expressions with a
16-bit or a 64-bit operand.
BSWAP
or
MOVBE
instructions.
Create the text file case10.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__inline
unsigned short swap16(unsigned short us)
{
return (us << 8) | (us >> 8);
}
unsigned long swap32(unsigned long ul)
{
return (unsigned long) swap16((unsigned short) ul) << 16
| (unsigned long) swap16((unsigned short) (ul >> 16));
}
unsigned long long swap64(unsigned long long ull)
{
return (unsigned long long) swap32((unsigned long) ull) << 32
| (unsigned long long) swap32((unsigned long) (ull >> 32));
}
unsigned short swap16alt(unsigned short us)
{
return ((us >> 8) & 0x00FFU)
| ((us & 0x00FFU) << 8);
}
unsigned long swap32alt(unsigned long ul)
{
ul = ((ul >> 8) & 0x00FF00FFUL)
| ((ul & 0x00FF00FFUL) << 8);
return (ul << 16) | (ul >> 16);
}
unsigned long long swap64alt(unsigned long long ull)
{
ull = ((ull >> 8) & 0x00FF00FF00FF00FFULL)
| ((ull & 0x00FF00FF00FF00FFULL) << 8);
ull = ((ull >> 16) & 0x0000FFFF0000FFFFULL)
| ((ull & 0x0000FFFF0000FFFFULL) << 16);
return (ull << 32) | (ull >> 32);
}
unsigned long swap32rot(unsigned long ul)
{
return _lrotr(ul, 8) & 0xFF00FF00UL
| _lrotl(ul, 8) & 0x00FF00FFUL;
}
unsigned long long swap64rot(unsigned long long ull)
{
ull = _rotr64(ull, 16) & 0xFFFF0000FFFF0000ULL
| _rotl64(ull, 16) & 0x0000FFFF0000FFFFULL;
ull = _rotr64(ull, 8) & 0x00FF00FF00FF00FFULL
| _rotl64(ull, 8) & 0xFF00FF00FF00FF00ULL;
return ull;
}
Note: better use the appropriate intrinsic
function
_byteswap_ushort()
,
_byteswap_ulong()
or
_byteswap_uint64()
instead of such expressions!
Generate the assembly listing file case10.asm
from the
source file case10.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase10.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case10.c
Display the assembly listing file case10.asm
created in
step 2.:
TYPE case10.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC swap16 PUBLIC swap32 PUBLIC swap64 PUBLIC swap16alt PUBLIC swap32alt PUBLIC swap64alt PUBLIC swap32rot PUBLIC swap64rot ; Function compile flags: /Ogtpy ; COMDAT swap16 _TEXT SEGMENT us$ = 8 swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 6Note: instead of just 2 instructions for each of the 4 functions, the assembly listing shows 20 (in words: twenty) instructions for the functionmovzx edx, cx mov eax, edx shl dx, 8 shr eax, 8 or ax, dxmovzx eax, cx xchg ah, al ; Line 7 ret 0 swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap32 _TEXT SEGMENT ul$ = 8 swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 11 mov eax, ecx ; Line 6rol cx, 8; Line 11shr eax, 16; Line 6rol ax, 8; Line 11movzx ecx, cx movzx eax, ax shl ecx, 16 or eax, ecxbswap eax ; Line 13 ret 0 swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap64 _TEXT SEGMENT ull$ = 8 swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 11mov eax, ecx; Line 17mov r9, rcx; Line 11shr eax, 16; Line 6rol ax, 8 movzx r8d, ax rol cx, 8 movzx eax, cx shl rax, 16; Line 17or rax, r8 shr r9, 32 ; 00000020H; Line 6movzx ecx, r9w; Line 17shl rax, 16; Line 6rol cx, 8 movzx edx, cx; Line 17or rax, rdx; Line 11shr r9d, 16; Line 6rol r9w, 8; Line 17shl rax, 16; Line 6movzx ecx, r9w; Line 17or rax, rcxmov rax, rcx bswap rax ; Line 19 ret 0 swap64 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap16alt _TEXT SEGMENT us$ = 8 swap16alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 23movzx edx, cx movzx eax, dl shr edx, 8 shl eax, 8 or eax, edxmovzx eax, cx xchg ah, al ; Line 25 ret 0 swap16alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap32alt _TEXT SEGMENT ul$ = 8 swap32alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 32 mov eax, ecxmov edx, ecx shl ecx, 8 shr eax, 8 shl edx, 8 xor eax, edx and eax, 16711935 ; 00ff00ffH xor eax, ecx rol eax, 16bswap eax ; Line 33 ret 0 swap32alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap64alt _TEXT SEGMENT ull$ = 8 swap64alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 37mov rdx, rcxmov rax, rcx bswap raxshl rcx, 8 shl rax, 8 shr rdx, 8 xor rdx, rax mov rax, 71777214294589695 ; 00ff00ff00ff00ffH and rdx, rax xor rdx, rcx; Line 42mov rax, rdx mov rcx, rdx shl rdx, 16 shr rax, 16 shl rcx, 16 xor rax, rcx mov rcx, 281470681808895 ; 0000ffff0000ffffH and rax, rcx xor rax, rdx rol rax, 32 ; 00000020H; Line 43 ret 0 swap64alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap32rot _TEXT SEGMENT ul$ = 8 swap32rot PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 47 mov eax, ecxrol ecx, 8 ror eax, 8 and ecx, 16711935 ; 00ff00ffH and eax, -16711936 ; ff00ff00H or eax, ecxbswap eax ; Line 49 ret 0 swap32rot ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT swap64rot _TEXT SEGMENT ull$ = 8 swap64rot PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 53 mov rax, rcxmov rdx, rcx mov rax, -281470681808896 ; ffff0000ffff0000H ror rdx, 16 and rdx, rax rol rcx, 16 mov rax, 281470681808895 ; 0000ffff0000ffffH and rcx, rax or rdx, rcx; Line 56mov rcx, -71777214294589696 ; ff00ff00ff00ff00H mov rax, rdx rol rax, 8 and rax, rcx ror rdx, 8 mov rcx, 71777214294589695 ; 00ff00ff00ff00ffH and rdx, rcx or rax, rdxbswap rax ; Line 59 ret 0 swap64rot ENDP _TEXT ENDS END
swap64()
, 8 instructions for the function
swap32()
, 5 instructions for the function
swap16()
, and 6 instructions for the function
swap32rot()
.
The optimiser fails to recognise all these commonly
used expressions to convert from little endian
byte-order to
big endian
byte-order (and vice versa) for
all operand sizes!
Additionally the commonly used expression for a rotate operation is
not recognised for a 16-bit operand.
Generate another assembly listing file case10.asm
from
the source file case10.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase10.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case10.c
Display the assembly listing file case10.asm
created in
step 4.:
TYPE case10.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case10.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _swap16 PUBLIC _swap32 PUBLIC _swap64 PUBLIC _swap16alt PUBLIC _swap32alt PUBLIC _swap64alt PUBLIC _swap32rot PUBLIC _swap64rot ; Function compile flags: /Ogtpy ; COMDAT _swap16 _TEXT SEGMENT _us$ = 8 ; size = 2 _swap16 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 6 movbe ax, WORD PTR _us$[esp-4]Note: instead of just 1 or 2movzx ecx, WORD PTR _us$[esp-4] mov eax, ecx shl ecx, 8 shr eax, 8 or eax, ecx; Line 7 ret 0 _swap16 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap32 _TEXT SEGMENT _ul$ = 8 ; size = 4 _swap32 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 11 movbe eax, DWORD PTR _ul$[esp-4]mov ecx, DWORD PTR _ul$[esp-4] mov eax, ecx shr eax, 16 ; 00000010H; Line 6rol cx, 8 rol ax, 8; Line 11movzx ecx, cx movzx eax, ax shl ecx, 16 ; 00000010H or eax, ecx; Line 13 ret 0 _swap32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap64 _TEXT SEGMENT _ull$ = 8 ; size = 8 _swap64 PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 17 movbe eax, DWORD PTR _ull$[esp] movbe edx, DWORD PTR _ull$[esp-4]mov ecx, DWORD PTR _ull$[esp-4]; Line 11mov eax, ecx shr eax, 16 ; 00000010H; Line 6rol ax, 8; Line 17movzx eax, ax cdq push ebx mov ebx, DWORD PTR _ull$[esp+4] push esi mov esi, eax; Line 6rol cx, 8; Line 16push edi; Line 17mov edi, edx movzx eax, cx cdq shld edx, eax, 16 shl eax, 16 ; 00000010H or edi, edx or esi, eax; Line 6mov ax, bx; Line 17shld edi, esi, 16; Line 6rol ax, 8; Line 17movzx eax, ax cdq shl esi, 16 ; 00000010H or edi, edx or esi, eax; Line 11shr ebx, 16 ; 00000010H; Line 6rol bx, 8; Line 17shld edi, esi, 16 movzx eax, bx cdq shl esi, 16 ; 00000010H or edx, edi pop edi or eax, esi pop esi pop ebx; Line 19 ret 0 _swap64 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap16alt _TEXT SEGMENT _us$ = 8 ; size = 2 _swap16alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 23 movbe ax, WORD PTR _us$[esp-4]movzx ecx, WORD PTR _us$[esp-4] movzx eax, cl shl eax, 8 shr ecx, 8 or eax, ecx; Line 25 ret 0 _swap16alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap32alt _TEXT SEGMENT _ul$ = 8 ; size = 4 _swap32alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 32 movbe eax, DWORD PTR _ul$[esp-4]mov ecx, DWORD PTR _ul$[esp-4] mov eax, ecx shr eax, 8 mov edx, ecx shl edx, 8 xor eax, edx and eax, 16711935 ; 00ff00ffH shl ecx, 8 xor eax, ecx rol eax, 16 ; 00000010H; Line 33 ret 0 _swap32alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap64alt _TEXT SEGMENT _ull$ = 8 ; size = 8 _swap64alt PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 37 movbe eax, DWORD PTR _ull$[esp] movbe edx, DWORD PTR _ull$[esp-4]mov eax, DWORD PTR _ull$[esp-4] push ebx push esi mov esi, DWORD PTR _ull$[esp+8] push edi mov edx, esi mov ebx, esi mov ecx, eax shrd ecx, edx, 8 mov edi, eax shld ebx, edi, 8 shld esi, eax, 8 shl edi, 8 xor ecx, edi shr edx, 8 xor edx, ebx shl eax, 8 and edx, 16711935 ; 00ff00ffH xor edx, esi and ecx, 16711935 ; 00ff00ffH xor ecx, eax; Line 40mov edi, edx mov esi, ecx shrd esi, edi, 16 shr edi, 16 ; 00000010H mov eax, edi mov ebx, edx mov edi, ecx shld ebx, edi, 16 shld edx, ecx, 16 xor eax, ebx shl edi, 16 ; 00000010H xor esi, edi and eax, 65535 ; 0000ffffH and esi, 65535 ; 0000ffffH shl ecx, 16 ; 00000010H xor eax, edx xor esi, ecx; Line 42xor edx, edx pop edi or edx, esi xor ecx, ecx pop esi or eax, ecx pop ebx; Line 43 ret 0 _swap64alt ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap32rot _TEXT SEGMENT _ul$ = 8 ; size = 4 _swap32rot PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 47 movbe eax, DWORD PTR _ul$[esp-4]mov ecx, DWORD PTR _ul$[esp-4] mov eax, ecx ror eax, 8 rol ecx, 8 and eax, -16711936 ; ff00ff00H and ecx, 16711935 ; 00ff00ffH or eax, ecx; Line 49 ret 0 _swap32rot ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _swap64rot _TEXT SEGMENT _ull$ = 8 ; size = 8 _swap64rot PROC ; COMDAT ; File c:\users\stefan\desktop\case10.c ; Line 53 movbe eax, DWORD PTR _ull$[esp] movbe edx, DWORD PTR _ull$[esp-4]mov edx, DWORD PTR _ull$[esp-4] mov ecx, DWORD PTR _ull$[esp] push ebx push esi mov eax, ecx mov esi, edx shrd esi, eax, 16 shrd eax, edx, 16 push edi mov edi, eax mov eax, edx shld eax, ecx, 16 shld ecx, edx, 16 xor eax, esi xor ecx, edi and eax, 65535 ; 0000ffffH xor eax, esi and ecx, 65535 ; 0000ffffH xor ecx, edi; Line 56mov esi, eax mov edx, ecx shrd esi, edx, 8 shrd edx, eax, 8 mov ebx, edx mov edx, eax mov eax, ecx mov ecx, edx shld ecx, eax, 8 shld eax, edx, 8 mov edi, eax; Line 58mov edx, eax xor edx, ebx and edx, 16711935 ; 00ff00ffH mov eax, ecx xor eax, esi xor edx, edi pop edi and eax, 16711935 ; 00ff00ffH pop esi xor eax, ecx pop ebx; Line 59 ret 0 _swap64rot ENDP _TEXT ENDS END
MOVBE
instructions for each of the 4 functions, the assembly listing shows
38 (in words: thirty-eight) instructions for the
function swap64()
, 9 instructions for the function
swap32()
, 5 instructions for the function
swap16()
, and 7 instructions for the function
swap32rot()
.
The optimiser fails to recognise all these commonly
used expressions to convert from little endian
byte-order to
big endian
byte-order (and vice versa) for
all operand sizes!
Additionally the commonly used expression for a rotate operation is
not recognised for a 16-bit operand.
Create the text file case11.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long lreverse(unsigned long ul)
{
#ifndef ALTERNATE
ul = ((ul & 0xAAAAAAAAUL) >> 1)
| ((ul & 0x55555555UL) << 1);
ul = ((ul & 0xCCCCCCCCUL) >> 2)
| ((ul & 0x33333333UL) << 2);
ul = ((ul & 0xF0F0F0F0UL) >> 4)
| ((ul & 0x0F0F0F0FUL) << 4);
ul = ((ul & 0xFF00FF00UL) >> 8)
| ((ul & 0x00FF00FFUL) << 8);
ul = ((ul & 0xFFFF0000UL) >> 16)
| ((ul & 0x0000FFFFUL) << 16);
#else
ul = ((ul >> 1) & 0x55555555UL)
| ((ul << 1) & 0xAAAAAAAAUL);
ul = ((ul >> 2) & 0x33333333UL)
| ((ul << 2) & 0xCCCCCCCCUL);
ul = ((ul >> 4) & 0x0F0F0F0FUL)
| ((ul << 4) & 0xF0F0F0F0UL);
ul = ((ul >> 8) & 0x00FF00FFUL)
| ((ul << 8) & 0xFF00FF00UL);
ul = ((ul >> 16) & 0x0000FFFFUL)
| ((ul << 16) & 0xFFFF0000UL);
#endif
return ul;
}
unsigned long long llreverse(unsigned long long ull)
{
#ifndef ALTERNATE
ull = ((ull & 0xAAAAAAAAAAAAAAAAULL) >> 1)
| ((ull & 0x5555555555555555ULL) << 1);
ull = ((ull & 0xCCCCCCCCCCCCCCCCULL) >> 2)
| ((ull & 0x3333333333333333ULL) << 2);
ull = ((ull & 0xF0F0F0F0F0F0F0F0ULL) >> 4)
| ((ull & 0x0F0F0F0F0F0F0F0FULL) << 4);
ull = ((ull & 0xFF00FF00FF00FF00ULL) >> 8)
| ((ull & 0x00FF00FF00FF00FFULL) << 8);
ull = ((ull & 0xFFFF0000FFFF0000ULL) >> 16)
| ((ull & 0x0000FFFF0000FFFFULL) << 16);
ull = ((ull & 0xFFFFFFFF00000000ULL) >> 32)
| ((ull & 0x00000000FFFFFFFFULL) << 32);
#else
ull = ((ull >> 1) & 0x5555555555555555ULL)
| ((ull << 1) & 0xAAAAAAAAAAAAAAAAULL);
ull = ((ull >> 2) & 0x3333333333333333ULL)
| ((ull << 2) & 0xCCCCCCCCCCCCCCCCULL);
ull = ((ull >> 4) & 0x0F0F0F0F0F0F0F0FULL)
| ((ull << 4) & 0xF0F0F0F0F0F0F0F0ULL);
ull = ((ull >> 8) & 0x00FF00FF00FF00FFULL)
| ((ull << 8) & 0xFF00FF00FF00FF00ULL);
ull = ((ull >> 16) & 0x0000FFFF0000FFFFULL)
| ((ull << 16) & 0xFFFF0000FFFF0000ULL);
ull = ((ull >> 32) & 0x00000000FFFFFFFFULL)
| ((ull << 32) & 0xFFFFFFFF00000000ULL);
#endif
return ull;
}
Generate the assembly listing file case11.asm
from the
source file case11.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase11.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case11.c
Display the assembly listing file case11.asm
created in
step 2.:
TYPE case11.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case11.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lreverse PUBLIC _llreverse ; Function compile flags: /Ogtpy ; COMDAT _lreverse _TEXT SEGMENT _ul$ = 8 ; size = 4 _lreverse PROC ; COMDAT ; File c:\users\stefan\desktop\case11.c ; Line 6 mov ecx, DWORD PTR _ul$[esp-4]In both functions, the optimiser fails to recognise that the final two or threemov edx, ecx shr edx, 1 lea eax, DWORD PTR [ecx+ecx] xor edx, eax lea eax, DWORD PTR [ecx+ecx] and edx, 1431655765 ; 55555555H xor edx, eaxlea eax, [ecx+ecx] shr ecx, 1 and eax, -1431655766 ; aaaaaaaaH and ecx, 1431655765 ; 55555555H or ecx, eax ; Line 8mov ecx, edx shr ecx, 2 lea eax, DWORD PTR [edx*4] xor ecx, eax lea eax, DWORD PTR [edx*4] and ecx, 858993459 ; 33333333H xor ecx, eaxlea eax, [ecx*4] shr ecx, 2 and eax, -858993460 ; ccccccccH and ecx, 858993459 ; 33333333H or ecx, eax ; Line 10mov edx, ecx mov eax, ecx shl eax, 4 shr edx, 4 xor edx, eax shl ecx, 4 and edx, 252645135 ; 0f0f0f0fH xor edx, ecxmov eax, ecx shr ecx, 4 shl eax, 4 and ecx, 252645135 ; 0f0f0f0fH and eax, -252645136 ; f0f0f0f0H or eax, ecx ; Line 12mov eax, edx mov ecx, edx shr eax, 8 shl ecx, 8 xor eax, ecx shl edx, 8 and eax, 16711935 ; 00ff00ffH xor eax, edx; Line 14rol eax, 16 ; 00000010Hbswap eax ; Line 29 ret 0 _lreverse ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llreverse _TEXT SEGMENT_ull$11$ = 8 ; size = 4_ull$ = 8 ; size = 8 _llreverse PROC ; COMDAT ; File c:\users\stefan\desktop\case11.c ; Line 34mov edx, DWORD PTR _ull$[esp] mov ecx, edx push ebx push ebppush esimov esi, DWORD PTR _ull$[esp+8] mov ebx, esi mov eax, esi shld ecx, eax, 1push edimov edi, edx add eax, eax shrd ebx, edi, 1 shld edx, esi, 1 xor ebx, eax shr edi, 1 xor edi, ecx add esi, esi and ebx, 1431655765 ; 55555555H and edi, 1431655765 ; 55555555H xor ebx, esi xor edi, edxmov eax, DWORD PTR _ull$[esp+8] mov edx, DWORD PTR _ull$[esp+4] mov ecx, -1431655766 ; aaaaaaaaH lea esi, [eax+eax] lea edi, [edx+edx] and esi, ecx and edi, ecx and eax, ecx and edx, ecx shr eax, 1 shr edx, 1 or eax, esi or edx, edi ; Line 36mov edx, ebx mov esi, edi shrd edx, esi, 2 mov eax, ebx mov ecx, edi shld ecx, eax, 2 shld edi, ebx, 2 shr esi, 2 xor esi, ecx shl eax, 2 xor edx, eax shl ebx, 2 and esi, 858993459 ; 33333333H and edx, 858993459 ; 33333333H xor esi, edi xor edx, ebxmov ecx, -858993460 ; ccccccccH lea esi, [4*eax] lea edi, [4*edx] and esi, ecx and edi, ecx and eax, ecx and edx, ecx shr eax, 2 shr edx, 2 or eax, esi or edx, edi ; Line 38mov ebx, esi mov ecx, esi mov edi, edx mov eax, edx shrd edi, ebx, 4 shld ecx, eax, 4 shld esi, edx, 4 shl eax, 4 xor edi, eax shr ebx, 4 xor ebx, ecx shl edx, 4 and edi, 252645135 ; 0f0f0f0fH and ebx, 252645135 ; 0f0f0f0fH xor ebx, esi xor edi, edxmov ecx, 252645135 ; 0f0f0f0fH mov esi, ecx mov edi, ecx and esi, eax and edi, edx shl esi, 4 shl edi, 4 shr eax, 4 shr edx, 4 and eax, ecx and edx, ecx or eax, esi or edx, edi ; Line 40mov ebp, edi mov esi, ebx shrd ebp, esi, 8 mov eax, edi mov ecx, ebx shld ecx, eax, 8 shr esi, 8 xor esi, ecx shl eax, 8 xor ebp, eax and esi, 16711935 ; 00ff00ffH shld ebx, edi, 8 and ebp, 16711935 ; 00ff00ffH xor esi, ebx shl edi, 8 mov DWORD PTR _ull$11$[esp+12], esi xor ebp, edi; Line 42mov edi, DWORD PTR _ull$11$[esp+12] mov eax, ebp mov ecx, edi mov edx, ebp shrd edx, esi, 16 shld ecx, eax, 16 shr esi, 16 ; 00000010H shl eax, 16 ; 00000010H xor edx, eax xor esi, ecx shld edi, ebp, 16 and esi, 65535 ; 0000ffffH movzx ecx, dx xor esi, edi shl ebp, 16 ; 00000010H; Line 60 pop edimov eax, esi xor ecx, ebppop esixor edx, edx pop ebp or edx, ecx pop ebxbswap eax bswap edx ; Line 61 ret 0 _llreverse ENDP _TEXT ENDS END
shift & maskassignments operating on 8 bits and more can be translated into one or two
BSWAP
instructions instead of 9 or
even 36 (in words: thirty-six instructions!
In the function llreverse()
it fails to recognise that
no shift operation crosses a register boundary and thus the
generation of SHLD
and SHRD
instructions is not necessary!
Repeat the previous steps with the alternate
implementation;
generate another assembly listing file case11.asm
from
the source file case11.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /GS- /Gy /Ox /Tccase11.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case11.c
Display the assembly listing file case11.asm
created in
step 4.:
TYPE case11.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case11.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lreverse PUBLIC _llreverse ; Function compile flags: /Ogtpy ; COMDAT _lreverse _TEXT SEGMENT _ul$ = 8 ; size = 4 _lreverse PROC ; COMDAT ; File c:\users\stefan\desktop\case11.c ; Line 17 mov ecx, DWORD PTR _ul$[esp-4]As bad as above!mov edx, ecx shr edx, 1 lea eax, DWORD PTR [ecx+ecx] xor edx, eax lea eax, DWORD PTR [ecx+ecx] and edx, 1431655765 ; 55555555H xor edx, eaxlea eax, [ecx+ecx] shr ecx, 1 and eax, -1431655766 ; aaaaaaaaH and ecx, 1431655765 ; 55555555H or ecx, eax ; Line 19mov ecx, edx shr ecx, 2 lea eax, DWORD PTR [edx*4] xor ecx, eax lea eax, DWORD PTR [edx*4] and ecx, 858993459 ; 33333333H xor ecx, eaxlea eax, [ecx*4] shr ecx, 2 and eax, -858993460 ; ccccccccH and ecx, 858993459 ; 33333333H or ecx, eax ; Line 21mov edx, ecx mov eax, ecx shl eax, 4 shr edx, 4 xor edx, eax shl ecx, 4 and edx, 252645135 ; 0f0f0f0fH xor edx, ecxmov eax, ecx shr ecx, 4 shl eax, 4 and ecx, 252645135 ; 0f0f0f0fH and eax, -252645136 ; f0f0f0f0H or eax, ecx ; Line 23mov eax, edx mov ecx, edx shr eax, 8 shl ecx, 8 xor eax, ecx shl edx, 8 and eax, 16711935 ; 00ff00ffH xor eax, edx; Line 25rol eax, 16 ; 00000010Hbswap eax ; Line 29 ret 0 _lreverse ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llreverse _TEXT SEGMENT_ull$11$ = 8 ; size = 4_ull$ = 8 ; size = 8 _llreverse PROC ; COMDAT ; File c:\users\stefan\desktop\case11.c ; Line 47mov edx, DWORD PTR _ull$[esp] mov ecx, edx push ebx push ebppush esimov esi, DWORD PTR _ull$[esp+8] mov ebx, esi mov eax, esi shld ecx, eax, 1push edimov edi, edx add eax, eax shrd ebx, edi, 1 shld edx, esi, 1 xor ebx, eax shr edi, 1 xor edi, ecx add esi, esi and ebx, 1431655765 ; 55555555H and edi, 1431655765 ; 55555555H xor ebx, esi xor edi, edxmov eax, DWORD PTR _ull$[esp] mov edx, DWORD PTR _ull$[esp-4] mov ecx, -1431655766 ; aaaaaaaaH lea esi, [eax+eax] lea edi, [edx+edx] and esi, ecx and edi, ecx and eax, ecx and edx, ecx shr eax, 1 shr edx, 1 or eax, esi or edx, edi ; Line 49mov edx, ebx mov esi, edi shrd edx, esi, 2 mov eax, ebx mov ecx, edi shld ecx, eax, 2 shld edi, ebx, 2 shr esi, 2 xor esi, ecx shl eax, 2 xor edx, eax shl ebx, 2 and esi, 858993459 ; 33333333H and edx, 858993459 ; 33333333H xor esi, edi xor edx, ebxmov ecx, -858993460 ; ccccccccH lea esi, [4*eax] lea edi, [4*edx] and esi, ecx and edi, ecx and eax, ecx and edx, ecx shr eax, 2 shr edx, 2 or eax, esi or edx, edi ; Line 51mov ebx, esi mov ecx, esi mov edi, edx mov eax, edx shrd edi, ebx, 4 shld ecx, eax, 4 shld esi, edx, 4 shl eax, 4 xor edi, eax shr ebx, 4 xor ebx, ecx shl edx, 4 and edi, 252645135 ; 0f0f0f0fH and ebx, 252645135 ; 0f0f0f0fH xor ebx, esi xor edi, edxmov ecx, 252645135 ; 0f0f0f0fH mov esi, ecx mov edi, ecx and esi, eax and edi, edx shl esi, 4 shl edi, 4 shr eax, 4 shr edx, 4 and eax, ecx and edx, ecx or eax, esi or edx, edi ; Line 53mov ebp, edi mov esi, ebx shrd ebp, esi, 8 mov eax, edi mov ecx, ebx shld ecx, eax, 8 shr esi, 8 xor esi, ecx shl eax, 8 xor ebp, eax and esi, 16711935 ; 00ff00ffH shld ebx, edi, 8 and ebp, 16711935 ; 00ff00ffH xor esi, ebx shl edi, 8 mov DWORD PTR _ull$11$[esp+12], esi xor ebp, edi; Line 55mov edi, DWORD PTR _ull$11$[esp+12] mov eax, ebp mov ecx, edi mov edx, ebp shrd edx, esi, 16 shld ecx, eax, 16 shr esi, 16 ; 00000010H shl eax, 16 ; 00000010H xor edx, eax xor esi, ecx shld edi, ebp, 16 and esi, 65535 ; 0000ffffH movzx ecx, dx xor esi, edi shl ebp, 16 ; 00000010H; Line 60 pop edimov eax, esi xor ecx, ebppop esixor edx, edx pop ebp or edx, ecx pop ebxbswap eax bswap edx ; Line 61 ret 0 _llreverse ENDP _TEXT ENDS END
Create the text file case12.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__inline
unsigned long htonl(unsigned long ul)
{
#if _MSC_VER >= 1900
__asm movbe eax, ul
#else
__asm mov eax, ul
__asm bswap eax
#endif
}
int main(int argc)
{
unsigned long array[] = {'MSFT', 'MSVC', 'POOR', 'CODE'};
argc = htonl(argc);
for (argc = 0; argc < sizeof(array) / sizeof(*array); argc++)
array[argc] = htonl(array[argc]);
}
Generate the assembly listing file case12.asm
from the
source file case12.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase12.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case12.c
Display the assembly listing file case12.asm
created in
step 2.:
TYPE case12.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case12.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _htonl PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT _htonl _TEXT SEGMENT _ul$ = 8 ; size = 4 _htonl PROC ; COMDAT ; File c:\users\stefan\desktop\case12.c ; Line 7 movbe eax, DWORD PTR _ul$[esp-4] ; Line 12 ret 0 _htonl ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _array$ = -8 ; size = 16 _ul$ = 8 ; size = 4 _argc$ = 8 ; size = 4 _main PROC ; COMDAT ; File c:\users\stefan\desktop\case12.c ; Line 15 sub esp, 16 ; 00000010H ; Line 16 mov DWORD PTR _array$[esp+8], 1297303124 ; 4d534654H mov DWORD PTR _array$[esp+12], 1297307203 ; 4d535643H mov DWORD PTR _array$[esp+16], 1347374930 ; 504f4f52H mov DWORD PTR _array$[esp+20], 1129268293 ; 434f4445H ; Line 18Notice the superfluous in(s)ane transfer of themovbe eax, DWORD PTR _argc$[esp+12]; Line 20 xor ecx, ecx npad 6 $LL4@main: ; Line 21mov eax, DWORD PTR _array$[esp+ecx*4+16] mov DWORD PTR _ul$[esp+12], eax movbe eax, DWORD PTR _ul$[esp+12]movbe eax, DWORD PTR _array$[esp+ecx*4+16] mov DWORD PTR _array$[esp+ecx*4+16], eax inc ecx cmp ecx, 4 jb SHORT $LL4@main ; Line 22 add esp, 16 ; 00000010H ret 0 _main ENDP _TEXT ENDS END
EAX
register to and from the (intermediate) variable _ul$
generated for line 21!
Generate another assembly listing file case12.asm
from
the source file case12.c
created in step 1., now
using the Visual C 2010 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /GS- /Gy /Ox /Tccase12.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1 case12.c
Display the assembly listing file case12.asm
created in
step 4.:
TYPE case12.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 TITLE C:\Users\Stefan\Desktop\case12.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _htonl ; Function compile flags: /Ogtpy ; COMDAT _htonl _TEXT SEGMENT _ul$ = 8 ; size = 4 _htonl PROC ; COMDAT ; File c:\users\stefan\desktop\case12.c ; Line 9 mov eax, DWORD PTR _ul$[esp-4] ; Line 10 bswap eax ; Line 11 ret 0 _htonl ENDP _TEXT ENDS PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _array$ = -16 ; size = 16 $T1040 = 8 ; size = 4 _argc$ = 8 ; size = 4 _main PROC ; COMDAT ; Line 15 push ebp mov ebp, esp sub esp, 16 ; 00000010H ; Line 16 mov DWORD PTR _array$[ebp], 1297303124 ; 4d534654H mov DWORD PTR _array$[ebp+4], 1297307203 ; 4d535643H mov DWORD PTR _array$[ebp+8], 1347374930 ; 504f4f52H mov DWORD PTR _array$[ebp+12], 1129268293 ; 434f4445H ; Line 18Notice the superfluous in(s)ane transfer of themov eax, DWORD PTR _argc$[ebp] bswap eax; Line 20 xor edx, edx $LL3@main: lea ecx, DWORD PTR _array$[ebp+edx*4] ; Line 21 mov eax, DWORD PTR [ecx]mov DWORD PTR $T1040[ebp], eax mov eax, DWORD PTR $T1040[ebp]bswap eax inc edx mov DWORD PTR [ecx], eax cmp edx, 4 jb SHORT $LL3@main ; Line 22 leave ret 0 _main ENDP _TEXT ENDS END
EAX
register to and from the intermediate (temporary) variable
$T1040
generated for line 21!
__forceinline
versus
__inline
by the Visual C 2017 compiler (and all
previous versions too) when specified for a
__fastcall
function with a body written in inline assembler.
Note: the advice against this combination given in the MSDN article Using and Preserving Registers in Inline Assembly does not apply here: there is no code which might clobber any register!
Create the text file case13.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__forceinline // here be dragons!
unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
{
#ifdef MITIGATE
return argument & 0x80000000 ? polynomial ^ (argument << 1) : argument << 1;
#else
__asm // 32-bit linear feedback shift register
{
add ecx, ecx ; ecx = argument << 1
sbb eax, eax ; eax = CF ? -1 : 0
and eax, edx ; eax = CF ? polynomial : 0
xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0)
}
#endif
}
int main()
{
unsigned lfsr = 123456789;
unsigned period = 0;
do
{
period++;
lfsr = lfsr32(lfsr, 0xC5);
} while (lfsr != 123456789);
return period;
}
Note: the constant 0xC5 represents the primitive
polynomial
x32+x7+x6+x2+x0;
it gives the 32-bit
LFSR its
maximum period length of 232−1.
Generate the assembly listing file case13.asm
from the
source file case13.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase13.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case13.c case13.c(4): warning C4100: 'polynomial': unreferenced formal parameter case13.c(4): warning C4100: 'argument': unreferenced formal parameter
Display the assembly listing file case13.asm
created in
step 2.:
TYPE case13.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case13.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC @lfsr32@8 PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT @lfsr32@8 _TEXT SEGMENT @lfsr32@8 PROC ; COMDAT ; _argument$ = ecx ; _polynomial$ = edx ; File c:\users\stefan\desktop\case13.c ; Line 11 add ecx, ecx ; Line 12 sbb eax, eax ; Line 13 and eax, edx ; Line 14 xor eax, ecx ; Line 15 ret 0 @lfsr32@8 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _main PROC ; COMDAT ; File c:\users\stefan\desktop\case13.c ; Line 21 push ebx mov eax, 123456789 ; 075bcd15H ; Line 22 mov edx, 197 ; 000000c5H xor ebx, ebxThe variablexor edx, edx$LL4@main: ; Line 26inc edxinc ebx ; Line 27 mov ecx, eax add ecx, ecx sbb eax, eax and eax, edx xor eax, ecx ; Line 28 cmp eax, 123456789 ; 075bcd15H jne SHORT $LL4@main ; Line 30mov eax, edxmov eax, ebx pop ebx ; Line 31 ret 0 _main ENDP _TEXT ENDS END
lfsr
alias argument
, held in
register ECX
(the first argument of functions with
__fastcall
calling convention), is not initialized with the constant 123456789,
register EDX
(the second argument of functions with
__fastcall
calling convention) is never loaded with the
constant 0xC5, and the return value from the (inlined) function held
in register EAX
is not loaded back into register
ECX
!
__forceinline
with
__inline
the Visual C compiler
generates correct code, but does not inline the function any more.
Note: replacing the function call avoids this compiler bug of course too, but generates no optimised code!
Generate another assembly listing file case13.asm
from
the source file case13.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro MITIGATE
defined on the
command line:
CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tccase13.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case13.c
Display the assembly listing file case13.asm
created in
step 4.:
TYPE case13.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case13.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC @lfsr32@8 PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT @lfsr32@8 _TEXT SEGMENT @lfsr32@8 PROC ; COMDAT ; _argument$ = ecx ; _polynomial$ = edx ; File c:\users\stefan\desktop\case13.c ; Line 5While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to explicitly set the sign flagpush esi; Line 7lea esi, DWORD PTR [ecx+ecx] mov eax, esi xor eax, edx test ecx, ecx cmovns eax, esi pop esiadd ecx, ecx sbb eax, eax and eax, edx xor eax, ecx ; Line 17 ret 0 @lfsr32@8 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _main PROC ; COMDAT ; File c:\users\stefan\desktop\case13.c ; Line 21 mov ecx, 123456789 ; 075bcd15H ; Line 22 xor eax, eaxpush esi$LL4@main: ; Line 7lea esi, DWORD PTR [ecx+ecx]; Line 26 inc eax ; Line 28mov edx, esi xor edx, 197 ; 000000c5H test ecx, ecx cmovns edx, esi mov ecx, edxIFDEF VARIANT lea edx, DWORD PTR [ecx+ecx] sar ecx, 31 and ecx, 197 ; 000000c5H xor ecx, edx ELSE add ecx, ecx sbb edx, edx and edx, 197 ; 000000c5H xor ecx, edx ENDIF cmp ecx, 123456789 ; 075bcd15H jne SHORT $LL4@main ; Line 30pop esi; Line 31 ret 0 _main ENDP _TEXT ENDS END
SF
with a separate
TEST
instruction, the carry
flag CF
set (from the most significant alias sign bit)
by a SHL
(as well as an
ADD
) instruction can be used here; this
variant also doesn’t need the extraneous register
ESI
to preserve the value of the shifted (or doubled)
variable!
Note: the assembly listing also shows an alternative variant.
reversedcase of the second variant from (the previous) case 13.
Create the text file case14.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__forceinline // here be dragons!
unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
{
#ifdef MITIGATE
return argument & 1 ? polynomial ^ (argument >> 1) : argument >> 1;
#else
__asm // 32-bit linear feedback shift register
{
shr ecx, 1 ; ecx = argument >> 1
sbb eax, eax ; eax = CF ? -1 : 0
and eax, edx ; eax = CF ? polynomial : 0
xor eax, ecx ; eax = (argument >> 1) ^ (CF ? polynomial : 0)
}
#endif
}
int main()
{
unsigned lfsr = 123456789;
unsigned period = 0;
do
{
period++;
lfsr = lfsr32(lfsr, 0xA3000000);
} while (lfsr != 123456789);
return period;
}
Note: the constant 0xA3000000 represents the same
primitive polynomial
x32+x30+x26+x25+x0
alias
x32+x7+x6+x2+x0
as 0xC5; it’s just the bit-reversed value and gives the 32-bit
LFSR its
maximum period length of 232−1.
Generate the assembly listing file case14.asm
from the
source file case14.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro MITIGATE
defined on the
command line:
CL.EXE /Bv /c /DMITIGATE /Fa /FoNUL: /Gy /Ox /Tccase14.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case14.c
Display the assembly listing file case14.asm
created in
step 2.:
TYPE case14.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case14.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC @lfsr32@8 PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT @lfsr32@8 _TEXT SEGMENT @lfsr32@8 PROC ; COMDAT ; _argument$ = ecx ; _polynomial$ = edx ; File c:\users\stefan\desktop\case14.c ; Line 5While the generated code is correct, the compiler fails to perform an obvious optimisation: instead to evaluate both terms of the ternary operator first, then overwrite one of the results with the other conditionally, depending on the least significant bit of the original value and determined with a separatepush esi; Line 7mov esi, ecx shr esi, 1 mov eax, esi xor eax, edx and cl, 1 cmove eax, esi pop esishr ecx, 1 sbb eax, eax and eax, edx xor eax, ecx ; Line 17 ret 0 @lfsr32@8 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _main PROC ; COMDAT ; File c:\users\stefan\desktop\case14.c ; Line 20push esi; Line 21 mov eax, 123456789 ; 075bcd15H ; Line 22xor esi, esixor ecx, ecx $LL4@main: ; Line 7 mov edx, eaxmov ecx, eaxshr edx, 1 ; Line 26inc esiinc ecx ; Line 28mov eax, edx xor eax, -1560281088 ; a3000000H; Line 7and cl, 1; Line 28cmove eax, edxand eax, 1 neg eax and eax, -1560281088 ; a3000000H xor eax, edx cmp eax, 123456789 ; 075bcd15H jne SHORT $LL4@main ; Line 30mov eax, esi pop esimov eax, ecx ; Line 31 ret 0 _main ENDP _TEXT ENDS END
AND
instruction, the carry flag
CF
already set by the
SHR
instruction from the
least significant bit can be used here; this variant also
doesn’t need the extraneous register ECX
to
preserve the original value of the shifted variable!
Note: the assembly listing shows an alternative, equally optimised variant.
Create the text file case15.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long right()
{
unsigned long long lfsr = 0x0123456789ABCDEF;
unsigned long long period = 0;
do
{
period++;
#ifdef ALTERNATE
lfsr = (-((long long) lfsr & 1) & 0xD800000000000000) ^ (lfsr >> 1);
#else
lfsr = lfsr & 1 ? 0xD800000000000000 ^ (lfsr >> 1) : lfsr >> 1;
#endif
} while (lfsr != 0x0123456789ABCDEF);
return period;
}
unsigned long long left()
{
unsigned long long lfsr = 0x0123456789ABCDEF;
unsigned long long period = 0;
do
{
period++;
#ifdef ALTERNATE
lfsr = (-((long long) lfsr < 0) & 0x1B) ^ (lfsr << 1);
#else
lfsr = (long long) lfsr < 0 ? 0x1B ^ (lfsr << 1) : lfsr << 1;
#endif
} while (lfsr != 0x0123456789ABCDEF);
return period;
}
Note: both constants 0xD800000000000000 and 0x1B
represent the primitive polynomial
x64+x63+x61+x60+x0
alias
x64+x4+x3+x1+x0;
it gives the 64-bit
LFSR its
maximum period length of 264−1.
Generate the assembly listing file case15.asm
from the
source file case15.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case15.c
Display the assembly listing file case15.asm
created in
step 2.:
TYPE case15.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC right PUBLIC left ; Function compile flags: /Ogtpy ; COMDAT right _TEXT SEGMENT right PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 5 mov r9, 81985529216486895 ; 0123456789abcdefH ; Line 6 xor r8d, r8d mov rax, r9 mov r10, -2882303761517117440 ; d800000000000000H npad 6 $LL4@right: ; Line 14 mov rdx, raxWhile the code generated for the functionmovzx ecx, alshr rdx, 1 inc r8 ; Line 16mov rax, rdx xor rax, r10 and cl, 1 cmove rax, rdxand eax, 1 neg rax and rax, r10 xor rax, edx cmp rax, r9 jne SHORT $LL4@right ; Line 18 mov rax, r8 ; Line 19 ret 0 right ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT left _TEXT SEGMENT left PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 23 mov r9, 81985529216486895 ; 0123456789abcdefH ; Line 24 xor eax, eax mov rcx, r9 npad 1 $LL4@left: ; Line 34mov rdx, rcx lea r8, QWORD PTR [rcx+rcx]inc raxmov rcx, r8 xor rcx, 27 test rdx, rdx cmovns rcx, r9add rcx, rcx sbb rdx, rdx and rdx, 27 xor rcx, rdx cmp rcx, r9 jne SHORT $LL4@left ; Line 37 ret 0 left ENDP _TEXT ENDS END
right()
is
correct, the compiler fails to perform an obvious optimisation:
instead to evaluate both terms of the ternary operator first, then
overwrite one of the results with the other conditionally, depending
on the least significant bit of the original value and determined
with a separate AND
instruction,
the carry flag CF
already set by the
SHR
instruction from the
least significant bit can be used here; this variant also
doesn’t need the extraneous register RCX
to
preserve the original value of the shifted variable!
Additionally the registers RAX
and R8
can
be swapped, rendering the MOV
instruction
generated for line 14 superfluous.
While the code generated for the function left()
is
correct too, the compiler likewise fails to perform an even more
obvious optimisation: instead to explicitly set the sign flag
SF
with a separate
TEST
instruction, the carry
flag CF
set (from the most significant alias sign bit)
by a SHL
(as well as an
ADD
) instruction can be used here; this
variant also doesn’t need the extraneous register
R8
to preserve the value of the shifted (or doubled)
variable!
Generate another assembly listing file case15.asm
from
the source file case15.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case15.c
Display the assembly listing file case15.asm
created in
step 4.:
TYPE case15.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case15.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _right PUBLIC _left ; Function compile flags: /Ogtpy ; COMDAT _right _TEXT SEGMENTThe code generated for the function_period$ = -8 ; size = 8_right PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 4sub esp, 8push esixorps xmm0, xmm0; Line 5 mov ecx, -1985229329 ; 89abcdefH ; Line 6movlpd QWORD PTR _period$[esp+12], xmm0mov eax, 19088743 ; 01234567Hmov esi, DWORD PTR _period$[esp+16]xor esi, esi push edimov edi, DWORD PTR _period$[esp+16]xor edi, edi $LL4@right: ; Line 10 add edi, 1 ; Line 14 mov edx, ecx adc esi, 0and ecx, 1 shrd edx, eax, 1shrd ecx, eax, 1 shr eax, 1or ecx, 0 mov ecx, edx je SHORT $LN7@right xor ecx, 0 xor eax, -671088640 ; d8000000H $LN7@right:and edx, 1 neg edx and edx, -671088640 ; d8000000H xor eax, edx ; Line 16 cmp ecx, -1985229329 ; 89abcdefH jne SHORT $LL4@right cmp eax, 19088743 ; 01234567H jne SHORT $LL4@right ; Line 18 mov eax, edi mov edx, esi pop edi pop esi ; Line 19add esp, 8ret 0 _right ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _left _TEXT SEGMENT_period$ = -8 ; size = 8_left PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 22sub esp, 8push ebxxorps xmm0, xmm0; Line 23 mov eax, -1985229329 ; 89abcdefHpush esi; Line 24movlpd QWORD PTR _period$[esp+16], xmm0mov ecx, 19088743 ; 01234567Hmov ebx, DWORD PTR _period$[esp+16]xor ebx, ebx push edimov edi, DWORD PTR _period$[esp+24]xor edi, edi $LL4@left: ; Line 28 add ebx, 1mov edx, eax mov esi, ecxadc edi, 0shld esi, edx, 1 add edx, edxadd eax, eax adc ecx, ecx ; Line 32test ecx, ecx jg SHORT $LN6@left jl SHORT $LN11@left test eax, eax jae SHORT $LN6@left $LN11@left: mov eax, edx mov ecx, esi xor eax, 27 ; 0000001bH xor ecx, 0 jmp SHORT $LN7@left $LN6@left: mov eax, edx mov ecx, esi $LN7@left:sbb edx, edx and edx, 27 ; 0000001bH xor eax, edx ; Line 34 cmp eax, -1985229329 ; 89abcdefH jne SHORT $LL4@left cmp ecx, 19088743 ; 01234567H jne SHORT $LL4@left ; Line 36 mov edx, edi mov eax, ebx pop edipop esipop ebx ; Line 37add esp, 8ret 0 _left ENDP _TEXT ENDS END
right()
is
totally screwed up: the variable period
is allocated
on the stack, zeroed using the
SSE register
XMM0
, then loaded into the registers ESI
and EDI
, but never used again; instead
to hold the variable period
in the register pair
EDX:EAX
used for the return value, it is held in the
registers EDI
and ESI
, which have to be
transferred into EDX:EAX
upon exit; register
ECX
, which holds the lower half of the variable
lfsr
, is clobbered inside the loop without necessity
and has to be reloaded; the result of the
AND
instruction present in the
EFLAGS
register is ignored, and evaluated again with an
extraneous OR
instruction;
the XOR
instruction with immediate operand 0 has no
effect and is superfluous too!
The code generated for the function left()
is even
worse: again the variable period
is allocated on the
stack, zeroed using the
SSE register
XMM0
, then loaded into the registers ESI
and EDI
, but never used again; instead
to hold the variable period
in the register pair
EDX:EAX
used for the return value, it is held in the
registers EBX
and EDI
, which have to be
transferred into EDX:EAX
upon exit; instead to use the
carry flag CF
already set by the
SHLD
instruction,
or the sign flag SF
set by the first
TEST
instruction, a full
comparison against 0 is performed, using three conditional branch
instructions; the registers EAX
and ECX
,
which hold the variable lfsr
, are copied without
necessity into the registers EDX
and ESI
,
which are then used for the shift and exclusive-or operation; the
XOR
instruction with
immediate operand 0 has no effect and is superfluous!
Generate another assembly listing file case15.asm
from
the source file case15.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase15.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case15.c
Display the assembly listing file case15.asm
created in
step 6.:
TYPE case15.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case15.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _right PUBLIC _left ; Function compile flags: /Ogtpy ; COMDAT _right _TEXT SEGMENT _period$ = -8 ; size = 8 _right PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 4 sub esp, 8 push ebx xorps xmm0, xmm0 ; Line 5 mov edx, -1985229329 ; 89abcdefH push esi ; Line 6 movlpd QWORD PTR _period$[esp+16], xmm0 mov eax, 19088743 ; 01234567H mov ebx, DWORD PTR _period$[esp+16] push edi mov edi, DWORD PTR _period$[esp+24] $LL4@right: ; Line 10 add ebx, 1 ; Line 12 mov ecx, edx adc edi, 0 xor esi, esi and ecx, 1 neg ecx adc esi, esi xor ecx, ecx shrd edx, eax, 1 neg esi and esi, -671088640 ; d8000000H shr eax, 1 xor edx, ecx xor eax, esi ; Line 16 cmp edx, -1985229329 ; 89abcdefH jne SHORT $LL4@right cmp eax, 19088743 ; 01234567H jne SHORT $LL4@right ; Line 18 mov edx, edi mov eax, ebx pop edi pop esi pop ebx ; Line 19 add esp, 8 ret 0 _right ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _left _TEXT SEGMENT $T1 = -8 ; size = 8 _period$ = -8 ; size = 8 _left PROC ; COMDAT ; File c:\users\stefan\desktop\case15.c ; Line 22 sub esp, 8 push ebx xorps xmm0, xmm0 ; Line 23 mov ecx, -1985229329 ; 89abcdefH push esi ; Line 24 movlpd QWORD PTR _period$[esp+16], xmm0 mov eax, 19088743 ; 01234567H mov edx, DWORD PTR _period$[esp+20] mov esi, DWORD PTR _period$[esp+16] push edi $LL4@left: ; Line 28 add esi, 1 adc edx, 0 ; Line 30 test eax, eax jg SHORT $LN6@left jl SHORT $LN11@left test ecx, ecx jae SHORT $LN6@left $LN11@left: mov edi, 27 ; 0000001bH xor ebx, ebx jmp SHORT $LN7@left $LN6@left: xorps xmm0, xmm0 movlpd QWORD PTR $T1[esp+20], xmm0 mov ebx, DWORD PTR $T1[esp+24] mov edi, DWORD PTR $T1[esp+20] $LN7@left: shld eax, ecx, 1 add ecx, ecx xor eax, ebx xor ecx, edi ; Line 34 cmp ecx, -1985229329 ; 89abcdefH jne SHORT $LL4@left cmp eax, 19088743 ; 01234567H jne SHORT $LL4@left ; Line 36 pop edi mov eax, esi pop esi pop ebx ; Line 37 add esp, 8 ret 0 _left ENDP _TEXT ENDS ENDThe code generated is as bad as in step 4. before!
Create the text file case16.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
__forceinline
long lfsr32(long argument, long polynomial)
{
return (((long long) argument >> 32) & polynomial) ^ (argument << 1);
}
int main()
{
unsigned lfsr = 123456789;
unsigned period = 0;
do
{
period++;
lfsr = lfsr32(lfsr, 0xC5);
} while (lfsr != 123456789);
return period;
}
Note: the constant 0xC5 represents the primitive
polynomial
x32+x7+x6+x2+x0;
it gives the 32-bit
LFSR its
maximum period length of 232−1.
Generate the assembly listing file case16.asm
from the
source file case16.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase16.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case16.c
Display the assembly listing file case16.asm
created in
step 2.:
TYPE case16.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case16.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lfsr32 PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT _lfsr32 _TEXT SEGMENT _argument$ = 8 ; size = 4 _polynomial$ = 12 ; size = 4 _lfsr32 PROC ; COMDAT ; File c:\users\stefan\desktop\case16.c ; Line 5The registerspush esi; Line 6mov esi, DWORD PTR _argument$[esp] mov eax, esimov eax, DWORD PTR _argument$[esp-4] cdqmov ecx, edxand edx, DWORD PTR _polynomial$[esp-4]sar ecx, 31 ; 0000001fH lea eax, DWORD PTR [esi+esi]add eax, eax xor eax, edxpop esi; Line 7 ret 0 _lfsr32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _main PROC ; COMDAT ; File c:\users\stefan\desktop\case16.c ; Line 10push esi push edi; Line 11mov ecx, 123456789 ; 075bcd15Hmov eax, 123456789 ; 075bcd15H ; Line 12xor edi, edixor ecx, ecx npad 7 $LL4@main: ; Line 6mov eax, ecx; Line 16inc ediinc ecx ; Line 6 cdq add eax, eaxadd ecx, ecx mov esi, edxand edx, 197 ; 000000c5H xor eax, edxxor ecx, edx sar esi, 31 ; 0000001fH; Line 18cmp ecx, 123456789 ; 075bcd15Hcmp eax, 123456789 ; 075bcd15H jne SHORT $LL4@main ; Line 20 mov eax, ecxmov eax, edi pop edi pop esi; Line 21 ret 0 _main ENDP _TEXT ENDS END
EDI
and ESI
are used and
clobbered without necessity and reason.
MOV
and
SAR
instructions
generated for line 6: their result is never used!
Create the text file case17.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long lcg64() // linear congruential generator
{
static unsigned long long z = 1066149217761810ULL;
z = z * 6906969069ULL + 1234567ULL;
return z;
}
Note: both constants are from George
Marsaglia’s
KISS64
pseudo-random number generator; they give the 64-bit
LCG its
maximum period length of 264.
Generate the assembly listing file case17.asm
from the
source file case17.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase17.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case17.c
Display the assembly listing file case17.asm
created in
step 2.:
TYPE case17.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case17.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lcg64While the generated code is correct, the compiler fails to perform an obvious optimisation: the constant 6906969069 is equal to 232+2612001773 (the hexadecimal notation 0x19BAFFBED shows this immediately); multiplication with 232 can be replaced by a simple addition.EXTRN __allmul:PROC_DATA SEGMENT ?z@?1??lcg64@@9@9 DQ 0003c9a83566fa12H ; `lcg64'::`2'::z _DATA ENDS ; Function compile flags: /Ogtpy ; COMDAT _lcg64 _TEXT SEGMENT _lcg64 PROC ; COMDAT ; File c:\users\stefan\desktop\case17.c ; Line 7push 1 push -1682965523 ; 9baffbedH push DWORD PTR ?z@?1??lcg64@@9@9+4 push DWORD PTR ?z@?1??lcg64@@9@9 call __allmulmov ecx, -1682965523 ; 9baffbedH mov eax, DWORD PTR ?z@?1??lcg64@@9@9 mul ecx add edx, DWORD PTR ?z@?1??lcg64@@9@9 imul ecx, DWORD PTR ?z@?1??lcg64@@9@9+4 add eax, 1234567 ; 0012d687H mov DWORD PTR ?z@?1??lcg64@@9@9, eaxadc edx, 0adc edx, ecx mov DWORD PTR ?z@?1??lcg64@@9@9+4, edx ; Line 10 ret 0 _lcg64 ENDP _TEXT ENDS END
Note: an optimising
compiler should clearly
not emit 5 instructions for the call of an
external routine to multiply 64-bit values, but emit the 6
instructions which can perform this operation inline!
Create the text file case18.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long msws32(void) // enhanced middle-square generator
{
static unsigned long v = 0UL;
static unsigned long w = 0UL;
w += 0x9E3779B9UL;
v = (unsigned long) __ull_rshift(__emulu(v, v), 16);
v += w;
v = _byteswap_ulong(v);
return v;
}
unsigned long mswsbw(void) // enhanced middle-square generator
{
static unsigned long long v = 0ULL;
static unsigned long long w = 0ULL;
w += 0x9E3779B97F4A7C15ULL;
v *= v;
v += w;
v = (v << 32) | (v >> 32);
return (unsigned long) v;
}
unsigned long long msws64(void) // enhanced middle-square generator
{
static unsigned long long v = 0ULL;
static unsigned long long w = 0ULL;
#ifdef _WIN64
const unsigned long long x;
const unsigned long long y = _umul128(v, v, &x);
v = __shiftright128(y, x, 32);
#else
v = (__emulu((unsigned long) v, (unsigned long) v) >> 32)
+ (__emulu((unsigned long) v, (unsigned long) (v >> 32)) << 1)
+ (__emulu((unsigned long) (v >> 32), (unsigned long) (v >> 32)) << 32);
#endif
w += 0x9E3779B97F4A7C15ULL;
v += w;
v = _byteswap_uint64(v);
return v;
}
int main(void)
{
#ifdef _WIN64
volatile unsigned long long ull = msws64();
#else
volatile unsigned long ul = msws32();
#endif
}
Note: the constants 0x9E3779B9 and
0x9E3779B97F4A7C15 are the fractional part of the
golden ratioΦ = (√5+1)/2, which is equal to the inverse or reciprocal value φ = 1/Φ = Φ−1 = (√5−1)/2 = 0.6180339887498948482…, multiplied by 232 and 264 respectively – or just 232/Φ and 264/Φ.
Generate the assembly listing file case18.asm
from the
source file case18.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase18.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case18.c case18.c(55): warning C4189: 'ul': local variable is initialized but not referenced
Display the assembly listing file case18.asm
created in
step 2.:
TYPE case18.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case18.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _msws32 PUBLIC _mswsbw PUBLIC _msws64 PUBLIC _mainWhile the code generated for the functionEXTRN __allmul:PROC_BSS SEGMENT ?v@?1??msws32@@9@9 DQ 01H DUP (?) ; `msws32'::`2'::v ?w@?1??msws32@@9@9 DQ 01H DUP (?) ; `msws32'::`2'::w ?v@?1??mswsbw@@9@9 DQ 01H DUP (?) ; `mswsbw'::`2'::v ?w@?1??mswsbw@@9@9 DQ 01H DUP (?) ; `mswsbw'::`2'::w ?v@?1??msws64@@9@9 DQ 01H DUP (?) ; `msws64'::`2'::v ?w@?1??msws64@@9@9 DQ 01H DUP (?) ; `msws64'::`2'::w _BSS ENDS ; Function compile flags: /Ogtpy ; COMDAT _msws32 _TEXT SEGMENT _msws32 PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 9 mov eax, DWORD PTR ?v@?1??msws32@@9@9 mul eaxpush esi mov esi, DWORD PTR ?w@?1??msws32@@9@9shrd eax, edx, 16 mov edx, DWORD PTR ?w@?1??msws32@@9@9sub esi, 1640531527 ; 61c88647Hadd edx, -1640531527 ; 9e3779b9Hadd eax, esiadd eax, edxmov DWORD PTR ?w@?1??msws32@@9@9, esimov DWORD PTR ?w@?1??msws32@@9@9, edx ; Line 11shr edx, 16 ; 00000010Hmov DWORD PTR ?v@?1??msws32@@9@9, eax ; Line 13pop esi; Line 14 ret 0 _msws32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _mswsbw _TEXT SEGMENT _mswsbw PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 21 mov ecx, DWORD PTR ?w@?1??mswsbw@@9@9 add ecx, 2135587861 ; 7f4a7c15H mov DWORD PTR ?w@?1??mswsbw@@9@9, ecx mov ecx, DWORD PTR ?w@?1??mswsbw@@9@9+4 adc ecx, -1640531527 ; 9e3779b9H mov DWORD PTR ?w@?1??mswsbw@@9@9+4, ecx ; Line 22 mov ecx, DWORD PTR ?v@?1??mswsbw@@9@9 mov eax, ecx mul eax imul ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4 add ecx, ecx add edx, ecx ; Line 23 add eax, DWORD PTR ?w@?1??mswsbw@@9@9 adc edx, DWORD PTR ?w@?1??mswsbw@@9@9+4 ; Line 22mov ecx, DWORD PTR ?v@?1??mswsbw@@9@9+4 mov eax, DWORD PTR ?v@?1??mswsbw@@9@9 push esi mov esi, DWORD PTR ?w@?1??mswsbw@@9@9+4 push edi mov edi, DWORD PTR ?w@?1??mswsbw@@9@9 push ecx push eax add edi, 2135587861 ; 7f4a7c15H push ecx adc esi, -1640531527 ; 9e3779b9H mov DWORD PTR ?w@?1??mswsbw@@9@9, edi push eax mov DWORD PTR ?w@?1??mswsbw@@9@9+4, esi call __allmul add eax, edi; Line 26pop edi adc edx, esi xor ecx, ecx or ecx, eaxmov DWORD PTR ?v@?1??mswsbw@@9@9, edxmov DWORD PTR ?v@?1??mswsbw@@9@9+4, ecxmov dword PTR ?v@?1??mswsbw@@9@9+4, eax mov eax, edxpop esi; Line 27 ret 0 _mswsbw ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _msws64 _TEXT SEGMENT _msws64 PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 43add DWORD PTR ?w@?1??msws64@@9@9, 2135587861 ; 7f4a7c15H mov ecx, DWORD PTR ?v@?1??msws64@@9@9+4 mov eax, ecx push ebx mov ebx, DWORD PTR ?w@?1??msws64@@9@9+4 adc ebx, -1640531527 ; 9e3779b9H push ebp mul ecx push esi mov esi, DWORD PTR ?v@?1??msws64@@9@9 mov ebp, eax push edi mov edi, edx mov DWORD PTR ?w@?1??msws64@@9@9+4, ebx shld edi, ebp, 31 mov eax, esi mul ecx shl ebp, 31 ; 0000001fH; Line 44add ebp, eax mov eax, esi adc edi, ebx shld edi, ebp, 1 mul esi add ebp, ebp add ebp, edx adc edi, 0 add ebp, DWORD PTR ?w@?1??msws64@@9@9; Line 45bswap ebp adc edi, ebx mov DWORD PTR ?v@?1??msws64@@9@9+4, ebp bswap edi mov DWORD PTR ?v@?1??msws64@@9@9, edi; Line 47mov eax, edi pop edi pop esi mov edx, ebp pop ebp pop ebx; Line 39 push ebx mov eax, DWORD PTR ?v@?1??msws64@@9@9 mov ebx, eax mul eax mov ecx, edx mov eax, ebx mov ebx, DWORD PTR ?v@?1??msws64@@9@9+4 mul ebx imul ebx, ebx add eax, eax adc edx, edx add eax, ecx adc edx, ebx ; Line 42 mov ecx, DWORD PTR ?w@?1??msws64@@9@9 mov ebx, DWORD PTR ?w@?1??msws64@@9@9+4 add ecx, 2135587861 ; 7f4a7c15H adc ebx, -1640531527 ; 9e3779b9H mov DWORD PTR ?w@?1??msws64@@9@9, ecx mov DWORD PTR ?w@?1??msws64@@9@9+4, ebx ; Line 44 add eax, ecx adc edx, ebx ; Line 45 bswap eax bswap edx xchg eax, edx mov DWORD PTR ?v@?1??msws64@@9@9, eax mov DWORD PTR ?v@?1??msws64@@9@9+4, edx pop ebx ; Line 47 ret 0 _msws64 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT _ul$ = -4 ; size = 4 _main PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 51push ecxpush 0 ; Line 9 mov eax, DWORD PTR ?v@?1??msws32@@9@9 mul eax ; Line 51push esi; Line 8mov esi, DWORD PTR ?w@?1??msws32@@9@9mov ecx, DWORD PTR ?w@?1??msws32@@9@9 ; Line 9 shrd eax, edx, 16sub esi, 1640531527 ; 61c88647Hadd ecx, -1640531527 ; 9e3779b9Hadd eax, esiadd eax, ecxmov DWORD PTR ?w@?1??msws32@@9@9, esimov DWORD PTR ?w@?1??msws32@@9@9, ecx ; Line 11 bswap eax mov DWORD PTR ?v@?1??msws32@@9@9, eaxshr edx, 16 ; 00000010H; Line 55mov DWORD PTR _ul$[esp+8], eaxmov DWORD PTR _ul$[esp+4], eax ; Line 57xor eax, eax pop esi pop ecxpop eax ret 0 _main ENDP _TEXT ENDS END
msws32()
is
correct, there is no reason to use and clobber register
ESI
instead of the volatileregister
ECX
!
SHR
instruction: its result
is never used.
While the code generated for the function mswsbw()
is
correct, an optimising
compiler should not
emit 7 instructions to call an external routine for squaring a
64-bit value, but emit the 6 instructions which can perform this
operation inline!
Also notice the superfluous
XOR
and
OR
instructions generated
for line 26.
While the code generated for the function msws64()
is
correct too, it has 39 instructions and clobbers all registers, but
still performs multiple avoidable transfers between them; the
optimal code has only 28 instructions and clobbers just 1 register!
Especially notice the weird
way to move the contents of
register EBP
into register EDI
in lines
43 and 44, using two
SHLD
plus a
SHL
instruction.
While the code generated for the function main()
is
correct, there is no reason to use and clobber register
ESI
instead of the volatile
register
ECX
!
Again notice the superfluous
SHR
instruction: its result
is never used.
Generate another assembly listing file case18.asm
from
the source file case18.c
created in step 1., now
using the Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase18.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case18.c case18.c(34): warning C4132: 'x': const object should be initialized case18.c(53): warning C4189: 'ull': local variable is initialized but not referenced
Display the assembly listing file case18.asm
created in
step 4.:
TYPE case18.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC msws32 PUBLIC mswsbw PUBLIC msws64 PUBLIC main _BSS SEGMENT ?v@?1??msws32@@9@9 DD 01H DUP (?) ; `msws32'::`2'::v ?w@?1??msws32@@9@9 DD 01H DUP (?) ; `msws32'::`2'::w ?v@?1??mswsbw@@9@9 DQ 01H DUP (?) ; `mswsbw'::`2'::v ?w@?1??mswsbw@@9@9 DQ 01H DUP (?) ; `mswsbw'::`2'::w ?v@?1??msws64@@9@9 DQ 01H DUP (?) ; `msws64'::`2'::v ?w@?1??msws64@@9@9 DQ 01H DUP (?) ; `msws64'::`2'::w _BSS ENDS ; Function compile flags: /Ogtpy ; COMDAT msws32 _TEXT SEGMENT msws32 PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 9 mov eax, DWORD PTR ?v@?1??msws32@@9@9 mov r8d, DWORD PTR ?w@?1??msws32@@9@9 mul rax add r8d, -1640531527 ; 9e3779b9H shr rax, 16 add eax, r8d mov DWORD PTR ?w@?1??msws32@@9@9, r8d ; Line 11 bswap eax mov DWORD PTR ?v@?1??msws32@@9@9, eax ; Line 14 ret 0 msws32 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT mswsbw _TEXT SEGMENT mswsbw PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 21While the code generated for the functionmov rcx, QWORD PTR ?w@?1??mswsbw@@9@9 mov rax, 7046029254386353131 ; 61c8864680b583ebH sub rcx, raxmov rcx, -7046029254386353131 ; 9e3779b97f4a7c15H add rcx, QWORD PTR ?w@?1??mswsbw@@9@9 ; Line 22 mov rax, QWORD PTR ?v@?1??mswsbw@@9@9 imul rax, rax mov QWORD PTR ?w@?1??mswsbw@@9@9, rcx add rax, rcx ; Line 24 rol rax, 32 ; 00000020H mov QWORD PTR ?v@?1??mswsbw@@9@9, rax ; Line 27 ret 0 mswsbw ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT main _TEXT SEGMENTx$1 = 8ull$ = 8 main PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 35 mov rax, QWORD PTR ?v@?1??msws64@@9@9 ; Line 43mov r8, 7046029254386353131 ; 61c8864680b583ebH mov rcx, QWORD PTR ?w@?1??msws64@@9@9mov rcx, -7046029254386353131 ; 9e3779b97f4a7c15H mul raxsub rcx, r8add rcx, QWORD PTR ?w@?1??mswsbw@@9@9 shrd rax, rdx, 32 ; 00000020Hmov QWORD PTR x$1[rsp], rdx; Line 44 add rax, rcx mov QWORD PTR ?w@?1??msws64@@9@9, rcx ; Line 45 bswap rax mov QWORD PTR ?v@?1??msws64@@9@9, rax ; Line 53 mov QWORD PTR ull$[rsp], rax ; Line 57 xor eax, eax ret 0 main ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT msws64 _TEXT SEGMENT msws64 PROC ; COMDAT ; File c:\users\stefan\desktop\case18.c ; Line 35 mov rax, QWORD PTR ?v@?1??msws64@@9@9 ; Line 43mov r8, 7046029254386353131 ; 61c8864680b583ebH mov rcx, QWORD PTR ?w@?1??msws64@@9@9mov rcx, -7046029254386353131 ; 9e3779b97f4a7c15H mul raxsub rcx, r8add rcx, QWORD PTR ?w@?1??mswsbw@@9@9 shrd rax, rdx, 32 ; 00000020H mov QWORD PTR ?w@?1??msws64@@9@9, rcx ; Line 44 add rax, rcx ; Line 45 bswap rax mov QWORD PTR ?v@?1??msws64@@9@9, rax ; Line 48 ret 0 msws64 ENDP _TEXT ENDS END
msws64()
is
correct, there is no reason to clobber register R8
.
MOV
instruction to the superfluous temporary variable x$1
in the function main()
.
Create the text file case19.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long sequence(void)
{
static unsigned long long weyl = 0ULL;
weyl += 0x9E3779B97F4A7C15ULL;
return weyl ^ (weyl >> 31);
}
Generate the assembly listing file case19.asm
from the
source file case19.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase19.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case19.c
Display the assembly listing file case19.asm
created in
step 4.:
TYPE case19.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case19.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _sequence _BSS SEGMENT ?weyl@?1??sequence@@9@9 DQ 01H DUP (?) ; `sequence'::`2'::weyl _BSS ENDS ; Function compile flags: /Ogtpy ; COMDAT _sequence _TEXT SEGMENT _sequence PROC ; File c:\users\stefan\desktop\case19.c ; Line 7While the generated code is correct, it clobbers registermov ecx, DWORD PTR ?weyl@?1??sequence@@9@9+4 push esi mov esi, DWORD PTR ?weyl@?1??sequence@@9@9 add esi, 2135587861 ; 7f4a7c15H mov eax, esi mov DWORD PTR ?weyl@?1??sequence@@9@9, esi adc ecx, -1640531527 ; 9e3779b9H mov edx, ecx mov DWORD PTR ?weyl@?1??sequence@@9@9+4, ecxmov eax, DWORD PTR ?weyl@?1??sequence@@9@9 mov edx, DWORD PTR ?weyl@?1??sequence@@9@9+4 add eax, 2135587861 ; 7f4a7c15H adc edx, -1640531527 ; 9e3779b9H mov DWORD PTR ?weyl@?1??sequence@@9@9, eax mov DWORD PTR ?weyl@?1??sequence@@9@9+4, edx mov ecx, eax shrd eax, edx, 31 xor eax, ecx mov ecx, edx shr edx, 31 ; 0000001fH ; Line 9xor eax, esixor edx, ecxpop esi; Line 10 ret 0 _sequence ENDP _TEXT ENDS END
ESI
without necessity.
Create the text file case20.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#ifdef _WIN64
unsigned long long nearlydivisionless(unsigned long long range,
unsigned long long (*random64)(void))
{
unsigned long long value = random64();
unsigned long long limit;
unsigned long long high;
unsigned long long low = _umul128(value, range, &high);
if (low < range)
for (limit = (0 - range) % range;
low < limit;
low = _umul128(value, range, &high))
value = random64();
return high;
}
#else
unsigned long nearlydivisionless(unsigned long range,
unsigned long (*random32)(void))
{
unsigned long value = random32();
unsigned long limit;
unsigned long long multi = __emulu(value, range);
if (range > (unsigned long) multi)
for (limit = (0 - range) % range;
limit > (unsigned long) multi;
multi = __emulu(value, range))
value = random32();
return multi >> 32;
}
#endif
Note: the function
nearlydivisionless()
returns a uniform distributed
(pseudo-random) value in the interval
[0, range
); for the discussion of the algorithm
see Daniel Lemire’s blog post
Fast Bounded Random Numbers on GPUs.
Generate the assembly listing file case20.asm
from the
source file case20.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase20.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case20.c
Display the assembly listing file case20.asm
created in
step 2.:
TYPE case20.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC nearlydivisionless … ; Function compile flags: /Ogtpy ; COMDAT nearlydivisionless _TEXT SEGMENT range$ = 48 random64$ = 56 nearlydivisionless PROC ; COMDAT ; File c:\users\stefan\desktop\case20.c ; Line 6 $LN15:Instead to use themov QWORD PTR [rsp+16], rbx push rsi sub rsp, 32 ; 00000020Hsub rsp, 40 ; 00000028Hmov rsi, rdxmov r11, rdxmov rbx, rcxmov r10, rcx ; Line 7call rsicall r11 mov r8, rax ; Line 10mov rax, rbxmov rax, r10 mul r8 mov rcx, rdx mov r8, rax ; Line 12cmp rax, rbxcmp rax, r10 jae SHORT $LN12@nearlydivi ; Line 13 xor edx, edxmov QWORD PTR [rsp+48], rdi mov rax, rbxmov rax, r10 neg raxdiv rbxdiv r10mov rdi, rdxmov r9, rdx ; Line 14 cmp r8, rdx jae SHORT $LN11@nearlydivi npad 2 $LL4@nearlydivi: ; Line 16call rsicall r11 mov rcx, raxmov rax, rbxmov rax, r10 mul rcxcmp rax, rdicmp rax, r9 jb SHORT $LL4@nearlydivi ; Line 18mov rdi, QWORD PTR [rsp+48]mov rax, rdx ; Line 19mov rbx, QWORD PTR [rsp+56]add rsp, 40 ; 00000028Hadd rsp, 32 ; 00000020H pop rsiret 0 $LN11@nearlydivi:mov rdi, QWORD PTR [rsp+48]; Line 18mov rax, rcx; Line 19mov rbx, QWORD PTR [rsp+56] add rsp, 32 ; 00000020H pop rsi ret 0$LN12@nearlydivi:mov rbx, QWORD PTR [rsp+56]mov rax, rcx add rsp, 40 ; 00000028Hadd rsp, 32 ; 00000020H pop rsiret 0 nearlydivisionless ENDP _TEXT ENDS END
volatileregisters
R9
,
R10
and R11
, the generated code clobbers
the registers RBX
, RDI
and
RSI
without necessity, and uses 11 (in words:
eleven) superfluous instructions to save and
restore them.
$LN12@nearlydivi:
, and the same 5
instructions emitted again immediately after that label: 14 (in
words: fourteen) from a total of 45 instructions
are superfluous!
Generate another assembly listing file case20.asm
from
the source file case20.c
created in step 1., now
using the Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase20.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case20.c
Display the assembly listing file case20.asm
created in
step 4.:
TYPE case20.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case20.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _nearlydivisionless ; Function compile flags: /Ogtpy ; COMDAT _nearlydivisionless _TEXT SEGMENT _range$ = 8 ; size = 4 _random32$ = 12 ; size = 4 _nearlydivisionless PROC ; COMDAT ; File c:\users\stefan\desktop\case20.c ; Line 23 push ebx ; Line 24 mov ebx, DWORD PTR _random32$[esp] push ebp push esi call ebx ; Line 26 mov esi, DWORD PTR _range$[esp+8] mul esi mov ebp, eax mov ecx, edx ; Line 28 cmp esi, ebp jbe SHORT $LN12@nearlydivi ; Line 29 mov eax, esi xor edx, edx neg eax div esi push edi mov edi, edx ; Line 30 cmp edi, ebp jbe SHORT $LN11@nearlydivi $LL4@nearlydivi: ; Line 32 call ebx mul esi cmp edi, eax ja SHORT $LL4@nearlydivi ; Line 35 pop edi pop esi pop ebp mov eax, edx pop ebx ret 0 $LN11@nearlydivi: pop edipop esi pop ebp mov eax, ecx pop ebx ret 0$LN12@nearlydivi: pop esi pop ebp mov eax, ecx pop ebx ret 0 _nearlydivisionless ENDP _TEXT ENDS END
Create the text file case21.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int islarge(unsigned long long ull)
{
#ifndef ALTERNATE
return ull > 0xFFFFFFFFULL;
#else
return (unsigned long) (ull >> 32) != 0UL;
#endif
}
int overflow(unsigned long multiplicand, unsigned long multiplier)
{
#ifndef ALTERNATE
return __emulu(multiplicand, multiplier) > 0xFFFFFFFFULL;
#else
return (unsigned long) (__emulu(multiplicand, multiplier) >> 32) != 0UL;
#endif
}
Generate the assembly listing file case21.asm
from the
source file case21.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase21.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case21.c
Display the assembly listing file case21.asm
created in
step 2.:
TYPE case21.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case21.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _islarge PUBLIC _overflow ; Function compile flags: /Ogtpy ; COMDAT _islarge _TEXT SEGMENT _ull$ = 8 ; size = 8 _islarge PROC ; COMDAT ; File c:\users\stefan\desktop\case21.c ; Line 6The compiler generates a superfluouscmp DWORD PTR _ull$[esp], 0 ja SHORT $LN5@islarge cmp DWORD PTR _ull$[esp-4], -1 ja SHORT $LN5@islargexor eax, eax cmp eax, DWORD PTR _ull$[esp] setne al ; Line 10ret 0 $LN5@islarge:; Line 6mov eax, 1; Line 10 ret 0 _islarge ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _overflow _TEXT SEGMENT _multiplicand$ = 8 ; size = 4 _multiplier$ = 12 ; size = 4 _overflow PROC ; COMDAT ; File c:\users\stefan\desktop\case21.c ; Line 15 mov eax, DWORD PTR _multiplicand$[esp-4] mul DWORD PTR _multiplier$[esp-4] IFDEF VARIANT seto al movzx eax, al ELSE test edx, edx setnz al movzx eax, aljne SHORT $LN5@overflow cmp eax, -1 ja SHORT $LN5@overflow xor eax, eax; Line 19ret 0 $LN5@overflow:; Line 15mov eax, 1ENDIF ; Line 19 ret 0 _overflow ENDP _TEXT ENDS END
CMP
instruction and two
superfluous performance degrading conditional branch instructions!
Note: the assembly listing also shows an alternative variant.
Repeat the previous steps with the alternate
implementation;
generate another assembly listing file case21.asm
from
the source file case21.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /GS- /Gy /Ox /Tccase21.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case21.c
Display the assembly listing file case21.asm
created in
step 4.:
TYPE case21.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case21.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _islarge PUBLIC _overflow ; Function compile flags: /Ogtpy ; COMDAT _islarge _TEXT SEGMENT _ull$ = 8 ; size = 8 _islarge PROC ; COMDAT ; File c:\users\stefan\desktop\case21.c ; Line 8 mov eax, DWORD PTR _ull$[esp] neg eax sbb eax, eax neg eax ; Line 10 ret 0 _islarge ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _overflow _TEXT SEGMENT _multiplicand$ = 8 ; size = 4 _multiplier$ = 12 ; size = 4 _overflow PROC ; COMDAT ; File c:\users\stefan\desktop\case21.c ; Line 17 mov eax, DWORD PTR _multiplicand$[esp-4] mul DWORD PTR _multiplier$[esp-4]The code generated for the alternative implementation of the functionneg edx sbb edx, edx neg edx mov eax, edxsbb eax, eax neg eax ; Line 19 ret 0 _overflow ENDP _TEXT ENDS END
islarge()
is good.
overflow
is better, but still quite bad the:
MUL
instruction already sets
the carry flag CF
(and the overflow flag
OF
too), so there is no need for the first
NEG
instruction, and the
SBB
instruction
should of course set register EAX
instead of
EDX
!
Create the text file case22.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long ullmul(unsigned long long p, unsigned long long q)
{
#ifdef OPTIMIZE
if (((unsigned long) p | (unsigned long) q) == 0)
return 0;
if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
return __emulu((unsigned long) p, (unsigned long) q);
#endif
return __emulu((unsigned long) p, (unsigned long) q)
+ ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
+ ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
}
long long llmul(long long p, long long q)
{
#ifdef OPTIMIZE
if (((unsigned long) (p >> 32) | (unsigned long) (q >> 32)) == 0)
return __emulu((unsigned long) p, (unsigned long) q);
if (((unsigned long) p | (unsigned long) q) == 0)
return 0;
#endif
return __emulu((unsigned long) p, (unsigned long) q)
+ ((unsigned long long) ((unsigned long) p * (unsigned long) (q >> 32)) << 32)
+ ((unsigned long long) ((unsigned long) q * (unsigned long) (p >> 32)) << 32);
}
Generate the assembly listing file case22.asm
from the
source file case22.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase22.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case22.c
Display the assembly listing file case22.asm
created in
step 2.:
TYPE case22.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case22.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _ullmul PUBLIC _llmul ; Function compile flags: /Ogtpy ; COMDAT _ullmul _TEXT SEGMENT _p$ = 8 ; size = 8 _q$ = 16 ; size = 8 _ullmul PROC ; COMDAT ; File c:\users\stefan\desktop\case22.c ; Line 12 mov eax, DWORD PTR _p$[esp-4] mul DWORD PTR _q$[esp-4] mov ecx, DWORD PTR _p$[esp] imul ecx, DWORD PTR _q$[esp-4]Especially notice the superfluous arithmetic right shifts by 31 generated for thepush esi mov esi, DWORD PTR _q$[esp+4] imul esi, DWORD PTR _p$[esp] add esi, ecx add eax, 0 adc edx, esi pop esiadd edx, ecx mov ecx, DWORD PTR _q$[esp] imul ecx, DWORD PTR _p$[esp-4] add edx, ecx ; Line 15 ret 0 _ullmul ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llmul _TEXT SEGMENT _p$ = 8 ; size = 8 _q$ = 16 ; size = 8 _llmul PROC ; COMDAT ; File c:\users\stefan\desktop\case22.c ; Line 26 mov eax, DWORD PTR _p$[esp-4] mul DWORD PTR _q$[esp-4]push ebp push edi mov edi, DWORD PTR _q$[esp+8] mov ebp, eax mov ecx, edi imul edi, DWORD PTR _p$[esp+4] sar ecx, 31 ; 0000001fH mov ecx, DWORD PTR _p$[esp+8] mov eax, ecx imul ecx, DWORD PTR _q$[esp+4] sar eax, 31 ; 0000001fH add edi, ecx add ebp, 0 mov eax, ebp adc edx, edi pop edi pop ebpmov ecx, DWORD PTR _p$[esp] imul ecx, DWORD PTR _q$[esp-4] add edx, ecx mov ecx, DWORD PTR _q$[esp] imul ecx, DWORD PTR _p$[esp-4] add edx, ecx ; Line 29 ret 0 _llmul ENDP _TEXT ENDS END
llmul()
routine, and the preceding
loads of the registers ECX
and EAX
: their
results are never used!
highlightis the addition of 0, which can’t set the carry flag
CF
, followed by an addition with
carry ADC
instruction, which
addsthis flag.
Generate another assembly listing file case22.asm
from
the source file case22.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro OPTIMIZE
defined on the
command line:
CL.EXE /Bv /c /DOPTIMIZE /Fa /FoNUL: /Gy /Ox /Tccase22.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case22.c
Display the assembly listing file case22.asm
created in
step 4.:
TYPE case22.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case22.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _ullmul PUBLIC _llmul ; Function compile flags: /Ogtpy ; COMDAT _ullmul _TEXT SEGMENTInstead to load the low parts of both arguments into the registerstv261 = 8 ; size = 8 tv252 = 8 ; size = 8_p$ = 8 ; size = 8 _q$ = 16 ; size = 8 _ullmul PROC ; COMDAT ; File c:\users\stefan\desktop\case22.c ; Line 4push esi mov esi, DWORD PTR _q$[esp] push edi mov edi, DWORD PTR _p$[esp+4]; Line 6mov eax, edi or eax, esi jne SHORT $LN2@ullmul; Line 7pop edi xor edx, edx; Line 15pop esi ret 0 $LN2@ullmul: push ebx; Line 9mov ebx, DWORD PTR _q$[esp+12] mov eax, edi push ebp mov ebp, DWORD PTR _p$[esp+16] mov ecx, ebp or ecx, ebx mov DWORD PTR tv252[esp+16], 0 mov DWORD PTR tv261[esp+16], 0 jne SHORT $LN3@ullmul; Line 10pop ebp pop ebx pop edi mul esi; Line 15pop esi ret 0 $LN3@ullmul:; Line 12imul ebx, edi imul ebp, esi mul esi add ebx, ebp add eax, 0 pop ebp adc edx, ebx pop ebx pop edi; Line 15pop esi; Line 6 mov eax, DWORD PTR _p$[esp-4] mov edx, DWORD PTR _q$[esp-4] or edx, eax je SHORT $LN2@ullmul ; Line 9 mov ecx, DWORD PTR _q$[esp] mov edx, DWORD PTR _p$[esp] or edx, ecx jne SHORT $LN3@ullmul ; Line 10 mul DWORD PTR _q$[esp-4] $LN2@ullmul: ; Line 15 ret 0 $LN3@ullmul: ; Line 12 imul ecx, eax mul DWORD PTR _q$[esp-4] add edx, ecx mov ecx, DWORD PTR _p$[esp] imul ecx, DWORD PTR _q$[esp-4] add edx, ecx ; Line 15 ret 0 _ullmul ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llmul _TEXT SEGMENTtv249 = 8 ; size = 8 tv240 = 8 ; size = 8_p$ = 8 ; size = 8 _q$ = 16 ; size = 8 _llmul PROC ; COMDAT ; File c:\users\stefan\desktop\case22.c ; Line 18sub esp, 8 push ebx; Line 20mov ebx, DWORD PTR _p$[esp+12] mov eax, ebx sar eax, 31 ; 0000001fH mov ecx, ebx push ebp mov ebp, DWORD PTR _q$[esp+16] mov DWORD PTR tv240[esp+20], eax mov eax, ebp sar eax, 31 ; 0000001fH or ecx, ebp push esi mov esi, DWORD PTR _p$[esp+16] push edi mov edi, DWORD PTR _q$[esp+20] mov DWORD PTR tv249[esp+28], eax mov eax, esi jne SHORT $LN2@llmul; Line 21mul edi pop edi; Line 29pop esi pop ebp pop ebx add esp, 8 ret 0 $LN2@llmul:; Line 23or eax, edi jne SHORT $LN3@llmul; Line 29pop edi pop esi pop ebp xor edx, edx pop ebx add esp, 8 ret 0 $LN3@llmul:; Line 26mov eax, esi imul esi, ebp mul edi imul edi, ebx add esi, edi add eax, 0 pop edi adc edx, esi; Line 29pop esi pop ebp pop ebx add esp, 8; Line 20 mov ecx, DWORD PTR _q$[esp] mov edx, DWORD PTR _p$[esp] or edx, ecx jne SHORT $LN2@ullmul ; Line 21 mul DWORD PTR _q$[esp-4] ; Line 29 ret 0 $LN2@llmul: ; Line 23 mov eax, DWORD PTR _p$[esp-4] mov edx, DWORD PTR _q$[esp-4] or edx, eax je SHORT $LN3@llmul ; Line 26 imul ecx, eax mul DWORD PTR _q$[esp-4] add edx, ecx mov ecx, DWORD PTR _p$[esp] imul ecx, DWORD PTR _q$[esp-4] add edx, ecx $LN3@llmul: ; Line 29 ret 0 _llmul ENDP _TEXT ENDS END
EAX
and EDX
(which return the result) and
test their logical orfor 0, the registers
ESI
and EDI
are clobbered, which both must be saved and
restored.
tv252
and tv261
respectively tv240
and
tv249
are allocated and values assigned to them, which
are but never used elsewhere – an advanced technique known as
WORN!
llmul()
routine: their results are assigned to
the (otherwise unused) temporary variables.
highlightis again the addition of 0, which can’t set the carry flag
CF
, followed by an
addition with carry ADC
instruction, which addsthis flag.
Create the text file case23.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long long __udivmoddi4(unsigned long long numerator,
unsigned long long denominator,
unsigned long long *remainder);
#ifndef ALTERNATE
long long __absdi2(long long argument)
{
long long s = argument >> 63; // s = argument < 0 ? -1 : 0
return (argument ^ s) - s; // negate if argument < 0
}
long long __divdi3(long long dividend, long long divisor)
{
long long r = divisor >> 63; // r = divisor < 0 ? -1 : 0
long long s = dividend >> 63; // s = dividend < 0 ? -1 : 0
divisor = (divisor ^ r) - r; // negate if divisor < 0
dividend = (dividend ^ s) - s; // negate if dividend < 0
s ^= r; // sign of quotient
// negate if quotient < 0
return (__udivmoddi4(dividend, divisor, 0) ^ s) - s;
}
long long __moddi3(long long dividend, long long divisor)
{
long long r = divisor >> 63; // r = divisor < 0 ? -1 : 0
long long s = dividend >> 63; // s = dividend < 0 ? -1 : 0
divisor = (divisor ^ r) - r; // negate if divisor < 0
dividend = (dividend ^ s) - s; // negate if dividend < 0
__udivmoddi4(dividend, divisor, &r);
return (r ^ s) - s; // negate if dividend < 0
}
#else
typedef union _large
{
long long ll;
unsigned long long ull;
struct
{
unsigned long low;
long high;
};
} LARGE;
long long __absdi2(long long argument)
{
LARGE value = {argument};
long long s = (long long) value.high >> 32;
return (value.ll ^ s) - s;
}
long long __divdi3(long long numerator, long long denominator)
{
LARGE divisor = {denominator};
LARGE dividend = {numerator};
long long r = (long long) divisor.high >> 32;
long long s = (long long) dividend.high >> 32;
divisor.ll = (divisor.ll ^ r) - r;
dividend.ll = (dividend.ll ^ s) - s;
s ^= r;
return (__udivmoddi4(dividend.ull, divisor.ull, 0) ^ s) - s;
}
long long __moddi3(long long numerator, long long denominator)
{
LARGE divisor = {denominator};
LARGE dividend = {numerator};
LARGE remainder;
long long r = (long long) divisor.high >> 32;
long long s = (long long) dividend.high >> 32;
divisor.ll = (divisor.ll ^ r) - r;
dividend.ll = (dividend.ll ^ s) - s;
__udivmoddi4(dividend.ull, divisor.ull, &remainder.ull);
return (remainder.ll ^ s) - s;
}
long long __muldi3(long long multiplicand, long long multiplier)
{
LARGE p = {multiplicand};
LARGE q = {multiplier};
LARGE product = {__emulu(p.low, q.low)};
product.high += p.low * q.high;
product.high += q.low * p.high;
return product.ll;
}
#endif
Generate the assembly listing file case23.asm
from the
source file case23.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase23.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case23.c
Display the assembly listing file case23.asm
created in
step 2.:
TYPE case23.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case23.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC ___absdi2 PUBLIC ___divdi3 PUBLIC ___moddi3 EXTRN ___udivmoddi4:PROC ; Function compile flags: /Ogtpy ; COMDAT ___absdi2 _TEXT SEGMENT _argument$ = 8 ; size = 8 ___absdi2 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 9While the code generated for the functionpush esi push edi; Line 10mov edi, DWORD PTR _argument$[esp+8] mov esi, edi sar esi, 31 ; 0000001fH mov ecx, edimov edx, DWORD PTR _argument$[esp] mov eax, DWORD PTR _argument$[esp-4] mov ecx, edx sar ecx, 31 ; 0000001fH ; Line 11 xor eax, ecx xor edx, ecx sub eax, ecx sbb edx, ecxmov eax, esi xor eax, DWORD PTR _argument$[esp+4] mov edx, ecx xor edx, edi sub eax, esi pop edi sbb edx, ecx pop esi; Line 12 ret 0 ___absdi2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT ___divdi3 _TEXT SEGMENT_s$1$ = -4 ; size = 4 _s$2$ = 8 ; size = 4_dividend$ = 8 ; size = 8 _divisor$ = 16 ; size = 8 ___divdi3 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 15push ecx; Line 17mov eax, DWORD PTR _dividend$[esp+4] mov edx, eaxpush ebxpush ebp mov ebp, DWORD PTR _divisor$[esp+12] mov ecx, eax push esi sar edx, 31 ; 0000001fH mov ebx, ebp sar ecx, 31 ; 0000001fH; Line 19mov esi, edx xor esi, DWORD PTR _dividend$[esp+12] push edi mov DWORD PTR _s$1$[esp+20], edx mov edi, ebp mov edx, ecx sar edi, 31 ; 0000001fH xor edx, eax sar ebx, 31 ; 0000001fH mov eax, DWORD PTR _s$1$[esp+20] sub esi, eaxmov eax, DWORD PTR _divisor$[esp+4] mov ecx, DWORD PTR _divisor$[esp] cdq xor ecx, edx xor eax, edx sub ecx, edx sbb eax, edx mov ebx, edx ; Line 22 push 0sbb edx, ecx xor eax, ebx xor ecx, edi mov DWORD PTR _s$1$[esp+24], eax mov DWORD PTR _s$2$[esp+20], ecx mov eax, edi mov ecx, ebx xor eax, ebp xor ecx, DWORD PTR _divisor$[esp+20] sub ecx, ebx sbb eax, edipush eax push ecxpush edx push esimov eax, DWORD PTR _dividend$[esp+16] mov ecx, DWORD PTR _dividend$[esp+12] cdq xor ecx, edx xor eax, edx sub ecx, edx sbb eax, edx push eax push ecx xor ebx, edx call ___udivmoddi4 xor eax, ebx xor edx, ebx sub eax, ebx sbb edx, ebxxor eax, DWORD PTR _s$1$[esp+40]add esp, 20 ; 00000014Hxor edx, DWORD PTR _s$2$[esp+16] sub eax, DWORD PTR _s$1$[esp+20] sbb edx, DWORD PTR _s$2$[esp+16] pop edi pop esi pop ebppop ebx ; Line 23pop ecxret 0 ___divdi3 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT ___moddi3 _TEXT SEGMENT_s$1$ = -12 ; size = 4_r$ = -8 ; size = 8 _dividend$ = 8 ; size = 8 _divisor$ = 16 ; size = 8 ___moddi3 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 26sub esp, 12 ; 0000000cHsub esp, 8 ; Line 28mov edx, DWORD PTR _dividend$[esp+12] mov eax, edxpush ebxmov ebx, DWORD PTR _divisor$[esp+16] push ebp push esi sar eax, 31 ; 0000001fH mov esi, ebx push edi mov DWORD PTR _s$1$[esp+28], eax mov edi, ebx sar esi, 31 ; 0000001fH; Line 31lea eax, DWORD PTR _r$[esp+28]lea eax, DWORD PTR _r$[esp+12] push eaxsar edi, 31 ; 0000001fH mov ebp, edx sar ebp, 31 ; 0000001fH mov ecx, edi xor ecx, DWORD PTR _divisor$[esp+28] mov eax, esi xor eax, ebx mov DWORD PTR _r$[esp+36], esi sub ecx, edi mov DWORD PTR _r$[esp+32], edi sbb eax, esi mov esi, DWORD PTR _s$1$[esp+32]mov eax, DWORD PTR _divisor$[esp+12] mov ecx, DWORD PTR _divisor$[esp+8] cdq xor ecx, edx xor eax, edx sub ecx, edx sbb eax, edx push eax push ecxmov ecx, esi mov eax, ebp xor ecx, DWORD PTR _dividend$[esp+36] xor eax, edx sub ecx, esi sbb eax, ebpmov eax, DWORD PTR _dividend$[esp+12] mov ecx, DWORD PTR _dividend$[esp+8] cdq xor ecx, edx xor eax, edx sub ecx, edx sbb eax, edx mov ebx, edx push eax push ecx call ___udivmoddi4 add esp, 20 ; 00000014H ; Line 32mov eax, esimov eax, ebxxor eax, DWORD PTR _r$[esp+28]xor eax, DWORD PTR _r$[esp+12]mov edx, ebpmov edx, ebxxor edx, DWORD PTR _r$[esp+32]xor edx, DWORD PTR _r$[esp+16]sub eax, esisub eax, ebxpop edi pop esi sbb edx, ebpsbb edx, ebxpop ebppop ebx ; Line 33add esp, 12 ; 0000000cHadd esp, 8 ret 0 ___moddi3 ENDP _TEXT ENDS END
__absdi2()
is
correct, it has 16 instructions and uses the registers
EDI
and ESI
without necessity; the
properly optimised code has only 9 instructions and clobbers
no registers.
While the code generated for the function __divdi3()
is
correct, it has 50 instructions, clobbers all registers, but still
performs multiple avoidable transfers between them, and additionally
uses two superfluous temporary variables _s$1$
and
_s$2$
, which hold even the same value; the optimal code
has only 30 instructions and clobbers just 1 register!
Especially notice the repeated arithmetic right shifts by 31: half
of them are superfluous; this includes the instructions to load the
registers used too.
While the code generated for the function __moddi3()
is correct too, it has 51 instructions, clobbers all registers, but
still performs multiple avoidable transfers between them, and
additionally uses a superfluous temporary variable
_s$1$
; the properly optimised code has only 34
instructions and clobbers just 1 register!
Again notice the repeated arithmetic right shifts by 31: half of
them are superfluous; this includes the instructions to load the
registers used too.
Repeat the previous steps with the alternate
implementation;
generate another assembly listing file case23.asm
from
the source file case23.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase23.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case23.c case23.c(43): warning C4201: nonstandard extension used: nameless struct/union case23.c(48): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(55): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(56): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(67): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(68): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(80): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(81): warning C4204: nonstandard extension used: non-constant aggregate initializer case23.c(82): warning C4204: nonstandard extension used: non-constant aggregate initializer
Display the assembly listing file case23.asm
created in
step 4.:
TYPE case23.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case23.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC ___absdi2 PUBLIC ___divdi3 PUBLIC ___moddi3 PUBLIC ___muldi3 EXTRN ___udivmoddi4:PROC ; Function compile flags: /Ogtpy ; COMDAT ___absdi2 _TEXT SEGMENT _argument$ = 8 ; size = 8 ___absdi2 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 48 mov eax, DWORD PTR _argument$[esp] cdq ; Line 49The code generated for themov ecx, edx push esi mov esi, ecx mov eax, ecx xor eax, DWORD PTR _argument$[esp] sar esi, 31 ; 0000001fH mov edx, esi xor edx, DWORD PTR _argument$[esp+4]; Line 50sub eax, ecx sbb edx, esi pop esimov ecx, DWORD PTR _argument$[esp-4] xor ecx, edx xor eax, edx sub ecx, edx sbb eax, edx mov edx, eax mov eax, ecx ; Line 51 ret 0 ___absdi2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT ___divdi3 _TEXT SEGMENT _s$2$ = -4 ; size = 4 _s$1$ = 8 ; size = 4 _numerator$ = 8 ; size = 8 _denominator$ = 16 ; size = 8 ___divdi3 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 54 push ecx ; Line 56 mov eax, DWORD PTR _denominator$[esp+4] cdq ; Line 57 mov eax, DWORD PTR _numerator$[esp+4] push ebx push ebp mov ebx, edx cdq push esi push edi ; Line 58 mov eax, edx mov ebp, ebx sar edx, 31 ; 0000001fH mov edi, eax xor edi, DWORD PTR _numerator$[esp+16] mov esi, edx xor esi, DWORD PTR _numerator$[esp+20] ; Line 62 mov ecx, ebx sar ebp, 31 ; 0000001fH sub edi, eax push 0 sbb esi, edx xor ecx, DWORD PTR _denominator$[esp+20] xor eax, ebx xor edx, ebp mov DWORD PTR _s$1$[esp+20], eax mov eax, ebp xor eax, DWORD PTR _denominator$[esp+24] sub ecx, ebx mov DWORD PTR _s$2$[esp+24], edx sbb eax, ebp push eax push ecx push esi push edi call ___udivmoddi4 xor eax, DWORD PTR _s$1$[esp+36] add esp, 20 ; 00000014H xor edx, DWORD PTR _s$2$[esp+20] sub eax, DWORD PTR _s$1$[esp+16] sbb edx, DWORD PTR _s$2$[esp+20] pop edi pop esi pop ebp pop ebx ; Line 63 pop ecx ret 0 ___divdi3 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT ___moddi3 _TEXT SEGMENT _s$2$ = -16 ; size = 4 _s$1$ = -12 ; size = 4 _remainder$ = -8 ; size = 8 _numerator$ = 8 ; size = 8 _denominator$ = 16 ; size = 8 ___moddi3 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 66 sub esp, 16 ; 00000010H ; Line 68 mov eax, DWORD PTR _denominator$[esp+16] cdq ; Line 70 mov eax, DWORD PTR _numerator$[esp+16] push esi mov esi, edx cdq push edi ; Line 71mov eax, edxmov DWORD PTR _s$1$[esp+24], edxsar eax, 31 ; 0000001fHmov edi, esimov DWORD PTR _s$2$[esp+24], eax; Line 74 mov ecx, esi xor ecx, DWORD PTR _denominator$[esp+20] lea eax, DWORD PTR _remainder$[esp+24] push eaxsar edi, 31 ; 0000001fHmov eax, edi xor eax, DWORD PTR _denominator$[esp+28] sub ecx, esimov esi, DWORD PTR _s$2$[esp+28]mov esi, DWORD PTR _s$1$[esp+28] sbb eax, edi push eax push ecx mov ecx, edx mov eax, esi xor ecx, DWORD PTR _numerator$[esp+32] xor eax, DWORD PTR _numerator$[esp+36] sub ecx, edx sbb eax, esi push eax push ecx call ___udivmoddi4 ; Line 75 mov eax, DWORD PTR _remainder$[esp+44] add esp, 20 ; 00000014H xor eax, DWORD PTR _s$1$[esp+24] mov edx, DWORD PTR _remainder$[esp+28] xor edx, esi sub eax, DWORD PTR _s$1$[esp+24] pop edi sbb edx, esi pop esi ; Line 76 add esp, 16 ; 00000010H ret 0 ___moddi3 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT ___muldi3 _TEXT SEGMENT _multiplicand$ = 8 ; size = 8 _multiplier$ = 16 ; size = 8 ___muldi3 PROC ; COMDAT ; File c:\users\stefan\desktop\case23.c ; Line 82 mov eax, DWORD PTR _multiplicand$[esp-4] mul DWORD PTR _multiplier$[esp-4] mov ecx, DWORD PTR _multiplier$[esp] ; Line 84 imul ecx, DWORD PTR _multiplicand$[esp-4] push esi mov esi, DWORD PTR _multiplicand$[esp+4] imul esi, DWORD PTR _multiplier$[esp] add edx, ecx add edx, esi ; Line 85 pop esi ; Line 86 ret 0 ___muldi3 ENDP _TEXT ENDS END
alternateimplementation, which is supposed to prod and tickle the optimiser, is only marginally better: again all registers are clobbered, superfluous temporary variables which hold the same value are used, and superfluous arithmetic right shifts by 31 are emitted.
Create the text file case24.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int variant0(long long x, long long y)
{
return (x ^ y) < 0;
}
int variant1(long long x, long long y)
{
return (x < 0) != (y < 0);
}
int variant2(long long x, long long y)
{
return (x >> 63) != (y >> 63);
}
int variant3(long long x, long long y)
{
return ((long) (x >> 32) < 0) != ((long) (y >> 32) < 0);
}
int variant4(long long x, long long y)
{
return ((long) (x >> 32) ^ (long) (y >> 32)) < 0;
}
int variant5(long long x, long long y)
{
return (long) ((x ^ y) >> 32) < 0;
}
Generate the assembly listing file case24.asm
from the
source file case24.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase24.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case24.c
Display the assembly listing file case24.asm
created in
step 2.:
TYPE case24.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case24.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _variant0 ; Function compile flags: /Ogtpy ; COMDAT _variant0 _TEXT SEGMENTtv128 = 8 ; size = 8_x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant0 PROC ; COMDAT ; File c:\users\stefan\desktop\case24.c ; Line 5 xor eax, eaxmov eax, DWORD PTR _x$[esp-4]mov ecx, DWORD PTR _x$[esp]xor eax, DWORD PTR _y$[esp-4]xor ecx, DWORD PTR _y$[esp]mov DWORD PTR tv128[esp], ecx jg SHORT $LN3@variant0 jl SHORT $LN5@variant0 test eax, eax jae SHORT $LN3@variant0 $LN5@variant0: mov eax, 1sets al ; Line 6 ret 0$LN3@variant0: ; Line 5 xor eax, eax ; Line 6 ret 0_variant0 ENDP _TEXT ENDS PUBLIC _variant1 ; Function compile flags: /Ogtpy ; COMDAT _variant1 _TEXT SEGMENT _x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant1 PROC ; COMDAT ; Line 10 cmp DWORD PTR _x$[esp], 0jg SHORT $LN5@variant1 jl SHORT $LN7@variant1 cmp DWORD PTR _x$[esp-4], 0 jae SHORT $LN5@variant1 $LN7@variant1: mov ecx, 1 jmp SHORT $LN6@variant1 $LN5@variant1: xor ecx, ecx $LN6@variant1:sets ah cmp DWORD PTR _y$[esp], 0jg SHORT $LN3@variant1 jl SHORT $LN8@variant1 cmp DWORD PTR _y$[esp-4], 0 jae SHORT $LN3@variant1 $LN8@variant1: mov eax, 1 xor edx, edx cmp ecx, eax setne dl mov eax, edxsetl al cmp al, ah setne al movzx eax, al ; Line 11 ret 0$LN3@variant1: ; Line 10 xor eax, eax xor edx, edx cmp ecx, eax setne dl mov eax, edx ; Line 11 ret 0_variant1 ENDP _TEXT ENDS PUBLIC _variant2 ; Function compile flags: /Ogtpy ; COMDAT _variant2 _TEXT SEGMENT _x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant2 PROC ; COMDAT ; Line 15 mov eax, DWORD PTR _x$[esp] xor eax, DWORD PTR _y$[esp]xor ecx, ecx and eax, -2147483648 ; 80000000H or ecx, eax je SHORT $LN3@variant2 mov eax, 1 ; Line 16 ret 0 $LN3@variant2: ; Line 15 xor eax, eaxsets al movzx eax, al ; Line 16 ret 0 _variant2 ENDP _TEXT ENDS PUBLIC _variant3 ; Function compile flags: /Ogtpy ; COMDAT _variant3 _TEXT SEGMENT _x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant3 PROC ; COMDAT ; Line 20 mov eax, DWORD PTR _x$[esp]mov ecx, eax sar ecx, 31 ; 0000001fH xor edx, edxtest eax, eax mov eax, DWORD PTR _y$[esp] sets dlmov ecx, eax sar ecx, 31 ; 0000001fH xor ecx, ecxtest eax, eax sets cl xor eax, eaxcmp edx, ecxcmp dl, cl setne al ; Line 21 ret 0 _variant3 ENDP _TEXT ENDS PUBLIC _variant4 ; Function compile flags: /Ogtpy ; COMDAT _variant4 _TEXT SEGMENT _x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant4 PROC ; COMDAT ; Line 25 mov eax, DWORD PTR _x$[esp]mov ecx, eax sar ecx, 31 ; 0000001fHxor eax, DWORD PTR _y$[esp]mov ecx, DWORD PTR _y$[esp] mov edx, ecx sar edx, 31 ; 0000001fH xor eax, ecxmov eax, 0 setl al ; Line 26 ret 0 _variant4 ENDP _TEXT ENDS PUBLIC _variant5 ; Function compile flags: /Ogtpy ; COMDAT _variant5 _TEXT SEGMENT _x$ = 8 ; size = 8 _y$ = 16 ; size = 8 _variant5 PROC ; COMDAT ; Line 30 mov eax, DWORD PTR _x$[esp]mov ecx, eax sar ecx, 31 ; 0000001fHxor ecx, DWORD PTR _y$[esp]mov ecx, DWORD PTR _y$[esp] mov edx, ecx sar edx, 31 ; 0000001fH xor eax, ecxmov eax, 0 setl al ; Line 31 ret 0 _variant5 ENDP _TEXT ENDS END
Create the text file case25.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
long long llsgn0(long long value)
{
return value < 0 ? -1 : 0;
}
int llsgn1(long long value)
{
return value < 0;
}
int llsgn2(long long value)
{
return value >> 63;
}
int llsgn3(long long value)
{
return (value >> 63) != 0;
}
int llsgn4(long long value)
{
return (value & (1LL << 63)) != 0;
}
Generate the assembly listing file case25.asm
from the
source file case25.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase25.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case25.c
Display the assembly listing file case25.asm
created in
step 2.:
TYPE case25.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case25.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _llsgn0 PUBLIC _llsgn1 PUBLIC _llsgn2 PUBLIC _llsgn3 PUBLIC _llsgn4 ; Function compile flags: /Ogtpy ; COMDAT _llsgn0 _TEXT SEGMENTThe optimiser fails to recognise all these commonly used expressions to determine the sign of an integer value!$T1 = 8 ; size = 8_value$ = 8 ; size = 8 _llsgn0 PROC ; COMDAT ; File c:\users\stefan\desktop\case25.c ; Line 5cmp DWORD PTR _value$[esp], 0 jg SHORT $LN3@llsgn0 jl SHORT $LN5@llsgn0 cmp DWORD PTR _value$[esp-4], 0 jae SHORT $LN3@llsgn0 $LN5@llsgn0: or eax, -1 or edx, eax; Line 6ret 0 $LN3@llsgn0: xorps xmm0, xmm0; Line 5movlpd QWORD PTR $T1[esp-4], xmm0 mov eax, DWORD PTR $T1[esp-4] mov edx, DWORD PTR $T1[esp]mov eax, DWORD PTR _value$[esp] cdq mov eax, edx ; Line 6 ret 0 _llsgn0 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsgn1 _TEXT SEGMENT _value$ = 8 ; size = 8 _llsgn1 PROC ; COMDAT ; File c:\users\stefan\desktop\case25.c ; Line 10cmp DWORD PTR _value$[esp], 0 jg SHORT $LN3@llsgn1 jl SHORT $LN5@llsgn1 cmp DWORD PTR _value$[esp-4], 0 jae SHORT $LN3@llsgn1 $LN5@llsgn1: mov eax, 1; Line 11ret 0 $LN3@llsgn1:; Line 10xor eax, eaxmov eax, DWORD PTR _value$[esp] shr eax, 31 ; 0000001fH ; Line 11 ret 0 _llsgn1 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsgn2 _TEXT SEGMENT _value$ = 8 ; size = 8 _llsgn2 PROC ; COMDAT ; File c:\users\stefan\desktop\case25.c ; Line 15mov ecx, DWORD PTR _value$[esp] mov eax, ecx sar eax, 31 ; 0000001fH sar ecx, 31 ; 0000001fHmov eax, DWORD PTR _value$[esp] sar eax, 31 ; 0000001fH ; Line 16 ret 0 _llsgn2 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsgn3 _TEXT SEGMENT _value$ = 8 ; size = 8 _llsgn3 PROC ; COMDAT ; File c:\users\stefan\desktop\case25.c ; Line 20mov ecx, DWORD PTR _value$[esp] xor eax, eax and ecx, -2147483648 ; 80000000H or eax, ecx je SHORT $LN3@llsgn3 mov eax, 1; Line 21ret 0 $LN3@llsgn3:; Line 20xor eax, eaxmov eax, DWORD PTR _value$[esp] shr eax, 31 ; 0000001fH ; Line 21 ret 0 _llsgn3 ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsgn4 _TEXT SEGMENT _value$ = 8 ; size = 8 _llsgn4 PROC ; COMDAT ; File c:\users\stefan\desktop\case25.c ; Line 25cmp DWORD PTR _value$[esp], 0 jg SHORT $LN3@llsgn4 jl SHORT $LN5@llsgn4 cmp DWORD PTR _value$[esp-4], 0 jae SHORT $LN3@llsgn4 $LN5@llsgn4: mov eax, 1; Line 26ret 0 $LN3@llsgn4:; Line 25xor eax, eaxmov eax, DWORD PTR _value$[esp] shr eax, 31 ; 0000001fH ; Line 26 ret 0 _llsgn4 ENDP _TEXT ENDS END
Especially notice the completely in(s)ane use of the
SSE register
XMM0
and the temporary variable $T1
instead of just two XOR
instructions to zero the registers EAX
and
EDX
in the function llsgn0()
, the two
SAR
instructions in the
function llsgn2()
, and the completely insane code
generated for the function llsgn3()
!
Create the text file case26.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
long lsign(long x)
{
return (x > 0) - (x < 0);
}
long long llsign(long long x)
{
return (x > 0) - (x < 0);
}
Generate the assembly listing file case26.asm
from the
source file case26.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase26.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case26.c
Display the assembly listing file case26.asm
created in
step 2.:
TYPE case26.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case26.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lsign PUBLIC _llsign ; Function compile flags: /Ogtpy ; COMDAT _lsign _TEXT SEGMENT _x$ = 8 ; size = 4 _lsign PROC ; COMDAT ; File c:\users\stefan\desktop\case26.c ; Line 5The code generated for the functionmov ecx, DWORD PTR _x$[esp-4] xor eax, eax test ecx, ecx setg al shr ecx, 31 ; 0000001fH sub eax, ecxmov eax, DWORD PTR _x$[esp-4] cdq neg eax adc edx, edx mov eax, edx ; Line 6 ret 0 _lsign ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsign _TEXT SEGMENT$T1 = 8 ; size = 8 $T2 = 8 ; size = 8_x$ = 8 ; size = 8 _llsign PROC ; COMDAT ; File c:\users\stefan\desktop\case26.c ; Line 10 xor edx, edx mov eax, DWORD PTR _x$[esp] cmp edx, DWORD PTR _x$[esp-4] sbb edx, eax cdq setl al movzx eax, al add eax, edxmov ecx, DWORD PTR _x$[esp] mov eax, DWORD PTR _x$[esp-4] push esi test ecx, ecx jl SHORT $LN5@llsign jg SHORT $LN7@llsign test eax, eax je SHORT $LN5@llsign $LN7@llsign: mov edx, 1 xor esi, esi jmp SHORT $LN6@llsign $LN5@llsign: xorps xmm0, xmm0 movlpd QWORD PTR $T2[esp], xmm0 mov esi, DWORD PTR $T2[esp+4] mov edx, DWORD PTR $T2[esp] $LN6@llsign: test ecx, ecx jg SHORT $LN3@llsign jl SHORT $LN8@llsign test eax, eax jae SHORT $LN3@llsign $LN8@llsign: mov eax, 1 xor ecx, ecx sub edx, eax mov eax, edx sbb esi, ecx mov edx, esi pop esi; Line 11ret 0 $LN3@llsign: xorps xmm0, xmm0; Line 10movlpd QWORD PTR $T1[esp], xmm0 mov eax, DWORD PTR $T1[esp] sub edx, eax mov ecx, DWORD PTR $T1[esp+4] mov eax, edx sbb esi, ecx mov edx, esi pop esi; Line 11 ret 0 _llsign ENDP _TEXT ENDS END
llsign()
is as bad
as it gets: 38 (in words: thirty-eight)
instructions, including performance degrading conditional branches,
instead of just 9 (in words: nine) instructions!
$T1
and $T2
loaded into the
registers EAX
, ECX
, EDX
and
ESI
instead of
XOR
instructions to zero
these registers!
Generate another assembly listing file case26.asm
from
the source file case26.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture, now
with the switch /arch:SSE
defined on the command line
to disable the generation of
SSE2
instructions:
CL.EXE /arch:SSE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase26.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case26.c
Display the assembly listing file case26.asm
created in
step 4.:
TYPE case26.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case26.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _lsign PUBLIC _llsign ; Function compile flags: /Ogtpy ; COMDAT _lsign _TEXT SEGMENT _x$ = 8 ; size = 4 _lsign PROC ; COMDAT ; File c:\users\stefan\desktop\case26.c ; Line 5The code generated for the functionmov ecx, DWORD PTR _x$[esp-4] xor eax, eax test ecx, ecx setg al shr ecx, 31 ; 0000001fH sub eax, ecxmov eax, DWORD PTR _x$[esp-4] cdq neg eax adc edx, edx mov eax, edx ; Line 6 ret 0 _lsign ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _llsign _TEXT SEGMENT _x$ = 8 ; size = 4 _llsign PROC ; COMDAT ; File c:\users\stefan\desktop\case26.c ; Line 10 xor edx, edx mov eax, DWORD PTR _x$[esp] cmp edx, DWORD PTR _x$[esp-4] sbb edx, eax cdq setl al movzx eax, al add eax, edxmov ecx, DWORD PTR _x$[esp] mov eax, DWORD PTR _x$[esp-4] push esi test ecx, ecx jl SHORT $LN5@llsign jg SHORT $LN7@llsign test eax, eax je SHORT $LN5@llsign $LN7@llsign: mov edx, 1 jmp SHORT $LN9@llsign $LN5@llsign: xor edx, edx $LN9@llsign: xor esi, esi test ecx, ecx jg SHORT $LN3@llsign jl SHORT $LN8@llsign test eax, eax jae SHORT $LN3@llsign $LN8@llsign: mov eax, 1 sub edx, eax mov eax, edx sbb esi, 0 mov edx, esi pop esi; Line 11ret 0 $LN3@llsign:; Line 10xor eax, eax sub edx, eax sbb esi, eax mov eax, edx mov edx, esi pop esi; Line 11 ret 0 _llsign ENDP _TEXT ENDS END
llsign()
is now
slightly less bad: 31 (in words: thirty-one)
instructions, including performance degrading conditional branches,
instead of just 9 (in words: nine) instructions!
Create the text file case27.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int ulcmp(unsigned left, unsigned right)
{
return (left > right) - (left < right);
}
int ullcmp(unsigned long long left, unsigned long long right)
{
return (left > right) - (left < right);
}
Generate the assembly listing file case27.asm
from the
source file case27.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase27.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case27.c
Display the assembly listing file case27.asm
created in
step 2.:
TYPE case27.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case27.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _ulcmp ; Function compile flags: /Ogtpy ; COMDAT _ulcmp _TEXT SEGMENT _left$ = 8 ; size = 4 _right$ = 12 ; size = 4 _ulcmp PROC ; COMDAT ; File c:\users\stefan\desktop\case27.c ; Line 4 mov ecx, DWORD PTR _right$[esp-4] mov edx, DWORD PTR _left$[esp-4]While the code generated for the functioncmp ecx, edx sbb eax, eax neg eaxcmp edx, ecx sbb eax, eax cmp ecx, edx adc eax, 0sbb ecx, ecx neg ecx sub eax, ecx; Line 5 ret 0 _ulcmp ENDP _TEXT ENDS PUBLIC _ullcmp ; Function compile flags: /Ogtpy ; COMDAT _ullcmp _TEXT SEGMENT _left$ = 8 ; size = 8 _right$ = 16 ; size = 8 _ullcmp PROC ; COMDAT ; Line 9 mov ecx, DWORD PTR _left$[esp] mov edx, DWORD PTR _right$[esp]push esi mov esi, DWORD PTR _right$[esp] push edi mov edi, DWORD PTR _left$[esp+4]cmp ecx, edx mov eax, DWORD PTR _left$[esp+4] sbb eax, DWORD PTR _right$[esp+4] sbb eax, eax cmp edx, ecx mov edx, DWORD PTR _right$[esp+4] sbb edx, DWORD PTR _left$[esp+4] adc eax, 0jb SHORT $LN5@ullcmp ja SHORT $LN7@ullcmp cmp edi, esi jbe SHORT $LN5@ullcmp $LN7@ullcmp: mov eax, 1 jmp SHORT $LN6@ullcmp $LN5@ullcmp: xor eax, eax $LN6@ullcmp: cmp ecx, edx ja SHORT $LN3@ullcmp jb SHORT $LN8@ullcmp cmp edi, esi jae SHORT $LN3@ullcmp $LN8@ullcmp: mov ecx, 1 pop edi sub eax, ecx pop esi ; Line 10 ret 0 $LN3@ullcmp: ; Line 9 xor ecx, ecx pop edi sub eax, ecx pop esi; Line 10 ret 0 _ullcmp ENDP _TEXT ENDS END
ulcmp()
shows
only 3 superfluous instructions, the code generated for the function
ullcmp()
is as bad as it gets: 29 (in words:
twenty-nine) instructions, including performance
degrading conditional branches, also clobbering the registers
EDI
and ESI
without necessity, instead of
just 11 (in words: eleven) instructions!
Create the text file case28.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int absolute(int x)
{
#ifdef ALTERNATE
long long z = x;
return x - ((x + x) & (z >> 32));
#else
return x - ((x + x) & (x >> 31));
#endif
}
int maximum(int x, int y)
{
#ifdef ALTERNATE
long long z = (y = x - y);
x -= y & (z >> 32);
#else
y = -y;
y += x;
x -= y & (y >> 31);
#endif
return x;
}
int minimum(int x, int y)
{
#ifdef ALTERNATE
long long z = (y -= x);
x += y & (z >> 32);
#else
y -= x;
x += y & (y >> 31);
#endif
return x;
}
int sign(int x)
{
#ifdef ALTERNATE
long long z = x;
return z >> 32;
#else
return x < 0 ? -1 : 0;
#endif
}
Generate the assembly listing file case28.asm
from the
source file case28.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case28.c
Display the assembly listing file case28.asm
created in
step 2.:
TYPE case28.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case28.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _absolute PUBLIC _maximum PUBLIC _minimum PUBLIC _sign ; Function compile flags: /Ogtpy ; COMDAT _absolute _TEXT SEGMENT _x$ = 8 ; size = 4 _absolute PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 9 mov eax, DWORD PTR _x$[esp-4] mov edx, eax sar edx, 31 ; 0000001fH lea ecx, DWORD PTR [eax+eax] and edx, ecx sub eax, edx ; Line 11 ret 0 _absolute ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _maximum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _maximum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 20 mov eax, DWORD PTR _x$[esp-4] mov edx, eax sub edx, DWORD PTR _y$[esp-4] ; Line 21 mov ecx, edx sar ecx, 31 ; 0000001fH and ecx, edx sub eax, ecx ; Line 24 ret 0 _maximum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _minimum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _minimum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 32 mov ecx, DWORD PTR _y$[esp-4] sub ecx, DWORD PTR _x$[esp-4] ; Line 33 mov eax, ecx sar eax, 31 ; 0000001fH and eax, ecx add eax, DWORD PTR _x$[esp-4] ; Line 36 ret 0 _minimum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _sign _TEXT SEGMENT _x$ = 8 ; size = 4 _sign PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 44 mov eax, DWORD PTR _x$[esp-4] sar eax, 31 ; 0000001fH ; Line 46 ret 0 _sign ENDP _TEXT ENDS ENDWhile the generated code is correct, the optimiser fails to recognise these well-known but superfluous
old school
Repeat the previous steps with the alternate
implementation;
generate another assembly listing file case28.asm
from
the source file case28.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case28.c
Display the assembly listing file case28.asm
created in
step 4.:
TYPE case28.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case28.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _absolute PUBLIC _maximum PUBLIC _minimum PUBLIC _sign ; Function compile flags: /Ogtpy ; COMDAT _absolute _TEXT SEGMENT _x$ = 8 ; size = 4 _absolute PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 4While the generated code is correct, it uses the registerspush esi; Line 6 mov eax, DWORD PTR _x$[esp]mov esi, DWORD PTR _x$[esp] mov eax, esicdq ; Line 7mov ecx, edx sar ecx, 31 ; 0000001fH lea ecx, DWORD PTR [esi+esi]lea ecx, DWORD PTR [eax+eax] and edx, ecx sub eax, edxsub esi, edx mov eax, esi pop esi; Line 11 ret 0 _absolute ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _maximum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _maximum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 14push esi push edi; Line 16mov edi, DWORD PTR _x$[esp+4] mov esi, edi sub esi, DWORD PTR _y$[esp+4] mov eax, esimov ecx, DWORD PTR _x$[esp+4] mov eax, ecx sub eax, DWORD PTR _y$[esp+4] cdq ; Line 17mov ecx, edx and edx, esiand edx, eax sub ecx, edxsub edi, edx sar ecx, 31 ; 0000001fH; Line 23 mov eax, ecxmov eax, edi pop edi pop esi; Line 24 ret 0 _maximum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _minimum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _minimum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 27push esi; Line 29mov esi, DWORD PTR _y$[esp] sub esi, DWORD PTR _x$[esp] mov eax, esimov eax, DWORD PTR _y$[esp] sub eax, DWORD PTR _x$[esp] cdq ; Line 30mov ecx, edx and edx, esiand edx, eax add edx, DWORD PTR _x$[esp]sar ecx, 31 ; 0000001fH; Line 35 mov eax, edxpop esi; Line 36 ret 0 _minimum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _sign _TEXT SEGMENT _x$ = 8 ; size = 4 _sign PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 41 mov eax, DWORD PTR _x$[esp-4] cdq ; Line 42 mov eax, edxsar eax, 31 ; 0000001fH mov eax, edx; Line 46 ret 0 _sign ENDP _TEXT ENDS END
EDI
and ESI
without necessity, and the
majority of instructions are superfluous.
Overwrite the text file case28.c
created in
step 1. with the following content:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int absolute(int x)
{
return x < 0 ? -x : x;
}
int maximum(int x, int y)
{
return x > y ? x : y;
}
int minimum(int x, int y)
{
return x < y ? x : y;
}
int sign(int x)
{
return x < 0 ? -1 : 0;
}
Generate the assembly listing file case28.asm
from the
source file case28.c
created in step 6., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase28.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case28.c
Display the assembly listing file case28.asm
created in
step 2.:
TYPE case28.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case28.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _absolute PUBLIC _maximum PUBLIC _minimum PUBLIC _sign ; Function compile flags: /Ogtpy ; COMDAT _absolute _TEXT SEGMENT _x$ = 8 ; size = 4 _absolute PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 5 mov eax, DWORD PTR _x$[esp-4] cdq xor eax, edx sub eax, edx ; Line 6 ret 0 _absolute ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _maximum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _maximum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 10 mov eax, DWORD PTR _y$[esp-4] cmp DWORD PTR _x$[esp-4], eax cmovg eax, DWORD PTR _x$[esp-4] ; Line 11 ret 0 _maximum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _minimum _TEXT SEGMENT _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _minimum PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 15 mov eax, DWORD PTR _y$[esp-4] cmp DWORD PTR _x$[esp-4], eax cmovl eax, DWORD PTR _x$[esp-4] ; Line 16 ret 0 _minimum ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _sign _TEXT SEGMENT _x$ = 8 ; size = 4 _sign PROC ; COMDAT ; File c:\users\stefan\desktop\case28.c ; Line 20 mov eax, DWORD PTR _x$[esp-4] sar eax, 31 ; 0000001fH ; Line 21 ret 0 _sign ENDP _TEXT ENDS END
Create the text file case29.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
unsigned long ul(unsigned long x)
{
#ifdef ALTERNATE
while (~x & 1)
#else
while (!(x & 1))
#endif
x >>= 1;
return x;
}
unsigned long long ull(unsigned long long x)
{
#ifdef ALTERNATE
while (~x & 1)
#else
while (!(x & 1))
#endif
x >>= 1;
return x;
}
Generate the assembly listing file case29.asm
from the
source file case29.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase29.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case29.c
Display the assembly listing file case29.asm
created in
step 2.:
TYPE case29.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case29.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _ul PUBLIC _ull ; Function compile flags: /Ogtpy ; COMDAT _ul _TEXT SEGMENT _x$ = 8 ; size = 4 _ul PROC ; COMDAT ; File c:\users\stefan\desktop\case29.c ; Line 8 mov eax, DWORD PTR _x$[esp-4] test al, 1 jne SHORT $LN3@ul $LL2@ul: ; Line 10 shr eax, 1 test al, 1 je SHORT $LL2@ul $LN3@ul: ; Line 13 ret 0 _ul ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _ull _TEXT SEGMENT _x$ = 8 ; size = 8 _ull PROC ; COMDAT ; File c:\users\stefan\desktop\case29.c ; Line 20 mov eax, DWORD PTR _x$[esp-4]mov ecx, eaxmov edx, DWORD PTR _x$[esp]and ecx, 1 or ecx, 0test al, 1 jne SHORT $LN3@ull $LL2@ull: ; Line 22 shrd eax, edx, 1mov ecx, eaxshr edx, 1and ecx, 1 or ecx, 0test al, 1 je SHORT $LL2@ull $LN3@ull: ; Line 25 ret 0 _ull ENDP _TEXT ENDS END
Generate another assembly listing file case29.asm
from
the source file case29.c
created in step 1., using
the Visual C 2017 compiler for the
x86 alias I386 processor architecture,
with the preprocessor macro ALTERNATE
defined on the
command line:
CL.EXE /Bv /c /DALTERNATE /Fa /FoNUL: /Gy /Ox /Tccase29.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case29.c
Display the assembly listing file case29.asm
created in
step 4.:
TYPE case29.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case29.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _ul PUBLIC _ull ; Function compile flags: /Ogtpy ; COMDAT _ul _TEXT SEGMENT _x$ = 8 ; size = 4 _ul PROC ; COMDAT ; File c:\users\stefan\desktop\case29.c ; Line 6 mov eax, DWORD PTR _x$[esp-4]The optimiser fails to recognise the commonly used (alternate) way to test whether an integer is even … or it is just bad in elementary boolean logic.mov ecx, eax not ecx test cl, 1 je SHORT $LN3@ul npad 3test al, 1 jne SHORT $LN3@ul $LL2@ul: ; Line 10 shr eax, 1mov ecx, eax not ecx test cl, 1 jne SHORT $LL2@ultest al, 1 je SHORT $LL2@ul $LN3@ul: ; Line 13 ret 0 _ul ENDP _TEXT ENDS ; Function compile flags: /Ogtpy ; COMDAT _ull _TEXT SEGMENT _x$ = 8 ; size = 8 _ull PROC ; COMDAT ; File c:\users\stefan\desktop\case29.c ; Line 18 mov eax, DWORD PTR _x$[esp-4]mov ecx, eaxmov edx, DWORD PTR _x$[esp]not ecx and ecx, 1 or ecx, 0 je SHORT $LN3@ulltest al, 1 jne SHORT $LN3@ull $LL2@ull: ; Line 22 shrd eax, edx, 1mov ecx, eaxshr edx, 1not ecx and ecx, 1 or ecx, 0 jne SHORT $LL2@ulltest al, 1 je SHORT $LL2@ull $LN3@ull: ; Line 25 ret 0 _ull ENDP _TEXT ENDS END
__getcallerseflags()
by the Visual C 2017 compiler (and previous
versions too):
Create the text file case30.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int main()
{
return __getcallerseflags();
}
Generate the assembly listing file case30.asm
from the
source file case30.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase30.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case30.c
Display the assembly listing file case30.asm
created in
step 2.:
TYPE case30.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case30.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _main ; Function compile flags: /Ogtpy ; COMDAT _main _TEXT SEGMENT__$Eflags$ = 4 ; size = 4_main PROC ; COMDAT ; File c:\users\stefan\desktop\case30.c ; Line 4 pushfdpush ebp mov ebp, esp; Line 5mov eax, DWORD PTR __$Eflags$[ebp]; Line 6pop ebp pop ecxpop eax ret 0 _main ENDP _TEXT ENDS END
Generate the assembly listing file case30.asm
from the
source file case30.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /Ox /Tccase30.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case30.c
Display the assembly listing file case30.asm
created in
step 4.:
TYPE case30.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC main … ; Function compile flags: /Ogtpy ; COMDAT main _TEXT SEGMENT__$Eflags$ = 0main PROC ; COMDAT ; File c:\users\stefan\desktop\case30.c ; Line 4 $LN4: pushfq ; Line 5mov eax, DWORD PTR __$Eflags$[rsp]; Line 6pop rcxpop rax ret 0 main ENDP _TEXT ENDS END
Create the text file case31.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define STRICT
#define UNICODE
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <unknwn.h>
#define IF2CO(class, member, interface) (&((class *) 0)->member == interface, \
((class *) (((char *) interface) - (size_t) &(((class *) 0)->member))))
extern const GUID CLSID_NULL;
extern DWORD dwCount;
typedef struct _CUnknown
{
DWORD dwCount;
IUnknown Unknown;
} CUnknown;
HRESULT WINAPI Unknown_QueryInterface(IUnknown *this, REFIID rIID, VOID **ppv)
{
CUnknown *that = IF2CO(CUnknown, Unknown, this);
if (ppv == NULL)
return E_POINTER;
*ppv = NULL;
if (rIID == NULL)
return E_INVALIDARG;
if (!IsEqualIID(rIID, &IID_IUnknown))
return E_NOINTERFACE;
*ppv = &that->Unknown;
_InterlockedIncrement(&that->dwCount);
return S_OK;
}
DWORD WINAPI Unknown_AddRef(IUnknown *this)
{
CUnknown *that = IF2CO(CUnknown, Unknown, this);
return _InterlockedIncrement(&that->dwCount);
}
DWORD WINAPI Unknown_Release(IUnknown *this)
{
CUnknown *that = IF2CO(CUnknown, Unknown, this);
DWORD dw = _InterlockedDecrement(&that->dwCount);
if (dw != 0L)
return dw;
_InterlockedDecrement(&dwCount);
CoTaskMemFree(that);
return 0L;
}
const IUnknownVtbl Unknown_Vtbl = {Unknown_QueryInterface, Unknown_AddRef, Unknown_Release};
Note: this
ANSI C
source is a minimum implementation of the
IUnknown
interface.
Generate the assembly listing file case31.asm
from the
source file case31.c
created in step 1., using the
Visual C 2010 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1is /Tccase31.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1 case31.c
Display the assembly listing file case31.asm
created in
step 2.:
TYPE case31.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 TITLE C:\Users\Stefan\Desktop\case31.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _Unknown_Release@4 PUBLIC _Unknown_AddRef@4 PUBLIC _Unknown_QueryInterface@12 PUBLIC _Unknown_Vtbl ; COMDAT CONST CONST SEGMENT _Unknown_Vtbl DD FLAT:_Unknown_QueryInterface@12 DD FLAT:_Unknown_AddRef@4 DD FLAT:_Unknown_Release@4 CONST ENDS EXTRN _IID_IUnknown:BYTE ; Function compile flags: /Ogspy ; COMDAT _Unknown_QueryInterface@12 _TEXT SEGMENT _this$ = 8 ; size = 4 _rIID$ = 12 ; size = 4 _ppv$ = 16 ; size = 4 _Unknown_QueryInterface@12 PROC ; COMDAT ; File c:\users\stefan\desktop\case31.c ; Line 26 mov edx, DWORD PTR _this$[esp-4] ; Line 28 mov eax, DWORD PTR _ppv$[esp-4] add edx, -4 ; fffffffcH test eax, eax jne SHORT $LN3@Unknown_Qu ; Line 29 mov eax, -2147467261 ; 80004003H jmp SHORT $LN4@Unknown_Qu $LN3@Unknown_Qu: ; Line 31 and DWORD PTR [eax], 0 push esi ; Line 33 mov esi, DWORD PTR _rIID$[esp] test esi, esi jne SHORT $LN2@Unknown_Qu ; Line 34 mov eax, -2147024809 ; 80070057H jmp SHORT $LN7@Unknown_Qu $LN2@Unknown_Qu:Notice the in(s)anepush ebxpush edi ; Line 36 push 4 pop ecxxor ebx, ebxmov edi, OFFSET _IID_IUnknown repe cmpsd pop edipop ebxje SHORT $LN1@Unknown_Qu ; Line 37 mov eax, -2147467262 ; 80004002H jmp SHORT $LN7@Unknown_Qu $LN1@Unknown_Qu: ; Line 39 lea ecx, DWORD PTR [edx+4] mov DWORD PTR [eax], ecx ; Line 41 xor eax, eax inc eax lock xadd DWORD PTR [edx], eax ; Line 43 xor eax, eax $LN7@Unknown_Qu: pop esi $LN4@Unknown_Qu: ; Line 44 ret 12 ; 0000000cH _Unknown_QueryInterface@12 ENDP _TEXT ENDS ; Function compile flags: /Ogspy ; COMDAT _Unknown_AddRef@4 _TEXT SEGMENT _this$ = 8 ; size = 4 _Unknown_AddRef@4 PROC ; COMDAT ; Line 48 mov ecx, DWORD PTR _this$[esp-4] ; Line 50 xor eax, eax add ecx, -4 ; fffffffcH inc eax lock xadd DWORD PTR [ecx], eax inc eax ; Line 51 ret 4 _Unknown_AddRef@4 ENDP _TEXT ENDS EXTRN __imp__CoTaskMemFree@4:PROC EXTRN _dwCount:DWORD ; Function compile flags: /Ogspy ; COMDAT _Unknown_Release@4 _TEXT SEGMENT _this$ = 8 ; size = 4 _Unknown_Release@4 PROC ; COMDAT ; Line 55 mov ecx, DWORD PTR _this$[esp-4] add ecx, -4 ; fffffffcH ; Line 56 mov edx, ecx or eax, -1 lock xadd DWORD PTR [edx], eax dec eax ; Line 59 jne SHORT $LN2@Unknown_Re ; Line 61 mov eax, OFFSET _dwCount or edx, -1 lock xadd DWORD PTR [eax], edx ; Line 63 push ecx call DWORD PTR __imp__CoTaskMemFree@4 ; Line 65 xor eax, eax $LN2@Unknown_Re: ; Line 66 ret 4 _Unknown_Release@4 ENDP _TEXT ENDS END
useof the
EBX
register
around the inlined
memcmp()
function.
__report_rangecheckfailure()
generated by the
Visual C 2017 compiler.
Create the text file case32.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#define MAX_PATH 260
typedef unsigned short wchar_t;
unsigned __stdcall GetModuleFileNameA(void *, char *, unsigned);
int main()
{
char sz[MAX_PATH];
unsigned dw = GetModuleFileNameA(0, sz, MAX_PATH);
if (dw < MAX_PATH)
sz[dw] = '\0';
}
unsigned __stdcall GetModuleFileNameW(void *, wchar_t *, unsigned);
int wmain()
{
wchar_t sz[MAX_PATH];
unsigned dw = GetModuleFileNameW(0, sz, MAX_PATH);
if (dw < MAX_PATH)
sz[dw] = L'\0';
}
Generate the assembly listing file case32.asm
from the
source file case32.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase32.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case32.c
Display the assembly listing file case32.asm
created in
step 2.:
TYPE case32.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case32.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _main PUBLIC _wmainNotice the difference between the single-byte character routineEXTRN ___report_rangecheckfailure:PROCEXTRN _GetModuleFileNameA@12:PROC EXTRN _GetModuleFileNameW@12:PROC EXTRN @__security_check_cookie@4:PROC EXTRN ___security_cookie:DWORD ; Function compile flags: /Ogspy ; COMDAT _main _TEXT SEGMENT _sz$ = -264 ; size = 260 __$ArrayPad$ = -4 ; size = 4 _main PROC ; COMDAT ; File c:\users\stefan\desktop\case32.c ; Line 10 push ebp mov ebp, esp sub esp, 264 ; 00000108H mov eax, DWORD PTR ___security_cookie xor eax, ebp mov DWORD PTR __$ArrayPad$[ebp], eax push esi ; Line 12 mov esi, 260 ; 00000104H lea eax, DWORD PTR _sz$[ebp] push esi push eax push 0 call _GetModuleFileNameA@12 ; Line 14 cmp eax, esi pop esi jae SHORT $LN2@main ; Line 15mov BYTE PTR _sz$[ebp+eax], 0$LN2@main: ; Line 16 mov ecx, DWORD PTR __$ArrayPad$[ebp] xor eax, eax xor ecx, ebp call @__security_check_cookie@4mov esp, ebp pop ebpleave ret 0 _main ENDP _TEXT ENDS ; Function compile flags: /Ogspy ; COMDAT _wmain _TEXT SEGMENT _sz$ = -524 ; size = 520 __$ArrayPad$ = -4 ; size = 4 _wmain PROC ; COMDAT ; File c:\users\stefan\desktop\case32.c ; Line 21 push ebp mov ebp, esp sub esp, 524 ; 0000020cH mov eax, DWORD PTR ___security_cookie xor eax, ebp mov DWORD PTR __$ArrayPad$[ebp], eax push esi ; Line 23 mov esi, 260 ; 00000104H lea eax, DWORD PTR _sz$[ebp] push esi push eax push 0 call _GetModuleFileNameW@12 ; Line 25 cmp eax, esi pop esi jae SHORT $LN2@wmain ; Line 26add eax, eax cmp eax, 520 ; 00000208H jae SHORT $LN9@wmain$LN2@wmain: ; Line 27 mov ecx, DWORD PTR __$ArrayPad$[ebp] xor eax, eax xor ecx, ebp call @__security_check_cookie@4mov esp, ebp pop ebpleave ret 0 $LN9@wmain: ; Line 26call ___report_rangecheckfailure$LN11@wmain: $LN8@wmain:int 3_wmain ENDP _TEXT ENDS END
main()
and the double-byte character routine
wmain()
: in the former, the conditional assignment of
the terminating NUL
character is not removed; in the
latter, a superfluous range checkwith a conditional branch that can never be taken is inserted instead, plus an unreachable call of the external routine
__report_rangecheckfailure()
!
Generate the assembly listing file case32.asm
from the
source file case32.c
created in step 1., using the
Visual C 2017 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase32.c /W4 /Zl
Microsoft (R) C/C++ Optimizing Compiler Version 19.13.26129.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx64\x64\1033\clui.dll: Version 19.13.26129.0 case32.c
Display the assembly listing file case32.asm
created in
step 4.:
TYPE case32.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC main PUBLIC wmainNotice the superfluousEXTRN __report_rangecheckfailure:PROCEXTRN GetModuleFileNameA:PROC EXTRN GetModuleFileNameW:PROC EXTRN __GSHandlerCheck:PROC EXTRN __security_check_cookie:PROC EXTRN __security_cookie:QWORD … ; Function compile flags: /Ogspy ; COMDAT main _TEXT SEGMENT sz$ = 32 __$ArrayPad$ = 304 main PROC ; COMDAT ; File c:\users\stefan\desktop\case32.c ; Line 10 $LN10: sub rsp, 328 ; 00000148H mov rax, QWORD PTR __security_cookie xor rax, rsp mov QWORD PTR __$ArrayPad$[rsp], rax ; Line 12 mov r8d, 260 ; 00000104H lea rdx, QWORD PTR sz$[rsp] xor ecx, ecx call GetModuleFileNameA ; Line 14 cmp eax, 260 ; 00000104H jae SHORT $LN2@main ; Line 15mov eax, eax cmp rax, 260 ; 00000104H jae SHORT $LN9@main mov BYTE PTR sz$[rsp+rax], 0$LN2@main: ; Line 16 xor eax, eax mov rcx, QWORD PTR __$ArrayPad$[rsp] xor rcx, rsp call __security_check_cookie add rsp, 328 ; 00000148H ret 0 $LN9@main: ; Line 15call __report_rangecheckfailure int 3$LN8@main: main ENDP _TEXT ENDS ; Function compile flags: /Ogspy ; COMDAT wmain _TEXT SEGMENT sz$ = 32 __$ArrayPad$ = 560 wmain PROC ; COMDAT ; File c:\users\stefan\desktop\case32.c ; Line 21 $LN11: sub rsp, 584 ; 00000248H mov rax, QWORD PTR __security_cookie xor rax, rsp mov QWORD PTR __$ArrayPad$[rsp], rax ; Line 23 mov r8d, 260 ; 00000104H lea rdx, QWORD PTR sz$[rsp] xor ecx, ecx call GetModuleFileNameW ; Line 25 cmp eax, 260 ; 00000104H jae SHORT $LN2@wmain ; Line 26mov eax, eax add rax, rax cmp rax, 520 ; 00000208H jae SHORT $LN9@wmain$LN2@wmain: ; Line 27 xor eax, eax mov rcx, QWORD PTR __$ArrayPad$[rsp] xor rcx, rsp call __security_check_cookie add rsp, 584 ; 00000248H ret 0 $LN9@wmain: ; Line 26call __report_rangecheckfailure int 3$LN8@wmain: wmain ENDP _TEXT ENDS END
range checkswith conditional branches that can never be taken, plus the unreachable calls of the external routine
__report_rangecheckfailure()
!
NUL
character is not removed in the single-byte
character routine main()
.
Create the text file case33.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#include <mmintrin.h>
int main(int argc)
{
const __m64 t = _mm_cvtsi32_si64(argc);
const __m64 u = _mm_set1_pi8(0);
const __m64 v = _mm_set_pi8(1, 2, 3, 4, 5, 6, 7, 8);
const __m64 w = _mm_setr_pi8(1, 2, 3, 4, 5, 6, 7, 8);
const __m64 x = _mm_or_si64(t, u);
const __m64 y = _mm_and_si64(v, w);
const __m64 z = _mm_xor_si64(x, y);
return _mm_cvtsi64_si32(z);
}
Generate the assembly listing file case33.asm
from the
source file case33.c
created in step 1., using the
Visual C 2017 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1s /Tccase33.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 19.13.26129.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\cl.exe: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c1xx.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\c2.dll: Version 19.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\link.exe: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\mspdb140.dll: Version 14.13.26129.0 C:\Program Files\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\bin\Hostx86\x86\1033\clui.dll: Version 19.13.26129.0 case33.c
Display the assembly listing file case33.asm
created in
step 2.:
TYPE case33.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.13.26129.0 TITLE C:\Users\Stefan\Desktop\case33.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _main ; COMDAT _CONST _CONST SEGMENT bar DQ 0807060504030201h foo DQ 0102030405060708h _CONST ENDS ; Function compile flags: /Ogspy ; COMDAT _main _TEXT SEGMENTThe compiler abuses antv138 = -8 ; size = 8 tv129 = -8 ; size = 8_argc$ = 8 ; size = 4 _main PROC ; COMDAT ; File c:\users\stefan\desktop\case33.c ; Line 6 push ebp mov ebp, espand esp, -8 ; fffffff8H sub esp, 8; Line 7 movd mm2, DWORD PTR _argc$[ebp] ; Line 8xor al, al; Line 9mov DWORD PTR tv129[esp+12], 16909060 ; 01020304H mov DWORD PTR tv129[esp+8], 84281096 ; 05060708H movd mm0, al punpcklbw mm0, mm0 punpcklbw mm0, mm0 punpcklbw mm0, mm0pxor mm0, mm0 ; Line 11 por mm2, mm0movq mm1, MMWORD PTR tv129[esp+8]movq mm1, MMWORD PTR barmov DWORD PTR tv138[esp+8], 67305985 ; 04030201H mov DWORD PTR tv138[esp+12], 134678021 ; 08070605H movq mm0, MMWORD PTR tv138[esp+8]movq mm0, MMWORD PTR foo ; Line 12 pand mm0, mm1 ; Line 13 pxor mm0, mm2 ; Line 15 movd eax, mm0 ; Line 16 mov esp, ebp pop ebp ret 0 _main ENDP _TEXT ENDS END
XOR
, a
MOVQ
plus three
PUNPCKLBW
instructions to
broadcastthe constant 0 into every byte of the MMX™ register
MM0
instead of a single
PXOR
instruction; the
following POR
instruction is
superfluous: a logical orwith 0 has no effect!
The constants 0x0102030405060708 and 0x0807060504030201 loaded into
the MMX registers
MM0
and MM1
are built during runtime,
using two superfluous temporary variables tv129
and
tv138
, instead during compile time.
Note: a proper optimising compiler would of course
replace both constants and the
PAND
instruction with the resulting
constant 0x0002020404020200!
ceil()
and
floor()
by the Visual C 2010 compiler for the
x64 alias AMD64 processor architecture.
Create the text file case34.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#pragma intrinsic(ceil, floor)
double trunc(double value)
{
return value < 0.0 ? ceil(value) : floor(value);
}
Generate the assembly listing file case34.asm
from the
source file case34.c
created in step 1., using the
Visual C 2010 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /fp:fast /Gy /Ox /Tccase34.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\1033\clui.dll: Version 16.00.40219.1 case34.c
Display the assembly listing file case34.asm
created in
step 2.:
TYPE case34.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMESThe compiler generatesPUBLIC __real@0000000000000000PUBLIC trunc EXTRN _fltused:DWORD ; File c:\users\stefan\desktop\case34.c; COMDAT __real@0000000000000000 CONST SEGMENT __real@0000000000000000 DQ 00000000000000000r ; 0 CONST ENDS; Function compile flags: /Ogtpy _TEXT SEGMENT value$ = 8 trunc PROC ; Line 6comisd xmm0, QWORD PTR __real@0000000000000000movmskpd rax, xmm0 and eax, 1 cvttsd2si rcx, xmm0xorpd xmm1, xmm1 mov rax, -9223372036854775808 ; 8000000000000000H jae SHORT $LN3@truncjz SHORT $LN3@trunccmp rcx, rax je SHORT $LN5@truncneg rcx jo SHORT $LN5@trunc neg rcxpxor xmm1, xmm1cvtsi2sd xmm1, rcx ucomisd xmm1, xmm0 je SHORT $LN5@truncunpcklpd xmm0, xmm0movmskpd rax, xmm0pxor xmm0, xmm0and eax, 1 xor eax, 1 add rcx, rax cvtsi2sd xmm0, rcx ; Line 7 ret 0 $LN3@trunc: ; Line 6cmp rcx, rax je SHORT $LN5@truncneg rcx jo SHORT $LN5@trunc neg rcxpxor xmm1, xmm1cvtsi2sd xmm1, rcx ucomisd xmm1, xmm0 je SHORT $LN5@truncunpcklpd xmm0, xmm0movmskpd rax, xmm0pxor xmm0, xmm0and eax, 1 sub rcx, rax cvtsi2sd xmm0, rcx $LN5@trunc: ; Line 7 fatret 0 trunc ENDP _TEXT ENDS END
PXOR
,
XORPD
and
UNPCKLPD
instructions to sanitise the upper lane of the
XMM
registers, although their upper lanes’
content is not used — except from the
MOVMSKPD
instructions, which get their output but sanitised with the
AND
instructions.
The comparison with the constant 0x8000000000000000, the lowest
signed 64-bit integer, followed by a jump-on-equal, is quite bad: it
can be replaced with a negation followed by a jump-on-overflow,
rendering the MOV
instruction with its
8 byte immediate value superfluous.
The initial comparison with the constant 0.0 is clumsy: it can be
replaced with a
MOVMSKPD
plus an AND
instruction, saving a
memory access and the storage for the constant
__real@0000000000000000
; a
XORPD
instruction to zero a register followed by a comparison with it
instead of the constant would also be better than the current code.
Generate another assembly listing file case34.asm
from
the source file case34.c
created in step 1., now
using the Visual C 2010 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /fp:fast /Gy /Ox /Tccase34.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1 case34.c case34.c(3) : warning C4163: 'ceil' : not available as an intrinsic function case34.c(7) : warning C4013: 'ceil' undefined; assuming extern returning intOUCH: contrary to the documentation Intrinsics available on all architectures on MSDN, the
ceil()
function is not available as intrinsic for the x86
alias I386 processor architecture!
Create the text file case35.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
#pragma comment(linker, "/NODEFAULTLIB")
#pragma comment(linker, "/ENTRY:MainCRTStartup")
#pragma comment(linker, "/SUBSYSTEM:CONSOLE")
#if 0 // NOTE: MSC yields error C2124 for 1.0 / 0.0 and 0.0 / 0.0!
#define INFINITY (1.0 / 0.0)
#define INDEFINITE (0.0 / 0.0)
#else
#define INFINITY (1.0 / 5.0e-324)
#define INDEFINITE (0.0 * INFINITY)
#endif
int _fltused;
int MainCRTStartup(void)
{
volatile double indefinite = INDEFINITE, infinity = INFINITY, zero = 0.0;
int bitmask = 0;
if (indefinite == indefinite)
bitmask += 1 << 0;
if (indefinite < INDEFINITE)
bitmask += 1 << 1;
if (indefinite == INDEFINITE)
bitmask += 1 << 2;
if (indefinite > INDEFINITE)
bitmask += 1 << 3;
if (indefinite < INFINITY)
bitmask += 1 << 4;
if (indefinite == INFINITY)
bitmask += 1 << 5;
if (indefinite > INFINITY)
bitmask += 1 << 6;
if (indefinite < -INFINITY)
bitmask += 1 << 7;
if (indefinite == -INFINITY)
bitmask += 1 << 8;
if (indefinite > -INFINITY)
bitmask += 1 << 9;
if (indefinite < 0.0)
bitmask += 1 << 10;
if (indefinite == 0.0)
bitmask += 1 << 11;
if (indefinite > 0.0)
bitmask += 1 << 12;
if (indefinite < -0.0)
bitmask += 1 << 13;
if (indefinite == -0.0)
bitmask += 1 << 14;
if (indefinite > -0.0)
bitmask += 1 << 15;
if (indefinite < 1.0)
bitmask += 1 << 16;
if (indefinite == 1.0)
bitmask += 1 << 17;
if (indefinite > 1.0)
bitmask += 1 << 18;
if (infinity == INDEFINITE)
bitmask += 1 << 19;
if (infinity == INFINITY)
bitmask += 1 << 20;
if (infinity == -INFINITY)
bitmask += 1 << 21;
if (-infinity == -INFINITY)
bitmask += 1 << 22;
if (infinity > -INFINITY)
bitmask += 1 << 23;
if (infinity > 0.0)
bitmask += 1 << 24;
if (-infinity < -0.0)
bitmask += 1 << 25;
if (zero == INDEFINITE)
bitmask += 1 << 26;
if (zero == INFINITY)
bitmask += 1 << 27;
if (zero == -INFINITY)
bitmask += 1 << 28;
if (zero == 0.0)
bitmask += 1 << 29;
if (zero == -0.0)
bitmask += 1 << 30;
if (INDEFINITE == INDEFINITE)
bitmask += 1 << 31;
return bitmask;
}
Generate the assembly listing file case35.asm
and the
console
program case35.exe
from the source file
case35.c
created in step 1., using the
Visual C 2010 compiler for the
x64 alias AMD64 processor architecture:
CL.EXE /Bv /Fa /fp:fast /Gy /Tccase35.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\1033\clui.dll: Version 16.00.40219.1 case35.c case35.c(116) : warning C4127: conditional expression is constant Microsoft (R) Incremental Linker Version 10.00.40219.386 Copyright (C) Microsoft Corporation. All rights reserved. /out:case35.exe case35.obj
Display the assembly listing file case35.asm
created in
step 2.:
TYPE case35.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 include listing.inc INCLUDELIB LIBCMT INCLUDELIB OLDNAMES _DATA SEGMENT COMM _fltused:DWORD _DATA ENDS PUBLIC __mask@@NegDouble@ PUBLIC __real@3ff0000000000000 PUBLIC __real@8000000000000000 PUBLIC __real@0000000000000000 PUBLIC __real@fff0000000000000 PUBLIC __real@7ff0000000000000 PUBLIC __real@fff8000000000000 PUBLIC MainCRTStartup EXTRN _fltused:DWORD ; COMDAT pdata ; File c:\users\stefan\desktop\case35.c pdata SEGMENT $pdata$MainCRTStartup DD imagerel $LN35 DD imagerel $LN35+972 DD imagerel $unwind$MainCRTStartup pdata ENDS ; COMDAT xdata xdata SEGMENT $unwind$MainCRTStartup DD 010401H DD 04204H xdata ENDS ; COMDAT __mask@@NegDouble@ CONST SEGMENT __mask@@NegDouble@ DB 00H, 00H, 00H, 00H, 00H, 00H, 00H, 080H, 00H, 00H, 00H DB 00H, 00H, 00H, 00H, 080H CONST ENDS ; COMDAT __real@3ff0000000000000 CONST SEGMENT __real@3ff0000000000000 DQ 03ff0000000000000r ; 1 CONST ENDS ; COMDAT __real@8000000000000000 CONST SEGMENT __real@8000000000000000 DQ 08000000000000000r ; -0 CONST ENDS ; COMDAT __real@0000000000000000 CONST SEGMENT __real@0000000000000000 DQ 00000000000000000r ; 0 CONST ENDS ; COMDAT __real@fff0000000000000 CONST SEGMENT __real@fff0000000000000 DQ 0fff0000000000000r ; -1.#INF CONST ENDS ; COMDAT __real@7ff0000000000000 CONST SEGMENT __real@7ff0000000000000 DQ 07ff0000000000000r ; 1.#INF CONST ENDS ; COMDAT __real@fff8000000000000 CONST SEGMENT __real@fff8000000000000 DQ 0fff8000000000000r ; -1.#IND ; Function compile flags: /Odtp CONST ENDS ; COMDAT MainCRTStartup _TEXT SEGMENT infinity$ = 0 bitmask$ = 8 indefinite$ = 16 zero$ = 24 MainCRTStartup PROC ; COMDAT ; Line 18 $LN35: sub rsp, 40 ; 00000028H ; Line 19 movsdx xmm0, QWORD PTR __real@fff8000000000000 movsdx QWORD PTR indefinite$[rsp], xmm0 movsdx xmm0, QWORD PTR __real@7ff0000000000000 movsdx QWORD PTR infinity$[rsp], xmm0 xorpd xmm0, xmm0 movsdx QWORD PTR zero$[rsp], xmm0 ; Line 21 mov DWORD PTR bitmask$[rsp], 0 ; Line 23 movsdx xmm0, QWORD PTR indefinite$[rsp] movsdx xmm1, QWORD PTR indefinite$[rsp] ucomisd xmm0, xmm1 jne SHORT $LN32@MainCRTSta jpe SHORT $LN32@MainCRTSta ; Line 24 mov eax, DWORD PTR bitmask$[rsp] inc eax mov DWORD PTR bitmask$[rsp], eax $LN32@MainCRTSta: ; Line 26 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@fff8000000000000 jae SHORT $LN31@MainCRTSta jpe SHORT $LN31@MainCRTSta ; Line 27 mov eax, DWORD PTR bitmask$[rsp] add eax, 2 mov DWORD PTR bitmask$[rsp], eax $LN31@MainCRTSta: ; Line 29 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@fff8000000000000 jne SHORT $LN30@MainCRTSta jpe SHORT $LN30@MainCRTSta ; Line 30 mov eax, DWORD PTR bitmask$[rsp] add eax, 4 mov DWORD PTR bitmask$[rsp], eax $LN30@MainCRTSta: ; Line 32 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@fff8000000000000 jbe SHORT $LN29@MainCRTSta ; Line 33 mov eax, DWORD PTR bitmask$[rsp] add eax, 8 mov DWORD PTR bitmask$[rsp], eax $LN29@MainCRTSta: ; Line 35 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@7ff0000000000000 jae SHORT $LN28@MainCRTSta jpe SHORT $LN28@MainCRTSta ; Line 36 mov eax, DWORD PTR bitmask$[rsp] add eax, 16 mov DWORD PTR bitmask$[rsp], eax $LN28@MainCRTSta: ; Line 38 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@7ff0000000000000 jne SHORT $LN27@MainCRTSta jpe SHORT $LN27@MainCRTSta ; Line 39 mov eax, DWORD PTR bitmask$[rsp] add eax, 32 ; 00000020H mov DWORD PTR bitmask$[rsp], eax $LN27@MainCRTSta: ; Line 41 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@7ff0000000000000 jbe SHORT $LN26@MainCRTSta ; Line 42 mov eax, DWORD PTR bitmask$[rsp] add eax, 64 ; 00000040H mov DWORD PTR bitmask$[rsp], eax $LN26@MainCRTSta: ; Line 44 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@fff0000000000000 jae SHORT $LN25@MainCRTSta jpe SHORT $LN25@MainCRTSta ; Line 45 mov eax, DWORD PTR bitmask$[rsp] add eax, 128 ; 00000080H mov DWORD PTR bitmask$[rsp], eax $LN25@MainCRTSta: ; Line 47 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@fff0000000000000 jne SHORT $LN24@MainCRTSta jpe SHORT $LN24@MainCRTSta ; Line 48 mov eax, DWORD PTR bitmask$[rsp] add eax, 256 ; 00000100H mov DWORD PTR bitmask$[rsp], eax $LN24@MainCRTSta: ; Line 50 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@fff0000000000000 jbe SHORT $LN23@MainCRTSta ; Line 51 mov eax, DWORD PTR bitmask$[rsp] add eax, 512 ; 00000200H mov DWORD PTR bitmask$[rsp], eax $LN23@MainCRTSta: ; Line 53 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@0000000000000000 jae SHORT $LN22@MainCRTSta jpe SHORT $LN22@MainCRTSta ; Line 54 mov eax, DWORD PTR bitmask$[rsp] add eax, 1024 ; 00000400H mov DWORD PTR bitmask$[rsp], eax $LN22@MainCRTSta: ; Line 56 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@0000000000000000 jne SHORT $LN21@MainCRTSta jpe SHORT $LN21@MainCRTSta ; Line 57 mov eax, DWORD PTR bitmask$[rsp] add eax, 2048 ; 00000800H mov DWORD PTR bitmask$[rsp], eax $LN21@MainCRTSta: ; Line 59 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@0000000000000000 jbe SHORT $LN20@MainCRTSta ; Line 60 mov eax, DWORD PTR bitmask$[rsp] add eax, 4096 ; 00001000H mov DWORD PTR bitmask$[rsp], eax $LN20@MainCRTSta: ; Line 62 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@8000000000000000 jae SHORT $LN19@MainCRTSta jpe SHORT $LN19@MainCRTSta ; Line 63 mov eax, DWORD PTR bitmask$[rsp] add eax, 8192 ; 00002000H mov DWORD PTR bitmask$[rsp], eax $LN19@MainCRTSta: ; Line 65 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@8000000000000000 jne SHORT $LN18@MainCRTSta jpe SHORT $LN18@MainCRTSta ; Line 66 mov eax, DWORD PTR bitmask$[rsp] add eax, 16384 ; 00004000H mov DWORD PTR bitmask$[rsp], eax $LN18@MainCRTSta: ; Line 68 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@8000000000000000 jbe SHORT $LN17@MainCRTSta ; Line 69 mov eax, DWORD PTR bitmask$[rsp] add eax, 32768 ; 00008000H mov DWORD PTR bitmask$[rsp], eax $LN17@MainCRTSta: ; Line 71 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@3ff0000000000000 jae SHORT $LN16@MainCRTSta jpe SHORT $LN16@MainCRTSta ; Line 72 mov eax, DWORD PTR bitmask$[rsp] add eax, 65536 ; 00010000H mov DWORD PTR bitmask$[rsp], eax $LN16@MainCRTSta: ; Line 74 movsdx xmm0, QWORD PTR indefinite$[rsp] ucomisd xmm0, QWORD PTR __real@3ff0000000000000 jne SHORT $LN15@MainCRTSta jpe SHORT $LN15@MainCRTSta ; Line 75 mov eax, DWORD PTR bitmask$[rsp] add eax, 131072 ; 00020000H mov DWORD PTR bitmask$[rsp], eax $LN15@MainCRTSta: ; Line 77 movsdx xmm0, QWORD PTR indefinite$[rsp] comisd xmm0, QWORD PTR __real@3ff0000000000000 jbe SHORT $LN14@MainCRTSta ; Line 78 mov eax, DWORD PTR bitmask$[rsp] add eax, 262144 ; 00040000H mov DWORD PTR bitmask$[rsp], eax $LN14@MainCRTSta: ; Line 80 movsdx xmm0, QWORD PTR infinity$[rsp] ucomisd xmm0, QWORD PTR __real@fff8000000000000 jne SHORT $LN13@MainCRTSta jpe SHORT $LN13@MainCRTSta ; Line 81 mov eax, DWORD PTR bitmask$[rsp] add eax, 524288 ; 00080000H mov DWORD PTR bitmask$[rsp], eax $LN13@MainCRTSta: ; Line 83 movsdx xmm0, QWORD PTR infinity$[rsp] ucomisd xmm0, QWORD PTR __real@7ff0000000000000 jne SHORT $LN12@MainCRTSta jpe SHORT $LN12@MainCRTSta ; Line 84 mov eax, DWORD PTR bitmask$[rsp] add eax, 1048576 ; 00100000H mov DWORD PTR bitmask$[rsp], eax $LN12@MainCRTSta: ; Line 86 movsdx xmm0, QWORD PTR infinity$[rsp] ucomisd xmm0, QWORD PTR __real@fff0000000000000 jne SHORT $LN11@MainCRTSta jpe SHORT $LN11@MainCRTSta ; Line 87 mov eax, DWORD PTR bitmask$[rsp] add eax, 2097152 ; 00200000H mov DWORD PTR bitmask$[rsp], eax $LN11@MainCRTSta: ; Line 89 movsdx xmm0, QWORD PTR infinity$[rsp] xorpd xmm0, QWORD PTR __mask@@NegDouble@ ucomisd xmm0, QWORD PTR __real@fff0000000000000 jne SHORT $LN10@MainCRTSta jpe SHORT $LN10@MainCRTSta ; Line 90 mov eax, DWORD PTR bitmask$[rsp] add eax, 4194304 ; 00400000H mov DWORD PTR bitmask$[rsp], eax $LN10@MainCRTSta: ; Line 92 movsdx xmm0, QWORD PTR infinity$[rsp] comisd xmm0, QWORD PTR __real@fff0000000000000 jbe SHORT $LN9@MainCRTSta ; Line 93 mov eax, DWORD PTR bitmask$[rsp] add eax, 8388608 ; 00800000H mov DWORD PTR bitmask$[rsp], eax $LN9@MainCRTSta: ; Line 95 movsdx xmm0, QWORD PTR infinity$[rsp] comisd xmm0, QWORD PTR __real@0000000000000000 jbe SHORT $LN8@MainCRTSta ; Line 96 mov eax, DWORD PTR bitmask$[rsp] add eax, 16777216 ; 01000000H mov DWORD PTR bitmask$[rsp], eax $LN8@MainCRTSta: ; Line 98 movsdx xmm0, QWORD PTR infinity$[rsp] xorpd xmm0, QWORD PTR __mask@@NegDouble@ comisd xmm0, QWORD PTR __real@8000000000000000 jae SHORT $LN7@MainCRTSta jpe SHORT $LN7@MainCRTSta ; Line 99 mov eax, DWORD PTR bitmask$[rsp] add eax, 33554432 ; 02000000H mov DWORD PTR bitmask$[rsp], eax $LN7@MainCRTSta: ; Line 101 movsdx xmm0, QWORD PTR zero$[rsp] ucomisd xmm0, QWORD PTR __real@fff8000000000000 jne SHORT $LN6@MainCRTSta jpe SHORT $LN6@MainCRTSta ; Line 102 mov eax, DWORD PTR bitmask$[rsp] add eax, 67108864 ; 04000000H mov DWORD PTR bitmask$[rsp], eax $LN6@MainCRTSta: ; Line 104 movsdx xmm0, QWORD PTR zero$[rsp] ucomisd xmm0, QWORD PTR __real@7ff0000000000000 jne SHORT $LN5@MainCRTSta jpe SHORT $LN5@MainCRTSta ; Line 105 mov eax, DWORD PTR bitmask$[rsp] add eax, 134217728 ; 08000000H mov DWORD PTR bitmask$[rsp], eax $LN5@MainCRTSta: ; Line 107 movsdx xmm0, QWORD PTR zero$[rsp] ucomisd xmm0, QWORD PTR __real@fff0000000000000 jne SHORT $LN4@MainCRTSta jpe SHORT $LN4@MainCRTSta ; Line 108 mov eax, DWORD PTR bitmask$[rsp] add eax, 268435456 ; 10000000H mov DWORD PTR bitmask$[rsp], eax $LN4@MainCRTSta: ; Line 110 movsdx xmm0, QWORD PTR zero$[rsp] ucomisd xmm0, QWORD PTR __real@0000000000000000 jne SHORT $LN3@MainCRTSta jpe SHORT $LN3@MainCRTSta ; Line 111 mov eax, DWORD PTR bitmask$[rsp] add eax, 536870912 ; 20000000H mov DWORD PTR bitmask$[rsp], eax $LN3@MainCRTSta: ; Line 113 movsdx xmm0, QWORD PTR zero$[rsp] ucomisd xmm0, QWORD PTR __real@8000000000000000 jne SHORT $LN2@MainCRTSta jpe SHORT $LN2@MainCRTSta ; Line 114 mov eax, DWORD PTR bitmask$[rsp] add eax, 1073741824 ; 40000000H mov DWORD PTR bitmask$[rsp], eax $LN2@MainCRTSta: ; Line 116 xor eax, eax test eax, eax je SHORT $LN1@MainCRTSta ; Line 117 mov eax, DWORD PTR bitmask$[rsp] sub eax, -2147483648 ; ffffffff80000000H mov DWORD PTR bitmask$[rsp], eax $LN1@MainCRTSta: ; Line 119 mov eax, DWORD PTR bitmask$[rsp] ; Line 120 add rsp, 40 ; 00000028H ret 0 MainCRTStartup ENDP _TEXT ENDS ENDThe
COMISD
and
UCOMISD
instructions generated by the compiler set all three flags
CF
alias carry,
PF
alias
parityand
ZF
alias zerowhen at least one operand is
indefinite; the conditional jump instructions
JA
alias
JNBE
,
JAE
alias
JNB
alias
JNC
,
JB
alias
JC
alias
JNAE
,
JBE
alias
JNA
,
JE
alias
JZ
, and
JNE
alias
JNZ
, which inspect the
CF
and ZF
flags, but don’t inspect
the PF
flag, are therefore not
sufficient there: depending on whether they test the inverted or
straight condition they must be accompanied by an
additional JP
alias
JPE
or
JNP
alias
JPO
instruction!
Run the 64-bit console
program case35.exe
created in step 2. and display its exit code:
.\case35.exe ECHO %ERRORLEVEL%
17424337191742433719 is equal to 0x67DB6DB7; the expected and correct exit code is but 0x63D00000 (1674575872): the IEEE 754 Standard for Floating-Point Arithmetic defines an
indefinitevalue, which results for example from the division ±0.0÷±0.0, from the multiplication
±infinity×0.0, or from the subtraction of
infinity−infinity, and which is unequal to any value, including itself!
Note: the final comparison in line 116, where the
compiler does not generate an
UCOMISD
instruction, but evaluates the constant expression, produces the
correct result.
Generate another assembly listing file case35.asm
plus
console
program case35.exe
from the source file
case35.c
created in step 1., now using the
Visual C 2010 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /Fa /fp:fast /Gy /Tccase35.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1 case35.c case35.c(116) : warning C4127: conditional expression is constant Microsoft (R) Incremental Linker Version 10.00.40219.386 Copyright (C) Microsoft Corporation. All rights reserved. /out:case35.exe case35.obj
Display the assembly listing file case35.asm
created in
step 4.:
TYPE case35.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 TITLE C:\Users\Stefan\Desktop\case35.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES _DATA SEGMENT COMM __fltused:DWORD _DATA ENDS PUBLIC __real@3ff0000000000000 PUBLIC __real@8000000000000000 PUBLIC __real@fff0000000000000 PUBLIC __real@0000000000000000 PUBLIC __real@7ff0000000000000 PUBLIC __real@fff8000000000000 PUBLIC _MainCRTStartup EXTRN __fltused:DWORD ; COMDAT __real@3ff0000000000000 ; File c:\users\stefan\desktop\case35.c CONST SEGMENT __real@3ff0000000000000 DQ 03ff0000000000000r ; 1 CONST ENDS ; COMDAT __real@8000000000000000 CONST SEGMENT __real@8000000000000000 DQ 08000000000000000r ; -0 CONST ENDS ; COMDAT __real@fff0000000000000 CONST SEGMENT __real@fff0000000000000 DQ 0fff0000000000000r ; -1.#INF CONST ENDS ; COMDAT __real@0000000000000000 CONST SEGMENT __real@0000000000000000 DQ 00000000000000000r ; 0 CONST ENDS ; COMDAT __real@7ff0000000000000 CONST SEGMENT __real@7ff0000000000000 DQ 07ff0000000000000r ; 1.#INF CONST ENDS ; COMDAT __real@fff8000000000000 CONST SEGMENT __real@fff8000000000000 DQ 0fff8000000000000r ; -1.#IND ; Function compile flags: /Odtp CONST ENDS ; COMDAT _MainCRTStartup _TEXT SEGMENT _zero$ = -32 ; size = 8 _indefinite$ = -24 ; size = 8 _bitmask$ = -12 ; size = 4 _infinity$ = -8 ; size = 8 _MainCRTStartup PROC ; COMDAT ; Line 18 push ebp mov ebp, esp sub esp, 32 ; 00000020H ; Line 19 fld QWORD PTR __real@fff8000000000000 fstp QWORD PTR _indefinite$[ebp] fld QWORD PTR __real@7ff0000000000000 fstp QWORD PTR _infinity$[ebp] fldz fstp QWORD PTR _zero$[ebp] ; Line 21 mov DWORD PTR _bitmask$[ebp], 0 ; Line 23 fld QWORD PTR _indefinite$[ebp] fld QWORD PTR _indefinite$[ebp] fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN32@MainCRTSta ; Line 24 mov eax, DWORD PTR _bitmask$[ebp] add eax, 1 mov DWORD PTR _bitmask$[ebp], eax $LN32@MainCRTSta: ; Line 26 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@fff8000000000000 fnstsw ax test ah, 5 jp SHORT $LN31@MainCRTSta ; Line 27 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 2 mov DWORD PTR _bitmask$[ebp], ecx $LN31@MainCRTSta: ; Line 29 fld QWORD PTR _indefinite$[ebp] fld QWORD PTR __real@fff8000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN30@MainCRTSta ; Line 30 mov edx, DWORD PTR _bitmask$[ebp] add edx, 4 mov DWORD PTR _bitmask$[ebp], edx $LN30@MainCRTSta: ; Line 32 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@fff8000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN29@MainCRTSta ; Line 33 mov eax, DWORD PTR _bitmask$[ebp] add eax, 8 mov DWORD PTR _bitmask$[ebp], eax $LN29@MainCRTSta: ; Line 35 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@7ff0000000000000 fnstsw ax test ah, 5 jp SHORT $LN28@MainCRTSta ; Line 36 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 16 ; 00000010H mov DWORD PTR _bitmask$[ebp], ecx $LN28@MainCRTSta: ; Line 38 fld QWORD PTR _indefinite$[ebp] fld QWORD PTR __real@7ff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN27@MainCRTSta ; Line 39 mov edx, DWORD PTR _bitmask$[ebp] add edx, 32 ; 00000020H mov DWORD PTR _bitmask$[ebp], edx $LN27@MainCRTSta: ; Line 41 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@7ff0000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN26@MainCRTSta ; Line 42 mov eax, DWORD PTR _bitmask$[ebp] add eax, 64 ; 00000040H mov DWORD PTR _bitmask$[ebp], eax $LN26@MainCRTSta: ; Line 44 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@fff0000000000000 fnstsw ax test ah, 5 jp SHORT $LN25@MainCRTSta ; Line 45 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 128 ; 00000080H mov DWORD PTR _bitmask$[ebp], ecx $LN25@MainCRTSta: ; Line 47 fld QWORD PTR _indefinite$[ebp] fld QWORD PTR __real@fff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN24@MainCRTSta ; Line 48 mov edx, DWORD PTR _bitmask$[ebp] add edx, 256 ; 00000100H mov DWORD PTR _bitmask$[ebp], edx $LN24@MainCRTSta: ; Line 50 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@fff0000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN23@MainCRTSta ; Line 51 mov eax, DWORD PTR _bitmask$[ebp] add eax, 512 ; 00000200H mov DWORD PTR _bitmask$[ebp], eax $LN23@MainCRTSta: ; Line 53 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@0000000000000000 fnstsw ax test ah, 5 jp SHORT $LN22@MainCRTSta ; Line 54 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 1024 ; 00000400H mov DWORD PTR _bitmask$[ebp], ecx $LN22@MainCRTSta: ; Line 56 fld QWORD PTR _indefinite$[ebp] fldz fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN21@MainCRTSta ; Line 57 mov edx, DWORD PTR _bitmask$[ebp] add edx, 2048 ; 00000800H mov DWORD PTR _bitmask$[ebp], edx $LN21@MainCRTSta: ; Line 59 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@0000000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN20@MainCRTSta ; Line 60 mov eax, DWORD PTR _bitmask$[ebp] add eax, 4096 ; 00001000H mov DWORD PTR _bitmask$[ebp], eax $LN20@MainCRTSta: ; Line 62 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@8000000000000000 fnstsw ax test ah, 5 jp SHORT $LN19@MainCRTSta ; Line 63 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 8192 ; 00002000H mov DWORD PTR _bitmask$[ebp], ecx $LN19@MainCRTSta: ; Line 65 fld QWORD PTR _indefinite$[ebp] fldz fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN18@MainCRTSta ; Line 66 mov edx, DWORD PTR _bitmask$[ebp] add edx, 16384 ; 00004000H mov DWORD PTR _bitmask$[ebp], edx $LN18@MainCRTSta: ; Line 68 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@8000000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN17@MainCRTSta ; Line 69 mov eax, DWORD PTR _bitmask$[ebp] add eax, 32768 ; 00008000H mov DWORD PTR _bitmask$[ebp], eax $LN17@MainCRTSta: ; Line 71 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@3ff0000000000000 fnstsw ax test ah, 5 jp SHORT $LN16@MainCRTSta ; Line 72 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 65536 ; 00010000H mov DWORD PTR _bitmask$[ebp], ecx $LN16@MainCRTSta: ; Line 74 fld QWORD PTR _indefinite$[ebp] fld1 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN15@MainCRTSta ; Line 75 mov edx, DWORD PTR _bitmask$[ebp] add edx, 131072 ; 00020000H mov DWORD PTR _bitmask$[ebp], edx $LN15@MainCRTSta: ; Line 77 fld QWORD PTR _indefinite$[ebp] fcomp QWORD PTR __real@3ff0000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN14@MainCRTSta ; Line 78 mov eax, DWORD PTR _bitmask$[ebp] add eax, 262144 ; 00040000H mov DWORD PTR _bitmask$[ebp], eax $LN14@MainCRTSta: ; Line 80 fld QWORD PTR _infinity$[ebp] fld QWORD PTR __real@fff8000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN13@MainCRTSta ; Line 81 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 524288 ; 00080000H mov DWORD PTR _bitmask$[ebp], ecx $LN13@MainCRTSta: ; Line 83 fld QWORD PTR _infinity$[ebp] fld QWORD PTR __real@7ff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN12@MainCRTSta ; Line 84 mov edx, DWORD PTR _bitmask$[ebp] add edx, 1048576 ; 00100000H mov DWORD PTR _bitmask$[ebp], edx $LN12@MainCRTSta: ; Line 86 fld QWORD PTR _infinity$[ebp] fld QWORD PTR __real@fff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN11@MainCRTSta ; Line 87 mov eax, DWORD PTR _bitmask$[ebp] add eax, 2097152 ; 00200000H mov DWORD PTR _bitmask$[ebp], eax $LN11@MainCRTSta: ; Line 89 fld QWORD PTR _infinity$[ebp] fchs fld QWORD PTR __real@fff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN10@MainCRTSta ; Line 90 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 4194304 ; 00400000H mov DWORD PTR _bitmask$[ebp], ecx $LN10@MainCRTSta: ; Line 92 fld QWORD PTR _infinity$[ebp] fcomp QWORD PTR __real@fff0000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN9@MainCRTSta ; Line 93 mov edx, DWORD PTR _bitmask$[ebp] add edx, 8388608 ; 00800000H mov DWORD PTR _bitmask$[ebp], edx $LN9@MainCRTSta: ; Line 95 fld QWORD PTR _infinity$[ebp] fcomp QWORD PTR __real@0000000000000000 fnstsw ax test ah, 65 ; 00000041H jne SHORT $LN8@MainCRTSta ; Line 96 mov eax, DWORD PTR _bitmask$[ebp] add eax, 16777216 ; 01000000H mov DWORD PTR _bitmask$[ebp], eax $LN8@MainCRTSta: ; Line 98 fld QWORD PTR _infinity$[ebp] fchs fcomp QWORD PTR __real@8000000000000000 fnstsw ax test ah, 5 jp SHORT $LN7@MainCRTSta ; Line 99 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 33554432 ; 02000000H mov DWORD PTR _bitmask$[ebp], ecx $LN7@MainCRTSta: ; Line 101 fld QWORD PTR _zero$[ebp] fld QWORD PTR __real@fff8000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN6@MainCRTSta ; Line 102 mov edx, DWORD PTR _bitmask$[ebp] add edx, 67108864 ; 04000000H mov DWORD PTR _bitmask$[ebp], edx $LN6@MainCRTSta: ; Line 104 fld QWORD PTR _zero$[ebp] fld QWORD PTR __real@7ff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN5@MainCRTSta ; Line 105 mov eax, DWORD PTR _bitmask$[ebp] add eax, 134217728 ; 08000000H mov DWORD PTR _bitmask$[ebp], eax $LN5@MainCRTSta: ; Line 107 fld QWORD PTR _zero$[ebp] fld QWORD PTR __real@fff0000000000000 fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN4@MainCRTSta ; Line 108 mov ecx, DWORD PTR _bitmask$[ebp] add ecx, 268435456 ; 10000000H mov DWORD PTR _bitmask$[ebp], ecx $LN4@MainCRTSta: ; Line 110 fld QWORD PTR _zero$[ebp] fldz fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN3@MainCRTSta ; Line 111 mov edx, DWORD PTR _bitmask$[ebp] add edx, 536870912 ; 20000000H mov DWORD PTR _bitmask$[ebp], edx $LN3@MainCRTSta: ; Line 113 fld QWORD PTR _zero$[ebp] fldz fucompp fnstsw ax test ah, 68 ; 00000044H jp SHORT $LN2@MainCRTSta ; Line 114 mov eax, DWORD PTR _bitmask$[ebp] add eax, 1073741824 ; 40000000H mov DWORD PTR _bitmask$[ebp], eax $LN2@MainCRTSta: ; Line 116 xor ecx, ecx je SHORT $LN1@MainCRTSta ; Line 117 mov edx, DWORD PTR _bitmask$[ebp] sub edx, -2147483648 ; 80000000H mov DWORD PTR _bitmask$[ebp], edx $LN1@MainCRTSta: ; Line 119 mov eax, DWORD PTR _bitmask$[ebp] ; Line 120 mov esp, ebp pop ebp ret 0 _MainCRTStartup ENDP _TEXT ENDS END
Run the 32-bit console
program case35.exe
created in step 4. and display its exit code:
.\case35.exe ECHO %ERRORLEVEL%
1674575872The 32-bit program built with the same compiler options yields the correct result!
memcmp()
intrinsic by the Visual C 2010 compiler.
Create the text file case36.c
with the following
content in an arbitrary, preferable empty directory:
// Copyleft © 2018-2024, Stefan Kanthak <stefan.kanthak@nexgo.de>
int main(int argc, char *argv[])
{
return memcmp(argv[0], "case36.exe", sizeof("case36.exe"));
}
Generate the assembly listing file case36.asm
from the
source file case36.c
created in step 1., using the
Visual C 2010 compiler for the
x86 alias I386 processor architecture:
CL.EXE /Bv /c /Fa /FoNUL: /Gy /O1isy /Tccase36.c /W4 /Zl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. Compiler Passes: C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\cl.exe: Version 16.00.40219.1 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c1xx.dll: Version 16.00.40219.400 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\c2.dll: Version 16.00.40219.449 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\link.exe: Version 10.00.40219.386 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\mspdb100.dll: Version 10.00.40219.478 C:\Program Files\Microsoft Visual Studio 10.0\VC\Bin\1033\clui.dll: Version 16.00.40219.1 case36.c
Display the assembly listing file case36.asm
created in
step 2.:
TYPE case36.asm
; Listing generated by Microsoft (R) Optimizing Compiler Version 16.00.40219.449 TITLE C:\Users\Stefan\Desktop\case36.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC ??_C@_0L@EJIHOLIK@case36?4exe?$AA@ ; `string' PUBLIC _main ; COMDAT ??_C@_0L@EJIHOLIK@case36?4exe?$AA@ ; File c:\users\stefan\desktop\case36.c CONST SEGMENT ??_C@_0L@EJIHOLIK@case36?4exe?$AA@ DB 'case36.exe', 00H ; `string' ; Function compile flags: /Odspy CONST ENDS ; COMDAT _main _TEXT SEGMENTOUCH: 5 (in words: five) superfluoustv68 = -8 ; size = 4 tv72 = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4_main PROC ; COMDAT ; Line 4push ebp mov ebp, esp push ecx push ecxpush esi push edi ; Line 5 push 11 ; 0000000bH pop ecxmov edi, OFFSET ??_C@_0L@EJIHOLIK@case36?4exe?$AA@mov esi, OFFSET ??_C@_0L@EJIHOLIK@case36?4exe?$AA@ mov eax, [esp+8]mov eax, DWORD PTR _argv$[ebp] mov esi, DWORD PTR [eax]mov edi, DWORD PTR [eax] xor eax, eaxmov DWORD PTR tv72[ebp], eaxrepe cmpsb seta al sbb eax, 0je SHORT $LN3@main sbb eax, eax sbb eax, -1 mov DWORD PTR tv72[ebp], eax $LN3@main: mov eax, DWORD PTR tv72[ebp] mov DWORD PTR tv68[ebp], eax mov eax, DWORD PTR tv68[ebp]; Line 6 pop edi pop esileaveret 0 _main ENDP _TEXT ENDS END
MOV
instructions, performing
as many superfluous memory accesses on 2 superfluous temporary
variables allocated via as many superfluous
PUSH
instructions, wasting 17 bytes in total, despite the
/Os
compiler option given!
OOPS: despite the
/Oy
compiler option given, a frame pointer is setup in register
EBP
, using 3 superfluous instructions in 4 bytes!
OOPS: with destination and source for the
CMPSB
instruction
swapped, the following conditional branch
JE
plus the 2
SBB
instructions can be replaced by a
SETA
and a
SBB
instruction, saving 1 instruction and 1 byte.
Note: properly optimising for size, the compiler should but generate only 14 instructions in just 29 bytes instead of 25 instructions in 50 bytes!
Use the X.509 certificate to send S/MIME encrypted mail.
Note: email in weird format and without a proper sender name is likely to be discarded!
I dislike
HTML (and even
weirder formats too) in email, I prefer to receive plain text.
I also expect to see your full (real) name as sender, not your
nickname.
I abhor top posts and expect inline quotes in replies.
as iswithout any warranty, neither express nor implied.
cookiesin the web browser.
The web service is operated and provided by
Telekom Deutschland GmbH The web service provider stores a session cookie
in the web
browser and records every visit of this web site with the following
data in an access log on their server(s):