EuroAssembler Index Manual Download Source Macros


Sitemap Links Forum Tests Projects

fastcall.htm
Enumeration
%FastCall
Macros
ClearLocalVar
EndProcedure
Invoke
LocalVar
Procedure
SaveToShadow
Uses

This file can be included to 64bit Windows programs written in EuroAssembler. The library contains macroinstructions Procedure, EndProcedure, Invoke which extend generic (pseudo)instructions PROC, ENDPROC, CALL.

Macroinstructions Procedure and EndProcedure implement the prologue and epilogue of Microsoft x64 calling convention [MSx64conv] , where the arguments are pushed backwards and they are removed by the caller.
The first 4 arguments are provided in registers RCX, RDX, R8, R9 (or in XMM0, XMM1, XMM2, XMM3 when they are floating-point numbers).

The block of code defined between macros Procedure .. EndProcedure can be called by macro Invoke or as a callback procedure from Windows library function, for instance see WndProc in sample projects.

Macro Invoke can be also used to call functions from third-party static or dynamically linked libraries. Nevertheless, for invocation of Windows functions it's better to use specialized macros WinABI (64bit) or WinAPI (32bit).

Implementation of FastCall convention in €ASM allows to use formal %names for accessing Procedure parameters and local stack-memory variables.

Number of arguments provided in macro Invoke must exactly match the number of arguments declared in macro Procedure or in the documentation of invoked function.

Macro Invoke takes care of stack alignment to OWORD just before execution of instruction CALL MyProc (step 5. in the following example). RSP at the Invoke entry might already have been OWORD aligned, or it may be only QWORD aligned (=unaligned). That is why RSP will be pushed once or twice in the prologue. Instructions of step 2. (PUSH RSP and ADD [RSP],8) are emitted always but they will be skipped at run-time

  1. when the number of arguments is odd (5, 7, 9, 11,,) and RSP is OWORD aligned,
  2. or when the number of arguments is even (<=4, 6, 8, 10,,) and RSP is unaligned.

Thanks to this, RSP is always OWORD aligned before step 5. (CALL MyProc) is executed, and the stack is restored to equilibrium (OrigRSP) after Invoke, no matter if it was OWORD aligned or not.

If macro SaveToShadow is used in Procedure block, it copies first four parameters from registers into shadow space. Thanks to this, those parameters are available not only in RCX,RDX,R8,R9, but as stack variables %Arg1,%Arg2,%Arg3,%Arg4 alias their formal operand %names, too.

This implementation is compatible with [MSx64conv] convention and it allows to Invoke 64bit external or imported functions available in 3rd party libraries or in Windows ABI. It also allows to create FastCall functions invokable from other libraries or as system callback procedures.

If procedures of your 64bit program are private, i.e. they're called from this program only (and not as callback from OS functions or statically linked to other programs), you may find more effective to use homonymous macros which implement StdCall convention or to invent your own register-calling convention and use native PROC, ENDPROC pseudoinstructions instead of this more complicated FastCall library.
Example
Definition of procedure MyProc with five parameters and two local stack variables with sizes 8 and 16:
MyProc Procedure Par1,Par2,Par3,Par4,Par5 ; Parameters Par1,Par2,Par3,Par4 are loaded in RCX,RDX,R8,R9. Par5 is pushed on stack. SaveToShadow ; Now are the parameter values available by formal names [%Par1],[%Par2],[%Par],[%Par4],[%Par5] ; and also by generic names [%Arg1],[%Arg2],[%Arg],[%Arg4],[%Arg5] (see %FastCall assignment). Uses RDI ; Exploit this macro only when some callee-save registers will be actually used in MyProc body. LocV1 LocalVar ; Reserve local stack variable with default size 8. LocV2 LocalVar Size=16 ; Reserve another variable. ClearLocalVar ; This fills all local stack variables with 0. ; Procedure parameters and local variables are available by formal name prefixed with %, e.g. MOV RAX,[%Par1] MOV [%LocV1],RAX ; Programmer should emit instructions of the MyProc body here ; and load the result of MyProc into return-register (RAX or XMM0). EndProcedure MyProc ; This macro discards local variables, pops used callee-save registers, ; pops RBP and returns below Invoke statement at run-time.

The following diagram shows the stack frame created by invoking MyProc defined in the example above.

Macros of fastcall convention will create and update the following "global" %variables at asm-time:
%ArgC_MyProc %SETA 5 (number of arguments),
%Uses_MyProc %SET RDI (list of used callee-saved registers),
%LvSize_MyProc %SETA 8+16 (total size of local stack-variables),
%LocV1 %SET RBP-16 (1st local stack variable),
%LocV2 %SET RBP-32 (2nd local stack variable),
%Par1 %SET RBP+16 (1st parameter pushed on stack), valid only if SaveToShadow was used,
%Par2 %SET RBP+24 (2nd parameter pushed on stack), valid only if SaveToShadow was used,
%Par3 %SET RBP+32 (3rd parameter pushed on stack), valid only if SaveToShadow was used,
%Par4 %SET RBP+40 (4th parameter pushed on stack), valid only if SaveToShadow was used,
%Par5 %SET RBP+48 (5th parameter pushed on stack).
RSP after the step #. │ Stack ┌ 0.>┌───────┐<16. ┐ │1.PUSH RSP │origRSP│ 16.POP RSP │ │ 1.>├───────┤ │ │2.PUSH RSP;ADD [RSP],8 │origRSP│ │ │ 2.>├───────┤<15. │ │3.PUSH Arg5 │ Arg5 │ 15.LEA RSP,[RSP+5*8] │ │ ┌ 3.>├═══════┤ ├ Invoke Invoke ┤ │ │ undef │ │ epilogue prologue │ │ ├───────┤ │ │ │ │ undef │ │ │ shadow space ─┤ ├═══════┤ │ │ │ │ undef │ │ │ │ ├───────┤ │ │4.SUB RSP,4*8 │ │ undef │ ┘ │ oword-aligned>└ 4.>├═══════┤<14. ┐ │5.CALL MyProc │return │ 14.RET │ └ 5.>├───────┤<13. │ Procedure┤6.PUSH RBP;MOV RBP,RSP │origRBP│ 13.POP RBP │ 6.>├───────┤<12. │ Uses ┤7.PUSH RDI │origRDI│ 12.POP RDI │ ┌ 7.>├───────┤<11. ├EndProcedure │8.SUB RSP,8 │ LocV1 │ 11.ADD RSP,8+16 │ │ 8.>├───────┤ │ LocalVar ┤ │ │ │ │ │ LocV2 │ │ │9.SUB RSP,16 │ │ │ └ 9.>└───────┘<10. ┘ 10. MyProc body

fastcall HEAD
↑ %FastCall
Argument values pushed on stack during Invoke and during SaveToShadow are available inside the Procedure body as formal names or as generic names defined here:
%Arg8    %SET RBP+72
%Arg7    %SET RBP+64
%Arg6    %SET RBP+56
%Arg5    %SET RBP+48
%Arg4    %SET RBP+40
%Arg3    %SET RBP+32
%Arg2    %SET RBP+24
%Arg1    %SET RBP+16
↑ Procedure Operand1, Operand2,,,

This macro Procedure declares 64bit FastCall procedure prolog which, unlike ordinary pseudoinstruction PROC, accepts ordinal operands and expect them to be pushed on stack or loaded to registers.
Using of the macro Procedure requires the corresponding EndProcedure be used in the same program.

Macro operands will be assigned to preprocessing %variables using the operand formal name prefixed with a percent sign %. Arguments are available in Procedure block as formal names declared here in Procedure prototype ( %Operand*) or as generic names (%Arg*).
Pay attention that this is true for 5th and higher argument only. The first four arguments are passed in registers, so if we want to access them by formal or generic name, they have to be saved to shadow space in the beginning of our procedure, either by SaveToShadow, or manually with

MOVQ [%Arg1],RCX ; or MOVQ [%Arg1],XMM0 (when the argument is floating-point number). MOVQ [%Arg2],RDX ; or MOVQ [%Arg2],XMM1 (when the argument is floating-point number). MOVQ [%Arg3],R8 ; or MOVQ [%Arg3],XMM2 (when the argument is floating-point number). MOVQ [%Arg4],R9 ; or MOVQ [%Arg4],XMM3 (when the argument is floating-point number).
Input
Label of Procedure statement is mandatory, it gives the procedure a callable name.
Operand* is the list of formal names for arguments passed to the procedure; their number is not limited.
Output
Macroinstruction will define public symbol with procedure name and assign formal operand names to the corresponding %variables.
It does not push registers (except for RBP). Programmer is responsible for saving and restoring callee-save registers RBX, RSI, RDI, R12..R15, XMM6..XMM15 if they are used in procedure body. This can be done either manually, or with macro Uses.
RBP must not be changed while the procedure body uses preprocessing %variables referring the formal operands names, or if it declares and uses temporary stack local variables defined with LocalVar.
Example
Move Procedure Source,Destination,Size ; Declaration of procedure Move and its formal arguments. PUSH RSI,RDI ; Those registers must be unchanged by fastcall procedure. MOV [%Source],RCX ; Saving register-passed arguments to shadow-space by formal name. MOV [%Arg2],RDX ; Ditto, saving by generic name. MOV [%Arg3],R8 ; Ditto, saving by generic name. MOV RSI,[%Source] MOV RDI,[%Destination] MOV RCX,[%Size] REP MOVSB POP RDI,RSI ; Restore calee-save registers. EndProcedure Move ; Return from the procedure Move. ;; With the knowledge of FastCall convention it may be also written more efficiently: Move Procedure Source,Destination,Size ; Declaration of procedure Move and its formal arguments. Uses RSI,RDI ; Those registers must be unchanged by fastcall Procedure. MOV RSI,RCX ; Source. MOV RDI,RDX ; Destination. MOV RCX,R8 ; Size. REP MOVSB ; Perform the actual copy. EndProcedure Move ; Restore used registers RDI,RSI, return from the procedure Move. Macro Move Procedure in the previous example will assign %ArgC_Move %SET 3 ; This %variable propagates to the corresponding macro EndProcedure Move. %Uses_Move %SET ; This %variable propagates to the macro Uses and EndProcedure. %LvSize_Move %SETA 0 ; This %variable propagates to the macro LocalVar and EndProcedure. %Source %SET RBP+16 ; This %variable represents formal name of Arg1. %Destination %SET RBP+24 ; This %variable represents formal name of Arg2. %Size %SET RBP+32 ; This %variable represents formal name of Arg3. Macro Uses in the previous example will assign %Uses_Move %SET RDI,RSI ; This %variable propagates to the corresponding macro EndProcedure Move. Invoke Move, RSI, OutBuffer, SIZE# OutBuffer ; Example of procedure invocation.
Procedure %MACRO FormalName1, FormalName2,,,,
LblCheck    %IF "%:" === ""
              %ERROR ID=5921, 'Macro "Procedure" requires a label.'
              %EXITMACRO Procedure
            %ENDIF LblCheck
%%ArgC_%:   %SETX %#                   ; Initialize with number of arguments.
%%Uses_%:   %SETX                      ; Initialize as empty list.
%%LvSize_%: %SETX 0                    ; Initialize as zero.
ArgNr       %FOR 1..%#, STEP= +1
%%%*{%ArgNr}  %SETX RBP+(8+%ArgNr*8)   ; Assign the formal name to the corresponding %variable.
            %ENDFOR ArgNr
%:::        PROC %=*, NESTINGCHECK=OFF ; Open the namespace and define entry symbol from macro label %: as GLOBAL.
            PUSH RBP
            MOV RBP,RSP                ; Initialize the frame pointer.
          %ENDMACRO Procedure
↑ SaveToShadow

Macro SaveToShadow stores the first four arguments of fast-called Procedure to the shadow space reserved by Invoke. This enables the first four arguments be referred by their %formal names or by generic names %Arg1, %Arg2, %Arg3, %Arg4.

Macro SaveToShadow should be used inside Procedure..EndProcedure block, near its beginning.
When it is omitted, the first four arguments are available only in registers RCX, RDX, R8, R9 (or XMM0..XMM3) and the shadow space contains undefined garbage.

When some of arguments contains floating-point number, it is passed to Procedure in SIMD register instead of GPR, and it must be therefore copied to the corresponding GPR prior to SaveToShadow. Example (Radius is FP):

MyCircle Procedure Xcoord, Ycoord, Radius MOVQ R8,XMM2 ; Copy the FP value from SIMD to the coresponding GPR. SaveToShadow ; Procedure body now has the 1st argument available at [RBP+16] alias [%Xcoord] alias [%Arg1] alias RCX. Et cetera. Uses RSI,RDI MyLv1 LocalVar Size=16 MyLv2 LocalVar Size=24 ClearLocalVar ; Procedure body now has the cleared 1st local variable available at [RBP-32] alias [%MyLv1]. ; Instructions of Procedure body are here. EndProcedure MyCircle
Input
RCX, RDX, R8, R9 contain the first four arguments, respectively.
Output
Shadow space (4*8 bytes at RBP+16) is populated with Procedure arguments.
SaveToShadow %MACRO
InProcCheck  %IF "%^PROC" === ""
               %ERROR ID=5926,'Macro "%0" is unexpected here.'
             %ENDIF InProcCheck
             MOV [%Arg1],RCX
             MOV [%Arg2],RDX
             MOV [%Arg3],R8
             MOV [%Arg4],R9
           %ENDMACRO SaveToShadow
↑ Uses Register1, Register2,,,,

Macro Uses specifies which callee-save registers does the Procedure use, so they are pushed on stack here (and they will be restored in EndProcedure epilogue).

Calling convention macros in 16bit and 32bit mode could save/restore all eight GPR with a single PUSHA/POPA. This instruction is not available in 64bit mode, so we will use this macro instead.

Macro Uses can be used in 64bit mode only, right after the statement Procedure and before local stack variables are defined with LocalVar.

Callee-save registers RBX,RSI,RDI,R12..R15,XMM6..XMM15, should be enumerated here if they are actually used in Procedure..EndProcedure block.
Callee-save registers RBP,RSP should not be mentioned here, they are always saved automatically in Procedure prologue.
It is useless to enumerate scratch registers RCX,RDX,R8..R11,XMM0..XMM5 here, because the caller of our Procedure cannot expect them to be preserved.
Registers RAX,XMM0 may not be enumerated here, because they wouldn't return the expected value after their restoration.

Input
Register* is GPR or XMM register. Only lower QWORD of XMM register is saved.
Output
Macro will push all enumerated registers on stack and assign their list to a unique %variable named %Uses_ProcedureName in reversed order. This %variable will be used by EndProcedure for restoration of callee-save registers.
Uses %MACRO Register1,Register2,...
InProcCheck   %IF "%^PROC" === ""
                %ERROR ID=5926,'Macro "%0" is unexpected here.'
              %ENDIF InProcCheck
%Uses         %SET2 %%Uses_%^PROC
reg           %FOR %*
                %IF REGTYPE#(%reg) = 'Q' ; General-purpose 64bit register.
                  PUSHQ %reg
%Uses             %SET %reg,%Uses        ; Accumulate register names in reversed order.
                %ENDIF
                %IF REGTYPE#(%reg) = 'X' ; SIMD XMM register.
                  SUB RSP,8
                  MOVQ [RSP],%reg
%Uses             %SET %reg,%Uses        ; Accumulate register names in reversed order.
                %ENDIF
RegTypeCheck    %IF REGTYPE#(%reg) != 'Q' && REGTYPE#(%reg) != 'X'
                   %ERROR ID=5927,'Macro "Uses" does not support operand "%reg".'
                %ENDIF RegTypeCheck
              %ENDFOR reg
%%Uses_%^PROC %SETX %Uses
            %ENDMACRO Uses
↑ LocalVar Size=8
Macro LocalVar reserves and declares local memory variable with the given Size allocated on machine stack and assigns its name to a preprocessing %variable which has its name derived from the label of macro LocalVar.
It can only be used inside Procedure..EndProcedure block.
Input
Plain identifier must be defined as a label of LocalVar statement. It does not need to be unique in the program because it does not declare assembly symbol. The name will be prefixed with % and used as preprocessing %variable for addressing the memory variable withing the procedure body.
Size=8 specifies how many bytes should be reserved for the local stack variable. The Size is automatically rounded up to the nearest multiple of 8.
Output
Macro will define a preprocessing %variable with the name which was defined as the label but it is now prefixed with percent sign. Then it will emit machine instruction SUB RSP,%Size to reserve room on the machine stack.
Macro LocalVar also maintains the "global" preprocessing %variable %LvSize_ProcedureName which was initialized in macro Procedure and which will be used for zeroing local variables in ClearLocalVar and for discarding local variables in EndProcedure.
Example
ProcName: Procedure Param1 ; This example uses two local stack variables with sizes 8 and 1K. Uses RDI ; Saved callee-save register is now available at RBP-8. BlockSize LocalVar ; %BlockSize is now assigned with RBP-16 (8+8). Block LocalVar Size=1024 ; %Block is now assigned with RBP-1040 (8+8+1024). ClearLocalVar ; Fill %Block and %BlockSize with 0. MOV [%BlockSize],1K, DATA=QWORD LEA RDI,[%Block] ; more instructions... EndProcedure ProcName
LocalVar      %MACRO Size=8
InProcCheck     %IF "%^PROC" === ""
                  %ERROR ID=5926,'Macro "%0" is unexpected here.'
                  %EXITMACRO LocalVar
                %ENDIF InProcCheck
LblCheck        %IF "%:" === ""
                  %ERROR ID=5922, 'Macro "%0" requires a label.'
                  %EXITMACRO LocalVar
                %ENDIF
OrdCheck        %IF %#
                  %ERROR ID=5923, 'Macro "%0" does not expect ordinal parameters.'
                %ENDIF
%:              %COMMENT                          ; This empty comment block makes the label of macro void,
                %ENDCOMMENT %:                    ;   so it does not declare a symbol.
%ThisLvSize     %SETA (%Size + 7) & ~7            ; Round up to the nearest multiple of 8.
%GlbLvSize      %SET2 %%LvSize_%^PROC+%ThisLvSize ; Increase the total size of previously defined local variables.
%%LvSize_%^PROC %SETX %GlbLvSize                  ; Update the "global" %variable.
                SUB RSP, %ThisLvSize              ; Stack memory allocation.
%ThisUses       %SET2 %%Uses_%^PROC               ; Retrieve the list of used registers.
%ThisLen        %SETL %ThisUses                   ; Number of registers pushed by macro Uses.
%%%:            %SETX RBP-8*%ThisLen-(%GlbLvSize) ; Assign formal %name to the id %: specified as LocalVar label.
   %ENDMACRO LocalVar
↑ ClearLocalVar
This macro zeroes all variables on stack previously declared with LocalVar. ClearLocalVar should be expanded right after the last LocalVar declarations, before any stack operations are made.
We could as well decide to initialize each local variable individually, e.g. MOVQ [%MyLocalVar],0, and in this case the macro ClearLocalVar will not be used in the Procedure body at all.
Input
Macro does not use explicit parameters. The cleared memory is pointed to with RSP, its size is specified with "global" variable %LvSize_ProcedureName.
Output
RAX=0
ClearLocalVar %MACRO
InProcCheck %IF "%^PROC" === ""
              %ERROR ID=5926,'Macro "%0" is unexpected here.'
              %EXITMACRO ClearLocalVar
            %ENDIF
%GlbLvSize  %SET2 %%LvSize_%^PROC
            %IF %GlbLvSize  ; Do nothing if no LocalVar was used in this Procedure (%GlbLvSize=0).
              PUSH RCX,RDI
               LEA RDI,[RSP+2*8] ; Skip pushed RCX,RDI.
               MOV ECX,%GlbLvSize / 8
               XOR EAX,EAX
               REP STOSQ
              POP RDI,RCX
            %ENDIF
          %ENDMACRO ClearLocalVar
↑ EndProcedure ProcName

Macro EndProcedure terminates context of the previously opened Procedure . This epilogue of FastCall convention will

Operands are not removed from stack, that's the job of Invoke.

Programmer should never use explicit machine instruction RET to return from the block defined with Procedure .. EndProcedure.
If premature return is required, jump to the label of EndProcedure statement instead.
Input
ProcName This macroinstruction requires exactly one operand which is identical with the label of previous corresponding Procedure statement.
Output
Stack frame is released, current process returns below Invoke statement.
EndProcedure %MACRO ProcName
OpCheck    %IF %# <> 1
             %ERROR ID=5924, 'Macro "EndProcedure" requires one operand.'
             %EXITMACRO EndProcedure
           %ENDIF
%NameStrip %SET %ProcName
Decolonize %WHILE "%NameStrip[%&]" === ":" ; Get rid of trailing colon(s), if used.
%NameStrip   %SET %NameStrip[1..%&-1]
           %ENDWHILE Decolonize
NestCheck  %IF "%NameStrip" !=== "%^PROC"
             %ERROR ID=5925, 'Nesting mismatch, "%ProcName Procedure" missing.'
             %EXITMACRO EndProcedure
           %ENDIF NestCheck
%GlbLvSize %SET2 %%LvSize_%^PROC
           %IF %GlbLvSize
             ADD RSP,%GlbLvSize            ; Discard local variables.
           %ENDIF
%ThisUses  %SET2 %%Uses_%^PROC
reg        %FOR %ThisUses                  ; Callee-save registers saved by macro Uses.
             %IF REGTYPE#(%reg) = 'Q'      ; General-purpose 64bit register.
               POPQ %reg
             %ENDIF
             %IF REGTYPE#(%reg) = 'X'      ; SIMD XMM register.
                MOVQ %reg,[RSP]
                ADD RSP,8
             %ENDIF
           %ENDFOR reg
           POP RBP                         ; Restore caller's frame pointer.
           RET                             ; Return below Invoke which called %ProcName.
           ENDP %ProcName, NESTINGCHECK=OFF
         %ENDMACRO EndProcedure
↑ Invoke Function, Arg1, Arg2,,, Fastmode=%Fastmode

Macro Invoke is a replacement of standard CALL instruction which can pass parameters to the Procedure in FastCall convention.

This FastCall macro Invoke is similar to macroinstruction WinABI, which is specialized on invokation of functions exported from MS Windows 64bit by [WindowsAPI].
WinABI in addition takes care of ANSI | WIDE variants of invoked function name, and declares the function as imported from DLL specified by keyword Lib=.

According to FastCall convention, arguments are pushed backwards on stack as QWORDs, starting from the last to the fifth. The first four parameters are loaded to RCX, RDX, R8, R9 and not pushed on stack. Nevertheless, room for the first four arguments is always reserved on stack (so called shadow space) even when the invoked function has less than four arguments.
When the Procedure expects argument in floating-point format instead of integer, pointer or immediate, the first four arguments are loaded to XMM0, XMM1, XMM2, XMM3 rather than to GPR.
Stack pointer will be OWORD aligned before the CALL instruction is performed.

Invoked procedure does not remove arguments from stack, it terminates with a simple near RET. Epilogue of macro Invoke restores RSP to its original value.

Input
Function is the name of invoked procedure. This can be a function from statically linked library, MS Windows function imported from system DLL, or it can be a block defined with Procedure, EndProcedure from this FastCall macrolibrary.
Function may also be provided in the form of GPR containing the entry address of the function.
Arg* can be a SIMD register or anything pushable:It is programmer's responsibility to provide exactly that many arguments as is specified in Function documentation.
Any argument may be suffixed with type specificator #SS or #SD (case insensitive), which signalizes that it represents floating-point value in Scalar Single or Scalar Double precision format, and that it should be therefore passed to the Function in XMM register instead of GPR.
Suffix is not necessary with XMM register (argument passed in XMM is always assumed to contain floating-point number).
Suffix is not necessary when Fastmode=No, because the robust version transfers the first four arguments simultaneously in SIMD and GP registers anyway, no matter if they are float or integer.
Suffix is not necessary in 5th and higher arguments in both modes (they are passed via machine stack regardless of their type).
Fastmode=No is an extended Boolean switch. When it's false or omitted (default), the Invoke macro preserves all registers except for RAX and XMM0 which return the result of Function, similary to other robust macros from EuroAssembler libraries.
Although Microsoft ABI specifies fastcall convention with floating-point parameters %1..%4 delivered in XMM0, XMM1, XMM2, XMM3, and with parameters %1..%4 of all other types delivered in RCX, RDX, R8, R9, arguments of Invoke in robust mode (Fastmode=No) may be supplied in those registers too in arbitrary order, for instance Invoke MessageBox,RDX,R8,R8,MB_OK.
Fastmode=Yes is the macro expansion mode where Invoke does not preserve caller-save registers and it loads the first four arguments directly to XMM0, XMM1, XMM2, XMM3 when they are floating-point values, and to RCX, RDX, R8, R9 in all other cases, so the invokation is faster. Registers RCX, RDX, R8, R9, XMM0..XMM3 cannot be used as macro arguments, because they are being overwritten in prologue. Or they should be only used in the exact ABI-specified order, e.g. Invoke Function, RCX, XMM1, R8, XMM3. Similary, registers RCX, RDX, R8, R9 cannot be used instead of Function name in fast mode. Use other GPR, e.g.
IMPORT WriteConsole,Lib=kernel32.dll
LEA RDI, [WriteConsole]
Invoke Fastmode=Yes, RDI, RBX, Message, SIZE# Message, WrittenChars, 0
Output
RAX, XMM0, RFlags are set as returned from the invoked Function.
RBX,RBP,RSP,RSI,RDI,R12..R15,XMM6..XMM15 are preserved (callee-save registers).
Other XMM and GPR are preserved only if Fastmode=No.
Documentation
Microsoft FastCall convention is described in [WinABI64] and [MSx64Conv].
Example
Expansion of Invoke invokation of the function CreateFile with seven arguments in both modes: Invoke CreateFileA, FileName, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0 | | ; Fast Version. |00000000: | Invoke CreateFileA, FileName, GENERIC_READ, FILE_SHARE_READ, 0, \ |00000000: | OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,0, Fastmode=Yes |00000000:54 + PUSH RSP ; Store original stack pointer value (equilibrum). |00000001:40F6C408 + TEST SPL,1000b ; Test RSP OWORD alignment at run-time. |00000005:7506 + JNZ .Invoke1: |00000007:54 + PUSH RSP ; Store and update 2nd copy of original RSP (equilibrum). |00000008:4883042408 + ADDQ [RSP],8 ; Those two instructions aren't executed if RSP was properly aligned. |0000000D: + .Invoke1: |0000000D:6A00 + PUSHQ 0 ; Push 7th argument. |0000000F:6880000000 + PUSHQ FILE_ATTRIBUTE_NORMAL ; Push 6th argument. |00000014:6A03 + PUSHQ OPEN_EXISTING ; Push 5th argument. |00000016:41B900000000 + MOV R9,0 ; Load 4th argument. |0000001C:41B801000000 + MOV R8,FILE_SHARE_READ ; Load 3rd argument. |00000022:BA00000080 + MOV RDX,GENERIC_READ ; Load 2nd argument. |00000027:488D0D(00000000) + LEA RCX,[FileName] ; Load 1st argument. |0000002E:4883EC20 + SUB RSP,4*8 ; Make room for shadow space in fast mode. RSP is OWORD-aligned. |00000032:E8(00000000) + CALL CreateFileA ; Call the imported function. |00000037:488D642438 + LEA RSP,[RSP+7*8] ; Discard transferred arguments, keep RFlags. |0000003C:5C + POP RSP ; Restore RSP to equilibrum from 1st or 2nd copy. |0000003D: | ; Invoke in fast mode occupies 61 bytes of code. |00000000: | ; Robust version. |00000000: | Invoke CreateFileA, FileName, GENERIC_READ, FILE_SHARE_READ, 0, \ |00000000: | OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,0, Fastmode=No |00000000:6A00 + PUSHQ 0 ; Push 7th argument. |00000002:6880000000 + PUSHQ FILE_ATTRIBUTE_NORMAL ; Push 6th argument |00000007:6A03 + PUSHQ OPEN_EXISTING ; Push 5th argument |00000009:6A00 + PUSHQ 0 ; Push 4th argument |0000000B:6A01 + PUSHQ FILE_SHARE_READ ; Push 3rd argument |0000000D:6800000080 + PUSHQ GENERIC_READ ; Push 2nd argument |00000012:50 + PUSH RAX ; Alloc 1st argument on stack. |00000013:488D05(00000000) + LEA RAX,[FileName] ; Load 1st argument. |0000001A:48870424 + XCHG RAX,[RSP] ; Transfer the pointer without clobbering RAX. |0000001E:6A07 + PUSHQ 7 ; Push the number of arguments. |00000020:488D05(00000000) + LEA RAX,[CreateFileA] ; Load the function address (pointer to its thunk in [.idata]). |00000027:E805000000 + CALL Invoke@RT ; Call the runtime with function in RAX. |0000002C:488D642440 + LEA RSP,[RSP+8*8] ; Restore stack to equilibrum, preserving RFlags. |00000031: | ; Invoke in robust mode occupies 49 bytes of code (plus 190 bytes of runtime code, once per program).
Remark
Macros Invoke can be expanded in one of two modes:

Selection of the mode depends on optimisation criterion:

Robust variant of Invoke is often shorter, but it emits 190 additional bytes of runtime procedure (only once in a program), so it is profitable when we have more than cca twenty invocations in the program. Robust variant may also spare some push/pops because it doesn't change any scratch registers.

When you want to switch on the Fastmode for all Invoke invokations, you don't have to append ,Fastmode=Yes to every Invoke if you set preprocessing variable %Fastmode in the beginning of your program: %Fastmode %SETB On.
Stack alignment
See the identical example of stack layout of macro WinABI.
Invoke %MACRO Function, Arg1, Arg2,,, Fastmode=%Fastmode, Lib=
%Fast   %SETB %Fastmode
%Robust %SETB ! %Fast
Fa      %IF %Fast                       ; Align stack in fast mode only.
          PUSH RSP                      ; Store original stack pointer value (equilibrum).
          TEST SPL,1000b                ; Test stack OWORD alignment at run-time.
FaEv      %IF %# & 1b || %# <= 5        ;>If the number of Function arguments is 0,1,2,3,4,6,8,10,,(even),
            JZ .Invoke%.:               ;   store 2nd copy of equilibrum when RSP is OWORD-unaligned.
          %ELSE FaEv                    ; If the number of arguments is 5,7,9,11,,, (odd),
            JNZ .Invoke%.:              ;   store 2nd copy of equilibrum when RSP is OWORD-aligned.
          %ENDIF FaEv
          PUSH RSP                      ; Store and update 2nd copy of original RSP (equilibrum).
          ADDQ [RSP],8                  ; Those two instructions aren't executed if RSP was properly aligned.
.Invoke%.:
        %ENDIF Fa
%GPR    %SET  RCX,RDX,R8,R9             ; Enumerate registers for transfer of integer|pointer values.
%SIMD   %SET  XMM0,XMM1,XMM2,XMM3       ; Enumerate registers for transfer of floating-point values.
%ArgNr  %SETA %#                        ; Number of macro ordinals, i.e. number of Function arguments + 1.
Arg     %WHILE %ArgNr > 1
%Arg     %SET %*{%ArgNr}                ; Transfer all Function arguments, start with the last one.
%ArgNr   %SETA %ArgNr-1                 ; %ArgNr is now the ordinal Nr of Function argument (,,3,2,1).
%suffix  %SET Q                         ; %suffix of MOV will be Q, SS or SD (MOVQ, MOVSS or MOVSD).
         %IF '%Arg[%&-2..%&-1]'=='#S'   ; If suffix #SS or #SD is present in argument notation,
%suffix    %SET %Arg[%&-1..%&]          ;  let %suffix be SS or SD
%Arg       %SET %Arg[1..%&-3]           ;  and remove it from the argument.
         %ENDIF                         ; %Arg may be GPR,SIMD,imm@abs,ptr@rel,[mem@abs],[mem@rel].
Rb       %IF %ArgNr>4 || %Robust        ; Transfer %Arg via stack.
RbSc       %IF TYPE#(SEGMENT#(%Arg))='N'; %Arg is not relocatable (scalar).
RbScRg       %IF TYPE#(%Arg) = 'R'      ; It can be GPR,SIMD,imm@abs,[mem@abs].
RbScRgXm       %IF REGTYPE#(%Arg)='X'   ; %Arg is a GP or SIMD register.
                 SUB RSP,8
                 MOV%suffix [RSP],%Arg  ; %Arg is a SIMD register.
               %ELSE RbScRgXm
                 PUSHQ %Arg             ; %Arg is a GP register.
               %ENDIF RbScRgXm
             %ELSE RbScRg               ; %Arg it not a register.
               PUSHQ %Arg               ; %Arg is scalar immediate or [mem], e.g. 1 or [RBP+16].
             %ENDIF RbScRg
           %ELSE RbSc                   ; %Arg is relocatable (vector), e.g. Symbol or [Symbol+RSI].
RbVeM        %IF '%Arg[1]' === '['      ; Argument is passed by value, via a temporary GPR.
               PUSH RAX                 ; Original contents of the borrowed RAX must be kept.
               LEA RAX,%Arg             ; Use relative addressing frame for relocatable %Arg.
               MOV RAX,[RAX]            ; Dereference the argument value.
               XCHG RAX,[RSP]           ; Transfer the value.
             %ELSE RbVeM                ; Relocatable argument is passed by reference, e.g. Symbol.
               PUSH RAX                 ; Original contents of the borrowed RAX must be kept.
               LEA RAX,[%Arg]           ; Use relative addressing frame for relocatable %Arg.
               XCHG RAX,[RSP]           ; Transfer the pointer.
             %ENDIF RbVeM
           %ENDIF RbSc
         %ELSE Rb                       ; Fastmode=Yes and %ArgNr=4,3,2,1. Transfer via registers.
FaSc       %IF TYPE#(SEGMENT#(%Arg))='N'; %Arg is not relocatable (scalar).
FaScRg       %IF TYPE#(%Arg) = 'R'      ; It can be GPR,SIMD,imm@abs,[mem@abs].
FaScRgXm       %IF REGTYPE#(%Arg)='X'   ; %Arg is a GP or SIMD register.
                 %IF "%SIMD{%ArgNr}" !== "%Arg"  ; %Arg is XMM. Skip when it's already there.
                   MOV%suffix %SIMD{%ArgNr},%Arg ; Copy SIMD %Arg to other SIMD (XMM0..XMM3).
                 %ENDIF
               %ELSE FaScRgXm           ; %Arg is GPR.
                 %IF "%GPR{%ArgNr}" !== "%Arg"   ; Skip when it's already there.
                    MOV %GPR{%ArgNr},%Arg        ; Copy GPR to other GPR (R9,R8,RDX,RCX).
                 %ENDIF
               %ENDIF FaScRgXm
             %ELSE FaScRg               ; %Arg is scalar immediate or [mem], e.g. 1 or [RBP+16].
FaScIm         %IF '%suffix' === 'Q'
                 MOV %GPR{%ArgNr},%Arg  ; %Arg is integer value.
               %ELSE FaScIm
                 MOV%suffix %SIMD{%ArgNr},%Arg ; %Arg is FP value, e g. [RSI].
               %ENDIF FaScIm
             %ENDIF FaScRg
           %ELSE FaSc                   ; %Arg is relocatable (vector), e.g. Symbol or [Symbol+RSI].
FaVeM        %IF '%Arg[1]' === '['      ; Argument is passed by value, via this GPR.
               LEA %GPR{%ArgNr},%Arg
               MOV %GPR{%ArgNr},[%GPR{%ArgNr}] ; Dereference the argument value, transfer the value.
             %ELSE FaVeM                ; Argument is passed by reference, e.g. Symbol.
               LEA %GPR{%ArgNr},[%Arg]  ; Transfer the pointer.
             %ENDIF FaVeM
           %ENDIF FaSc
         %ENDIF Rb
        %ENDWHILE Arg
Fa       %IF %Fast                      ;
           SUB RSP,4*8                  ; Make room for shadow space in fast mode. RSP is OWORD-aligned.
FaRg       %IF REGTYPE# %Function = 'Q' ; Function was specified as 64bit GPR.
             %IF "%Function"=="RCX"||"%Function"=="RDX"||"%Function"=="R8"||"%Function"=="R9"
               %ERROR ID=5956,'%0 function cannot be supplied in scratch register %Function when Fastmode=Yes.'
               %EXITMACRO Invoke
             %ENDIF
           %ENDIF FaRg                  ; Function was specified by name.
           CALL %Function               ; Call the function in fast mode.
           %IF %# > 5
             LEA RSP,[RSP+8*(%#-1)]     ; Discard transferred arguments, keep RFlags.
           %ELSE
             LEA RSP,[RSP+8*4]          ; Discard transferred arguments, keep RFlags.
           %ENDIF
           POP RSP                      ; Restore RSP to equilibrum from 1st or 2nd copy.
         %ELSE Fa                       ; Use helper runtime procedure in robust mode.
           PUSHQ %#-1                   ; Push the number of Function arguments.
RbRg       %IF REGTYPE# %Function = 'Q' ; Function was specified as 64bit GPR.
             %IF '%Function' !== 'RAX'  ; Skip if it's already in RAX.
               MOV RAX,%Function
             %ENDIF
           %ELSE RbRg                   ; Function was specified by name.
             LEA RAX,[%Function]        ; RVA of the Function (pointer to its thunk in [.idata]).
           %ENDIF RbRg
           CALL Invoke@RT               ; Arguments+their number are on stack. Function is in RAX.
           LEA RSP,[RSP + 8 * %#]       ; Restore stack to equilibrum, preserving RFlags.
Invoke@RT::PROC1                        ; Macro emits the runtime subroutine, which is expanded only once in program.
            PUSH RSI,RDI,R12            ; Calee-save registers used by Invoke@RT procedure.
            PUSH RCX,RDX,R8,R9,R10,R11  ; Robust version preserves scratch GP registers.
            MOVQ RDX,XMM1
            MOVQ  R8,XMM2
            MOVQ  R9,XMM3
            MOVQ R10,XMM4
            MOVQ R11,XMM5
            PUSH RDX,R8,R9,R10,R11      ; Robust version preserves scratch SIMD registers.
            MOV ECX,[RSP+15*8]          ; Number of Function arguments.
            LEA RSI,[RSP+16*8]          ; Pointer to %Arg1 in Invoke@RT stack frame.
            CMP ECX,4                   ; Number of arguments is 0|1|2|3|4|5|6...
            JAE .AtLeast4:
            MOV CL,4                    ; Number is saturated to 4|4|4|4|4|5|6...
  .AtLeast4:MOV R12,RSP                 ; Save (perhaps unaligned) RSP to callee-preserved register.
            ; Align stack pointer as dictated by ABI specification.
            MOV EDX,ECX                 ; Saturated number of operands (4,5,6,,,).
            SHL EDX,3                   ; RDX bit 3 is set if saturated number of arguments is odd (5,7,9,,).
            XOR EDX,ESP                 ; RDX will be 8 when number of arguments is even and RSP is OWORD unaligned,
            AND EDX,0000_1000b          ;   or when it's odd and RSP is already OWORD aligned. Otherwise RDX=0.
            SUB RSP,RDX                 ; Align RSP to OWORD.
            ; Copy ECX arguments from RSI to the callee's shadow space.
            MOV EDX,ECX
            SHL EDX,3
            SUB RSP,RDX                 ; Alloc Function frame (shadow space + possible other arguments).
            MOV RDI,RSP                 ; RSP is OWORD aligned at this moment.
            REP MOVSQ                   ; Copy all pushed arguments.
            ; Load first four arguments to GP registers.
            MOVQ RCX,[RSP+00]
            MOVQ RDX,[RSP+08]
            MOVQ R8, [RSP+16]
            MOVQ R9, [RSP+24]
            ; Copy first four arguments to SIMD registers (for the case they were floating-point).
            MOVQ XMM0,RCX
            MOVQ XMM1,RDX
            MOVQ XMM2,R8
            MOVQ XMM3,R9
            CALL RAX                    ; Call the Function in robust mode.
            MOV RSP,R12                 ; Discard Function frame plus possible stack-alignment stuff.
            POP R11,R10,R9,R8,RDX
            MOVQ XMM5,R11
            MOVQ XMM4,R10
            MOVQ XMM3,R9
            MOVQ XMM2,R8
            MOVQ XMM1,RDX               ; Restore SIMD scratch registers.
            POP R11,R10,R9,R8,RDX,RCX   ; Restore GP scratch registers.
            POP R12,RDI,RSI             ; Restore used callee-save registers.
            RET
           ENDP1 Invoke@RT::
         %ENDIF  Fa
       %ENDMACRO Invoke
  ENDHEAD fastcall

▲Back to the top▲