EuroAssembler

Posted: **05 Sep 2018 17:41**

In the new winabi.htm, the following code snippet appears:

%IF "%SIMD" !=== "0" ; If arguments should be passed in XMM registers (required for floating- point values).
____MOVQ XMM0,RCX
____MOVQ XMM1,RDX
____MOVQ XMM2,R8
____MOVQ XMM3,R9
____%ENDIF

That will not work. Most functions only pass one or two floats -- the rest will be GP. Example ...

invoke fakeFunction, dword1, float2, qword3, double4 -->
____movq xmm3,double4
____mov__r8,qword3
____movq xmm1,float2
____mov__ecx,dword1

or

invoke fakeFunction, float1, double2, qword3, dword4 -->
____mov__r9w,dword4
____mov__r8,qword3
____movq xmm1,double2
____movq xmm0,float1

etc ...

Posted: **05 Sep 2018 18:52**

ar18 wrote: ↑05 Sep 2018 17:41 That will not work. Most functions only pass one or two floats -- the rest will be GP. Example ...

invoke fakeFunction, dword1, float2, qword3, double4 -->
____movq xmm3,double4
____mov__r8,qword3
____movq xmm1,float2
____mov__ecx,dword1

If you expand my current implementation of WinAPI, it will emit following instructions:
WinAPI fakeFunction, dword1, float2, qword3, double4
PUSH RSP
TEST SPL,8
JZ .WinAPI1:
PUSH RSP
ADDQ [RSP],8
.WinAPI1:
PUSHQ double4
PUSHQ qword3
PUSHQ float2
PUSHQ dword1 ; Shadow space is now reserved for fakeFunction.
MOV RCX,[RSP+0] ; dword1 is now in RCX.
MOV RDX,[RSP+8] ; float2 is in RDX.
MOV R8,[RSP+16] ; qword3 is in R8.
MOV R9,[RSP+24] ; double4 is in R9.
MOVQ XMM0,RCX ; dword1 is in XMM0.
MOVQ XMM1,RDX ; float2 is in XMM1.
MOVQ XMM2,R8 ; qword3 is in XMM2.
MOVQ XMM3,R9 ; double4 is in XMM3.
CALL fakeFunction
LEA RSP,[RSP+32]
POP RSP

Your fakeFunction will get what it expects: dword1 in RCX, float2 in XMM1, qword3 in R8, double4 in XMM3.
Yes, it is not much effective, but why do you think that won't work?

Posted: **05 Sep 2018 20:01**

vitsoft wrote: ↑05 Sep 2018 18:52 If you expand my current implementation of WinAPI, it will emit following instructions:

...
PUSHQ double4
PUSHQ qword3
PUSHQ float2
PUSHQ dword1 ; Shadow space is now reserved for fakeFunction.
MOV RCX,[RSP+0] ; dword1 is now in RCX.
MOV RDX,[RSP+8] ; float2 is in RDX.
MOV R8,[RSP+16] ; qword3 is in R8.
MOV R9,[RSP+24] ; double4 is in R9.
MOVQ XMM0,RCX ; dword1 is in XMM0.
MOVQ XMM1,RDX ; float2 is in XMM1.
MOVQ XMM2,R8 ; qword3 is in XMM2.
MOVQ XMM3,R9 ; double4 is in XMM3
...

Your fakeFunction will get what it expects: dword1 in RCX, float2 in XMM1, qword3 in R8, double4 in XMM3.

This is the Win64 ABI conforming version of that code snippet:

...
sub rsp,32 ; Shadow space is now reserved by invoke for fakeFunction to use
MOV RCX,dword1 ; dword1 is now in RCX.
MOV R8,qword3 ; qword3 is in R8.
MOVQ XMM1,float2 ; float2 is in XMM1.
MOVQ XMM3,double4 ; double4 is in XMM3
...

Do you want to tell me that the previous code is a better implementation?

The portion of €ASM's implementation of invoke quoted above uses 96 bytes per invoke, compared to the UASM, MASM, NASM, or GCC version of that same exact thing which only uses 36 bytes. So multiply all your invokes by (96-36=) 60 and that is how much do nothing bloat will be in your exe. Instead of my paint program being 50k in size, it would be 70k! That would be an incredible 20k of bloat, which is very much unacceptable.

vitsoft wrote: ↑05 Sep 2018 18:52 Yes, it is not much effective, but why do you think that won't work?

Oh that would work, but who would want to use not much effective spaghetti code in their projects?

PS -- It would fail to work if the fifth argument was an xmm register because you can't push xmm registers.

Posted: **06 Sep 2018 16:47**

Terrabytes of spaghetti code are used every day worldwide, see https://github.com/dominictarr/your-web-app-is-bloated.
Your implementation is undoubtedly shorter.

BTW notice what Microsoft says ad float passing:
For vararg or unprototyped functions, any floating point values must be duplicated in the corresponding general-purpose register.

My implementation trades-off code size with programmer's convenience:

I can use any GPR for argument passing, including RCX,RDX etc. This is compatible with almost all other macros in €ASM libraries.
Float values are already copied both in XMM and GPR when WinFunction is called, as MS wants.
If I decided to use the same convention for my internal functions (only called by me), all arguments are already consistently available on stack and I don't have to store them to shadow space in my function.

BTW it is not €ASM's implementation. It's not embedded in euroasm.exe and could be written in other assemblers as well.
And vice versa, you can take MASM or NASM version of that same exact thing and use it in €ASM macro.
This will require flock of %IFs and %ENDIFs but I'm sure you'll manage to write implementation optimized by criterii of your own choice.
Take fastcall.htm and winabi.htm only as a starter-pack and inspiration.

If only SW companies hired more people like you, ar18, we all could have twice as much free space left on our disks...

Posted: **07 Sep 2018 15:24**

vitsoft wrote: ↑06 Sep 2018 16:47 Terrabytes of spaghetti code are used every day worldwide, see https://github.com/dominictarr/your-web-app-is-bloated.
Your implementation is undoubtedly shorter.

BTW notice what Microsoft says ad float passing:
For vararg or unprototyped functions, any floating point values must be duplicated in the corresponding general-purpose register.

...by the called function, and not the callee. I told you to get a debugger and see how the rest of the world does it, so why can't you do that? You have no idea what you are missing.

You know that the first four arguments of INVOKE are supposed to go somewhere, but you just can't seem to figure out where, so you blast everything -- the GP registers, the stack, the XMM registers -- everything with the same values! I've watched you struggle with the WinABI, and you still can't get it right, and I'm thinking, "This isn't the same guy who wrote €ASM, is it?". The coding style is much different. The degree of proficiency is not the same. I have seen that kind of programming approach before you are taking with the WinABI, but only with beginner programmers and it is called "shotgunning code". It's what beginners do when they can't figure out how to do something or can't understand the spec they are trying to conform to. I skimmed over you source code for €ASM and it looked decent as far as I could tell, but when I look at approach you are taking with the WinABI here, it seems there are two very different people here writing for €ASM. I think you either bought the code for €ASM from someone else, was given the code by a friend or co-worker, or (God forbid) you stole the code from someone else -- you tell us.

You have the link to the Win64 ABI on Microsoft's website, you have a debugger (I would recommend x64dbg) in which you can look at the compiled output of any ABI compliant C compiler (like GCC) to see how 64-bit programming is done professionally, yet you refuse to do any of that. For that reason (and the shotgunning of code), I don't believe you will ever get the Win64 ABI correct (or at least not become this huge bloated monster you are making it out to be so far).

Posted: **08 Sep 2018 13:48**

...by the called function, and not the callee.
It is not explicitly specified there, but both preceding and successive sentences concern the caller and not the callee.
And here MS explicitly says
both the integer register and the floating-point register will contain the float value in case the callee expects the value in the integer registers..
Of course, I may have misunderstand this, I'm not a native English speaker. Nevertheless, copying float values to GPR canot make any harm, except for bloating the code.

Ad coding style: yes, only two month ago I purchased 64bit version of Windows (and most of that time I've spent by rearranging my old apps and services to work on 64bit system). The only 64bit programs that I have written since, are those three Sample projects (I used x64dbg.exe in their developement).

I've never used in real app any ABI fuction which accepts float arguments yet, so I would appreciate if you, ar18, would try to prune that bloat WinAPI macro of mine (which drives you so furious

) and tested your optimized version in real program, such as your Paint.
Both versions of €ASM fastcall macrolibrary could coexist in maclib subdirectory: your slim+neat and my fat+robust. Sometimes I'd like to add SystemV ABI version there, too.

EuroAssembler

WinABI.htm

WinABI.htm

Re: WinABI.htm

Re: WinABI.htm

Re: WinABI.htm

Re: WinABI.htm

Re: WinABI.htm