Autoincrement/decrement Addressing Modes

Requests for missing features.
AndreyDmitriev
Posts: 16
Joined: 01 Jun 2023 09:11

Autoincrement/decrement Addressing Modes

Unread postby AndreyDmitriev » 20 Jan 2026 16:37

Hi Pavel,

Thank you very much for the new €Assembler version — it works like a charm!

I have one suggestion that I would like to discuss.
A long time ago, I worked with the MACRO‑11 assembler on a DEC PDP‑11 computer.
Manual:
https://bitsavers.org/pdf/dec/pdp11/rst ... _Oct87.pdf

In section 5‑2 (page 46), the addressing modes are listed, and one thing I am missing in Intel‑style assembler syntax is the **Autoincrement Mode**.
You can see how it is used on page 3‑11 (page 37), Figure 3‑1:
1$: CLR (R0)+
    CMP R0, #IMPURT; Test if the end reached
    BNE 1$
Here, `CLR (R0)+` clears a word at the address pointed to by `R0`, then increments `R0` (by 2, because it is a word). As a result, `R0` points to the next word on the next iteration. This way, we do not need a dedicated increment instruction—the increment is combined with `CLR`. As far as I know, this is directly supported by the DEC's CPU.

Table 5‑3 shows that this works with any operation, such as `OPR R, (R)+`.

Perhaps you could add similar “syntax sugar” to your assembler as well?
For example, consider this code, which adds two byte arrays `a` and `b` of length `n` and writes the result to `c`:
EUROASM CPU=X64, SIMD=AVX2, AMD=ENABLED
AsmDLL64 PROGRAM FORMAT=DLL, MODEL=FLAT, WIDTH=64

EXPORT add_bytes_avx2
; void add_bytes_avx2(const uint8_t* a,
;                     const uint8_t* b,
;                     uint8_t* c,
;                     size_t n);
add_bytes_avx2 PROC
    test    r9, r9
    jz      done

    ; number of full 32-byte blocks
    mov     r10, r9
    shr     r10, 5            ; r10 = n / 32
    jz      tail

avx_loop:
    vmovdqu ymm0, [rcx]
    vmovdqu ymm1, [rdx]
    vpaddb  ymm0, ymm0, ymm1
    vmovdqu [r8], ymm0

    add     rcx, 32 ; < next 32 bytes
    add     rdx, 32
    add     r8,  32
    dec     r10
    jnz     avx_loop

tail:
    ; remaining bytes
    and     r9, 31
    jz      done

tail_loop:
    mov     al, [rcx]
    add     al, [rdx]
    mov     [r8], al

    inc     rcx, rdx, r8
    dec     r9
    jnz     tail_loop

done:
    vzeroupper ; important for ABI
    ret
ENDP add_bytes_avx2

ENDPROGRAM AsmDLL64

My suggestion is to add post‑increment support, for example:

    vmovdqu ymm0, [rcx]
    add     rcx, 32
    ; will be turned to
    vmovdqu ymm0, [rcx]+ ; < auto increment
or
    mov     al, [rcx]
    inc     rcx
    ; will be written as
    mov     al, [rcx]+ ; < auto increment

With this feature, the full code above could look like this, shorter and elegant:

EUROASM CPU=X64, SIMD=AVX2, AMD=ENABLED
AsmDLL64 PROGRAM FORMAT=DLL, MODEL=FLAT, WIDTH=64

EXPORT add_bytes_avx2
; void add_bytes_avx2(const uint8_t* a,
;                     const uint8_t* b,
;                     uint8_t* c,
;                     size_t n);
add_bytes_avx2 PROC
    test    r9, r9
    jz      done

    ; number of full 32-byte blocks
    mov     r10, r9
    shr     r10, 5            ; r10 = n / 32
    jz      tail

avx_loop:
    vmovdqu ymm0, [rcx]+
    vmovdqu ymm1, [rdx]+
    vpaddb  ymm0, ymm0, ymm1
    vmovdqu [r8]+, ymm0

    dec     r10
    jnz     avx_loop

tail:
    ; remaining bytes
    and     r9, 31
    jz      done

tail_loop:
    mov     al, [rcx]+
    add     al, [rdx]+
    mov     [r8]+, al

    dec     r9
    jnz     tail_loop

done:
    vzeroupper ; important for ABI
    ret
ENDP add_bytes_avx2

ENDPROGRAM AsmDLL64

This could also be extended to post‑ and pre‑increments and decrements:

mov  al, [rcx]+ == mov  al, [rcx] & inc rcx
mov  al, [rcx]- == mov  al, [rcx] & dec rcx
mov  al, +[rcx] == inc rcx & mov  al, [rcx]
mov  al, -[rcx] == dec rcx & mov  al, [rcx]

The amount of increment or decrement would be inferred from the data size and context: 1 for byte, 2 for word, and so on, up to 64 for AVX‑512.
This could reduce code size and improve readability.

Just an idea — what do you think?
User avatar
vitsoft
Site Admin
Posts: 61
Joined: 04 Nov 2017 20:08
Location: Vítkov, The Czech Republic
Contact:

Re: Autoincrement/decrement Addressing Modes

Unread postby vitsoft » 20 Jan 2026 19:38

Table 5‑3 shows that this works with any operation, such as `OPR R, (R)+`.
Perhaps you could add similar “syntax sugar” to your assembler as well?
I'm afraid postfix/prefix autoincrement contradict with the syntax of assemblers. Plus and Minus signs are reserved for addition and subtraction.
The only case where it magically works are string instructions such as LODS which have the autoincrementation hardwired in x64-86 CPU.

Autoincrementation can be however easily programmed at macro level.
vmovdqu ymm0, [rcx]
add rcx, 32
; will be turned to
vmovdqu ymm0, [rcx]+ ; < auto increment
will change to
Load32Bytes %MACRO DestYMM, SourceAddr
     VMOVDQU %DestYMM, [%SourceAddr]
     ADD %SourceAddr, 32
    %ENDMACRO Load32Bytes

Store32Bytes %MACRO DestAddr, SourceYMM
     VMOVDQU [%DestAddr], %SourceYMM
    ADD %DestAddr, 32
  %ENDMACRO Store32Bytes
and then instead of
avx_loop:
vmovdqu ymm0, [rcx]+
vmovdqu ymm1, [rdx]+
vpaddb ymm0, ymm0, ymm1
vmovdqu [r8]+, ymm0
you will write
    Load32Bytes ymm0, rcx
    Load32Bytes ymm1, rdx
    vpadd ymm0, ymm0, ymm1
    Store32Bytes r8, ymm0
which is almost as elegant and does not conflict with assembler syntax.
AndreyDmitriev
Posts: 16
Joined: 01 Jun 2023 09:11

Re: Autoincrement/decrement Addressing Modes

Unread postby AndreyDmitriev » 22 Jan 2026 10:48

vitsoft wrote: 20 Jan 2026 19:38 instead of
avx_loop:
vmovdqu ymm0, [rcx]+
vmovdqu ymm1, [rdx]+
vpaddb ymm0, ymm0, ymm1
vmovdqu [r8]+, ymm0

you will write
    Load32Bytes ymm0, rcx
    Load32Bytes ymm1, rdx
    vpadd ymm0, ymm0, ymm1
    Store32Bytes r8, ymm0
which is almost as elegant and does not conflict with assembler syntax.
No, macros are obvious, but unfortunately they are not very elegant, because there are many possible combinations and they are not generic enough. For example, if I change the unaligned move VMOVDQU to the aligned VMOVDQA, I will need another pair of macros. Or, if I pass the instruction as a parameter as well, I will end up with an additional separator and more “information noise.”
In general, you are already conflicting with canonical assembler syntax when you introduce instructions like inc rcx, rdx, r8 (the same applies to dec, push, and pop, where multiple registers are allowed — and I actually love this). This will cause an A2008 syntax error in MASM, at least.
So I’m fine with broken syntax; this is just an enhancement taken from the MACRO-11 assembler, where the plus and minus signs are also reserved for addition and subtraction, but when used as (Ri)+ or -(Ri) they mean autoincrement or autodecrement.
See:
https://www.teach.cs.toronto.edu/~ajr/258/pdp11.pdf
Maybe I’ll try to add this myself with the help of AI.

Who is online

Users browsing this forum: Google [Bot] and 1 guest