The name of the software is EuroAssembler.
Please notice that there is no space between Euro and Assembler.
The name is often abbreviated as €ASM.
In a 7-bit ASCII environment it may also be referred as EUROASM
and in some internal identifiers it's just ea.
The Euro character € is available on a Windows keyboard as Alt~0128 or as HTML entity €.
Some features that are rarely seen in other assemblers:
euroasm.exe source1.asm, source2.asm, more*.asm
Labels, EQUated symbols, structures may be referred (used) before they are defined (though this is not recommeded). |0000: | ; Referring structured memory variable Today which will be defined later. |0000:C706[1000]E007 | MOV [Today.Year],2016 ; Put immediate value to WORD memory variable. |0006:C606[1200]0C | MOV [Today.Month],12 ; Put immediate value to BYTE memory variable. |000B:C606[1300]1F | MOV [Today.Day],31 ; Put immediate value to BYTE memory variable. |0010: | |0010:00000000 |Today DS Datum ; Definition of a structured symbol whose structure will be declared later. |0014: | |[Datum] |Datum STRUC ; Declaration of structure Datum. |0000:.... |.Year DW WORD |0002:.. |.Month DB BYTE |0003:.. |.Day DB BYTE |0004: | ENDSTRUC Datum
<HTML tags>
are treated as comments.
This allows to keep the assembly source close to its rich-text documentation.HEAD
and ENDHEAD
. Includable interface division HEAD..ENDHEAD
of program module does not need to be kept in a separated header file (such as *.hfiles in C-language).
PROC..ENDPROC
€ASM supports
semiinline procedures PROC1..ENDPROC1
,
which are expanded from macro only once, during its first invocation.IMPORT RegCloseKey, LIB="user32.dll"
.
The import-libraries are not required by the €ASM linker (though they are supported).PROGRAM..ENDPROGRAM
produces its own object or executable file.
A multi-module project source could be kept in one big file, if this is preferred by the author.euroasm source.asm
.HelloL32.xand
HelloL64.x. Both executable files will be created from this source file
hello.asmwith a single command
euroasm hello.asm
.
We may run them in Linux or in its Windows emulator WSL:EUROASM CPU=x64 HelloL32 PROGRAM Format=ELFX, Entry=Main:, Width=32 ; HelloL32.exe works in 32-bit Linux. Main: MOV EAX,4 ; Kernel operation sys_write=4. MOV EBX,1 ; File descriptor of the standard output (console). MOV ECX,Message ; Address of the message. MOV EDX,SIZE# Message ; Size of the message. INT 0x80 ; Invoke the kernel. MOV EAX,1 ; Kernel operation sys_exit=1. XOR EBX,EBX ; Returned errorlevel=0. INT 0x80 ; Invoke the kernel. Message: DB "Hello, world of %^Width bits in Linux!",10 ENDPROGRAM HelloL32 HelloL64 PROGRAM Format=ELFX, Entry=Main:, Width=64 ; HelloL64.exe works in 64-bit Linux. Main: MOV RAX,1 ; Kernel operation sys_write=1. MOV RDI,1 ; File descriptor of the standard output (console). LEA RSI,[Message] ; Address of the message. MOV RDX,SIZE# Message ; Size of the message. SYSCALL ; Invoke the kernel. MOV RAX,60 ; Kernel operation sys_exit=60. XOR EDI,EDI ; Returned errorlevel=0. SYSCALL ; Invoke the kernel. Message: DB "Hello, world of %^Width bits in Linux!",10 ENDPROGRAM HelloL64
We could hide most of assembly instruction in macroinstructions from the libraries linapi.htm
(32 bit) and linabi.htm
(64 bit),
and using literals (=B "Hello...") for the definition of the printed strings:
EUROASM CPU=x64 HelloL32 PROGRAM Format=ELFX, Entry=Main:, Width=32 ; HelloL32.exe works in 32-bit Linux. INCLUDE linapi.htm ; Define 32-bit macros StdOutput and TerminateProgram. Main: StdOutput =B "Hello, world of %^Width bits in Linux!", Eol=Yes TerminateProgram Errorlevel=0 ENDPROGRAM HelloL32 %DROPMACRO * ; Forget macros defined in "linapi.htm". HelloL64 PROGRAM Format=ELFX, Entry=Main:, Width=64 ; HelloL64.exe works in 64-bit Linux. INCLUDE linabi.htm ; Define 64-bit macros StdOutput and TerminateProgram. Main: StdOutput =B "Hello, world of %^Width bits in Linux!", Eol=Yes TerminateProgram Errorlevel=0 ENDPROGRAM HelloL64
A similar example for MS-Windows:
EUROASM CPU=x64, SIMD=Yes HelloW32 PROGRAM Format=PE, Entry=Main:, Width=32 ; HelloW32.exe works in 32-bit and 64-bit Windows. INCLUDE winapi.htm ; Define 32-bit macros WinAPI and TerminateProgram. Main: WinAPI MessageBox,0,="Hello, world of %^Width bits in Windows!",="Title",0, Lib=user32.dll TerminateProgram Errorlevel=0 ENDPROGRAM HelloW32 %DROPMACRO * ; Forget macros defined in "winapi.htm". HelloW64 PROGRAM Format=PE, Entry=Main:, Width=64 ; Hello64.exe works in 64-bit Windows. INCLUDE winabi.htm ; Define 64-bit macros WinABI and TerminateProgram. Main: WinABI MessageBox,0,="Hello, world of %^Width bits in Windows!",="Title",0, Lib=user32.dll TerminateProgram Errorlevel=0 ENDPROGRAM HelloW64
This manual covers the programmer's guide, examples, language references and implementation remarks. Different styles are used to identify those elements.
The background color of the element in the web page helps to distinguish between
Dashed hyperlinks refer to another paragraph within the same page.
Underlined hyperlinks navigate to a different HTML page of this site.
Underlined
hyperlinks with Link icon navigate to signpost page Links
with external references.
Underlined hyperlinks with Exit icon navigate outside EuroAssembler website, you may want to open them in a new tab or window.
The contents of this manual are organized in chapters with a tree structure.
Definitions of new terms is written in blue bold italics.
Implementation details, discussions and less important personal remarks are printed with smaller font.
File names
are emphasized in quotes.
Characters used in text have white background.
Short piece of source code
is displayed in a monospace font, black on yellow.
; Longer examples of source code in this manual are presented in a box. ; They may have more lines. ;Errorneous, negative or wrong examplesare overstriked.
The assembly programming language (ASM) gives programmers the
maximal possible control of emitted machine code.
Of course, having to write every instruction for the Central Processing Unit (CPU)
by hand is very tedious. That is why subprograms were invented:
procedures, functions and macroinstructions.
A subprogram is like a black box with a documented purpose, input and output.
The main difference between our own ASM subprogram and a HLL function is that when it
doesn't work as expected, we can easily trace down the mistake, stepping on each
machine instruction in a debugger, and that there is no-one else to blame but us.
ASM subprograms can do the same job as orders of higher level languages (HLL) or invokations of operating system (OS) application programming interface (API). The EuroAssembler macrolanguage allows to prepare in advance macros tailored to the problem and use them to solv a task, which are similar to functions from OS or HLL libraries, and they allow to develop programs in ASM almost as rapidly as in HLL.
The advantage of mastering the assembly language manifests when we are challenged with a third-party program that is without its source code available, or when some badly written program throws an exception and exits. DrWatson, debuggers or disassemblers can only show the alien code converted to assembly instructions. People who never met ASM will hardly know how to interpret the disassembled code, while ASM programmer will feel like a fish in its natural environment.
The main disadvantage of assemblers is a lack of standardized libraries which unify programming in HLL such as C or Java. In one hand, many ASM programmers build their own, which makes their sources not portable unless the necessary libraries are shipped together with source. On the other hand, making a library with our own functions is the best method how to remember all the function and parameter names, and on how to learn a lot about computers and operating systems.
The EuroAssembler packageeuroasm.zipcontains several macrolibraries for a quick start and for inspiration.
Phase | Used tool |
---|---|
design-time | imagination |
write-time | text editor |
assembly-time | assembler |
combine-time | linker |
link-time | linker |
load-time | operating system loader |
bind-time | operating system loader |
run-time | processor |
Dissatisfation with available tools is one of the reasons why some programmers want to invent their own language.
And last but not least, creating an assembler is a very interresting challenge. An incomplete list of assemblers and other tools, that I had the pleasure to come into contact with, is presented at the link [Assemblers] and [UsefulTools].
The first assembler I met when I started to flirt with the assembly language in the early 80's, was IBM's FDOS for S360 mainframe computers [HLASM]. That was a very sofisticated product with advanced features such as sections, keyword operands, literals, with a macro language which was able to manipulate not only with the generated machine statements, but also with its own macro variables and their names.
I missed many of those features in assemblers for the Intel architecture. Some of them brought new ideas but none seemed ideal for me. [NASM] ver.0.99 was quite good, in fact the first bootstrap version of €ASM was written in it, but I was irritated when it wasn't able to automatically select SHORT or NEAR distance jumps and had other design flaws, such as not expanding preprocessing variables in quoted strings.
I always wondered why constant EQU symbols had to be declared before the first use. Why I can't declare a macro in a macro. How to solve situations when file A includes files B and C, and file C also includes file B, duplicating its definitions.
I don't like a language which is cluttered up with free space. In HLASM a space in the operand list signalised that everything up to the end of the punched card should be ignored. €ASM isn't that strict in this horror vacui, in fact white spaces may be put anywhere between language elements to improve readability. However, spaces are almost never required by syntax.
€ASM does not use English word modifiers such as
SHORT, NEAR, DWORD PTR, NOSPLIT
which are identified by their value only. Instead, it prefers the Name=Value paradigma with keyword instruction modifiers such asDATA=QWORD,IMM=BYTE,MASK=K5,ZEROING=ON
, which remove ambiguity and replace ugly decorators proposed in the Intel documentation.
Permission to use EuroAssembler is granted to everybody who obeys this Licence.
There are no restrictions on purpose and scope of applications created with this tool.
It may be used in private, educational or commercial environments freely.
EuroAssembler is provided free of charge as-is, without any warranty guaranteed by its author.
This software may be redistributed in unmodified zipped form, as downloaded from EuroAssembler.eu. No fee may be requested for the right to use this software.
You may disseminateeuroasm.zipon other websites, repositories, FTP archives, compact disks and similar media. Please be sure to always distribute the latest available €ASM version.
Source code of EuroAssembler was written by Pavel Šrubař, AKA vitsoft, and it is copyrighted as so.
Macrolibraries and sample projects are released as public domain and they may be modified freely.
I cannot recommend modifying the libraries, though, because they may be changed in future releases of €ASM and your enhancements would have been overwritten. Create your own files with vacant names instead.
You may modify €ASM source code for the sole purpose to fix a bug or to enhance it with new function, but you may not distribute such modified software. It may only be used by you on the same computer where it was edited, reassembled and linked.
EuroAssembler is not open source. I don't want to fork €ASM developement into a bazaar of incompatible versions, where each branch provides different enhancements. Please propose your modifications to the author or to €ASM forum instead, so it might be incorporated in future releases of EuroAssembler.
The distribution file euroasm.zip
contains folders and files as listed on the
Sitemap page.
The modification time of all files is equally set to the nominal release time.
All file names are in lower case (Linux convention) and in 8.3 size (DOS convention),
so any old DOS utility can be used for unpacking.
You may need to run the console as an administrator
for an installation on a secure version of MS-Windows.
Choose and create EuroAssembler home directory,
for instance C:\euroasm
, change to it
and unzip the downloaded euroasm.zip
.
Move or copy the main executable euroasm.exe
to some folder from system %PATH%
,
so it might be launched as euroasm
from anywhere. When you run it without parameters for the first time,
it will create the global configuration euroasm.ini
,
which you should tailor now with a plain-text editor.
You may want to replace relative IncludePath= and LinkPath= in[EUROASM]
section with an absolute path identifying the €ASM home directory.
In[PROGRAM]
section you can specify your preferred target format, for instanceFormat=PE, Subsystem=CON
andWidth=32
. You could also replaceIconFile="euroasm.ico"
and copy your preferred personal icon toobjlibsubfolder.
For the (not-recommended) bare-bone minimal installation
you are now done and you could erase the whole home directory now.
The executable euroasm.exe
itself does not need any other supporting files, environment or registry modification.
If you prefer to read this documentation in other language,
rename the default English version of this manual eadoc\index.htm
to eadoc\man_eng.htm
and then rename the chosen available human language translation, e.g. eadoc\man_cze.htm
, to eadoc\index.htm
.
For a developement installation go to the home directory
and unzip developer-scripts from the subarchive generate.zip
.
You will also need webserver and PHP (version 5.3 or higher) installed on your localhost.
Most of EuroAssembler files are in HTML format, you may want to incorporate €ASM into your local web server, if you run it on your localhost computer.
In my Apache installation I added the following paragraph to the
httpd.conforapache2.conf:<VirtualHost *:80> DocumentRoot C:/euroasm/ ServerName euroasm.localhost </VirtualHost>I appended the statement
127.0.0.1 euroasm.localhost
into the file%SystemRoot%/SYSTEM32/drivers/etc/hosts. Now I can writeeuroasm.localhost
into address line of my internet browser and explore the €ASM documentation and other files locally.
Computer programs exchange information with users through various channels: standard streams, command-line parameters, environment variables, errorlevel value, disk files and devices.
The basic form of communication between programs and human user has the form of characters streams, which are by default directed to the console terminal where was the program launched from. They may also be redirected to a disk file or device driver with command-line operators >, >>, <, |.
Standard input is not used in €ASM.
Standard output prints warnings, errors and informative messages produced by €ASM.
Standard error output is not used in €ASM.
Command-line parameters are not used.
€ASM assumes that everything on the command line is the main source file name(s) intended to assemble.
All options controlling the assembly & link process are defined in
the configuration files euroasm.ini
or directly in the source file itself.
In fact there are semi-undocumented EUROASM options which are recognized in command-line, however the preferred place for EUROASM options is the configuration file or the source file. Command-line options are employed in test examples to suppress some variable informative messages, and its use should be kept to a minimum.
Environment variables are not used in €ASM.
Environment variables may be incorporated into the source at assembly-time using
the pseudoinstruction %SETE. Of course, it is also possible
to read environment variables at run-time with the corresponding API call,
such as GetEnvironmentVariable()
.
€ASM does not use any other devices (I/O ports, printers, sound cards, graphic adapters, etc.) at assembly-time.
Important information detected by EuroAssembler during its activity is published in the form of short text messages. They are written on standard output (console window) and to the listing file.
Each message is identified by a combination of a capital letter followed by four decimal digits. The complete text of messages is defined in source file msg.htm.
The letter prefix and the first digit (0..9) declare message severity.
The final errorlevel value, which euroasm.exe
terminates with,
is equal to the highest message severity encounterred during the assembly session.
Type of message | Prefix | Identifier range | Severity | Search marker |
---|---|---|---|---|
Informative | I | I0000..I0999 | 0 | |# |
Debugging | D | D1000..D1999 | 1 | |# |
Warning | W | W2000..W3999 | 2..3 | |## |
Nonsuppressible warning | W | W4000..W4999 | 4 | |## |
User-defined error | U | U5000..U5999 | 5 | |### |
Error | E | E6000..E8999 | 6..8 | |### |
Fatal | F | F9000..F9999 | 9 | |### |
EuroAssembler is verbose by default, but it may be totally silenced when launched with the parameter
NOWARN=0000..0999
, and if no error occured in source.
Warnings usually do not prevent the compiled target from execution, they are meant as a friendly reminder that the programmer might have forget about something or has made a typo mistake.
Messages with a severity level tanging from 5..8 indicate that some statements were not compiled due to error. Although the target file may be valid, it will probably not work as intended.
Fatal errors indicate an interaction failure with the operating system, resource exhaustion, file errors or internal €ASM errors. The target and listing file might have not been written at all.
Informative, debugging and warning messages in the range I0000..W3999 can be suppressed with EUROASM option NOWARN=, but this ostrich-like policy is not a good idea. It's always better to fix the root cause of the message. If you intend to publish your code, it should always assemble with an errorlevel 0.
A typical message consists of its identifier followed by the actual tailored message text. When it is printed on standard output, the text is accompanied by a position indicator in the form of a quoted file name followed by a physical line number in curly brackets, for instance
E6601 Symbol "UnknownSym" mentioned at "t1646.htm"{71} was not found. "t1646.htm"{71} ▲▲▲▲▲ ▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲ Identifier position indicator
Usually there is just one position indicator per message, but when the error was discovered in the macro expansion phase, another indicator is added which determines the line in the macro library. In case of a macro expanded in another macro, position indicators will be further chained.
The messages printed to the listing file have a slightly different format. The position indicator is omitted, because they are inserted just below the source line which triggered the error:
|002B: | MOV SI,UnknownSym: ; E6601 expected. |### E6601 Symbol "UnknownSym" mentioned at "t1646.htm"{71} was not found. ▲▲▲▲ markerThe message text is prefixed with a search marker which helps to find messages in listing.
So you can use the internal function Find/FindNext
(Ctrl-F)
of the editor or viewer used to examine the file listing.
As amatter of fact €ASM syntax never uses multiple pound characters ##, so the search marker is unique in listing and it helps to skip (filter out) from one error|warning to the next.
You could also try the specialized €ASM listing viewer distributed as one of the sample projects.
Debugging messages D1???
produced by the pseudoinstruction
%DISPLAY are published even when they are placed in false
%IF branches or in blocks commented-out by
%COMMENT..%ENDCOMMENT.
The listing file is created only during the final assembly pass,
and informative messages are not printed to listing at all,
except for informative linker messages in the I056?
range.
There are two kinds of input files which €ASM reads: configuration and source.
There are two kinds of output files which €ASM writes: object and listing.
The configuration file, which has the immovable (predetermined) name euroasm.ini
, specifies
default options for assembler. €ASM queries two configuration files with identical name and structure:
A global configuration file is located in the same directory
as the main executable (euroasm.exe
) and it is processed once after €ASM
has started. If the file does not exist, €ASM tries to create it with
the factory-default contents.
The local configuration file is searched for in the same directory as the
actual source file. If more than one source is specified at the command-line,
local configuration files are read each time when the actual source gets processed.
Local euroasm.ini
is not automatically created by €ASM,
you may need to copy|clone the global file manually, and eventually erase unchanged or unused options
from the local configuration file for better performance.
Example of command line which assembles two sources:C:\PgmFiles\euroasm.exe Source1.asm D:\Temp\Source2.asm
EuroAssembler will try to read its configuration from three files:C:\PgmFiles\euroasm.ini
,.\euroasm.ini
,D:\Temp\euroasm.ini
.
The initial contents of configuration file, which is built-in in euroasm.exe
as factory-defaults, are defined in objlib/euroasm.ini.
There are two sections in the file:
[EUROASM]
and [PROGRAM]
.
The former specifies parameters for €ASM itself, such as CPU generation, what information should go to the listing file, which warnings should be suppressed etc. The parameters from [EUROASM] section of the configuration file can be redefined later in the source with the EUROASM pseudoinstruction, where you will find detailed explanation for each one of the parameters.
[PROGRAM] section of configuration file specifies the default working parameters of program which is to be created by €ASM, for instance the memory model, format and name of the object file etc. These parameters can be modified further with the PROGRAM pseudoinstruction.
The configuration parameters order is not important.
Names of the parameters are case insensitive.
The parameters with a boolean value accept any of the predefined enumerated tokens such as
ON, YES, TRUE, ENABLE, ENABLED
as true
and OFF, NO, FALSE, DISABLE, DISABLED
as false.
They may also accept numeric expressions which are evaluated as boolean.
When you give away your programs source code written in EuroAssembler, you don't have to specify which comand-line parameters were used to compile and link, because they can be declared in the source itself. A typical €ASM source program begins with configuration pseudoinstruction, such asEUROASM AUTOALIGN=YES,CPU=PENTIUM
, so it is easy to tell in which assembler was the program written.
As a developer of program written in EuroAssebler, you shouldn't rely that users of your distributed source will have the same contents ofeuroasn.inias you have. Specify all important settings in the beginning of the published source. Local configuration file is convenient during the development phase, when sources in the same directory do not have to explicitly specify all EUROASM and PROGRAM parameters.
The EuroAssembler options and directives can be defined in the configuration files and in the source files (by the pseudoinstruction EUROASM). They have the following order of precedence in their processing:
The source file contains the instructions to be assembled, usually it is a plain-text file
or an HTML file arranged for €ASM. The file name will be provided as a command-line parameter
of euroasm.exe
. The source file may be identified with an absolute path in the filesystem, e.g.
euroasm /user/home/euroasm/MyProject/MySource.asm
,
or with a relative or omitted path, which will be related to the current shell or command line path.
The structure and syntax of source text, which €ASM is able to assemble and link, is described further down in this document.
The main purpose of programming is to obtain the target file from the source code. The target file may be an object module or a library linkable to other files, or a binary file for special purposes, or an executable file.
The format of the output file is specified by the PROGRAM parameter FORMAT=. Their layouts were standardized by their creators many, many years ago. For more details about supported output formats see the chapter Program formats.
The final name of the target file is determined by the label used in the previously described pseudoinstruction PROGRAM,
and it is appended with its default extension depending on program format. The target name is not necessarily derived
from the source filename, as in many other assemblers.
For instance, if the source code file has statement Hello PROGRAM FORMAT=COM
, its output file will be
created in the current directory with the name Hello.com
, no matter what the source file is named.
The default target name can be changed by the PROGRAM parameter OUTFILE=.
If the OUTFILE= name is specified with relative or omitted path, current shell directory is assumed.
A listing file is a plain text file with two columns where EuroAssembler logs its activity:
The name of the listing is determined by the name of source file, which is then appended an .lst extension,
and it is created in the source file directory.
The default listing filename and location might be changed
with the EUROASM parameter LISTFILE=.
Let's create the source file Hello.asm
with the following contents:
EUROASM DUMP=ON,DUMPWIDTH=18,DUMPALL=YES Hello PROGRAM FORMAT=COM,LISTLITERALS=ON, \ LISTMAP=OFF,LISTGLOBALS=OFF MOV DX,=B"Hello, world!$" MOV AH,9 INT 21h RET ENDPROGRAM Hello
Submitting the file to EuroAssembler with the command
euroasm Hello.asm
will create the listing file Hello.asm.lst
.
The width of the dump column expressed in characters can be specified with the EUROASM option DUMPWIDTH=. Other EUROASM options which control the dump column are the boolean DUMPALL= and DUMP=OFF, which can suppress the dump column completely.
|<-Dump column-->|<--Source column-------- <--DumpWidth=18--> | | EUROASM DUMP=ON,DUMPWIDTH=18,DUMPALL=YES | |Hello PROGRAM FORMAT=COM,LISTLITERALS=ON, \ | | LISTMAP=OFF,LISTGLOBALS=OFF |[COM] ::::Section changed. |0100:BA[0801] | MOV DX,=B"Hello, world!$" |0103:B409 | MOV AH,9 |0105:CD21 | INT 21h |0107:C3 | RET |[@LT1] ====ListLiterals in section [@LT1]. |0108:48656C6C6F =B"Hello, world!$" |010D:2C20776F72 ----Dumping all. (because of DUMPALL=YES) |0112:6C64212400 ----Dumping all. | | ENDPROGRAM Hello ▲ column separatorThe dump column on the left side always starts with the machine comment indicator (pipe character |) and it is terminated with a listing column separator, which determines the origin of this line.
Character | Function |
---|---|
| (pipe) | Termination of a machine comment. Used in ordinary statements, which can be reused as €ASM source. |
! (exclamation) | Copy of the source line with expanded preprocessing %variables (when LISTVAR=ENABLED is used). |
+ (plus) | Source line generated in %FOR,%WHILE,%REPEAT expansion (when LISTREPEAT=ENABLED is used). |
+ (plus) | Source line generated in %MACRO expansion (when LISTMACRO=ENABLED is used). |
: (colon) | Inserted listing line to display a changed [section]. |
. (fullstop) | Inserted listing line to display an autoalignment stuff (when AUTOALIGN=ENABLED is used). |
- (minus) | Inserted listing line to display the whole dump (when DUMPALL=ENABLED is used). |
= (equal) | Inserted listing line to display data literals (when LISTLITERALS=ENABLED is used). |
(space) | Inserted envelope PROGRAM / ENDPROGRAM line. |
* (asterix) | Inserted listing line in INCLUDE* statement when filename wildcards are resolved. |
As a side effect when the column separator is not |, the whole listing line has the form of a machine remark and it is ignored if the listing is submitted again as a program source.
The dump of emitting statements has their hexadecimal address (offset in the current working section),
terminated with a colon :.
In a 16-bit section the offset is 16 bits wide (four hexadecimal digits), in a 32-bit and 64-bit sections it is 32 bits wide.
Then the emitted bytes follow. The data contents in the dump column is always in hexadecimal notation without
an explicit number modifier. If the chosen DUMPWIDTH= is too small for all emitted bytes to fit,
they are either right-trimmed and replaced with a tilde ~ (if DUMPALL=OFF
),
or additional lines with separator - are inserted to the listing (DUMPALL=ON
).
Some other decorators are used in the dumped bytes:
Decorator | Description |
---|---|
~ | Trimmed data indicator, used only when DUMPALL=OFF |
.. | Byte of reserved data (instead of hexadecimal byte value when it's initialized) |
[] | Absolute relocation |
() | Relative relocation |
{} | Paragraph address relocation |
<N | disp8*N compression used |
The brackets [] and {}, which may enclose the dumped word or dword, indicate that the address requires relocation at link-time. Value printed in the listing will differ from the offset viewed in a linked code or in a debugger at run-time.
The character < followed with one decimal digit (N) signals that the previously dumped byte is a 8-bit displacement which will be left-shifted by N bits at run-time to obtain the effective displacement (the so called disp8*N compression). The digit from 1..6 specifying scaling factor N is not emitted to the assembled code.
Brackets [ ] and { } indicate relocatable values. | | EUROASM DUMPWIDTH=30,CPU=X64,SIMD=AVX512,EVEX=ENABLED |[CODE] ▼ ▼▼ ▼ |[CODE] SEGMENT WIDTH=16 |0000:EA[0500]{0000} | JMPF Label ; Absolute far jump encodes immediate seg:offset. |0005:CB |Label: RETF |[CODE64] |[CODE64] SEGMENT WIDTH=64 |00000000:62F36D28234D02<504 | VSHUFF32X4 YMM1,YMM2,[RBP+40h],4 |00000008:C3 ▲▲ | RET <5 is a nonemitted disp8*N decorator. ▲▲Byte displacement +02h will be bit-shifted 5 times to the left, so the effective displacement is in fact +40h.The dump of not emitting statements is either empty or contains auxiliary information.
|[DATA] |[DATA] ; Segment|section switch quotes its[name]
in dump column.
|0000: |; Empty or comment-only line just displays the offset in current section.
|0000: |Label: ; Ditto.
| |;; Line comment starting with double semicolon will suppress the offset in dump.
|[DATA]:0000 |Target EQU Label: ; Address symbol definition is displayed as [segment]:offset
.
|4358 |%Counter %SET CX ; Assignment of preprocessing %variable dumps its contents in hexadecimal.
|TRUE | %IF "%Counter" == "CX" ; Preprocessing construct displays the evaluated boolean condition.
|[]:0010 | Bits EQU 16 ; Scalar symbol definition is displayed with empty segment.
|FALSE | %ELSE ; Boolean condition concerns %IF, %ELSE, %WHILE, %UNTIL.
| | Bits EQU 32 ; Dump of statements in false conditional branches is empty.
| | %ENDIF
A listing produced with the default (factory) configuration is more or less an exact copy of the source (except for the inserted dump column).
Sometimes it is useful to check if the high-level constructs worked as expected, and this is controlled
by the following boolean EUROASM options:
LISTINCLUDE=ON unrolls the contents of the included file, which is
normally hidden from the main source.
LISTVAR=ON creates a copy of the statements which contain preprocessing %variable,
and replace the %variable name with its expanded value in the copied line.
LISTMACRO=ON inserts statements expanded by the macroinstruction.
LISTREPEAT=ON inserts all iterations of the repeating constructs
%FOR..%ENDFOR, %WHILE..%ENDWHILE, %REPEAT..%ENDREPEAT
.
A repeated expansion is listed as a commented-out by dump column separator +.
In the default state (defined by LISTREPEAT=DISABLED
) only the first expansion is listed.
A very useful trait by design of an EuroAssembler listing is to keep the generated listing re-usable as source code again, in the following assembly session. The messages generated in the listing are ignored by the €ASM parser, so they need not be removed when we want to submit the listing file to a reassembly (nevertheless, those messages will be generated again if the cause of error was not fixed).
I wanted to sustain this philosophy regardless of the LIST* parameters. In the default state with
LISTINCLUDE=OFF
the statement INCLUDE is normally listed and the contents of included file is hidden. With optionLISTINCLUDE=ON
it is reversed: the original INCLUDE statement is commented out by dump column separator * but the included lines are inserted into the listing and they become valid source statements. See also t2220.When options
LISTVAR, LISTMACRO, LISTREPEAT
are enabled, the original line is kept as is and expanded lines will be inserted below it, commented-out by dump column separator ! or +. See also t2230
The EUROASM option LIST=DISABLE will switch off the generating of listing lines until enabled again, or until the end of source, whichever comes first, and of course such listing will be no longer reusable as source code.
Disk files can be specified by their absolute path, i. e. with a path
which begins at filesystem root, e.g. C:\ProgFiles\euroasm.exe D:\Project\source.asm
.
Such files are unequivocally defined.
Files may be also specified with a relative path, e. g.
euroasm ..\prowin32\skeleton.asm
.
These relative paths are always related to the current working directory.
Files can also be specified without a path, i. e. when their name contains no colon and no slash :, \, /. The location of such files is reviewed in the table below:
Direction | File | Directory | See also |
---|---|---|---|
Executable | euroasm.exe | Exe-directory | OS PATH |
Input | Global euroasm.ini | Exe-directory | OS PATH |
Output | Global euroasm.ini | Exe-directory | OS PATH |
Input | Local euroasm.ini | Source directory | |
Input | Source file | Current directory | |
Input | Included source file | Include directory | EUROASM INCLUDEPATH= |
Output | Target object file | Current directory | PROGRAM OUTFILE= |
Output | Listing file | Source directory | EUROASM LISTFILE= |
Input | Linked module file | Link directory | EUROASM LINKPATH= |
Input | Linked stub file | Link directory | PROGRAM STUBFILE= |
Input | Linked icon file | Link directory | PROGRAM ICONFILE= |
Import | Dynamically imported function | OS-dependent | IMPORT LIB= |
The current directory is the actual folder
assigned to the shell process at the moment when euroasm.exe
was launched.
It's never changed by €ASM.
The exe-directory is the folder where euroasm.exe
was found and executed,
usually it is one of the directories specified by the environment variable PATH.
The source directory is the folder where the currently assembled source file lies.
The include directory is one of the directories specified by the option
EUROASM INCLUDEPATH=
.
The link directory is one of the directories specified by the option
EUROASM LINKPATH=
.
This chapter describes the format of a typical source file which €ASM understands and which it is able to compile.
That is particulary important taht if the source file is written in an editor that uses WIDE (16-bit) character encoding (UTF-16), it should be saved as a plain text in UTF-8 or in 8-bit ANSI or OEM codepage before submitting the file for assembly.
A program written in €ASM may need to display messages and texts in other languages than English.
Therefore, a string which defines the output text will contain characters
with their codepoint value above 127
(codepoint is an ordinal number of the character in the [Unicode] chart).
Many European languages are satisfied with a limited set of 256 characters.
Historically the relation between their codes and corresponding glyphes is called a code page.
Be aware that MS-Windows uses different code pages in console applications (OEM) and in GUI applications (ANSI) and it makes automatic conversion between them in some circumstances. €ASM itself never changes the code page of the source.
A programmer, who needs to mix several human-languages in MS-Windows application, may need to use 16-bit WIDE characters
instead of 8-bit ANSI in text strings at run-time. See cpmix32
as a demo example.
The wide (UTF-16) strings are declared with pseudoinstruction DU
(Define data in Unichars)
instead of DB
(Define data in Bytes) pseudoinstruction.
The wide variant of WinAPI call must be used for a visual representation of Unichar strings at run-time,
e. g. TextOutW()
instead of TextOutA()
. However, the in-source definition
of characters in DU
statement is still 8-bit. You should tell €ASM
which code page was used for writing the DU
statement in the source file.
This information is provided by the EUROASM CODEPAGE=
option.
The codepage may change dynamically in the source, thus allowing mixing of different
human-languages in one program.
The texts in your program which aim to run inside the console
(using the WinAPI function WriteConsoleA()
or macroinstruction
StdOutput) should be written in the OEM code page.
You may want to use a DOS plain-text editor, such as EDIT.COM
for writing console programs. As text mode editors use console fonts
which are in OEM code page, the text is displayed correctly both in editor
at write-time and in the console of your program at run-time.
Converserly text which would be presented in GUI windows (using the WinAPI function TextOutA()
)
should be written in the ANSI code page, using a windowed editor such as Notepad.exe
.
The default is EUROASM CODEPAGE=UTF-8
, where characters are encoded
with a variable length of one to four bytes. Thanks to the clever [UTF8] design, all non-ASCII UTF-8 characters
are encoded as censecutive bytes with the values in the 128..255 range, which are treated as
letters in €ASM,
so any UTF-8 defined character can be used in identifiers as is.
Unlike the 8-bit ANSI or OEM encodings, which limit the repertoire to 256 glyphs, CODEPAGE=UTF8 allows the mixing of arbitrary character codepoints defined in [Unicode], including non-European alphabets. MS-Windows API does not, by design, directly support UTF-8 strings, and they need run-time reencoding to UTF-16 which is used by the WIDE variant of the WinAPI functions, such as TextOutW(). This reencoding can be performed by WinAPI MultiByteToWideChar() or by macro DecodeUTF8. Exotic characters will be displayed correctly only if the used font supports their glyphes, of course.
Example of a freeware text editor that supports UTF-8 encoding is [PSPad].
Some UTF-8 text editors insert Byte Order Mark characters0xEF, 0xBB, 0xBF
at the start of source file. EuroAssembler treats those three characters as a 3-bytes long unused label at the start of source, which usually makes no harm.
All identifiers created by you, the programmer, are case sensitive: labels, constants, user-defined %variables, structures, macro names. On the other hand, all built-in names are case insensitive. Case insensivity concerns all enumerations: register names, machine instructions and prefixes, built-in data types, number modifiers, pseudoinstruction names and parameters, symbol attributes, system %^variables.
The case insensitive names are presented in UPPER CASE in this manual but they may be used in lower or mixed case as well.
Each byte (8 bits) in €ASM source is treated as a character. Many characters have special purpose in assembler syntax unless they are quoted inside double or single quotes. A character is unquoted if zero or an even number of quotes appears between the start of the line and the character itself.
ASCII | glyph | name | function in €ASM |
---|---|---|---|
0..9 | controls | white space | |
10 | line feed | end of line | |
11..31 | controls | white space | |
32 | space | white space | |
33 | ! | exclamation mark | logical operator |
34 | " | double quote | string delimiter |
35 | # | number sign | modifier |
36 | $ | dollar sign | letter |
37 | % | percent sign | preprocessor apparatus prefix |
38 | & | ampersand | logical operator |
39 | ' | apostrophe (single quote) | string delimiter |
40 | ( | left parenthesis | priority parenthesis |
41 | ) | right parenthesis | priority parenthesis |
42 | * | asterix | arithmetic and special operator |
43 | + | plus sign | arithmetic operator |
44 | , | comma | operand separator |
45 | - | minus sign | arithmetic operator |
46 | . | fullstop | member separator |
47 | / | slash (solidus) | arithmetic operator |
48..57 | 0..9 | digits | digit |
58 | : | colon | field separator |
59 | ; | semicolon | comment separator |
60 | < | less-then sign | logical operator, comment separator |
61 | = | equals sign | logical operator, key separator, literal indicator |
62 | > | greater-than sign | logical operator |
63 | ? | question mark | letter |
64 | @ | commercial at | letter |
65..90 | A..Z | uppercase letters | letter |
91 | [ | left square bracket | content braces, substring operator |
92 | \ | backslash (reverse solidus) | arithmetic operator, line continuation indicator |
93 | ] | right square bracket | content braces, substring operator |
94 | ^ | caret (circumflex) | logical operator |
95 | _ | underscore (low line) | letter, digit separator |
96 | ` | grave accent | letter |
97..122 | a..z | lowercase letters | letter |
123 | { | left curly bracket | sublist operator |
124 | | | vertical bar (pipe) | logical operator, comment separator |
125 | } | right curly bracket | sublist operator |
126 | ~ | tilde | logical operator, shortcut indicator |
127 | delete | white space | |
128..255 | NonASCII characters | letter | |
ASCII | glyph | name | function in €ASM |
An assembler source is treated as a text consisting of lines which are processed from left to right, from top to bottom.
A source file consists of physical lines. A physical line is a sequence of characters terminated with a line feed (ASCII 10). The line feed (EOL) character is part of the physical line, too.
The EOL may be omitted in the last physical line of source file.
A statement is an order for €ASM to perform some action at assembly-time, that is usually to emit some code to the object file or to change its internal state. A typical statement is equivalent to a physical line but long statements might span several lines when line continuation is used.
A statement consists of several fields which are recognized by their position in the line, by the separator or by their contents. All fields are facultative (optional), any of them may be omitted. However, no operand can be used when the operation field is omitted.
Order | Field name | Termination |
---|---|---|
1. | Machine remark | | or EOL |
2. | Label | : or white space |
3. | Prefix | : or white space |
4. | Operation | white space |
5. | Operand | , |
6. | Line comment | EOL |
Example of a statement:
| machine remark |Label |Prefix|Operation| Operands | Line comment |00001234:F08705[78560000] |Mutex: LOCK: XCHG EAX,[TheLock] ; Guard the thread.A machine remark begins with a vertical bar |
when it is the first
non-white character on the physical line. It is terminated with the second occurence
of the same vertical bar or with the end of the physical line.
The contents of a machine remark is usually an hexadecimal address followed by the machine code emitted by the statement in question. As the field name indicates, this information is generated by the computer into €ASM listing file, and because of that, the programmer should never need to write a machine remark manually. Machine remarks are ignored in assembler source, thus any valid €ASM listing file may be reused as the source file without change.
A label field can accomodate any of these elements:
My1stStructure
, My1stLabel:
, Outer
[.data]
%Count
In the first case the symbolic name may begin with a period (point) .
,
making the label local. The symbol in the label field may be optionally terminated
with one or more colons :
immediately following the identifier.
The white space between the label field and the next field may be omitted when the
colon is used.
The machine prefix is an order for CPU to change its internal state at run-time. It is similar to a machine instruction code but it only modifies the following instruction at run-time. Each prefix assembles to a single byte machine opcode.
Name | Group | Opcode |
---|---|---|
LOCK | 1 | 0xF0 |
REP | 1 | 0xF3 |
REPE | 1 | 0xF3 |
REPZ | 1 | 0xF3 |
REPNE | 1 | 0xF2 |
REPNZ | 1 | 0xF2 |
XACQUIRE | 1 | 0xF2 |
XRELEASE | 1 | 0xF3 |
SEGCS | 2 | 0x2E |
SEGSS | 2 | 0x36 |
SEGDS | 2 | 0x3E |
SEGES | 2 | 0x26 |
SEGFS | 2 | 0x64 |
SEGGS | 2 | 0x65 |
SELDOM | 2 | 0x2E |
OFTEN | 2 | 0x3E |
OTOGGLE | 3 | 0x66 |
ATOGGLE | 4 | 0x67 |
The last four mnemonic names are not known in other assemblers.
The SELDOM
and OFTEN
may be used in front of conditional jump instructions
as hints for newer CPUs to help with predictions of the jump target.
The OTOGGLE
and ATOGGLE
switch between 16-bit and 32-bit width of operand and address
portion of machine code. They are normally generated by the assembler
internally whenever needed, without an explicit request.
Up to four prefixes can be defined in one statement but not more than one prefix from the same group.
The names of the prefixes are case insensitive and reserved, they cannot be used as labels. A prefix name may be terminated with colon(s) : (same as symbols).
AMD and Intel 64-bit architecture introduced special prefixes REX
,
XOP
, VEX
, MVEX
, EVEX
.
€ASM treats them as part of operation encoding and does not provide
mnemonic for their direct declaration.
[AMDSSE5] introduced
another instruction prefix DREX
, but DREX-encoded instructions are not
supported by €ASM as they never made it to the production, as far as I know.
The segment-override prefixes SEG*S can be alternatively requested as a component of memory-variable register expression. In this case they are emitted only when they are not redundant (when they specify a non-default segment). Explicitly specified prefixes are emitted always, in the order as they appeared in the statement.
EuroAssembler warns when a prefix is used in contradiction with the CPU specification. This can be overrided when the prefix is separated in extra statement.
|0000:F091 |LOCK: XCHG AX,CX ; Prefix Lock should not be used with register operands. |## W2356 Prefix LOCK: is not expected in this instruction. |0002:F0 |LOCK: ; This can be outperformed when the prefix is separated in extra statement, |0003:91 | XCHG AX,CX ; for instance to investigate CPU behaviour in such situation. |0004: | |0004:6691 | XCHG EAX,ECX ; Operand-size prefix 0x66 is emitted internally (in 16-bit segment). |0006:6691 |OTOGGLE: XCHG EAX,ECX ; Its explicit specification has no effect, |0008:6691 |OTOGGLE: XCHG AX,CX ; but here it overrides the registers sizes from 16 to 32 bits.The operation field is the most important field of an assembler statement; it tells €ASM what to do: declare something, change its internal state or emit something to the object file. It often gives its name to the whole statement, we may say an EXTERN operation instead of a statement with EXTERN pseudoinstruction in the operation field.
€ASM recognizes three types (genders) of operation:
Statement may have no operation at all:
[CODE] ; Redirect further emitting to section [CODE]. ; Empty statement may be used for optical separation or for comments. Label: ; Define a label but do not emit any data or code. LOCK: ; Define a machine prefix for the following instruction.
Some statements tell €ASM to generate assembled code|data to the object file, they are called emitting instructions:
The operands specify the data which the operation works. Conversely, the number of operands in the statement is not limited and it depends on the operation. The operand can be a register name, number, expression, identifier, string, and almost any of their various combinations.
The operation field is separated from the first operand with at least one white-space. Operands are separated with an unquoted comma , from one another. There are two kinds of operands recognised in €ASM: ordinal and keyword.
The ordinal operands (or shortly ordinals) are referred by the order in the statement.
The first operand has an ordinal number one (that is one-based index); in macros it is identified as %1
.
For instance, in the MOV AL,BL
statement the AL register
is operand number 1 and BL is number 2. The machine instruction MOV is known
to copy contents of the second operand to the first.
The comma between operands will increase the ordinal number even when the operand is empty (nothing but white-spaces).
An operand of machine instruction may represent a register, immediate integer number, address,
memory variable enclosed in square braces, for instance MOV AL,[ES:SI+16]
.
Some other assemblers allow for different syntax of address expression, which is not supported by EuroAssembler, for instanceMOV AL,
orES:[SI+16]MOV AL,
.[ES:16]+SI
€ASM requires that the entire memory operand is placed inside square braces [].
Beside the ordinal parameters €ASM introduces one more type of operands: keyword operand (or shortly keywords). They are referred by name (key word) rather than by their position in the operands list. A keyword operand has the cannonical form name=value where name is an identifier immediately followed by an equal sign.
Keyword operands have many advantages: they are selfdescribing (if their name is chosen wisely), they don't depend on position in the operand list (no more tedious comma counting), they may be assigned a default value and they may be completely omitted when they have the default value.
Keyword operands are best used with macroinstructions but €ASM also employs them in some pseudoinstructions and even in machine instructions, too. For instance, in
INC [EDI],DATA=DWORD
the keyword parameterDATA=
tells which form of the possible INC machine instruction (increment byte, word or dword variable) should be used.
It should not have an space between keyword and equal sign to be recognized as a valid instrukction modifier:
|0000: |; Let's define two memory variables (with not recommended names). |0000:3412 |DATA: DW 1234h |0002:7856 |WORD: DW 5678h |0004: | |0004:50 | PUSH AX, DATA=WORD |0005: |; Assembled asPUSH AX
.
|0005: |; Operand DATA=WORD
is recognized as a redundant but valid instruction modifier.
|0005: |
|0005:506A00 | PUSH AX, DATA = WORD
|0008: |; Operand DATA = WORD
is not recognized as keyword modifier
|0008: |; due to the space which follows identifier DATA.
|0008: |; €ASM sees the 2nd operand as a numerical comparison between symbols DATA and WORD,
|0008: |; which happen to exist in this program (otherwise E6601 would have been issued).
|0008: |; Their offsets (0000h and 0002h) are different, the result is boolean FALSE
|0008: |; represented with value 0. The statement is recognized as PUSH AX, 0
|0008: |; which is legal, because €ASM accepts integration of multiple ordinal operands
|0008: |; to one statement in machine instructions PUSH, POP, INC, DEC.
|0008: |; The statement is assembled as two instructions: PUSH AX
and PUSH 0
.
The order of keyword operands is not important. It is a good practice to list ordinal operands first and then all keyword operands, but keywords may be mixed freely with ordinals, too.
Label1: Operation1 Ordinal1,Ordinal2,,Ordinal4,, Label2: Operation2 Ordinal1,Keyword1=Value1,Ordinal2,,Ordinal4
Operation1 in the previous example has three operands with ordinal numbers 1,2 and 4. The third operand is empty and the last two commas at the end of line are ignored, as no other nonempty operand follows.
Mixed operands are used in Operation2 and notice that Ordinal2 has an ordinal number 2 although it is the third operand on the list. Keyword operands do not count into ordinal numbers but empty operands do.
A line comment begins with unquoted semicolon ; and it extends to the end of this physical line. Line comments are ignored by assembler, they are geared towards human reader of the source code.
A statement continues on the next physical line when line continuation character, which is an unquoted backslash \, is used at the position where the next field would normally begin.
aLabel: \ ; This semicolon is redundant. MOV EAX, \ The first operand of MOV is destination EBX ; and the second one is source.
Everything that follows the line continuation character is treated like a comment field, so the semicolon may be omitted in this case. In a multiline statement you may add comments to any physical line.
The whole field of any statement must fit on one physical line.
The backslash \ is also used as modulo binary operator, which cannot appear at the beginning of operation, so the confusion is avoided.
; modulo modulo line-continuation ; | | | |0000:01000200 | DW 5 \ 4, 6 \ 4, \ |0004:03000000 | 7 \ 4, 8 \ 4
Statements in assembler source code are processed one by one, from top to bottom in a downwards fashion and some of them might influence successive statements but most instructions are standalone. From this point of view there are three kinds of statements:
A block statement must appear in pair with its corresponding ending statement. The internal state of €ASM is changed only within the range between them, which is called a block.
A block actually begins at the operation field of a begin-block statement and it ends at the operation field of the end-block statement.
Some block statements may be prematurely cancelled (broken) with an exit operation, for instance when an error is detected during a macro expansion.
Label field | Operation field | ||||
---|---|---|---|---|---|
Obligation | Represents | Declares | Begin block | Break | End block |
mandatory | program name | program | PROGRAM | not used | ENDPROGRAM |
mandatory | procedure name | symbol | PROC | not used | ENDPROC |
mandatory | procedure name | symbol | PROC1 | not used | ENDPROC1 |
mandatory | structure name | structure | STRUC | not used | ENDSTRUC |
optional | block identifier | nothing | HEAD | not used | ENDHEAD |
optional | block identifier | nothing | %COMMENT | not used | %ENDCOMMENT |
optional | block identifier | nothing | %IF | %ELSE | %ENDIF |
optional | block identifier | nothing | %WHILE | %EXITWHILE | %ENDWHILE |
optional | ids of Begin/End swapped | nothing | %REPEAT | %EXITREPEAT | %ENDREPEAT |
mandatory | formal control variable | %variable | %FOR | %EXITFOR | %ENDFOR |
mandatory | macro name | macro | %MACRO | %EXITMACRO | %ENDMACRO |
Some end-block operations can be aliased:
ENDPROC
alias ENDP
,
ENDPROC1
alias ENDP1
,
%ENDREPEAT
alias %UNTIL
.
The label field of a block statement specifies the name of the program, procedure, structure or macro. In the preprocessing of a %FOR loop the label field declares a formal variable which changes its value in each loop cycle. In other preprocessing loops (%REPEAT, %WHILE) the label field is optional and it may contain identifier which optically connects the beginning and the ending of block statements together (for nesting check) but has no further significance - it does not declare a symbol.
The same block identifier may be used as the first and only operand of the corresponding end-block statement.
Assemblers are not united in the cannonical format of pseudoinstructions block. In one hand MASM uses the same block identifier in the label fields of both begin- and end-block statements:
MyProcedure PROC ; MASM syntax ; some code MyProcedure ENDPThis is good when you eyeball the source code for a procedure definition, as its name is on the left so it will hit your eyes when you scan the leftmost column. On the other hand, the same label appears in the source twice, making an ugly exception from the rule that a non-local symbol declaration may occur only once in the program.
Perhaps for that reason Borland chose a different syntax in TASM IDEAL mode:
PROC MyProcedure ; TASM syntax ; some code ENDP MyProcedureIt solves the double label problem but the name of MyProcedure never appears in the label field, although it is a regular label.
€ASM presents a compromise solution: the name of block is defined in the label field of a begin-block statement and it may appear in the end-block statement:
MyProcedure PROC ; €ASM syntax ; some code ENDP MyProcedureThe operand in the endblock statement may be omitted but, if used, it must be identical to the label of the corresponding begin-block statement label. This helps to maintain a correct block nesting because €ASM will emit an error when block identifiers don't match.
Blocks of code can be nested, but only correctly, that is, that there is no spillover between them.
A %MACRO block in the example presented below contains a correctly nested %IF block.
WriteCMOS %MACRO Address,Value %IF %1 <= 30h %ERROR "Checksum protected area!" %EXITMACRO WriteCMOS %ENDIF MOV AL,%1 OUT 70h,AL MOV AL,%2 OUT 71h,AL %ENDMACRO WriteCMOS
Incorrect block nesting is only tolerated in procedures declared with the NESTINGCHECK=OFF option.
A block identifier in an operand field of end-block and exit-block statements
usually only guards the correct binding. When blocks of the same type are nested
one in another, exit-block operand can be used to identify the exiting block.
As an example see t2642
where one Inner %FOR
block is nested in Outer %FOR
block, and the operand
of %EXITFOR statement specifies which block is exited.
A switching statement changes the internal state of €ASM for all following statements until another switching statement changes the state again, or until the end of source code is found.
There are two switching pseudoinstructions in €ASM: EUROASM,
and SEGMENT.
The latter has two forms: [name] SEGMENT
(define a new segment) and
[name]
(define new section in current segment if it wasn't defined yet,
and switch emitting to this section).
Examples of switching statements:
EUROASM AUTOSEGMENT=OFF, CPU=486 ; Change €ASM options for all following statements. [Subprocedures] SEGMENT PURPOSE=CODE, ALIGN=BYTE ; Declare a new segment. [.data] ; Switch emitting of following statements to previously defined segment [.data] [StringData] ; Define a new section in the current segment (in [.data]).
All the remaining pseudoinstructions and machine instructions are not logically bound with others in a vertical structure of a program, so they are standalone, by definition.
The size of EuroAssembler elements is not limited by design.
This applies to the length of strings, physical text lines, identifiers, number notations,
expressions, nesting depth and number of operands. They are kept internally
as a signed 32-bit integer number so the theoretical size limit of each such element is
2 GB = 2_147_483_647 bytes (characters)
.
In reality it is the amount of available virtual memory and stack space which restrict elements of this size, and EuroAssembler may terminate well before with a fatal error message F9110 Cannot allocate virtual memory. or F9210 Memory reserved for machine stack is too small for this source file.
Comments are parts of the source code which are not processed by assembler and their only purpose is to explain the code for a human reader. There are four types of comments recognised in €ASM:
Line comments start with an unquoted semicolon; everything up to the end of line is ignored by €ASM. Line comments are copied to the listing file verbatim.
Label: CALL SomeProc ; This is a line comment.
Machine remarks are written by €ASM into the listing file and they contain the generated machine code in hexadecimal notation.
A machine remark starts with a vertical bar | which is the first non-white character on the physical line. A machine remark ends with the second occurence of the same vertical bar || is omitted, the whole physical line is treated as a remark. This is used for inserting error messages into the listing, just below the erroneous statement.
|0030:E81234 |Label1: CALL SomeProc ; This is a line comment. |0033: |Label2: COLL OtherProc ; A typing error in the operation name. |### E6860 Unrecognized operation "COLL", ignored.
Machine remarks are ignored by €ASM and they are not copied to the listing. Instead, €ASM recreates them when the listing produced by previous assembly session is submitted as a source to the assembler.
Machine remarks are not intended to be manually inserted by a programmer into the source text, use an ordinary line comment instead.
When a physical line begins with less-than character <, it is treated as a markup comment and ignored up to the end of line. This enables to mix source code and hypertext markup language tags. Markup comments are not copied onto the listing.
Thanks to the markup comments, €ASM source code can be stored not just only as a plain-text but also as HTML or XML hypertext.
<h2>Description of SomeProcedure</h2> <img src="SomeImage.png"/> SomeProcedure PROC ; See the image above for description.
All source code shipped with €ASM is completely stored in HTML format, which allows to document the source with hypertext links, tables, images and better visual representation than simple line comments could yield.
If you want to keep your source codes in HTML, make sure that ordinary assembler statements do not start with < and rearrange the source so that every markup comment line starts with some HTML tag. You may also use void HTML tags <span/> or <!----> to start the comment line.
A block comment can be used to temporary disable a portion of source code or to include the documentation inside the source code.
Block comment begins with %COMMENT statement
and it ends with the corresponding %ENDCOMMENT. It can span over many lines of program,
which as a sole restriction don't have to start with semicolons.
Block comments are copied into the listing file.
€ASM does not assemble the text inside the commented-out block, but it needs to parse it anyway in order to find the coresponding %ENDCOMMENT statement, so the commented-out text should be a valid source as well.
The text in %COMMENT block must be corectly nested, although it is ignored.
The pseudoinstrucion%COMMENT
could be easily replaced with%IF 0
, but the former one is more intuitive.
CALL SomeProc ; This is a line comment.
%COMMENT ; This is a block comment.
COLL OtherProc ; Intentional typing error in operation name.
%COMMENT ; This is a nested block comment.
%ENDCOMMENT ; End of inner block comment.
; This statement is ignored, too.
%ENDCOMMENT
; Emitting assembly continues here.
An identifier is a human readable text which gives the name to an element of assembler program: a symbol, register, instruction, structure etc.
The length of identifiers is not limited in €ASM and all characters are significant.
A number notation is the way to write numeric value and those numeric values are kept and computed internally by €ASM as 64-bit signed integers.
A number modifier is one of the B D E G H K M P Q T character apended to the end of a digits sequence, or 0N 0O 0X 0Y (a zero followed by a letter) prefixed in front of other digits. All number modifiers are case insensitive. Except for the decimal format, which is the default, a modifier must always be used.
Floating point numbers shell use a period (fullstop) . to separate the integer and decimal part of the number notation.
Another number modifier is the underscore character _ which is ignored by the number parser and it can be used as a digit separator instead of space or comma for a better readability of long numbers. No white spaces are allowed in number notation.
A decimal number is a combination of decimal digits 0..9 optionally suffixed with a
decimal modifier D. There are five other decimal suffixes:
K (Kilo), which tells €ASM to multiply the number by 210=1024,
M (Mega), which tells €ASM to multiply the number by 220=1_048_576,
G (Giga), which tells €ASM to multiply the number by 230=1_073_741_824,
T (Tera), which tells €ASM to multiply the number by 240=1_099_511_627_776,
P (Peta), which tells €ASM to multiply the number by 250=1_125_899_906_842_624.
Decimal numbers may be prefixed with 0N modifier.
All six numbers in the following example have the same value:
1048576, 1048576d, 0n1048576, 1_048_576, 1024K, 1M
.
Pay attention of the fact that using a decimal modifier is done in powers of 2, not in the usual sense of powers of tens.
Maximal possible unsigned number which would fit into 32 bits is 0xFFFF_FFFF=4_294_967_295.
Maximal possible positive number which would fit into 63 bits is 0x7FFF_FFFF_FFFF_FFFF=9_223_372_036_854_775_807.
A binary number is made of digits 0 1 appended with
a binary number modifier B or prefixed by a modifier 0Y. Examples:
0y101, 101b, 00110010b, 1_1111_0100B
are equivalent to decimal numbers
5, 5, 50, 500
respectively.
Maximal 32-bit binary number is 1111_1111__1111_1111__1111_1111__1111_1111b.
Each octal digit 0..7 represents three bits of the equivalent binary notation. The number is terminated with octal suffix Q or prefixed with 0O alias 0o (digit zero followed by the capital or small letter O).
Example: 177_377q = 0o177_377 = 0xFEFF
The biggest 32-bit octal number is 37_777_777_777q.
The biggest 64-bit octal number is 1_777_777_777_777_777_777_777q.
Each hexadecimal digit encodes four bits in one character, which requires 24=16 possible values. Therefore the ten decadic digits are extended with letters A, B, C, D, E, F with values 10, 11, 12, 13, 14, 15. Hexadecimal digits (letters) A..F are case insensitive. When the first digit of a hexadecimal number is represented with a letter A..F, an additional leading zero must be prefixed to the number notation to avoid confusions. Hexadecimal number is terminated with suffix H or it begins with prefix 0X.
Example: 5h, 0x32, 1F4H, 0x1388, 0C350H
represent decadic numbers
5, 50, 500, 5000, 50000
respectively.
Keep in mind that all numbers in €ASM are internally kept as 64-bit signed integer. Although instructionsMOV EAX,0xFFFF_FFFF
andMOV EAX,-1
assemble to identical codes, their operands are internally represented as0x0000_0000_FFFF_FFFF
and0xFFFF_FFFF_FFFF_FFFF
. Boolean expression0xFFFF_FFFF = -1
is false. |00000000:B8FFFFFFFF | MOV EAX, 0xFFFF_FFFF |00000005:B8FFFFFFFF | MOV EAX, -1 |FALSE | %IF 0xFFFF_FFFF = -1
Integers may be written in binary, decimal, octal or hexadecimal notation.
Some number modifiers overlap with hexadecimal digits B, D, E. €ASM parses
as much of the element as possible to solve such ambiguity:
1BH
is recognized as a hexadecimal number 0x1B=27 and not binary 1 followed with letter H.
2DH
is recognized as a hexadecimal number 0x2D=45 and not decimal 2 followed with letter H.
3E2H
is recognized as a hexadecimal number 0x3E2=994 and not 3 * 102 followed with letter H.
Notation | Prefix | Base | Suffix | Multiplier |
---|---|---|---|---|
Binary | 0Y | 2 | B | 1 |
Octal | 0O | 8 | Q | 1 |
Decimal | 0N | 10 | D | 1 |
K | 210 | |||
M | 220 | |||
G | 230 | |||
T | 240 | |||
P | 250 | |||
Hexadecimal | 0X | 16 | H | 1 |
Binary, octal and hexadecimal numbers must always be written with prefix or suffix (or both, however this is not recommended, and it feels awkward). There is no RADIX directive in €ASM.
For more examples of acceptable syntax see €ASM numbers tests.
Floating point alias real numbers are parsed from the scientific notation with decimal point and exponent of 10, using this syntax:
Order | Field name | Contents |
---|---|---|
1 | number sign | +, - or nothing |
2 | significand | digits 0..9, digit separators _ |
3 | decimal point | . |
4 | fraction | digits 0..9, digit separators _ |
5 | FP number modifier | E or e |
6 | exponent sign | +, - or nothing |
7 | exponent part | digits 0..9, digit separators _ |
For instance, in the floating point number 1234.56E3
has value 1234.56 * 103=1234560.
An omitted sign is treated as +.
The decimal part can be omitted when it is zero(s), as 123.00E2 = 123.E2
and even
The decimal point may be omitted when decimal part is omitted (is equal to zero). The E modifier
still specifies the floating point format.
123.00E2 = 123.E2 = 123E2 = 12300.
Exponent can be omitted when it is zero. The modifier E may be omitted in this case, too,
and without the E modifier it is the presence of the decimal point which decides if the number is integer or real.
In our example: 12345.67E0 = 12345.67E = 12345.67
No white space is allowed within FP number notation.
The number is considered as floating point when its notation contains either decimal point ., or modifier E (capital or small letter E), or both. Otherwise it is treated as an integer.
All internal assembly-time calculations in €ASM are provided with 64-bit integers only. When FP is used in mathematical expression, it is converted to an integer first. And the error E6130 (number overflow) is reported if the number does not fit to 64 bits. Warning W2210 (precision lost) is reported if the FP number had decimal part which was rounded in conversion.
An actual FP number format [IEEE754] is maintained only when the scientific notation is used to define the static FP variable with pseudoinstruction DD, DQ, DT.
Half-precision FP numbers (float16) are not supported by €ASM, neither they are supported by processors, with exception of two packed SIMD instructions VCVTPS2PH and VCVTPH2PS, and a few MVEX-encoded up/down conversion operations.
Unlike integer numbers, the sign of FP notation is inseparable from digits which follow. If you by mistake put a space between the sign and the number, instead of FP definition it is treated as an operation (unary minus applied to a number), and therefore the FP number is converted to integer first, before the operation is evaluated. |00000000:001DF1C7 | DD -123.45E3 ; Single-precision FP number -123.45*103. |00000004:C61DFEFF | DD - 123.45E3 ; Dword signed integer number -123450. |00000008:00000000A023FEC0 | DQ -123.45E3 ; Double-precision FP number -123.45*103. |00000010:C61DFEFFFFFFFFFF | DQ - 123.45E3 ; Qword signed integer number -123450. |00000018:0000000000001DF10FC0 | DT -123.45E3 ; Extended-precision FP number -123.45*103. |00000022: | DT - 123.45E3 ; Tbyte integer number is not supported. |### E6725 Datatype TBYTE expects plain floating-point number.
Beside the standard scientific notation of floating-point numbers they may have a special FP constant value:
Constant | Interpretation | single precision (DD) | double precision (DQ) | extended precision (DT) |
---|---|---|---|---|
#ZERO | zero | 00000000 | 00000000_00000000 |
0000_00000000_00000000 |
+#ZERO | positive zero | 00000000 | 00000000_00000000 |
0000_00000000_00000000 |
-#ZERO | negative zero | 80000000 | 80000000_00000000 |
8000_00000000_00000000 |
#INF | infinity | 7F800000 | 7FF00000_00000000 |
7FFF_80000000_00000000 |
+#INF | positive infinity | 7F800000 | 7FF00000_00000000 |
7FFF_80000000_00000000 |
-#INF | negative infinity | FF800000 | FFF00000_00000000 |
FFFF_80000000_00000000 |
#PINF | pseudo infinity | 7F800000 | 7FF00000_00000000 |
7FFF_00000000_00000000 |
+#PINF | positive pseudo infinity | 7F800000 | 7FF00000_00000000 |
7FFF_00000000_00000000 |
-#PINF | negative pseudo infinity | FF800000 | FFF00000_00000000 |
FFFF_00000000_00000000 |
#NAN | not a number | 7FC00000 | 7FF80000_00000000 |
7FFF_C0000000_00000000 |
+#NAN | positive not a number | 7FC00000 | 7FF80000_00000000 |
7FFF_C0000000_00000000 |
-#NAN | negative not a number | FFC00000 | FFF80000_00000000 |
FFFF_C0000000_00000000 |
#PNAN | pseudo not a number | 7F800001 | 7FF00000_00000001 |
7FFF_00000000_00000001 |
+#PNAN | positive pseudo not a number | 7F800001 | 7FF00000_00000001 |
7FFF_00000000_00000001 |
-#PNAN | negative pseudo not a number | FF800001 | FFF00000_00000001 |
FFFF_00000000_00000001 |
#QNAN | quiet not a number | 7FC00000 | 7FF80000_00000000 |
7FFF_C0000000_00000000 |
+#QNAN | positive quiet not a number | 7FC00000 | 7FF80000_00000000 |
7FFF_C0000000_00000000 |
-#QNAN | negative quiet not a number | FFC00000 | FFF80000_00000000 |
FFFF_C0000000_00000000 |
#SNAN | signaling not a number | 7F800001 | 7FF00000_00000001 |
7FFF_80000000_00000001 |
+#SNAN | positive signaling not a number | 7F800001 | 7FF00000_00000001 |
7FFF_80000000_00000001 |
-#SNAN | negative signaling not a number | FF800001 | FFF00000_00000001 |
FFFF_80000000_00000001 |
Names of special constants are case insensitive. If sign + or - is used,
it is unseparable. Examples:
FourNans DY 4 * QWORD #NaN ; Define vector of four double-precision not-a-number FP values.
MOV ESI,=8*Q#ZERO ; Define 8*8 zero bytes in literal section and set ESI to point at them.
A number can also be written as a character constant, which is a string containing not more than eight characters. Its numeric value is taken from ordinal number of each character in the ASCII table. Example of character constants and their values:
'0' = 30h = 48 'abc' = 636261h = 6513249 "4%%" = 2534h = 9524
Assemblers are not united in character constants treatment. MASM and TASM use scriptual convention where the order of characters in the written source code corresponds with the way we write numbers: least significant digit is on the right side.
€ASM as well as other newer assemblers use the memory convention where the order of characters in the written source code corresponds with the order how they are stored in memory on little endian architecture processors.
| | ; MASM and TASM: |00000000:616263 | DB 'abc' ; String. |00000003:63626100 | DD 'abc' ; Character constant. |00000007:B863626100 | MOV EAX,'abc' ; AL='c'. | | ; €ASM, FASM, GoASM, NASM, SpASM: |00000000:616263 | DB 'abc' ; String. |00000003:61626300 | DD 'abc' ; Character constant. |00000007:B861626300 | MOV EAX,'abc' ; AL='a'.
Some operands may acquire only one of the few predefined values,
e.g. the EUROASM option CPU=
may be 086, 186, 286, 386, 486, 586, 686, PENTIUM, P6, X64
.
Although some enumerated values may look like a number, they are not countable, they merely represent a position in a predefined collection.
Any number can be interpreted as a boolean (logical) value, too. Boolean values can acquire one of the two states: false or true. Number 0 is treated as boolean false in logical expression, any nonzero number is treated as true.
All built-in €ASM boolean options have an extended repertoire of possible values. Those boolean values accept
This aplies to the:
Extended boolean enumeration is used only with operands built in the €ASM.
They are not symbols that could be used elsewhere,
such as
.
To achieve similar functionality in macros, the programmer would have to define such symbols first, e.g.MOV EAX,TRUE
FALSE EQU 0 false EQU 0 TRUE EQU -1 true EQU !false MOV EAX,TRUE
When an extended Boolean value is used as the macro keyword operand, it can be also tested in the macro body
with %IF, %WHILE, %UNTIL
, for instance
MacroWithBool %MACRO Bool=On %IF %Bool ; Do something when Bool is set to TRUE. %ELSE ; Do something when Bool is set to FALSE. %ENDIF %ENDMACRO MacroWithBool
Now we may invoke the macro as MacroWithBool Bool=Enable
, MacroWithBool Bool=No
etc.
MacroWithBool %MACRO Bool=0 %IF ! %Bool ; Do someting when Bool is set to FALSE. %ENDIF %ENDMACRO MacroWithBool
The previous example would not work with extended Boolean values, for instance MacroWithBool Bool=False
will complain that E6601 Symbol "False" was not found.. However, reversing the logic should work well:
MacroWithBool %MACRO Bool=0 %IF %Bool %ELSE ; Do someting when Bool is set to FALSE. %ENDIF %ENDMACRO MacroWithBool
A string is a set of arbitrary characters enclosed in quotes. Either double " or single quotes ' (also called apostrophes) may be used to mark the borders of a string. The surrounding quotes do not count into the string contents. All characters within the string lose their semantic significance, with three exceptions:
No escape character is employed in €ASM, in fact the percent sign and quote escape themselves. If you need to use any of the above mentioned characters within a string, they must be doubled. This duplication (self-escaping) concerns only the notation in the source text and it does not increase the final string size in emitted computer memory.
Strings enclosed in 'single quotes' and "double quotes" are equivalent with a single exception: if the contents of a string is a filename, only double quotes may be used, because the apostrophe is a valid character when used in filenames on most filesystems. More examples of string definitions:
|0000:3830202520 |DB "80 %% " |0005:766F74656420224E6F22 |DB "voted ""No""" |000F: |DB '' ; Empty string. |000F:27 |DB "'" ; Single apostrophe. |0010:27 |DB '''' ; Single apostrophe. |0011: |; Examples of invalid syntax (odd number of quotes): |0011: |DB """ |### E6721 Invalid data expression """"". |0011: |DB "It ain't necessarilly so' |### E6721 Invalid data expression ""It ain't necessarilly so'". |0011: |The processor, otherwise known as Central Processing Unit (CPU), operates with data and communicates with its environment (registers, memory and devices). A typical operation reads a piece of information from a register, memory or port (I/O device), makes some manipulation with the data and writes it back to the environment. The least addressable unit is a single byte (1 B) and their number is limited by the addressing space. A register is identified by its name, a device is identified by its port number, a byte in memory is identified by its address.
CPU mode | GPR | I/O port | Memory addressing |
---|---|---|---|
16-bit | 8* 2 B | 64 KB (216) | 1 MB (216+4) |
32-bit | 8* 4 B | 64 KB (216) | 4 GB (232) |
64-bit | 16* 8 B | 64 KB (216) | 16384 PB (264) |
Addressing space is limited by the CPU architecture and by the number of wires connecting addressing pins between the CPU and the memory chips. A combination of logical zeros and ones, which can be measured on those wires, is called physical address (PhA).
From an application programmer's point of view, the processor writes or reads from virtual address (VA). If the memory segmentation is not taken into account, virtual address is sometimes called linear address (LA). As a matter of historical fact both virtual and physical address were identical only in first generations of processors operating in real mode without memory cache and memory paging.
The objects in the linked image of a protected-mode program are often addressed with an offset from the beginning of an image loaded in memory (from the ImageBase). Such offset is called relative virtual address (RVA).
And similary, the position of the data items in file formats are sometimes identified with file address (FA), that is defined as the distance between start of the file and the actual data item position in this file.
PhA, VA, LA, RVA, FA are integer non-negative plain numbers, but addressing objects or data at assembly-time is rather more complicated. From historical reasons, the addressing space is divided into segments of memory and each segment is identified by the contents of a segment register. An address at assembly-time is expressed as number of bytes off, (hence the name offset) between the position and the start of its segment, and the segment identification. See also the chapters Address symbols and Address expressions.
Data and code are retrieved from memory faster when their address is aligned, which means that is rounded to a value which in turn is a multiple of power of two. Even though most of IA-32 CPU instructions can cope with unaligned data, it takes more time as the data read from memory are not in the same cache page and the CPU may need to shift the information internally during the fetch-time.
For the best performance, memory variables should be aligned to their natural alignment which corresponds with their size, see the Autoalign column in Data types table. Doublewords, for instance, have autoalign value 4, which says that the last two bits of a properly aligned address should be zero. QWORD are aligned to 8, therefore the last three bits (8=23) should be zero.
This alignment can be achieved explicitly with ALIGN pseudoinstruction, or with the ALIGN= keyword given in machine instruction or in PROC and PROC1 pseudoinstructions.
Memory variables are being aligned by €ASM implicitly when the EUROASM
option AUTOALIGN=ON is set. For instance the statement
SomeDword: DD 1234
is autoaligned by 4 (offset of SomeDword can be
divided by 4 without a remainder). An important concept is the alignment stuff, which fills the space
in front of the aligned instruction. It is zero 0x00 in data segments
and NOP 0x90 or multibyte NOP in code segments.
The align value may be a numeric expression which evaluates to 1, 2, 4, 8 or a higher power of two. €ASM accepts without warning a zero or an empty value, too, which is identical to ALIGN=1 (it has no effect). Beside the numeric values ALIGN also accepts the enumerated values BYTE, WORD, DWORD, QWORD, OWORD, YWORD, ZWORD or their short versions B, W, D, Q, O, Y, Z.
Alignment is always limited by the alignment of the segment on which the statement lies in. If the current segment is DWORD aligned, we cannot ask for a QWORD or an OWORD alignment in this segment. The default segment alignment is OWORD (10h) in €ASM and it is increased to SectionAlign (usually by 1000h) when the assembled program is in ELF or PE/DLL format.
Beside the instruction modifier ALIGN=
the alignment may also be established
with the explicit ALIGN pseudoinstruction, which allows for intentional disalignment, too.
Though a register remembers information written to it, it is not a part of the addressable memory. Registers can be referenced by their names only, they have no address.
Family | REGTYPE# | Members | Size |
---|---|---|---|
GPR 8-bit | 'B' | AL, AH, BL, BH, CL, CH, DL, DH,
DIB, SIB, BPB, SPB, R8B, R9B, R10B, R11B, R12B, R13B, R14B, R15B DIL, SIL, BPL, SPL, R8L, R9L, R10L, R11L, R12L, R13L, R14L, R15L | 1 |
GPR 16-bit | 'W' | AX, BX, CX, DX, BP, SP, SI, DI, R8W, R9W, R10W, R11W, R12W, R13W, R14W, R15W | 2 |
GPR 32-bit | 'D' | EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI, R8D, R9D, R10D, R11D, R12D, R13D, R14D, R15D | 4 |
GPR 64-bit | 'Q' | RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15 | 8 |
Segment | 'S' | CS, SS, DS, ES, FS, GS | 2 |
FPU | 'F' | ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7 | 10 |
MMX | 'M' | MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7 | 8 |
XMM | 'X' | XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM16, XMM17, XMM18, XMM19, XMM20, XMM21, XMM22, XMM23, XMM24, XMM25, XMM26, XMM27, XMM28, XMM29, XMM30, XMM31 | 16 |
AVX | 'Y' | YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16, YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM29, YMM30, YMM31 | 32 |
AVX-512 | 'Z' | ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31 | 64 |
Mask | 'K' | K0. K1, K2. K3, K4, K5, K6, K7 | 8 |
Bound | 'N' | BND0, BND1, BND2, BND3 | 16 |
Control | 'C' | CR0, CR2, CR3, CR4, CR8 | 4 |
Debug | 'E' | DR0, DR1, DR2, DR3, DR6, DR7 | 4 |
Test | 'T' | TR3, TR4, TR5 | 4 |
Register names are case insensitive. General Purpose Registers (GPR) are aliased, for instance AL is another name for the lower half of AX, which is the lower half of EAX, which is the lower half of RAX.
Similary, SIMD (AVX) registers are aliased as well: XMM0 is another name for the lower half of YMM0, which is the lower half of ZMM0.
Names of 8-bit registers DIB, SIB, BPB, SPB, R8B..R15B are aliases for the least significant byte of RDI, RSI, RBP, RSP, R8..R15. They may also be referred as DIL, SIL, BPL, SPL, R8L..R15L, as used in Intel manual. €ASM supports both suffixes ~L and ~B. Those registers are available in 64-bit mode only.
Some other assemblers and Intel manuals use notation ST(0), ST(1)..ST(7) for Floating-Point Unit register names, but this syntax is not accepted in €ASM. Neither can be ST0 register aliased with ST (top of the FPU stack).
Processor x86 contains some other registers which hold flags, descriptor tables, FPU control and status registers, but they are not listed in the table above because they are not directly accessible by their name.
The result of some CPU operations is treated as a predicate with mnemonic shortcut that can be used as a part of instruction name.
Some combinations of CPU flags ZF, CF, OF, SF, PF are given special names, so called condition codes. They are used in mnemonic of conditional branching using the jump instructions or in bit-manipulation general-purpose instructions.
Inverted code can be used in macroinstructions to bypass region of code when the condition is not met. See the automatic %variable inverted condition code.
Num. value | Mnemonic code | Alias | Description | Condition | Inverted mnem.code |
---|---|---|---|---|---|
0x4 | E | Z | Equal | ZF=1 | NE |
0x5 | NE | NZ | Not Equal | ZF=0 | E |
0x4 | Z | E | Zero | ZF=1 | NZ |
0x5 | NZ | NE | Not Zero | ZF=0 | Z |
0x2 | C | B | Carry | CF=1 | NC |
0x3 | NC | NB | Not Carry | CF=0 | C |
0x2 | B | C | Borrow | CF=1 | NB |
0x3 | NB | NC | Not Borrow | CF=0 | B |
0x0 | O | Overflow | OF=1 | NO | |
0x1 | NO | Not Overflow | OF=0 | O | |
0x8 | S | Sign | SF=1 | NS | |
0x9 | NS | Not Sign | SF=0 | S | |
0xA | P | PE | Parity | PF=1 | NP |
0xB | NP | PO | Not Parity | PF=0 | P |
0xA | PE | P | Parity Even | PF=1 | PO |
0xB | PO | NP | Parity Odd | PF=0 | PE |
0x7 | A | NBE | Above | CF=0 && ZF=0 | NA |
0x6 | NA | BE | Not Above | CF=1 || ZF=1 | A |
0x3 | AE | NB | Above or Equal | CF=0 | NAE |
0x2 | NAE | B | Not Above nor Equal | CF=1 | AE |
0x2 | B | NAE | Below | CF=1 | NB |
0x3 | NB | AE | Not Below | CF=0 | B |
0x6 | BE | NA | Below or Equal | CF=1 || ZF=1 | NBE |
0x7 | NBE | A | Not Below nor Equal | CF=0 && ZF=0 | BE |
0xF | G | NLE | Greater | SF=OF && ZF=0 | NG |
0xE | NG | LE | Not Greater | SF<>OF || ZF=1 | G |
0xD | GE | NL | Greater or Equal | SF=OF | NGE |
0xC | NGE | L | Not Greater nor Equal | SF<>OF | GE |
0xC | L | NGE | Less | SF<>OF | NL |
0xD | NL | GE | Not Less | SF=OF | L |
0xE | LE | NG | Less or Equal | SF<>OF || ZF=1 | NLE |
0xF | NLE | G | Not Less nor Equal | SF=OF && ZF=0 | LE |
CXZ | CX register is Zero | CX=0 | |||
ECXZ | ECX register is Zero | ECX=0 | |||
RCXZ | RCX register is Zero | RCX=0 |
Streaming Single Instruction Multiple Data Extension instructions (V)CMPccSS,(V)CMPccSD,(V)CMPccPS,(V)CMPccPD use different set of condition codes cc.
Only aliased mnemonic code is documented for legacy instructions CMPccSS,CMPccSD,CMPccPS,CMPccPD.
Num. value | Mnemonic code | Alias | Description |
---|---|---|---|
0x00 | EQ_OQ | EQ | Equal, Ordered, Quiet |
0x01 | LT_OS | LT | Less Than, Ordered, Signaling |
0x02 | LE_OS | LE | Less than or Equal, Ordered, Signaling |
0x03 | UNORD_Q | UNORD | Unordered, Quiet |
0x04 | NEQ_UQ | NEQ | Not Equal, Unordered, Quiet |
0x05 | NLT_US | NLT | Not Less Than, Unordered, Signaling |
0x06 | NLE_US | NLE | Not Less than or Equal,Unordered, Signaling |
0x07 | ORD_Q | ORD | Ordered, Quiet |
0x08 | EQ_UQ | Equal, Unordered, Quiet | |
0x09 | NGE_US | NGE | Not Greater than or Equal, Unordered, Signaling |
0x0A | NGT_US | NGT | Not Greater Than, Unordered, Signaling |
0x0B | FALSE_OQ | FALSE | False, Ordered, Quiet |
0x0C | NEQ_OQ | Not Equal, Ordered, Quiet | |
0x0D | GE_OS | GE | Greater than or Equal, Ordered, Signaling |
0x0E | GT_OS | GT | Greater Than, Ordered, Signaling |
0x0F | TRUE_UQ | TRUE | True, Unordered, Quiet |
0x10 | EQ_OS | Equal, Ordered, Signaling | |
0x11 | LT_OQ | Less Than, Ordered, Quiet | |
0x12 | LE_OQ | Less than or Equal, Ordered, Quiet | |
0x13 | UNORD_S | Unordered, Signaling | |
0x14 | NEQ_US | Not Equal, Unordered, Signaling | |
0x15 | NLT_UQ | Not Less Than, Unordered, Quiet | |
0x16 | NLE_UQ | Not Less than or Equal, Unordered, Quiet | |
0x17 | ORD_S | Ordered, Signaling | |
0x18 | EQ_US | Equal, Unordered, Signaling | |
0x19 | NGE_UQ | Not Greater than or Equal, Unordered, Quiet | |
0x1A | NGT_UQ | Not Greater Than, Unordered, Quiet | |
0x1B | FALSE_OS | False, Ordered, Signaling | |
0x1C | NEQ_OS | Not Equal, Ordered, Signaling | |
0x1D | GE_OQ | Greater than or Equal, Ordered, Quiet | |
0x1E | GT_OQ | Greater Than, Ordered, Quiet | |
0x1F | TRUE_US | True, Unordered, Signaling |
Combination of punctuation characters is used in €ASM to prescribe various operations with numbers, addresses, strings and registers in the assembly process. Placing a binary operator between the two numbers tells €ASM to replace these three elements with the result of operation. Some operators are unary, they modify the value of operand which they stand in front of.
All operations implemented in €ASM are presented in the following table.
Operation | Priority | Properties | Left operand | Operator | Right operand | Result | II (6) |
---|---|---|---|---|---|---|---|
Membership | 16 | binary noncomm. (1) | identifier | . | identifier | identifier | |
Attribute | 15 | unary noncomm. (3) | attr# | element | number or address | ||
Case-insens. Equal | 14 | binary commutative (2) | string | == | string | boolean | CMPS |
Case-sens. Equal | 14 | binary commutative | string | === | string | boolean | CMPS |
Case-insens. Nonequal | 14 | binary commutative (2) | string | !== | string | boolean | CMPS |
Case-sens. Nonequal | 14 | binary commutative | string | !=== | string | boolean | CMPS |
Plus | 13 | unary (3) | + | number | numeric | NOP | |
Minus | 13 | unary (3) | - | number | numeric | NEG | |
Shift Logical Left | 12 | binary noncommutative | number | << | number | numeric | SHL |
Shift Arithmetic Left | 12 | binary noncommutative | number | #<< | number | numeric | SAL |
Shift Logical Right | 12 | binary noncommutative | number | >> | number | numeric | SHR |
Shift Arithmetic Right | 12 | binary noncommutative | number | #>> | number | numeric | SAR |
Signed Division | 11 | binary noncommutative | number | #/ | number | numeric | IDIV |
Division | 11 | binary noncommutative | number | / | number | numeric | DIV |
Signed Modulo | 11 | binary noncommutative | number | #\ | number | numeric | IDIV |
Modulo | 11 | binary noncommutative | number | \ | number | numeric | DIV |
Signed Multiplication | 11 | binary commutative | number | #* | number | numeric | IMUL |
Multiplication | 11 | binary commutative | number | * | number | numeric | MUL |
Scaling | 10 | binary commutative (5) | number | * | register | address expression | |
Addition | 9 | binary commutative | number | + | number | numeric | ADD |
Subtraction | 9 | binary noncommutative | number | - | number | numeric | SUB |
Indexing | 9 | binary commutative (5) | number | + | register | address expression | |
Bitwise NOT | 8 | unary (3) | ~ | number | numeric | NOT | |
Bitwise AND | 7 | binary commutative | number | & | number | numeric | AND |
Bitwise OR | 6 | binary commutative | number | | | number | numeric | OR |
Bitwise XOR | 6 | binary commutative | number | ^ | number | numeric | XOR |
Above | 5 | binary noncommutative | number | > | number | boolean | JA |
Greater | 5 | binary noncommutative | number | #> | number | boolean | JG |
Below | 5 | binary noncommutative | number | < | number | boolean | JB |
Lower | 5 | binary noncommutative | number | #< | number | boolean | JL |
Above or Equal | 5 | binary noncommutative | number | >= | number | boolean | JAE |
Greater or Equal | 5 | binary noncommutative | number | #>= | number | boolean | JGE |
Below or Equal | 5 | binary noncommutative | number | <= | number | boolean | JBE |
Lower or Equal | 5 | binary noncommutative | number | #<= | number | boolean | JLE |
Numeric Equal | 5 | binary commutative | number | = | number | boolean | JE |
Numeric Nonequal | 5 | binary commutative (4) | number | != or <> | number | boolean | JNE |
Logical NOT | 4 | unary (3) | ! | number | boolean | NOT | |
Logical AND | 3 | binary commutative | number | && | number | boolean | AND |
Logical OR | 2 | binary commutative | number | || | number | boolean | OR |
Logical XOR | 2 | binary commutative | number | ^^ | number | boolean | XOR |
Segment separation | 1 | binary noncommutative | number | : | number | address expression | |
Data duplication | 0 | binary noncomm. (1) (5) | number | * | datatype | data expression | |
Range | 0 | binary noncomm. (1) | number | .. | number | range | |
Substring | 0 | binary noncomm. (1) | text | [ ] | range | text | |
Sublist | 0 | binary noncomm. (1) | text | { } | range | text |
(1) Special operations Membership, Duplication, Range, Substring, Sublist are solved at parser level rather than by the €ASM expression evaluator. They are listed here only for completeness.
(2) Case insensitive string-compare operations ignore the character case of letters A..Z but not the case of accented national letters above ASCII 127.
(3) Unary operator applies to the following operand. Binary operators work with two operands. Attribute operator applies to the following element or expression in parenthesis/brackets.
(4) Numeric Nonequal operation has two aliased operators != and <>. You can choose whichever you like.
(5) Operation Multiplication, Scaling and Duplication share the same operator *. Similary Addition and Indexing share operator +. The actual operation is determined by the operands types.
(6) Column II illustrates which equivalent machine instruction is used internally to compute the operation at assembly-time.
The commutative property specifies whether both operands of a binary operation can be exchanged without having impact to the result.
Priority column specifies the order of processing operators. Higher priority operations compute sooner but this can be changed with priority parenthesis ( ). Operation with equal priority compute in their notation order (from left to right).
Operations which calculate with signed integers have the operator prefixed with #. Operations Addition and Subtraction do not need a special "#signed" version because they compute with signed and unsigned integer numbers in the same way.
Both numeric and boolean operations return 64-bit number.
In case of boolean operations the result number has one of the two
possible values: 0
(FALSE) or -1 = 0xFFFF_FFFF_FFFF_FFFF
(TRUE).
For example the expression
'+' & %1 #>= 0 | '-' & %1 #< 0
is evaluated as
('+' & (%1 #>= 0)) | ('-' & (%1 #< 0))
and its result is the minus sign (45) if %1
is negative
and plus sign (43) otherwise.
Spaces which separate operands and operators in expression examples serve only for better readability and they are not required by €ASM syntax.
Rich set of operators allows €ASM to get rid of cloned pseudoinstructions such as
IFE, IFB, IFIDN, IFIDNI, IFDIF, ERRIDNI, ERRNB...
The Shift operators family is given higher priority than in other languages because I treat shifts as a special kind of multiplication/division.
NASM evaluates the expression4+3<<2
as(4+3)<<2 = 28
but in €ASM it is evaluated as4+(3<<2) = 16)
.
Expression is a combination of operands, operators and priority parenthesis () which follows the rules in the table below.
What may follow | left parenthesis | unary operator | operand | binary operator | right parenthesis | end of expression |
---|---|---|---|---|---|---|
beginning of expression | yes | yes | yes | no | no | yes (2) |
left parenthesis | yes | yes | yes | no | yes (2) | no |
unary operator | yes | no | yes | no | no | no |
operand | no | no | no | yes | yes | yes |
binary operator | yes | yes (1) | yes | no | no | no |
right parenthesis | no | no | no | yes | yes | yes |
(1) Unary operator is permitted after the binary operation,
e.g. 5*-3
evaluates as 5*(-3)
.
(2) Empty expression, empty parenthesis contents and superabundant parenthesis are valid.
The table shows which combinations are permitted. It should be read by rows, for instance the first line stipulates that expression may begin with the left parenthesis, unary operator or an operand.
Expression is parsed into elementar unary and binary operations, which are calculated according to the priority. Operations with the same priority are computed from left to right. Priority can be increased using parenthesis ( ).
Result of the numeric or logical expression is a scalar 64-bit numeric value (signed integer). It may be treated as a number or as a logical value. Zero result is treated as boolean false and any nonzero result is boolean true. Pure logical expressions, such as logical NOT, AND, OR, XOR and all compare operations return 0 when false and 0xFFFF_FFFF_FFFF_FFFF = -1 when true. This enables to use the result of logical expression in subsequent bitwise operations with all bits.
String compare expressions return a boolean value. Case insensitive versions convert both strings to the same case before actual comparing; however this concerns ASCII letters A..Z only. National letters with accents in any codepage are always compared case sensitively.
String compare is given the highest priority since no other assembly-time operation can be performed with strings beside the test of equality. At assembly time €ASM cannot tell which string is "bigger". |00000000:FFFFFFFFFFFFFFFF | DQ "EAX" == "eax" ; TRUE, the strings are equal. |00000008:0000000000000000 | DQ "EAX" === "eax" ; FALSE, the strings differ in character case. |00000010:FFFFFFFFFFFFFFFF | DQ "I'm OK." === 'I''m OK.' ; TRUE, their netto value is equal. |00000018:0000000000000000 | DQ "Müller" == "MÜLLER" ; FALSE because of the different case of umlauted U's. |00000020:0000000000000000 | DQ "012" == "12" ; FALSE, the strings are not equal. |00000028:0000000000000000 | DQ "123" = 123 ; FALSE; the character constant "123"=3355185 which is not 123. |00000030: | DQ "123" == 123 ; Syntax error; right operand is not a string. |### E6321 String compare InsensEqual with non-string operand in expression ""123" == 123". |00000030:
Case insensitive string compare should be used with built-in €ASM elements, such as register or datatype names , e.g.
%IF '%1' !== 'ECX' %ERROR Only register ECX is expected as the first macro operand. %ENDIF
When we are investigating the presence of punctuation, it's better to use case-sensitive compare, because it assembles faster (€ASM doesn't have to convert both sides to a common character case):
DoSomethingWithMemoryVar %MACRO %IF '%1[1]' !=== '[' ; Test if the 1st operand begins with a square bracket. %ERROR The first operand should be a memory variable in [brackets]. %ENDIF %ENDMACRO DoSomethingWithMemoryVar
The test on square bracket in previous example fails if the macro operand is a string or character-constant
in quotes, e.g. DoSomethingWithMemoryVar 'xyz'
. The string compare operation will raise
E6101 Expression "''' !=== '" is followed by unexpected character "[". because of syntax error.
A trick how to avoid E6101 is to compare doubled values. In this case both single or double quotes escape themselves:
DoSomethingWithMemoryVar %MACRO %IF '%1[1]%1[1]' !=== '[[' ; Test if the 1st operand begins with a square bracket. %ERROR The first operand should be a memory variable in [brackets]. %ENDIF
The numeric compare operations use a single equal sign =, optionally combined with < or > and they can compare values of two plain numbers or offsets of two addresses within the same segment.
Numeric compare can be used to test which side of operation is bigger. Terms above/below are used when comparing unsigned numbers or addresses. Terms greater/lower are used for comparing signed numbers. Operators which treat numbers as signed are prefixed with # modifier. Virtual addresses are always unsigned, therefore we cannot ask whether they are greater or lower.
|00000000:FFFFFFFFFFFFFFFF | DQ 5 < 7 ; TRUE, 5 is below 7. |00000008:FFFFFFFFFFFFFFFF | DQ 5 #< 7 ; TRUE, 5 is lower than 7. |00000010:0000000000000000 | DQ 5 #< -7 ; FALSE, 5 is not lower than -7. |00000018:FFFFFFFFFFFFFFFF | DQ 5 < -7 ; TRUE, 5=0x0000_0000_0000_0005 is below -7=0xFFFF_FFFF_FFFF_FFF9. |00000020:FFFFFFFFFFFFFFFF | DQ 123 = 0123 ; TRUE, both numbers are equal. |00000028:0000000000000000 | DQ "123" == "0123" ; FALSE, both strings are different. |00000030:0000000000000000 | DQ "123" = "0123" ; FALSE, both sides are treated as character constants with different values. |00000038: | DQ "123" = "000000123" ; "000000123" is not a number, its too big for a character constant. |### E6131 Character constant "123" = "000000123" is too big for 64 bits. |00000038: |Common arithmetic operations are Addition, Subtraction, Multiplication, Division and Modulo (remainder after division).
Unary minus may be applied to scalar numeric operand only. Unary plus does not change
the value of operand; it is included in the operator set only for completeness.
Adjacent binary and unary numeric operator is accepted by €ASM, however weird
this may seem. This is useful in evalution expressions with substituted value,
such as 5 + %1
where the symbolic argument %1 happens to be
negative, e. g. -2
. This expression is calculated as
5 + %1
.-> 5 + -2 -> 5 + (-2) -> 3
The greatest permitted value of integer number in €ASM source is
0xFFFF_FFFF_FFFF_FFFF
as unsigned, or -> 18_446_744_073_709_551_6150x7FFF_FFFF_FFFF_FFFF
as signed.
Overflow at assembly time is ignored in Addition, Subtraction and Shift Logical operation.
Assembly error is reported when overflow occurs during Multiplication
and Shift Arithmetic Left operation, or when division-by-zero happens
during Division or Modulo operation.
This maximum must not be exceeded even in intermediate results during the evaluation,
such as -> 9_223_372_036_854_775_808 0x7FFF_FFFF_FFFF_FFFF * 2 / 2
(€ASM reports error). However, rearranged code
0x7FFF_FFFF_FFFF_FFFFF * (2 / 2)
assembles well.
No overflow is reported in following examples of numeric expressions evaluation:
|00000000:0E00000000000000 | DQ 2 + 3 * 4 ; Result is 14. |00000008:0200000000000000 | DQ 0xFFFF_FFFF_FFFF_FFF9 + 0x0000_0000_0000_0009 ; Result is 2. |00000010:0200000000000000 | DQ -7 + 9 ; Result is 2 (0xFFFF_FFFF_FFFF_FFF9 + 0x0000_0000_0000_0009). |00000018:0200010000000000 | DQ 0xFFF9 + 0x0009 ; Result is 65538 (0x0000_0000_0000_FFF9 + 0x0000_0000_0000_0009). |00000020: |€ASM calculates with the integer truncated division and with [Modulo] at assembly-time in the same way as machine instruction IDIV.
Before the signed division applies, both divident and divisor are internally converted to
positive numbers. Then, having been divided as unsigned, the quotient is converted to negative if one
of the operands (but not both) was negative.
Remainder in signed modulo operation is converted to negative only when the divident was negative.
The shift operations are not commutative. Operand on the left side is treated as a 64-bit integer and shifted to the left or right by the number of bits specified by the operand on the right side.
Shift operations at assembly time are given higher priority than other numeric operation
because they correspond with computing power of 2 rather than with multiplication or division.
For instance 1 << 7
is equivalent to 1 * 27
.
NASM evaluates the expression4 + 3 << 2
as(4 + 3) << 2
, but in €ASM it is evaluated as->284 + (3 << 2)
.->16
Bits which enter the least significant bit (LSb) during Shift Left operation are always 0. Bits which enter the most significant bit (MSb) during Shift Right operation are either 0 (Shift Logical Right), or they copy their previous value (Shift Arithmetic Right), thus preserving the sign of operand.
Bits which leave LSb during Shift Right are discarded. Bits which leave MSb during Shift Left are discarded, too, but overflow error E6311 is reported by €ASM when the sign of result (kept in MSb) has changed during Shift Arithmetic Left. Overflow sensitivity is the only difference between Shift Arithmetic Left and Shift Logical Left.
The right operand may be arbitrary number; however when it is greater than 64,
the result is 0 with one exception: negative number shifted arithmetic right by more
than 64 bit results in 0xFFFF_FFFF_FFFF_FFFF
.-> -1
Shift by 0 bits does nothing. Shift by a negative number just reverses the direction of actual shift from left to right and vice versa.
Assembly-time rotate operations are not supported.
|00000000:0000010000000000 | DQ 1 << 16 ; The result is 65536. |00000008:F4FFFFFFFFFFFFFF | DQ -3 #<< 2 ; The result is -12. |00000010:8078675645342312 | DQ 0x1122_3344_5566_7788 << 4 ; The result is 0x1223_3445_5667_7880. |00000018:98A9BACBDCEDFE0F | DQ 0xFFEE_DDCC_BBAA_9988 >> 4 ; The result is 0x0FFE_EDDC_CBBA_A998. |00000020:98A9BACBDCEDFEFF | DQ 0xFFEE_DDCC_BBAA_9988 #>> 4 ; The result is 0xFFFE_EDDC_CBBA_A998. |00000028:0000000000000000 | DQ 0x8000_0000_0000_0000 << 1 ; The result is 0x0000_0000_0000_0000. |00000030: | DQ 0x8000_0000_0000_0000 #<< 1 ; Overflow, MSb would have been changed. |### E6311 ShiftArithmeticLeft 64-bit overflow in "0x8000_0000_0000_0000 #<< 1". |00000030: |Bitwise NOT, AND, OR, XOR perform logical operation with the whole operands bit per bit.
|0000:FA | DB ~ 5 ; ~ 0000_0101b is 1111_1010b which is -6. |0001:04 | DB 5 & 12 ; 0000_0101b & 0000_1100b is 0000_0100b which is 4. |0002:0D | DB 5 | 12 ; 0000_0101b | 0000_1100b is 0000_1101b which is 13. |0003:09 | DB 5 ^ 12 ; 0000_0101b ^ 0000_1100b is 0000_1001b which is 9.Logical NOT, AND, OR, XOR operate with the numbers as well as with the boolean values.
Each operand, which is internally stored as a nonzero 64-bit number, is converted to boolean
true (0xFFFF_FFFF_FFFF_FFFF
) before the actual logical operation.
Operand with the value 0 is treated as false.
Numeric expressions operate with immediate numeric values, such as 1, 0x23, '4567'
or with symbols representing such scalar numeric value, such as NumericSymbolTen EQU 10
.
On the other hand, most symbols in a real assembler program represent address value which points
to some data in memory or to some position in the program code.
While a plain number (scalar) is internally stored by €ASM in eight bytes, an address needs additional room to keep information of the segment it belongs to.
Imagine yourself driving a car. You're passing the milestone 123 on a highway when some friends of yours ring you up that they're passing the milestone 97. How far are you from one another? The answer is as easy as subtracting only when you are both driving on the same highway.
The set of operations defined with address symbols is very limited in comparison with numeric expressions. They cannot be multiplied, divided, shifted, logically operated. Only two kind of operations are allowed with addresses:
Memory variables are addressed as the offset from the first byte of used memory segment (displacement) which may be updated at run-time with the contents of one or two registers. Notation of such address is called register expression or memory address expression.
Unlike instructions with immediate number embedded in the instruction code, such as
ADD EAX,1234
, machine instructions which load|store data somewhere from|to memory,
must have the entire operand enclosed in brackets [ ].
For instance ADD EAX,[1234]
, where 1234 is offset of dword variable in data segment
where the addend is loaded from.
MASM allows to omit square brackets even when the operand is a variable defined in memory, for instanceADD EAX,Something
. A poor reader of MASM program has to search for the definition of the variable to learn whether it was defined in memory (Something DD 1
) or if it was defined as a constant (Something EQU 1
). Newer assemblers abandoned this design flaw, luckily.
When the address expression is used in machine instruction, it may be completed
with registry names; it becomes register address expression.
Complete address expression follows the schema
segment: base + scale * index + displacement
where
segment is segment register CS, DS, ES, SS, FS, GS
,
base is BX, BP
in 16-bit addressing mode, or
EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI, R8D..R15D
in 32-bit addressing mode, or
RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI, R8..R15
in 64-bit addressing mode,
scale is a numeric expression which evaluates to a scalar number 0, 1, 2, 4 or 8
,
index is SI, DI
in 16-bit addressing mode, or
EAX, EBX, ECX, EDX, EBP, ESI, EDI, R8D..R15D
in 32-bit addressing mode, or
RAX, RBX, RCX, RDX, RBP, RSI, RDI, R8..R15
in 64-bit addressing mode,
displacement is an address or numeric expression with magnitude (width)
not exceeding the addressing mode.
Some assemblers allow different syntax of memory addressing, for instance,
MOV EAX,Displ[ESI],
MOV EAX,dword ptr [Displ+ESI],
MOV EAX,Displ+[4*ESI].
MOV EAX,Displ+4*[ESI]+[EBX]
EuroAssembler requires that the whole operand is surrounded in square brackets:MOV EAX,[Disp+4*ESI+EBX]
.
The order of components in addressing expression is arbitrary.
Any portion of register address expression may be omitted.
Scale is not permitted in 16-bit addressing mode and scale cannot be used if indexregister is not specified.
ESP and RSP cannot be used as index register (they cannot be scaled).
Addressing modes of different sizes cannot be mixed in the same instruction, e. g.
.
[EBX+SI]
16-bit addressing mode is not available in 64-bit CPU mode.
16-bit addressing mode in 16-bit and 32-bit segment | |
---|---|
base register | BX SS:BP |
index register | SI DI |
displacement | 16-bit signed integer, sign-extended to segment's width at run-time |
32-bit addressing mode in 16-bit and 32-bit segment | |
base register | EAX EBX ECX EDX ESI EDI SS:EBP SS:ESP |
index register | EAX EBX ECX EDX ESI EDI EBP |
displacement | 32-bit signed integer, sign-extended|truncated to segment's width at run-time |
32-bit addressing mode in 64-bit segment | |
base register | EAX EBX ECX EDX ESI EDI SS:EBP SS:ESP R8D..R15D |
index register | EAX EBX ECX EDX ESI EDI EBP R8D..R15D |
displacement | 32-bit signed integer, sign-extended to segment's width at run-time |
64-bit addressing mode in 64-bit segment | |
base register | RAX RBX RCX RDX RSI RDI SS:RBP SS:RSP R8..R15 |
index register | RAX RBX RCX RDX RSI RDI RBP R8..R15 |
displacement | 32-bit signed integer, sign-extended to segment's width at run-time |
MOFFS addressing mode in 16-bit, 32-bit and 64-bit segment | |
base register | none |
index register | none |
displacement | unsigned integer of segment's width (16|32|64 bits) |
When the segment register is not explicitly specified, a default segment is used for addressing
the operand. If BP, EBP, RBP, ESP or RSP is used as a baseregister, the default segment is SS
,
otherwise it is DS
.
Nondefault segment register used for data retrieving may be specified either as an explicit
instruction prefix SEGCS SEGDS SEGES SEGSS SEGFS SEGGS
,
or as a segment register which becomes part of the register expression (implicit segment override).
The segment register may be included in expression either with colon :
(segment separator) or with plus + (indexing operator):
There is a subtle difference between implicit and explicit segment override: if it requests the same segment register which is already used as a default, €ASM emits the prefix only when it is specified explicitly (in the prefix field of the statement):
|0000:8B04 | MOV AX,[SI] |0002:8B04 | MOV AX,[DS:SI] |0004:3E8B04 | SEGDS: MOV AX,[SI] |0007:3E8B04 | SEGDS: MOV AX,[DS:SI]See t3021, t3022, t3023 for more examples.
In expressions where scaling is not used and therefore it's not obvious
which of the two registers is meant as an index,
€ASM treats the leftmost register as a base.
So in [ESI+EBP]
the base is ESI and implicit segment is DS,
while in [EBP+ESI]
the implicit segment is register SS.
We don't have to bother with implicit segment selection in 32-bit and 64-bit FLAT model programs, because both SS and DS are loaded with the same segment descriptor at load-time.
Although the operators * or + in register address expression look like an ordinary multiplication or addition, they specify a very different kind of operation called Scaling or Indexing when applied to a register. The actual multiplication or addition is performed at run-time rather than at assembly-time, because the assembler cannot know the contents of registers.
Indexing operation has lower priority than the corresponding Multiplication. Hence, the register expression
[EBX + 5 + ESI * 2 * 2]
is evaluated as[EBX + 5 + ESI * (2 * 2)]
.->[EBX + 5 + ESI * 4]
Data expression specifies static data declared with pseudoinstruction
D or with literals.
Format of data expression is
duplicator * type value, where
duplicator is a non-negative integer number, type is primitive
data type in full BYTE UNICHAR WORD DWORD QWORD TBYTE OWORD YWORD ZWORD INSTR
or short B U W D Q T S O Y Z I
notation, or a structure name.
Optional value defines the contents of data which is repeated
duplicator times.
Duplication is not a commutative operation; duplicator
must be on the left side of duplication operator *.
Default duplicator value is 1 (the data is not duplicated).
Nested duplication is not supported in €ASM.
Priority of duplication is very low, so the data expression
2 + 3 * B 4
is evaluated as five bytes where each contains the value 4.
Example:
D 3 * BYTE ; Declare three bytes with uninitialized contents. D W 0x5 ; Declare one word with value 5. D 2 * U "some text" ; Declare Unicode (UTF-16) string containing "some textsome text". D 3 * MyStruc ; Declare three instances of structured memory variable MyStruc.
See also pseudoinstruction D and tests t2480, t2481, t2482 for more examples.
The remaining expression are not calculated with mathematical expression evaluator; they are evaluated by the parser.
The fullstop alias the point . which joins two identifiers will make them
a fully qualified name (FQN),
which looks like a namespace identificator followed by the local name. FQN is nonlocal,
it never starts with fullstop.
For instance, when a local symbol .bar
is declared in a procedure or structure
Foo
, it is treated by €ASM as symbol with FQN Foo.bar
.
Namespace can be local, too, so the membership operation can nest.
Range is defined as two numeric expressions separated with range operator, which is .. (two adjacent fullstops) and it represents the set of integer numbers between those values, including the first and the last value.
A range has the property slope, which can be negative, zero or positive. Slope is defined as the sign of the difference between the right and the left value. Examples:
0 .. 15 ; Range represents sixteen numbers from 0 to 15; slope is positive. -5 .. -4 ; Range represents values -5 and -4; slope is positive. 3 .. 4 - 1 ; Range represents one value 3; slope is zero. 2..-2 ; Range represents five values; slope is negative.
Substring is an operation which returns only part of the input text. Substring operator is a range enclosed in a pair of square brackets []. The text is treated as a sequence of 8-bit characters (bytes) and the range specifies which of them are used.
%Sample1 %SET ABCDEFGH ; Preprocessing variable %Sample1 now contains 8 characters.
DB "%Sample1[3..5]" ; This actually assembles as DB "CDE"
Sublist operation is similar to Substring with the difference that curly brackets {} are used instead of braces and that it treats the input text as an array of comma-separated items (in case of %variable expansion), or as a sequence of physical lines (in case of file inclusion).
INCLUDE "MySource.asm"{1..10} ; Include the first ten lines of file "MySource.asm"
Common properties of suboperations Substring and Sublist:
Suboperator is appended to the suboperated resource (text) without spaces.
Suboperations can be applied on four kinds of elements:
%MyVar[2..8]
INCLUDEBIN "tada.wav"[44..%&]
euroasm "MySource1.asm"{0..24}
BootSec PROGRAM OutFile="boot.sec"[0x7C01..0x7E00]
.When applied to files, the file name must always be specified in double quotes.
%&
.Ordinal number of the last character|item|line of input text is assigned by €ASM to an automatic preprocessing variable with the name %&. This %variable is valid only in the suboperation, it cannot be used outside the braces.
You can use pseudoinstruction %SETS to get the number of characters assigned to a %variable, or pseudoinstruction %SETL to get the number of items in it (array length).
You can use attribute operator FILESIZE# to get the number of bytes in a file at assembly-time.
In Substring
operation the value of automatic %variable %&
specifies the number of characters assigned in the %variable
or it specifies the size of the included file or the object file in bytes.
In Sublist operation it represents the ordinal number of the last non-empty item
in the %variable, or the number of physical lines in the included file.
A suboperated included file must be enclosed in double quotes even when its name doesn't contain spaces. The opening square bracket must immediately follow the input value (%variable name or the quote which terminates the filename). No white spaces are allowed between the %variable and the suboperation left bracket.
Suboperations are very tolerant about the range values. No warning is reported when they refer to a nonexisting character or item, for instance when the range member is zero or negative. Ranges with negative slope simply return nothing. Ranges with zero slope return one character|item|line when the index is between 1 and %&, otherwise they return nothing.
|4142434445464748 |%Sample %SET ABCDEFGH ; Variable %Sample now contains 8 characters. |0000:4142434445 | DB "%Sample[-3..5]" ; DB "ABCDE" |0005:434445464748 | DB "%Sample[ 3..99]" ; DB "CDEFGH" |000B:43 | DB "%Sample[ 3..3]" ; DB "C" |000C: | DB "%Sample[5..3]" ; DB "" |000C:4142434445464748205B352E2E335D | DB "%Sample [5..3]" ; DB "ABCDEFGH [5..3]" ; Not a suboperation.Suboperation range consists of three components:
Some of those components may be omitted, they will be given the default value.
Default minimum indices is 1. Default maximum indices is %&.
|4142434445464748 |%Sample %SET ABCDEFGH ; Preprocessing variable %Sample now contains 8 characters.
|0000:4142434445 | DB "%Sample[..5]" ; -> DB "%Sample[1..5]" -> DB "ABCDE"
|0005:434445464748 | DB "%Sample[3..]" ; -> DB "%Sample[3..8]" -> DB "CDEFGH"
|000B:4142434445464748 | DB "%Sample[..]" ; -> DB "%Sample[1..8]" -> DB "ABCDEFGH"
|0013:4142434445464748 | DB "%Sample[]" ; -> DB "%Sample[1..8]" -> DB "ABCDEFGH"
All the following notations are identical in %variable expansion:
%variable %variable[1..%&] %variable[..%&] %variable[1..] %variable[..] %variable[] %variable{1..%&} %variable{..%&} %variable{1..} %variable{..} %variable{}
The last notation in previous example is useful in %variable names concatenating
when we need to append some literal text to the %variable, for instance 123
to the %variable
contents.
We cannot write %variable123
because the appended digits change
the name of original %variable. The solution is to use empty suboperation,
which doesn't change the %variable contents but it separates its name from
the successive text: %variable[]123
or %variable{}123
.
When the range inside braces contains only one index without range operator,
it is treated as both minimum and maximum value and only one character|item|line
is expanded: %Sample1[3]
.
-> %Sample[3..3] -> C
Suboperations may be chained. The chain is processed from left to right. Example: |4142432C4445462C2C4748492C4A4B4C |%Sample %SET ABC,DEF,,GHI,JKL ; %& is now 16 in %Sample[%&] and 5 in %Sample{%&}. |0000:4A4B | DB "%Sample{4..5}[2..6]{2}" ; DB "JK"
The first sublist in previous example takes items nr.4 and 5,
giving the list of two items GHI,JKL
. The next substring extracts characters
from second to sixth from that sublist, giving HI,JK
. The last sublist operation
expands the second item, which is JK
.
Suboperations may be nested. Inner ranges are calculated before the outer ones:
|31323334353637383930 |%Sample %SET 1234567890
|0000:3233343536 | DB "%Sample[2..%Sample[6]]" ; -> DB "%Sample[2..6]" -> DB "23456"
For each emitting statement the assembler generates some data or machine code which will be dumped to the output file in the end. Fortunately we don't have to write the whole program in the exact sequence which is required by the output file format. Assembled data and code is tossed on demand to one of several output sections. The statement, which will switch assembly to a different section, is quite simple: just the name of the section in square brackets [ ] in the label field of the statement.
Imagine that you (the programmer) act like a manager dictating some code and data to your secretary (EuroAssembler). You have dictated a few instructions, which were written in shorthand by your secretary on a sheet of paper labeled[TEXT]
. Then you decided to dictate other kind of data. The secretary will grab another sheet, label it[DATA]
and start to write there. Later, when you want to dictate some other instructions, your secretary takes the sheet labeled[TEXT]
again, and continues from the point (origin) where it was interrupted.
You are free to open new sheets and to switch between them ad libitum. When the dictation ends, all used sheets will be stapled together (linked).
In EuroAssembler is the term section used for a named division of segment. Each segment has one or more sections. By default any segment has just one section with identical name (base section) which was created at segment definition.
Intel Architecture divides memory to segments controlled by segment registers. Segment is defined in €ASM by the pseudoinstruction SEGMENT.
In the dawn of computer age, programmers demanded more memory then mere 256 bytes or 64 kilobytes which was addressable by 8-bit and 16-bit registers. Designers at Intel in pre-32-bit times might have chosen to use joinder of two 16-bit general registers, such asDX:AX
orSI:BX
and to address inconceivable 4 GB of memory with them, but they didn't. Instead, they invented new 16-bit segment registers specialized by the purpose of addressed memory: register CS for machine code, DS for data, SS for machine stack, ES for extra temporary usage.
Segment registers are used for addressing of 16 bytes long chunks of memory called paragraphs (alias octonary word, OWORD). Linear address in real CPU mode is calculated as a sum ofUsing segment registers for addressing of 16byte paragraphs yields 1 MB of memory addressable by each segment register, which seemed enough for everybody in those times.
- 16-bit or 32-bit offset, and
- paragraph address, obtained from the segment register which is currently in charge, and multiplied by 16.
Contents of the segment register in real processor mode represents
paragraph address of the segment.
Contents of the segment register in protected processor mode represents
index to a descriptor table, which holds some auxilliary information
about the addressed segment (beside its address and size limit): access privileges and width.
Those auxilliary properties are fixed in real mode:
- segment bottom address is specified with segment register contents multiplied by 16
- segment size limit is 64 KB in 16-bit addressing mode
- access privilege is allow everything
- segment width is 16 bits but using 32-bit offsets is also allowed on CPU 386 or newer.
Segment at run-time is a continuous range of operational memory addressable with the contents of one segment register.
Segment at link-time is a named part of object file, which can be concatenated with segments of the same name from other linkable files.
In [MS_PECOFF] terminology is the linkable segment called section. I think the term segment would be more appropriate here, because COFF "sections" are differentiated by access privileges as they are addressed by different segment registers, ergo by different segment descriptors.
In our segment-highway parable, segments in flat protected mode are highway lanes running in parallel, so they share common milestones (offsets), but each lane is dedicated to a different kind of vehicles.
Segment at write-time is a part of assembler source which begins with section switching statement, and which ends with another switching statement or with the end of program.
There is no ENDS (end-of-segment) directive in €ASM. It is not possible to say this part of source code doesn't belong to any segment. When you write the very first statement of your source text, it already belongs to the default (envelope) program, and every program implicitly defines its default segments. Nevertheless, when a structure or numeric constant is being defined, it is irrelevant which segment is currently in charge, because structures and scalar symbols do not belong to any segment, no matter where was the structure or symbol defined in the source.
Segments and section divisions of assembler source do not have to be continuous. In fact, discontinuity is their main raison d'être. It allows to keep data in the source text near the code which manipulates with it, and this is good for readability and understanding of program function.
When segments of assembler program are not much huge, they may be coalesced into segment group. The whole group of segments is addressable with one segment register. Group can be defined with pseudoinstruction GROUP.
When a group is defined, e. g. [DGRP] GROUP [DATA],[STRINGS]
beside the group [DGRP] it automatically creates a segment with the same name [DGRP]
(and consequently a section with the same name [DGRP]). It also declares
that segments [DATA] and [STRINGS] belong to group [DGRP] together with
its base segment [DGRP].
Nevertheless, when nothing is emitted to the implicitely defined segment [DGRP],
it will be discarder in the end.
The relation between segment and its sections in EuroAssembler is similar to the relation between group and its segments.
Whenever a segment is defined (with the pseudoinstruction SEGMENT), a section with the same name is automatically created in it (it is called base section). Other sections of the same segment may be created on demand later. This is done by the statement which has only the section name in its label field (there is no explicit SECTION directive in €ASM).
Section properties (class=, purpose=, combine=, align=) are inherited from the segment which they belong to.
The alignment is not inherited when special literal sections [@LT64] .. [@LT1], [@RT0], [@RT1]..
are created; literal sections are aligned according to the type of data which they keep.
Whenever a group is defined (with the pseudoinstruction GROUP), a segment with the same name is created in it (it is called base segment), together with other segments which we want to incorporate to the group.
Each segment has one or more sections. Each section belongs to exactly one segment. During assembly time all segments are assumed to be loaded at virtual address 0. At the end of each assembly pass are sections virtually linked to their segment, so they begin at higher VA, where the preceeding section ended. However, in pass 1 it is not known yet what size will those sections have, so all sections are assumed to start at VA=0 in pass 1. When the last assembly pass ends, all sections are linked physically (their emitted contents and relocations are concatenated to the segment=base section) and sections are then discarded. Linker is not aware of €ASM sections at all.
Why should we actually split a segment to sections? Well, it is not necessary, mostly we can get by with just one default section per segment. In big programs, on the other hand, it may be useful to group similar kind of data together; we may want to create separate section for double word sized variables, for floating-point numbers, for text strings. This may save a few bytes of alignment stuff, which would be necessary when variables of different sizes are mixed together. Also literal sektions are organized in that way.
Another occasion where sections are handy is fast retrieving from read-only "databases" defined statically somewhere in data segment.
Database can be mentally visualized as a table with many rows and with columns containing data items of constant size. For fast selection of a particular row by an item of a "indexed" key value it is profitable to emit all items from one column sequentially to a section, one after another. The data from every column will have their own section. The width of "indexed" column should be padded to 1, 2, 4 or 8 bytes, so its items can be scanned with a single machine instructionREPNE SCAS
. When an item is found, the difference between register rDI and the start of section identifies the selected row index. Remaining items of this row then can be addressed with the knowledge of row index.
This access method was used in a sample project EuroConvertor and in EuroAssembler itself, where it assigns address of instruction handler to each of two thousands mnemonics, see DistLookupIi.
Each group has one or more segments. Each segment belongs to exactly one group (even when it wasn't explicitly grouped, a group with the segment's name will be implicitly created at link time for the addressing purposes). When a program with executable format is linked, all groups are physically concatenated into image. Loader of realmode executable image is not aware of groups and segments.
€ASM creates implicit segments when it starts to assemble a program. Implicit segment names depend on the chosen program format:
FORMAT= | Implicit segment names |
---|---|
BIN | [BIN] |
BOOT | [BOOT] |
COM | [COM] |
OMF | MZ | [CODE],[RODATA],[DATA],[BSS],[STACK] |
COFF | PE | DLL | ELF | ELFX | ELFSO | [.text],[.rodata],[.data],[.bss] |
If you are not satisfied with the implicit segments created by €ASM, you may redefine them at the start of program or create a new set of segments with different names. Segments and sections which were not used (nothing was emitted to them) will not be linked to output file and they can be ignored.
When the assembly ends and all segments from linked modules have been incorporated (combined) to the base program, €ASM looks at segments which are not part of any group, and creates implicit group for them (name of the group is the same as the segment). Here the memory model is taken into account:
Models with single code segment (TINY, SMALL, COMPACT) link all code into a single group, no matter how many code segments are actually defined in the program.
Multicode models (MEDIUM, LARGE) keep each code segment it its own implicit group, (if they weren't grouped explicitly), hence intersegment jumps, calls and returns should have DIST=FAR.
Similary, single data models (TINY, SMALL, MEDIUM) assume that all initialized and uninitialized data fits into one group not exceeding 64 KB, so the €ASM linker will assign all data segments into the implicit group and register DS does not have to be changed when accessing data from various segments, which may have been defined in the base program or in the linked modules.
Name of the group, segment and section is always surrounded by square brackets in €ASM source.
Unlike symbols, namespace is not preposited to segment name when it starts with . (fullstop). Group, segment, section names are always nonlocal.
Number of characters in group|segment|section name is not limited by €ASM but it may be limited by the output format. In OMF object module the name of a group or segment must not exceed 255 characters. In PE COFF executables the name in section header is truncated to 8 characters.
€ASM treats all names as case sensitive. If you want to link your segment with object module produced by an external compiler which converts segment name to uppercase or which mangles the names by prepending underscores __, you should adapt your naming convention to it.
Segment name should be unique, you cannot define two segments with the identical name
in a program, except for the implicitly created segments, if there were not used yet.
However, it is possible to define segments with same names in different programs
and link them together; their contents will be concatenated according to their
COMBINE=
property. Similar rule applies to groups.
Section names cannot be duplicated on principle. When a section name appears in the source for the second time, it will only switch to that section rather than creating a new one.
Implicit literal section name begins with @LT or @RT, you'd better avoid names which begin with this combination of letters.
Segment which have dollar sign $ in their name are treated in a special way. If the characters on the left side of this $ match, all such segments will be linked adjacently in alphabetic order.
There are conventions how "sections" are named in COFF modules, you may need to adapt to them to succesfully link €ASM program with modules created by different compilers.
When €ASM creates a protected executable ELFX or PE 32-bit or 64-bit program format, we don't have to bother with segments, groups or stack at all. All segment registers are preloaded by Linux or Windows and the stack is established automatically.
When the DOS launches a tiny COM program, it loads CS=DS=SS=ES with the paragraph address of its PSP, sets IP=100h and SP to the end of the stack segment, usually 0FFFEh. Again, we don't have to bother with segment registers at all.
When a MZ executable program is prepared to start, its segment registers have been set by the DOS loader. CS:IP is set to the program entry point, SS:SP is set to the top of machine stack, but both DS and ES point to PSP, which is not our data segment.
There is no instruction in Intel architecture to load segment register with immediate value directly, so this is usually done via register or stack:
; Loading paragraph address of [DATA] to segment register ; using a general purpose register (which is faster): MOV AX, PARA# [DATA] MOV DS,AX ; or using the machine stack (which is shorter): PUSH PARA# [DATA] POP DS
It is the responsibility of programmer to load segment register with the address of another segment, whenever it is used. €ASM makes no assumption about the contents of segment registers; there is no ASSUME, USING, WRT directive in €ASM.
Order is generally based on four sorting keys:
Order of sections
At the end of each assembly pass are all sections linked to their segments in this order:
[@LT64], [@LT32],..[@LT1]
).[@RT0], [@RT1], [@RT2]..
).Order of segments
Segments are combined and linked at link time in this order:
Segments in each group are in the order as they were defined in the source (not as they were declared in the GROUP statement). The base segment is always the first in a group.
When an executable format is linked, every segment is assigned to some group, at least to the implicit one (with identical name).
Implicit groups of segments are used internally for relocation purposes only. Protected mode programs (MODEL=FLAT) do not care of segment registers much, so we don't have to bother with groups in programs for Windows or Linux.
Name | Segment purpose | Access | Size 32-bit | 64-bit |
File alignment 32-bit | 64-bit | Remark |
---|---|---|---|---|---|
MZ DOS header | RW | 128 | 128 | 0) 2) | ||
MZ stub program | RW | 16 | 16 | 0) 2) | ||
PE signature | RW | 4 | 4 | 32 | 32 | 0) 2) | |
File header | R | 20 | 20 | 16 | 16 | 0) | |
Optional header | R | 224 | 240 | 16 | 16 | 0) 2) | |
Section headers | R | NrOfSe*40 | 16 | 16 | 0) | |
.text | CODE | RX | FiAl|SeAl | ||
.rodata | RODATA | R | FiAl|SeAl | ||
.data | DATA | RW | FiAl|SeAl | ||
.bss | BSS | RW | FiAl|SeAl | ||
.idata | IMPORT+IAT | RWX | 16|16 | 0) 2) | |
.edata | EXPORT | RW | 16|16 | 0) 2) 5) | |
.reloc | BASERELOC | RW | 16|16 | 0) 2) | |
.rsrc | RESOURCE | RW | 16|16 | 0) | |
Symbol table | (SYMBOLS) | R | NrOfSym*18 | 16 | 16 | 0) 1) 3) |
String table | (STRINGS) | 4 | 4 | 0) 1) 3) |
Name | Segment purpose | Access | Size 32-bit | 64-bit |
File alignment 32-bit | 64-Sbit | Remark |
---|---|---|---|---|---|
File header | R | 52 | 64 | 0) | ||
Program headers | R | NrOfPh*(32|56) | 16 | 8 | 0) 2) | |
Section headers | R | NrOfSe*(40|64) | 8 | 16) | 0) | |
.symtab | SYMBOLS | NrOfSym*(16|24) | 16 | 8 | ||
.hash | HASH | R | 4 | 4 | 4) | |
.strtab | STRINGS | 1 | 1 | |||
.shstrtab | STRINGS | 1 | 1 | |||
.interp | RODATA | R | 1 | 1 | 4) | |
.plt | PLT | RX | NrOfJmp*16 | 16 | 4) |
.text | CODE | RX | FiAl|SeAl | ||
.rodata | RODATA | R | FiAl|SeAl | ||
.data | DATA | RW | FiAl|SeAl | ||
.bss | BSS | RW | FiAl|SeAl | ||
.dynamic | DYNAMIC | RW | NrOfRec*(8|16) | 8 | 16 | 4) |
Remarks:
0) Special structure without its own section header.
1) Used in relocatable module only.
2) Used in executable image only.
3) Used in executable image only when EUROASM DEBUG=ENABLED.
4) Used in executable image only when linked with shared object library.
Access rights:
R Allocate memory in process address space and allow read.
W Allow write.
X Allow execute.
FiAl|SeAl maximum of File Alignment | Segment Alignment.
Pseudoinstruction %DISPLAY Sections
prints to the listing file a complete map of groups, segments and sections
defined so far at assembly time, one object per line represented by
a debugging message D1260 (group), D1270 (segment), D1280 (section).
Segment is indented with two spaces, section is indented with four spaces.
Instead of%DISPLAY Sections
we could use%DISPLAY Segment
or%DISPLAY Groups
, the result is identical. The entire group/segment/section map is always displayed with those statements.
At link time €ASM prints
a similar map of groups and segments to the listing, with finally used virtual addresses,
unless it was disabled with option PROGRAM LISTMAP=OFF
.
The distance is property of a difference between two addresses.
It is not just the numeric difference of two offsets; in €ASM this term
represents one of three enumerated values: FAR, NEAR, SHORT
.
The distance of two addresses is FAR when they belong to different groups/segments, otherwise it is NEAR or SHORT. Difference of offsets is SHORT if it fits into 8-bit signed integer, i. e. -128..+127.
€ASM is 64-bit assembler, it can also compile programs for the older CPU
which worked with 32 and 16 bit words only. The number of bits which CPU works with simultaneously
is called width and it is either 16
, 32
or 64
.
The width is a property of segment. Some 32-bits object file formats allow to mix segments of different widths in one file. Width of addressing and operating mode can be ad hoc changed with instruction prefix ATOGGLE, OTOGGLE.
Pseudoinstruction PROGRAM has the WIDTH= property, too. It will establish the default for all segments declared in the program. Program width is also used to select the format of output file, for instance if the PExecutable should be created as 32-bit or 64-bit.
Size is a plain non-negative number which specifies the number of bytes in object (register, memory variable, structure, segment, file etc). Size of a string is specified in bytes, no matter if the string is composed of ANSI or WIDE characters.
Size of an object can be counted with at assembly time, using the attribute operator SIZE# or FILESIZE#.
Size of a preprocessing %variable contents can be retrieved with pseudoinstruction %SETS.
Size and length of €ASM elements (identifiers, numbers, structures, expressions, file contents, nesting depth, number of operands, etc.) is not limited by design, but such sizes are internally stored as the signed 32-bit integers, so the actual limitation is 2_147_483_647 characters. In practice we will be restricted by the amount of available memory, of course.
This term is used to count the number of comma-separated items in an array, for instance
the length of operand list in the statement
VPERMI2B XMM1,XMM2,XMM3,MASK=K4,ZEROING=ON
is 5.
Length of a preprocessing %variable contents can be retrieved with pseudoinstruction %SETL.
The names of symbols and structures created in a program must be unique. In large projects
it might be difficult to maintain unique names, especially when
more people work on separate parts of the program. That is why the programmer can
use local identifiers which must be unique only in a division of
source file called namespace. The namespace is a range
of the source specified by namespace block. There are four block-pseudoinstructions
in €ASM which create the namespace: PROGRAM, PROC,
PROC1, STRUC
. The block name is also the name of the namespace.
An identifier is local when its name begins with fullstop ..
Unlike with standards symbols, the characters following the leading fullstop
may start with a decimal digit and it is not an error when they form
a reserved name. Example of valid local identifiers:
.L1, .20, .AX
.
Names of local identifiers are kept in €ASM internally concatenated with namespace name, so they form fully qualified name (FQN). Local symbols may be referred with .local name only within their native namespace block; they may also be referred with fully qualified name anywhere in the program.
The namespace actually starts at the operation field of the block statement and it ends at the operation field of the corresponding endblock statement. Thanks to this, the namespace itself (label of the block) may be local, too, and the namespaces may be nested.
MyProg PROGRAM ; PROGRAM starts the namespace MyProg. ;
;
Main PROC ; PROC starts inner namespace Main. ;
.10: RET ; Local label; its FQN is Main.10. ;
ENDP Main ; After ENDP we are in MyProg namespace again. ;
;
.Local PROC ; Its FQN is MyProg.Local. ;
.10: RET ; FQN of this label is MyProg.Local.10. ;
ENDP .Local ; MyProg.Local namespace ends right after ENDP.;
;
ENDPROGRAM MyProg
Beside the namespace blocks there is one more occasion where namespace is unfolded: operand fields of the structured data definition statement, which temporarily take over the namespace of a structure which is being instanceized.
DateProg PROGRAM ; PROGRAM starts the namespace DateProg. ;
;
Datum STRUC ; Declaration of structure Datum creates namespace Datum. ;
.day DB 0 ;
.month DB 0 ;
.year DW 0 ;
ENDSTRUC Datum ; Namespace Datum ends right behind ENDSTRUC field. ;
;
[.data] ; Segment name is not local label, namespace is ignored here. ;
Birthday DS Datum, .day=1, .month=1, .year=1970 ;
;
; The previous statement defines 4 bytes long structured memory variable ;
; called Birthday in section [.data] and statically sets its members. ;
; On creating the variable "Birthday" €ASM uses properties ;
; declared as Datum.day, Datum.month, Datum.year (B,B,W). ;
; Members can be referred as Birthday.day, Birthday.month, Birthday.year.;
A symbol defined in the assembler program, such as label or memory variable, may be referred anywhere within the program at assembly time. Our program may be linked with other programs, object modules or libraries, which might have misused the same name for their own symbols, but it's OK and no conflict occurs because programs are compiled separately. This is the standard behaviour, such symbols have standard private scope and their visibility is limited to the inside of PROGRAM..ENDPROGRAM block.
When a symbol name begins with fullstop ., visibility of such private local name is even narrower, it is limited to the smallest namespace block in which was the symbol defined (PROC..ENDPROC, STRUC..ENDSTRUC).
On the other hand, executables which are linked from several programs (modules, libraries) need to acces symbols outside their standard private scope, for instance to call an entry point of a library function. Names of such global symbols should be unique among all linked programs.
private | Global | ||||
Standard | local | static link | dynamic link | ||
Public | Extern | eXport | Import |
Scope of a symbol can be examined at assembly time with attribute operator SCOPE#, which returns ASCII value of uppercase scope shortcut, for instance
MySymbol EXTERN MOV AL,SCOPE# MySymbol ; This is equivalent to MOV AL,'E'
Available shortcuts are underlined in the table above. The same shortcuts are also used
when symbol properties are listed by %DISPLAY Symbols
and after the link phase
if LISTGLOBALS=ENABLED.
GLOBAL, PUBLIC, EXTERN, EXPORT and IMPORT scope of a symbol can be explicitly declared by pseudoinstruction with the corresponding name. GLOBAL scope can be also declared implicitly, using two (or more) terminating colons :: after the symbol name. A symbol declared as GLOBAL is either available as PUBLIC (if it is defined in the same program), or it is marked as EXTERN (if it is not defined in the program).
Only the scopes for static linking (PUBLIC, EXTERN)
can be declared by simplified global scope declaration (using two colons).
When the symbol will be exported (if a DLL file is created), or when it should be dynamically imported
from other DLL, using two colons is not enough and either explicit declaration
EXPORT/IMPORT symbol
or LINK import_library
is required.
Word1: DW 1 ; Standard private scope. Word2:: DW 2 ; Public scope declared implicitly (with double colon). Word3 PUBLIC ; Public scope declared explicitly. Word4 GLOBAL ; Public or extern scope (which depends on Word4 definition in this program). Word5 GLOBAL ; Public or extern scope (which depends on Word5 definition in this program). Word6 EXTERN ; Extern scope. Symbol Word6 must not be defined anywhere else in this program. Word4: ; Definition of symbol Word4. MOV EAX,Word5 ; Reference of external symbol Word5. ; Scope of Word1 is PRIVATE. ; Scope of Word2, Word3, Word4 is PUBLIC. ; Scope of Word5, Word6 is EXTERN.
Information in computer memory or register represents the code or data. Important properties of stored texts and numbers is data type, which is a rule specifying how to interpret the information. €ASM recognizes following types of data:
Typename | Short | Size | Autoalign | Width | Typical storage | Character string | Integer number |
Floating-point number | Packed vector |
---|---|---|---|---|---|---|---|---|---|
BYTE | B | 1 | 1 | 8 | R8 | ANSI | 8-bit | ||
UNICHAR | U | 2 | 2 | 16 | R16 | WIDE | |||
WORD | W | 2 | 2 | 16 | R16 | 16-bit | |||
DWORD | D | 4 | 4 | 32 | R32,ST | 32-bit | Single precision | ||
QWORD | Q | 8 | 8 | 64 | R64,ST | 64-bit | Double precision | ||
TBYTE | T | 10 | 8 | 80 | ST | Extended precision | |||
OWORD | O | 16 | 16 | 128 | XMM | 4×D | 2×Q | |||
YWORD | Y | 32 | 32 | 256 | YMM | 8×D | 4×Q | |||
ZWORD | Z | 64 | 64 | 512 | ZMM | 16×D | 8×Q |
Typename | Short | Size | Autoalign | Usage |
---|---|---|---|---|
Structure name | S | variable | STRUC explicit alignment, otherwise program width | structured variables |
INSTR | I | variable | 1 | machine instructions |
Using of fundamental typenames is often reduced to their first letter. Data types in short or long notation are used for explicit static data definition with pseudoinstruction D, for implicit data definition in literals, as an alignment specification,or in instruction modifiers.
€ASM has some type awareness, though not so strong as in higher programming languages. For instance when processing instruction
INC [MemoryVariable]
it looks how was MemoryVariable defined and the it selects appropriate encoding version (byte|word|dword).
Symbol in assembly language is an alias to a number or address.
Numeric symbol answers the question how many and address symbol answers the question where (at which position in the program).
Numeric symbol is defined with pseudoinstruction EQU
or with its alias =, for instance Dozen EQU 12
or Gross = 144
.
Address symbol is defined when its name appears in a label field of a statement.
Value of the numeric symbol is internally kept in 8 bytes (signed QWORD) but address symbols need an additional information about the section where they belong to.
It is not possible in €ASM to define numeric symbol as a label of other statement than EQU, or as a solo label without operation field. Each program statement compulsorily belongs to some section (either explicitly defined or implicitly created when assembly of program block starts).
Symbol name is an identifier (letter or fullstop optionally followed with other letters, fullstops and digits),
which is not a reserved symbol name in either character case.
Symbol name may always be terminated with one or more colons : which helps to
recognize the identifier as a symbol name. The colon itself is not a part of the symbol name.
Symbols should have self-explaining mnemonic name.
Termination of each symbol name with : is a good habit both when the symbol is defined and referred, though many other assemblers do not support this. It's easier to copy&paste the symbol name without having to delete colon at its end. Colon tells both assembler and human reader that the name represents a symbol, and it protects from mistake when you choose a symbol name which accidentally happens to collide with one of thousands instruction mnemonics.
Structure names, register names (except for segment registers), or machine instruction mnemonics names are never colon-terminated.
Symbols and structures may be referred (used in statement) before they are actually defined. However, it's a good practice to define numeric symbols and structures at the beginning of the program, because forward references require additional program passes, which extends the duration of assembly.
Category | Reserved names |
---|---|
Assembly-time current pointer | $ |
Segment register names | CS, DS, ES, FS, GS, SS |
Prefix names | ATOGGLE, LOCK, OFTEN, OTOGGLE, REP, REPE, REPNE, REPNZ, REPZ, SEGCS, SEGDS, SEGES, SEGFS, SEGGS, SEGSS, SELDOM, XACQUIRE, XRELEASE |
Name of symbol may contain fullstop ., which usually connects namespace with symbol's local name. Leading . makes the symbol local, as it is in fact connected with the current namespace internally.
Creating symbol names which collide with names of registers or instructions is discouraged. If you really want to use some of those not recommended name for a symbol, it must be always followed with colon, e.g.
Byte: DB 1 ; Define a symbol named "Byte". MOV AX,Byte: ; Load AX with offset of the symbol.
In other cases, terminating symbol name with : is voluntary, but recommended.
Category | Not recommended names |
---|---|
Fundamental data types | B, BYTE, D, DWORD, I, INSTR, O, OWORD, Q, QWORD, S, T, TBYTE, U, UNICHAR, W, WORD, Y, YWORD, Z, ZWORD |
Register names | AH, AL, AX, BH, BL, BND0, BND1, BND2, BND3, BP, BPB, BPL, BX, CH, CL, CR0, CR2, CR3, CR4, CR8, CX, DH, DI, DIB, DIL, DL, DR0, DR1, DR2, DR3, DR6, DR7, DX, EAX, EBP, EBX, ECX, EDI, EDX, ESI, ESP, K0, K1, K2, K3, K4, K5, K6, K7 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10L, R10W, R11, R11B, R11D, R11L, R11W, R12, R12B, R12D, R12L, R12W, R13, R13B, R13D, R13L, R13W, R14, R14B, R14D, R14L, R14W, R15, R15B, R15D, R15L, R15W, R8, R8B, R8D, R8L, R8W, R9, R9B, R9D, R9L, R9W, RAX, RBP, RBX, RCX, RDI, RDX, RSI, RSP, SEGR6, SEGR7, SI, SIB, SIL, SP, SPB, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, TR3, TR4, TR5, XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM16, XMM17, XMM18, XMM19, XMM20, XMM21, XMM22, XMM23, XMM24, XMM25, XMM26, XMM27, XMM28, XMM30, XMM31 YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16, YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM30, YMM31 ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16, ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM30, ZMM31 |
Pseudoinstruction names | ALIGN, D, DB, DD, DI, DO, DQ, DS, DU, DW, DY, DZ, ENDHEAD, ENDP, ENDP1, ENDPROC, ENDPROC1, ENDPROGRAM, ENDSTRUC, EQU, EUROASM, EXTERN, GLOBAL, GROUP, HEAD, INCLUDE, INCLUDE1, INCLUDEBIN, INCLUDEHEAD, INCLUDEHEAD1, PROC, PROC1, PROGRAM, PUBLIC, SEGMENT, STRUC |
Machine instruction mnemonics | AAA, AAD, ... XTEST, see IiHandlers in €ASM source for the complete list. |
Numeric symbol is defined with pseudoinstruction EQU (or with its alias =) which specifies a number, numeric expression or other numeric symbol. Examples:
BufferSize: EQU 16K WM_KEYDOWN = 0x0100 Total EQU 2*BufferSize MOV ECX,BufferSize
Using numeric symbol instead of the direct number notation has its advantages:
- When the symbol name is carefully chosen, e.g.
BufferSize
, it is selfexplaining and we do not need to comment why we loaded ECX with this particular value16K
.- If we decide to increase
BufferSize
during the program developement, it is easier to change its value only at one place where it is defined.
An address symbol is defined when it appears as a label of machine instruction or prefix, as a label of empty instruction or as a label of pseudoinstruction D*, PROC, PROC1.
Examples:[DATA] SomeValue: DD 4 [CODE] MOV EAX,[SomeValue:] StartOfLoop: CALL SomeProcedure: DEC EAX JNZ StartOfLoop:
While numeric symbol BufferSize
was completely defined with its value,
in case of address symbol SomeValue
it is not sufficient.
Instruction MOV EAX,SomeValue
loads EAX with the symbol offset,
i. e. with the distance between its position and the start of its segment.
Address symbol is defined with two properties: its segment and offset.
That is why address symbol is sometimes called vector or relative symbol
and numeric symbol is called scalar or absolute symbol
or constant.
There are five methods how to create a symbol in EuroAssembler:
- Symbol is defined when its name occurs in the label field of a statement. Such symbol represents address within the section it was defined in, and the data or code emitted by the statement, too. The statement may be empty (solo label) or it may declare data, prefix or machine instruction. Pseudoinstructions PROC and PROC1 also define the symbol with their name, but pseudoinstructions PROGRAM, STRUC, SEGMENT do not.
- External and imported symbols are created with pseudoinstructions EXTERN, IMPORT or GLOBAL, or when they are referred with two colons appended to their name. Extern symbol is not defined in the current program, it must not appear in label field (with an exception of EXTERN pseudoinstruction itself, which declares it as external).
- Literal symbol is created when it is referred for the first time. It does not have an explicit name, in fact its name is represented by its value, for instance the instruction
LEA ESI,[=D 123]
creates literal symbol, which is stored in €ASM symbol table under the pseudo-name=D 123
.- €ASM maintains a special dynamic symbol
$
for each section, which represents the current assembly position in the section.- Symbol can be defined with pseudoinstruction EQU or with its alias =. This is the only way how to define a plain numeric symbol.
A special dynamic symbol $
represents the address of next
free position in emitted code at the beginning of assembly of the statement,
in which it is referred. Value of this symbol is not constant but it is
changed by €ASM after an emitting statement has been assembled.
Programmer may change the offset of current origin $ with EQU pseudoinstruction, this is equivalent to pseudoinstruction ORG known from other assemblers.
See also the test t2551 or sample project boot16.
Some important symbol properties are available for next processing in a program at assembly time,
they are called attributes. When a symbol is defined, it automatically
gets its attributes. They can be referred by prefixing the symbol name with attribute operator.
An attribute operator is an identifier which defines the kind of attribute, immediately
followed with #. The object, which the attribute operator is applied on,
may be separated by zero or more white spaces and it may be in parenthesis.
For instance SIZE#SymbolName
or SIZE# SymbolName
or SIZE#(SymbolName)
.
Remember that the symbol name is case sensitive but the attribute name is not.
Attributes GROUP#, SEGMENT# and SECTION# return an address when applied to an address symbol; they return scalar zero when applied to a numeric symbol. Other attributes always return scalar (plain number).
Attribute OFFSET# returns the offset of symbol in the current segment as a plain number, i. e. the number of bytes between the start of the segment and the symbol itself. If the symbol is numeric, its value is returned.
Symbol
and OFFSET#Symbol
are identical only when
Symbol is a scalar value, otherwise the former represents its address and the latter
represents a plain number.
The expression Symbol - SEGMENT#Symbol
is identical with
OFFSET#Symbol
for both numeric and address kind of symbols.
Attribute PARA# represents the paragraph address of beginning of the group that the symbol belongs to. It is the value which has to be loaded to the segment register which will be used for addressing. When PARA# is applied to a numeric symbol, it returns scalar zero.
Attribute GROUP# represents the address of beginning of the group that the symbol belongs to, i.e. address of the first byte of the first (lowest) segment of the group. When applied to a numeric symbol, it returns scalar zero.
Attribute SEGMENT# represents the address of beginning of the segment that the symbol belongs to. When applied to a numeric symbol, it returns scalar zero.
Attribute SECTION# represents the address of beginning of the section that the symbol belongs to. When applied to a numeric symbol, it returns scalar zero. If the symbol lies in default section (with the same name as its segment), both SECTION# and SEGMENT# attributes return identical address.
Attribute SCOPE# returns a number representing the ASCII value
of capital letter corresponding with the symbol scope, which can be
'E'
for external symbols, 'P'
for public symbols,
'X'
for exported symbols, 'I'
for imported symbols,
'S'
for standard (private) symbols, or '?'
when the symbol is undeclared.
SIZE# represents the amount of bytes emitted by the statement which defines the symbol. Typically it is the size of data defined with D pseudoinstruction or the size of machine instruction. Symbols defined with EQU pseudoinstruction or defined in non-emitting instruction have attribute SIZE# equal to zero.
Attribute TYPE# returns a number representing the ASCII value of
a capital letter corresponding with the symbol type. It may be one of the
fundamental data types 'B', 'U', 'W', 'D',
'Q', 'T', 'O', 'Y', 'Z'
, structured data type 'S'
or
machine instruction type 'I'
when the symbol is defined with
data definition pseudoinstruction D.
Numeric symbol returns type attribute 'N'
.
Label of a machine instruction or machine prefix have type attribute 'I'
.
Address symbols defined with just a label, or as a label of PROC | PROC1,
and external symbols return 'A'
.
Undefined symbol returns '?'
.
Forward reference to a symbol will create its record in the symbol table. However, in the first pass its type attribute is'?'
(undefined) until its definition is encounterred. On the other hand, applying an attribute to undefined symbol does not make it referred. That is why we may test with the pseudoinstruction%IF TYPE#Symbol = '?'
whether the symbol is undefined in program.
Beside symbols, some attribute operators may be applied to other elements than symbols: to a register, structure name, string, expression in parenthesis () or braces [].
TYPE# of a register is 'R'
and its SIZE#
is equal to the register width in bytes (1,2,4,8,10,16,32,64).
TYPE# of a structure or segment is 'S'
and SIZE# computes its size in bytes.
Why should we use SIZE# or TYPE# attributes when the querried symbol is defined by ourselves and therefore we already know its size and type? If we would decide to change the text of Message later, we won't have to bother with its length recalculation.
Attribute operators are often used in macros to determine what type of operand was the macro provided: if it's a register, data symbol, immediate value etc. When we need to check in a macro if the provided operand
%1
is a plain number, we could test this with query%IF TYPE# %1 = 'N'
.
See tests t16* for more attribute examples.
Detailed differentiation of data symbol which attribute TYPE# yields is sometimes not necessary.
For instance we may need to distinguish whether the macro operand %1
needs relocation at link time. This happens when this is address symbol or memory variable
which contains some address symbol. TYPE# DataSymbol
or TYPE# [DataSymbol+RSI]
may return 'A', 'B','W','D','Q','T' or whichever kind of data was the DataSymbol defined with.
Otherwise it will return 'N' when the operand was a number which doesn't use relocation, such as
TYPE# MAX_PATH_SIZE
or TYPE# [RBP-16]
.
Here we may need to unify all kinds of address+external symbols with attribute operator SEGMENT#,
wich returns relocatable address of its bottom, regardless of its datatype.
Attribute TYPE# applied to such SEGMENT# attribute will always return 'A'.
On the other hand, SEGMENT# ScalarSymbol
and TYPE#(SEGMENT#ScalarSymbol)
return 'N'.
%IF TYPE# (SEGMENT# %1) = 'A' ; %1 is address expression which requires relocation. %ELSE ; %1 is nonrelocatable expression. %ENDIF
Notice that the chained attributes require parenthesis. This is because all attribute operators have equal priority, so they are evaluated from left to right, and without parenthesis the first operator would attempt to apply itself on another unary operator.
See also test t1695 for more examples.
Attribute TYPE# applied on register returns value 'R'
, regardless of register family.
Sometimes it is useful to know the exact kind of register.
Attribute REGTYPE# returns a number representing the ASCII value of capital letter
corresponding with the register family. General-purpose registers return 'B', 'W', 'D', 'Q'
,
SIMD registers return 'X', 'Y', 'Z'
, segment registers return 'S'
etc. See the Registers table for the complete list.
When this attribute is applied to an element which is not a register, it returns '?'
.
See also test t1648.
Unlike previous attributes, FILESIZE# and FILETIME# can be applied only to files specified by their name, which must be surrounded with double quotes ". The filename may have absolute, relative, or no path, it is related to the current directory at assembly time.
Both file attribute operators investigate the file properties at assembly time.
FILESIZE# "filename"
returns the number of bytes in the file,
or 0 if the file was not available.
FILETIME# "filename"
returns the timestamp of the file,
i. e. the number of seconds between midnight, January 1st 1970 UTC
and the last file modification. It returns 0 when the file was not found.
See also test t1690.
Literal symbols alias literals are similar to the standard assembler
symbols.
The main difference is that they don't have explicit definition and name.
A literal is defined whenever it is referred and its name is represented with
equal sign = followed with data expression,
for instance =D(5)
or =B"Some text."
.
They may be duplicated, but unlike in D pseudoinstruction
(which may have many operands), only one data expression
can be specified. Examples of instructions with literals:
DIV [=W(10)] ; Divide DX:AX by an anonymous word memory variable with value 10. MOV DX,=B"This is a literal message.$" ; Load DX with offset of a string defined ad hoc somewhere in data segment. LEA ESI,[=D 0] ; Load ESI with address of a DWORD memory variable which contains the value 0. CALL =I"RET" ; Push EIP and then load EIP with offset of machine instruction RET defined somewhere in code segment.LEA EBX,[=D 0,1,2,3]; Error: multiple data expressions.MOV DX,=B"This is a literal message.",13,10; Error: multiple data expressions.
The first example declares a word variable=W(10)
. Without literals we would have to explicitly define a data variableTen DW 10
somewhere in data section and give it an explicit unique name.
Advantage of literal is that we don't need to invent unique symbol name and explicitly declare the symbol in data section with D pseudoinstruction. The data contents is visible directly in the instruction which uses the literal.
All literals are autoaligned according to their type,
for instance =D 5
is DWORD aligned regardless of current
EUROASM AUTOALIGN= option.
String literals, such as =B"Some text"
or =U"Some text"
are always implicitly terminated with byte or unichar zero when they are declared as literals.
€ASM allows simplified declaration of nonduplicated literal strings, where the type identifier
(B
or U
) is omitted, e.g. ="Some text"
.
The actual type of string (B or U) is then determined by system preprocessing variable
%^UNICODE.
Implicit data definition with literals does not allow to control the exact location where the literals will be emitted to. €ASM creates a subservient section for each type of data depending on their natural alignment. The literal section is created either
- in the last segment with explicit purpose LITERAL and purpose RODATA or DATA
- if no LITERAL segment exists, the last segment with purpose RODATA is chosen
- if no RODATA segment exists, the last segment with purpose DATA is chosen
- if no DATA|RODATA segment exists, an implicit one
@LT
will be created with the purpose RODATA+LITERAL.Names of literal sections are
[@LT64], [@LT32], [@LT16], [@LT8], [@LT4], [@LT2], [@LT1]
.
Literals withINSTRUC
data type, such as=8*I"MOVSD"
, are emitted to subservient section[@RT0]
which is similarly created in the segment withPURPOSE=CODE+LITERAL
, or in the last code segment, or in automatically created implicit code segment[@RT]
.Repeated literals with the same declaration are reused, they represent the same memory variable. Literals with non-verbatim match, such as
=W+4
,=W 4
and=W(2+2)
are stored separately as different symbols, nevertheless their value is reused when it's identical, so it occupies common space in literal section. Similarly=B"Some text"
,=B'Some text'
and=B 'Some text'
are different but those three symbols together will occupy only 9+1 bytes in literal section memory at run-time.
Although the programmer cannot be stopped from overwriting the literal value at run-time, this could corrupt behaviour of other parts of the program, which might be reusing the same literal data.
Property | Standard symbol | Literal symbol |
---|---|---|
Declaration | It is defined explicitly,
with pseudoinstruction D or its clones, e.g.
Dozen: DD 12 |
It is declared when it is first used in any instruction,
e.g. MOV ECX,=D 12 |
Name | Programmer must invent unique symbol name. | Name of literal symbol is created from its value. |
Position in object code | Placement of the symbol is fully in programmer's hands. | The placement is not directly controlled by a programmer. |
Alignment | If required, it must be specified explicitly with pseudoinstruction ALIGN, or with modifier ALIGN= or with EUROASM option AUTOALIGN=. | Literals are always naturally aligned, as if EUROASM AUTOALIGN=ENABLED
were set at their declaration. |
Alignment stuff | In order to minimalize necessary alignment stuff, programer should pay attention when mixing aligned data with different sizes. | Literal data of all sizes are packed together in the descending order which minimalizes alignment stuff between them. |
Multioperands |
Data definition pseudoinstruction D and its clones
support multiple operands, e.g. Hello DB "Hello, world",13,10,'$' |
Multiple literal operands are not supported. |
String NUL termination | Only when explicitly declared, for instance
Hello: DU "Hello, world",0 |
Automatically, e.g. MOV ESI,=U "Hello, world" |
Duplication | Duplication is supported, e.g.
FourDoublePrecOnes: DY 4 * Q 1.0 |
Duplication is supported, e.g. VMOVUPD YMM7, [= 4 * Q 1.0] |
Value overwriting | Ad libitum. | This should be avoided. |
The structure is declared by a piece of assembly code represented with STRUC..ENDSTRUC block. The block declares names, datatypes, sizes and offsets of structure members. In OOP terminology the structure is a class and structured memory variable is an object. Example:
DATUM STRUC ; Declaration of the structure (class) DATUM. .Year D W .Month D B .Day D B ENDSTRUC DATUM Today DS DATUM ; Definition of memory variable (object) Today.
Structure declaration creates symbols DATUM.Year, DATUM.Month, DATUM.Day
with values 0, 2, 3
respectively. Those symbols are absolute (scalars)
and they give names to relative offsets inside the structure.
Data definition creates structured memory variable - symbol Today
.
At the same time it also creates symbols Today.Year, Today.Month, Today.Day
.
Their addresses are defined somewhere in data or bss section,
they are not scalars but have relocatable addresses.
Value of structure members is undefined (when the structured variable was defined in BSS segment)
or it contains all zeroes (if defined in DATA segment).
Members of structured memory variable can be defined statically at definition-time
with keyword operands, for instance Today DS Datum, .Day=31
,
see also pseudoinstruction DS.
Memory-variable member can be accessed directly, for instance
MOV [Today.Month],12
We could also use a register to address the whole memory-variable, and employ this register to address individual members with relative offsets specified in structure declaration:
MOV EDI,Today MOV [EDI+DATUM.Month],12
More about structures see here.
€ASM program uses preprocessing variables (alias %variables) for easy manipulation with the source text at assembly-time. Hand in hand with macroinstructions they make a powerful tool to save repetitive programmer's labour. Preprocessing apparatus does not affect the object code directly, as plain assembler does. Instead, it manipulates with the source text, which can be modified with %variables and repeated with preprocessing %pseudoinstructions.
Preprocessing variables always treat their contents as a sequence of characters, without inspecting its syntactic significance, no matter if they were assigned with literal text, string, numeric or logical expression or whatever.
Once assigned, the contents of %variable will be used (expanded) whenever the %variable appears in the source text (except for comments). Expansion takes place before the physical line of source file is parsed into the statement fields. By default the whole contents of %variable is expanded, but this can be limited with Substring or Sublist operation.
See also €ASM function Preprocessing.
%Variable family ► | User-defined | Formal | Automatic | System | ||
---|---|---|---|---|---|---|
EUROASM | PROGRAM | €ASM | ||||
name format | %identifier | %identifier | %spec.character(s) | %^option | %^option | %^fixed |
case-sensitive | Yes | Yes | Yes | No | No | No |
(re)assignmentable | explicitly with %SET* |
indirectly by FOR-loop | MACRO expansion |
indirectly by macro expansion |
indirectly by EUROASM option |
indirectly by PROGRAM option |
No |
Name of user-defined %variable is represented with a percent sign % immediately followed by an identifer, which is not reserved %variable name in either case. Identifier name must begin with a letter and may not contain fullstop or other punctuation.
Category | Reserved names |
---|---|
Pseudoinstructions | %COMMENT, %DEBUG, %DISPLAY, %DROPMACRO, %ELSE, %ENDCOMMENT, %ENDFOR, %ENDIF, %ENDMACRO, %ENDREPEAT, %ENDWHILE, %ERROR, %EXITFOR, %EXITMACRO, %EXITREPEAT, %EXITWHILE, %FOR, %IF, %MACRO, %PROFILE, %REPEAT, %SET, %SET2 %SETA, %SETB, %SETC, %SETE, %SETL, %SETS, %SETX, %SHIFT, %UNTIL, %WHILE |
User %variables are assigned (created) by the programmer with one of the %SET* family of pseudoinstructions.
%Variables may be reassigned later with a different value, they don't have to be unique in the source.
%Variables need not be assigned before the first use. Unassigned %variable expands
to nothing (empty text). Once defined %variable cannot be unassigned, there is no
%UNSET, UNDEFINE or UNASSIGN directive in €ASM. Nevertheless, setting
a %variable to emptiness (e.g. %SomeVar %SET
) is equivalent to
unsetting it. €ASM reports no warning if it encounters user-defined %variable
which is empty, which has not been defined earlier or which is not defined
in the source file at all.
See also test t7321.
Symbols | User-defined %variables |
---|---|
are properties of PROGRAM | are properties of EuroAssembler |
their name never begins with % | their name always begins with % |
may have membership fullstop in their name | never have fullstop in their name |
are declared in label field of a statement | are assigned with %SET* pseudoinstruction |
have assembly attributes such as TYPE# and SIZE#. | are simply a piece of text without attributes |
may be forward referenced | cannot be forward referenced |
must be declared just once in a program | may be redeclared many times |
cannot be referenced if not declared somewhere in the main or linked program | may be referenced without declaration |
cannot be subject of sublist or substring operation | can be sublisted or substringed |
Formal %variable expands to a parameter value used in a %FOR loop or in %MACRO invocation. It is represented by an identifier which stands in the label field of the %FOR statement, or as an operand in the %MACRO prototype.
The scope of formal variables is limited to the block which is being expanded.
Count %FOR 1..8 DB %Count %ENDFOR Count
The previous example generates eight DB statements which define
byte values from 1 to 8. Identifier Count
used in %FOR
and %ENDFOR statements is %FOR-control variable, which is accessible
inside the %FOR block as a formal %variable %Count
.
Formal variables are also used to access macro operand by name during the macro expansion.
In the next example we have two %MACRO-formal variables
provided in the %MACRO definition as identifiers Where
and Stuff
. In the macro body their values are available
as formal %variables %Where
and %Stuff
.
Fill %MACRO Where, Stuff=0 ; Definition of macro Fill. MOV %Where,%Stuff %ENDMACRO Fill ; invocations of macro Fill: Fill [Counter], Stuff=255 ; Will be assembled asMOV [Counter],255
Fill EBX ; Will be assembled asMOV EBX,0
Notice that formal %variables are always written without the percent sign when they are declared, but % must be prefixed to their name when they are referred in the %FOR or %MACRO body. This is important for inheriting of arguments in nested and recursively expanded macroinstructions, see t7233 as an example.
Scope of the formal %variables has higher priority than user-defined %variables with identical name, no matter if they were assigned outside or inside the scope. Reassignment of a %variable with formal name inside the macro body will assign the new value to the user-defined %variable, but inside the macro the value of formal %variable prevails, see t7347, t7362. %Variable with reassigned value will be visible outside the macro, though.
Automatic preprocessing variables are created and maintained by EuroAssembler at assembly time; their names contain punctuation characters and, unlike user-defined %variables, they cannot be explicitly reassigned with %SET pseudoinstruction.
The scope of automatic %variables is limited, using them outside their scope leads to an error.
Suboperation size | suboperation length (percent sign followed by an ampersand) %& represents the
number of characters | list items | physical lines in the suboperated object.
Its scope is constrained to the suboperation braces [ ]
or { }.
Automatic suboperation variable %& is created when the expansion of included file or of another %variable uses suboperations.
When the substring operator [ ] is appended to the %variable name
or to the included file name, automatic variable %&
can be used
inside the brackets, e. g. [1..%&]
, and it represents the number
of bytes in expanded %variable or in the included "file".
For instance, when the user has assigned %aVariable with five letters %aVariable %SET ABCDE
,
then its size is 5 and the statement DB "%aVariable[4..%&]"
expands to DB "DE"
.
When the sublist operator { } is appended to the %variable name,
the contents of this %variable is treated as an array of comma-separated items
and %&
represents their count (ordinal number of the last nonempty item).
E. g. when the user has assigned %aReglist %SET ax,cx,dx,bx,bp
then its length is 5 operands (items)
and the statement MOV %aReglist{3},%aReglist{%&}
expands to MOV dx,bp
.
When the same sublist operator { } is appended to the included file name,
contents of the file is treated as a set of physical lines
and %&
represents number of lines in the file. For instance
INCLUDE "file.inc"{%&-10 .. %&}
will include the last ten lines
from "file.inc".
Using the %&
variable outside brackets will throw an error.
1
to %&
.The expansion counter (percent sign followed by a fullstop) %.
maintains a decadic number which is incremented by €ASM in each expansion of preprocessing block and can be used
to create unique labels in repeating blocks.
Its scope is limited to the body of preprocessing blocks
%MACRO, %FOR, %WHILE, %REPEAT. If used outside those blocks,
it will expand to the single digit 0, see t7362.
If there is some private or local label declared within a macro or repeating block, and if the macro or block is expanded more than once, the same symbols will be defined more than once, and assembler treats that as an error. The identifier used as a label within macro or other expanding pseudooperations (%FOR, %REPEAT, %WHILE) should be unique. This can be achieved with the expansion counter embedded into symbol name.
See the example of macro AbortIf below.
The label Skip
is postfixed with %., giving the label
Skip%.
which expands to Skip1
and which will expand to Skip2
on the next AbortIf
invocation.
%.
helps to create unique symbol names.All the following automatic macro %variables have their scope limited to the %MACRO block body. They refer to operands used when the macro is invoked (expanded).
If a label is used in a macro invocation, the label is by default placed in the first of expanded statements. This behaviour can be overridden when the automatic macro label %variable %: (percent sign followed by a colon) is explicitly declared somewhere in the macro definition. Only one such label may be defined in the macro. Resettlement of macro label may spare a few clocks when jumping to the macro expansion which begins with code which would have to be skipped, see the following example:
SaveCursor %MACRO Videopage=BH
%IF TYPE#CursorSave != 'W' ; If the memory variable CursorSave was not defined yet.
JMP %: ; Skip to $+4 (below the DW) when the macro is entered in normal statements flow.
CursorSave DW 0 ; Space for storing the cursor is reserved here in the code section.
%ENDIF
%: MOV AH,3 ; Entry point of the macro is here when the macro invocation is jumped to.
MOV BH,%Videopage
INT 10h ; Get cursor shape via BIOS API.
MOV [CursorSave],CX
%ENDMACRO SaveCursor
...
Save: SaveCursor Videopage=0 ; Use the macro in program.
...
JMP Save: ; Jumps to the instruction MOV AH,3
.
%:
represents the "entry" of macro body.See also test t7215.
Ordinal operands of the macro can be referred by digits
Unlike in batch scripts for DOS and Windows, their number is not limited to 9,
but any positive decadic number is possible, for instance %11.
Of course, when the eleventh operand is not specified in the macro invocation, %11 expands to nothing.
See also pseudoinstruction %SHIFT.
Automatic %variable %0 expands to the macro name.
Another method how to refer to macro operand (both ordinal and keyword) is prefixing the formal name of the operand with percent sign.
When the ordinal number or formal operand name is prefixed with logical NOT operator
(exclamation !), it expands to the inverted condition code from ordinal operand.
This requires that the referred operand contains a general condition code
(case insensitive) such as E, NE, C
etc. Operand contents
will be replaced with corresponding inverted code. €ASM reports error
if the operand did not contain valid condition code.
NASM uses unary-minus operator - to achieve similar functionality. I believe that the logical-not operator ! is more appropriate for the inversion of logical values.
See the macro AbortIf above as an example.
Ordinal operand list %* (percent sign followed by an asterisk) is assigned with all ordinal operands from macro invocation, comma-separated. Keyword operands are omitted from the list.
Macro operands can be referred by various methods. The following example demonstrates three possible ways how to refer the macro ordinal operands:
CopyStr %MACRO FirstOp, SecondOp, ThirdOp ; Macro prototype. MOV ESI,%FirstOp ; Using formal %variable name of the operand. MOV EDI,%2 ; Using ordinal number of the operand. MOV ECX,%*{3} ; Using the third item of operand list. REP MOVSB %ENDMACRO CopyStr ... CopyStr Source, Dest, SIZE# Dest ; invocation of the macro.
Length of the ordinal operand list (ordinal number of the last non-empty operand) is set to ordinals count variable %# (percent sign followed by a pound character) and it represents the number of ordinal operands used in macro invocation (not the number declared in macro prototype).
The same length could be also obtained with %NrOfOrdinals %SETL %*
.
List of keyword operands %=*
is similar to the automatic variable %*
but is contains only comma-separated keyword=value
operands
actually used in macro invocation.
Both %* and %=* can be used to make cloned macros with different names. For example
copystr %MACRO CopyStr %*, %=* %ENDMACRO copystrThis creates a clone of previously defined macro CopyStr but with a different name copystr. All operands used in invocation of copystr will be passed verbatim to CopyStr.
Keyword count variable %=#
represents the number of keyword operands actualy used in macro invocation
(not the number declared in macro prototype).
See also t7364.
EuroAssembler maintains a collection of preprocessing variables with the values specified by configuration parameters. Their current value can be tested at asm-time, so the assembly process can branch accordingly.
The name of a system variable consists of %^
followed with
one of enumerated identifiers.
Value of system %^variable cannot be assigned with %SET* pseudoinstruction; it is dynamically maintained by €ASM and it reflects the current value in charge.
%^DumpWidth %SETA 32 ; Use EUROASM DumpWidth=32 instead.
Programmer can involve the value of system %^variable only indirectly,
with options specified in euroasm.ini
configuration file or with EUROASM and PROGRAM
pseudoinstructions.
Category | %variable names (case insensitive) |
---|---|
EUROASM | %^AES, %^AMD, %^AutoAlign, %^AutoSegment, %^CET, %^CodePage, %^CPU, %^CYRIX, %^D3NOW, %^Debug, %^DisplayEnc, %^DisplayStm, %^Dump, %^DumpAll, %^DumpWidth, %^EVEX, %^FPU, %^ImportPath, %^IncludePath, %^Interpreter, %^Linkpath, %^List, %^ListFile, %^ListInclude, %^ListMacro, %^ListRepeat, %^ListVar, %^LWP, %^MaxInclusions, %^MaxLinks, %^MMX, %^MPX, %^MVEX, %^NoWarn, %^Profile, %^Prot, %^Prov, %^RunPath, %^RTF, %^RTM, %^SHA, %^SIMD, %^Spec, %^SVM, %^TBM, %^TimeStamp, %^TSX, %^Undoc, %^Unicode, %^VIA, %^VMX, %^Warn, %^XOP, |
PROGRAM | %^DllCharacteristics, %^Entry, %^FileAlign, %^Format, %^IconFile, %^ImageBase, %^ListGlobals, %^ListLiterals, %^ListMap, %^MajorImageVersion, %^MajorLinkerVersion, %^MajorOSVersion, %^MajorSubsystemVersion, %^MaxExpansions, %^MaxPasses, %^MinorImageVersion, %^MinorLinkerVersion, %^MinorOSVersion, %^MinorSubsystemVersion, %^Model, %^OutFile, %^SectionAlign, %^SizeOfHeapCommit %^SizeOfHeapReserve, %^SizeOfStackCommit, %^SizeOfStackReserve, %^StubFile, %^Subsystem, %^TimeStamp, %^Width, %^Win32VersionValue, |
€ASM | %^Date, %^EuroasmOs, %^Pass, %^Proc, %^Program, %^Section, %^Segment, %^SourceExt, %^SourceFile, %^SourceLine, %^SourceName, %^Time, %^Version, |
are assigned with values specified in [EUROASM]
division
of the euroasm.ini
or with the pseudoinstruction EUROASM.
For description of system %variables of this category see the corresponding keyword of pseudoinstruction EUROASM.
are assigned with values specified in [PROGRAM]
division
of the euroasm.ini
or with the PROGRAM pseudoinstruction.
For description of system %variables of this category see the corresponding keyword of the pseudoinstruction PROGRAM.
Value of €ASM system %variables is maintained by €ASM itself and the programmer cannot change them directly. They are described here:
euroasm source*.asm
will share the same %^Date and %^Time which were set from the current local time
at the moment when euroasm.exe launched.Combination of €ASM system %^variables is used internally to identify position of statement
in error messages: "%^SourceName%^SourceExt"{%^SourceLine}
, e.g.
"HelloWorld.asm"{3}
€ASM %^variable %^Section
can be used to save and restore the current section|segment
in macros. Together with statement EUROASM PUSH
it guaranties that the €ASM environment
will not be modified by expanding a macro, even if the macro required to temporarily change it.
aMacro %MACRO ; Declaration of a macro which needs to emit to its own private section. EUROASM PUSH ; Save all EUROASM options on their own stack. %BackupSec %SET %^Section ; Save the current section name to a user-defined %variable. [.MacroPrivateSection] ; Switch to the desired section. ... ; Declare the macro body. [%BackupSec] ; Switch back to the original section, whatever it was. EUROASM POP ; Restore EUROASM options. %ENDMACRO aMacro
Another example using system €ASM %^variables:
%MonthList %SET Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec %Day %SETA %^DATE[7..8] ; Using %SETA instead of %SET will assign %Day with decimal numeric value to get rid of leading zero. InfoMsg DB "This program was assembled with €ASM %^EuroasmOs ver.%^Version",13,10 DB "on %MonthList{%^Date[5..6]} %Day-th, %^Date[1..4] at %^Time[1..2]:%^Time[3..4].",13,10,0 ; InfoMsg now contains something like ; This program was assembled with €ASM Win ver.20081231 ; on Feb 8-th, 2009 at 22:05.
Enumerated option, such as %^CPU, %^FORMAT, %^MODEL etc. is assigned as upper case text. They can be tested at assembly time with string-compare operations.
Numeric options are always assigned as numbers in decimal notation where the positive sign + is omitted. They can be tested at assembly time with numeric-compare operations.
Boolean options, such as AutoSegment=, Priv=
etc.,
are assigned to corresponding system %^variables %^Autosegment, %^Priv
as 0 (false) or -1 (true), no matter whether they
were specified using enumerated tokens ON/OFF, YES/NO, TRUE/FALSE or with a logical expression.
They can be tested at assembly time
with boolean expression or directly as an operand of %IF,
e.g. %IF %^UNDOC
.
Range EUROASM options WARN= and NOWARN= are assigned to system %variables
%^Warn, %^NoWarn
as series of 3999 digits 0 (false) and 1 (true).
The first digit reflects the current status of message I0001, the second I0002, the last W3999.
Example: %IF %^WARN[2820]
will assemble the following statements
only if message W2820 is currently enabled.
System %^variables can be used in macros to warn users of the macro that the €ASM environment is not set as desired. Examples:
%IF "%^MODEL" !== "FLAT" %ERROR Macro "%0" is intended for flat memory model only. %ENDIF %IF %^SizeOfStackCommit < 16K %ERROR This recursive macro requires stack size at least 16 KB. %ENDIF %IF %^Width = 64 && ! %^AMD %ERROR This 64-bit program for MS-Windows should have AMD=Enabled. %ENDIF %IF %^NoWarn[2101] %ERROR You shouldn't suppress W2101. Move unused symbols to an included file instead. %ENDIF
There are three genders (types) of instructions in assembly language:
machine instructions invented by the CPU manufacturer,
pseudoinstructions invented by the assembler manufacturer,
macroinstructions invented by the programmer.
Machine instruction is the least order for CPU to make some calculation or data manipulation at run-time.
EuroAssembler uses the Intel syntax where the first instruction operand specifies destination (which is often one of source operands, too), and one or more sources may follow.
This is the syntax used in CPU-vendor documentation and also used in most other assemblers, with exception for the Unix-based gas, which prefers alternative paradigma represented by AT&T syntax with reversed operand order. For more differences between AT&T and Intel syntax see [ATTsyntax].
EuroAssembler implements machine instructions mnemonics as defined in specifications by CPU vendors. It also implements some undocumented instructions and instruction-format enhancements which are described below.
Some machine instructions allow alternative encoding of the same mnemonic,
€ASM prefers the shortest one, if not instructed differently.
€ASM respects the mnemonic chosen by the programmer, therefore it never encodes
e. g. LEA ESI,[MemoryVariable]
as MOV ESI,MemoryVariable
,
although the latter encoding is one byte shorter. There are only two notable exceptions
when the mnemonics is not obeyed:
MOVZX r64,r/m32
which is implemented
as MOV r32,r/m32
as it uses the IA-64 side effect
of zeroing higher half of 64-bit register when the lower half is being written to.
See the test t3043.
A machine instruction can manipulate with registers and memory variables
of different width, usually with a byte, word or doubleword operands.
However, CPU-manufacturer manuals define the same mnemonic regardless of data size.
For instance, SUB [MemoryVariable],4
tells CPU to subtract the immediate
number 4 from the contents of MemoryVariable, which might have been defined as DB, DW, DD or DQ. €ASM looks
at the type of MemoryVariable and selects appropriate encoding according to its size.
However, the offset might also be external or expressed as a register contents or plain number, such as in
SUB [ESI],4
, and the type of memory variable is unknown in this case.
One method, how to tell EuroAssembler which data-width is desired,
is using an instruction suffix, which is one of the letters B W D Q S N F
appended to the mnemonic name.
€ASM allows to extend many general-purpose instructions with mnemonic suffix B, W, D, Q to specify operand size.
Transfer control instructions CALL, JMP, RET
may be modified
with suffix N or F which tells whether the distance of the target
is near or far, i. e. if the target belongs to the same segment
or if segment descriptor value needs to change, too. The unconditional JMP
instruction may be also completed with
suffix S when the distance to the target can be encoded into 8 bits (-128..+127).
Suffix aware instructions in €ASM | Suffix |
---|---|
ADC, ADD, AND, CMP, CMPS, CRC32, DEC, DIV, IDIV, IMUL, INC, LODS, MOV, MOVS, MUL, NEG, NOT, OR, RCL, RCR, ROL, ROR, SAL, SAL2, SAR, SBB, SCAS, SHL, SHR, STOS, SUB, TEST, TEST2, XOR | B, W, D, Q |
BT, BTC, BTS, BTR, ENTER, HINT_NOP, IRET, LEAVE, POP, POPF, PUSH, PUSHF | W, D, Q |
PUSHA, POPA | W, D |
INS, MOVSX, MOVZX, OUTS | B, W, D |
XLAT | B |
CALL, RET | N, F |
JMP | S, N, F |
Suffix instruction usage is not necessary in most cases because
the width of the memory variable can be deduced by its type attribute
or the width is determined by the register used as one of the operands.
An error is reported if the register is in conflict with the suffix,
for instance in MOVW AL,[ESI]
.
Mnemonic suffix notation is sporadicly used in other assemblers or in CPU documentations, see STOSB/W/D, OUTSB/W/D, RETN/F etc. €ASM just extends this enhancement.
Mnemonics of many SIMD instructions terminate with letters
~SS, ~SD, ~PS, ~PD
which specify the type of operands, too (Scalar/Packed Single/Double-precision). €ASM does not treat them as mnemonic suffixes.There are a few overloads (conflicts) of suffixed mnemonics with IA-32 instructions, they are resolvable by the type and by the number of operands:
|00000000: | ; Standard Move versus MMX Move Doubleword: |00000000:C7450800000000 | MOVD [EBP+8],0 ; Store immediate number to DWORD memory location (suffix ~D). |00000007:0F7E4508 | MOVD [EBP+8],MM0 ; Store DWORD from MMX register to the memory location. |0000000B: | |0000000B: | ; Shift versus Double Precision Shift: |0000000B:C1650804 | SHLD [EBP+8],4 ; Shift left logical the DWORD in memory location by 4 bits (suffix ~D). |0000000F:0FA4450804 | SHLD [EBP+8],EAX,4 ; Shift left 4 bits from register EAX to the memory location. |00000014: | |00000014: | ; Compare String versus Compare Scalar Double-precision FP number: |00000014:A7 | CMPSD ; Compare DWORDs at [DS:ESI] and [ES:EDI] (suffix ~D). |00000015:A7 | CMPSD [ESI],[EDI] ; Ditto, documented with explicit operands. |00000016:F20FC2CA00 | CMPSD XMM1,XMM2,0 ; Compare scalar float64 numbers for EQUAL.
Machine instructions with the same mnemonic name and functionality sometimes may be encoded
to a different machine codes. For instance, an immediate value can be optionally encoded in one byte
when it does not exceed the range -128..+127, or it can be encoded as a full word or doubleword.
Similar rule applies to encoding of displacement value in an address expressions.
Scaled address expression such as [1*ESI+EBX]
may be encoded without SIB as
[ESI+EBX]
or using the SIB byte with explicit scaling factor 1.
€ASM prefers the shortest variant but this may be changed with additional keyword operands called instruction modifiers.
Many other assemblers decorate operands with special directives
byte, word, dword, qword, short, strict, near, far, ptr
to achieve specific encoding, for instanceadd word ptr [StringOfBytes + 4], 0x20
orjmp short SomeLabel
. Instead of those directives, €ASM uses either mnemonic suffix, or instruction modifiers.Consecuently AVX instruction modifiers
MASK=, ZEROING=, SAE=, ROUND=, BCST=
are used in €ASM instead of inconsistent and poorly documented decorators, such as{k} {z} {ru-sae} {4to16} {uint16} {cdab}
proposed by [IntelAVX512] and [IntelMVEX].
A modifier typical value is an enumerated token such as BYTE, WORD, DWORD
etc.
The majority of enumerated modifier values may be abbreviated to their first letter.
Both names and values of the instruction modifiers are case insensitive.
Some modifiers are boolean type, their value may be TRUE, YES, ON, ENABLE, ENABLED
if true, and FALSE, NO, OFF, DISABLE, DISABLED
otherwise.
Boolean modifier may also be an expression which evaluates to zero (false) or nonzero (true),
see boolean extended values.
When the requested modifier cannot be satisfied, €ASM raises a warning and ignores it.
Modifiers actually used for encoding can be displayed by switching ON the EUROASM option DISPLAYENC=. In this case €ASM accompanies each machine instruction with a D1080 diagnostic message that explicitly documents which modifiers were used for encoding:
| | EUROASM DISPLAYENC=ON |00000000:694D10C8000000 | IMUL ECX,[EBP+16],200 |# D1080 Emitted size=7,DATA=DWORD,DISP=BYTE,SCALE=SMART,IMM=DWORD. |00000007: | |00000007:62F1ED2CF44D02<5 | VPMULUDQ YMM1,YMM2,[EBP+40h],MASK=K4 |# D1080 Emitted size=7,PREFIX=EVEX,MASK=K4,ZEROING=OFF,DATA=YWORD,BCST=OFF,OPER=2,DISP=BYTE,SCALE=SMART.As a heritage from the evolution of older processors, some machine instructions
have more than one encoding. For instance the instruction POP rAX
may be encoded either as 0x58 or as 0x8FC0, keeping the same
functionality. Modifier CODE= selects which encoding should €ASM use.
Operation-code modifier may be SHORT or LONG
alias S or L. Default behaviour is the one which selects shorter encoding, usually CODE=SHORT
.
When an instruction has two possible encodings with the same size, CODE=SHORT selects the variant with numerically lower opcode.
|00000000:43 | INC EBX |00000001:43 | INC EBX,CODE=SHORT ; Intel 8080 legacy encoding, not available in 64-bit mode. |00000002:FFC3 | INC EBX,CODE=LONG |00000004: | |00000004:50 | PUSH EAX |00000005:50 | PUSH EAX,CODE=SHORT ; Intel 8080 legacy encoding, not available in 64-bit mode. |00000006:FFF0 | PUSH EAX,CODE=LONG |00000008: | |00000008:87CA | XCHG ECX,EDX |0000000A:87D1 | XCHG ECX,EDX,CODE=LONG ; Modifier swaps operands in commutative operations XCHG, TEST. |0000000C:87D1 | XCHG EDX,ECX |0000000E:87CA | XCHG EDX,ECX,CODE=LONG |00000010: | |00000010:C3 | RET |00000011:C3 | RET CODE=LONG |00000012:C20000 | RET CODE=SHORT ; Numerically lower opcode 0xC2 requested, which requires imm16. |00000015: | |00000015:83C07F | ADD EAX,127 |00000018:83C07F | ADD EAX,127,CODE=LONG |0000001B:057F000000 | ADD EAX,127,CODE=SHORT ; Shorter opcode 0x05 requested, which cannot sign-extend imm8.In some cases explicit request for numerically lower opcode withCODE=SHORT
may lead to a longer encoding, see the exampleADD r32,imm8
above.
This modifier controls operation-size, i. e. the width of data that the instruction operates on. It may be one of BYTE, WORD, DWORD, QWORD, TBYTE, OWORD, YWORD, ZWORD alias B, W, D, Q, T, O, Y, Z. The default is not specified.
Modifier DATA= has the same function as instruction suffix, they are only two differences:
There are two other ways how the operand width is controlled. If one of operands is a register, its width prevails and this cannot be overriden with suffix or modifier. When the operand width is not determined with the register, suffix nor modifier, €ASM looks at the TYPE# attribute of the target operand.
Priority of operand-size specifications:
See the following examples:
|00000000:00000000 |MemoryVariable DB 0,0,0,0 |00000004:0107 | ADD [EDI],EAX ; Operand width is set by the register (32 bits). |00000006:830701 | ADDD [EDI],1 ; Operand width is set by the suffix (32 bits). |00000009:66830701 | ADD [EDI],1,DATA=W ; Operand width is set by the modifier (16 bits). |0000000D:800701 | ADDB [EDI],1,DATA=W ; Operand width is set by the suffix (8 bits). Warning:modifier ignored. |## W2401 Modifier "DATA=WORD" could not be obeyed in this instruction. |00000010:660107 | ADDB [EDI],AX ; Operand width is set by the register (16 bits). Error:suffix ignored. |### E6740 Impracticable operand-size requested with mnemonic suffix. |00000013:8387[00000000]01 | ADDD [EDI+MemoryVariable],1 ; Operand width is set by the suffix (32 bits). |0000001A:668387[00000000]01 | ADD [EDI+MemoryVariable],1,DATA=W ; Operand width is set by the modifier (16 bits). |00000022:8087[00000000]01 | ADD [EDI+MemoryVariable],1 ; Operand width is set by TYPE# MemoryVariable = 'B' (8 bits). |00000029:800701 | ADD [EDI],1 ; Error:Operand width is not specified. |### E6730 Operand size could not be determined, please use DATA= modifier.Some instructions allow to encode a small immediate value as one byte, although they operate with full words. The byte value is sign-extended by CPU at run-time.
Modifier IMM=
may have value BYTE, WORD, DWORD, QWORD alias B, W, D, Q
and it specifies how should the immediate operand be encoded in the instruction.
Displacement address portion in some instructions may be encoded into
one byte when its value is in the range -128..+127. The byte value is sign-extended by the CPU at run-time.
Values outside this range are encoded in full size, i. e. as WORD, or DWORD, according to the
segment width (possibly inverted with ATOGGLE prefix).
This is the default behaviour of €ASM.
Modifier DISP=
can have the same enumerated values as IMM= modifier
(BYTE, WORD, DWORD, QWORD alias B, W, D, Q)
and it controls whether the displacement is encoded with full size or as a byte.
Scaling means multiplication of the contents of the index register
with 0, 1, 2, 4 or 8 at run-time.
The SCALE=
modifier can be either SMART or VERBATIM
(or shortly S, V). Default is SCALE=SMART
.
In verbatim mode
no optimisation is performed with index and base registers and
the scaling is encoded in SIB byte even when the scale factor is 1 or 0.
Encoding of instruction with SCALE=VERBATIM
uses SIB byte, if possible.
In smart mode (default) €ASM tries
to rearrange registers and not emit SIB byte unless absolutely necessary.
Here are the "smart" optimisation rules
(IR
is indexregister, BR
is baseregister, disp
is displacement):
[0*IR+BR+disp] > [BR+disp]
[1*IR+BR+disp] > [IR+BR+disp]
[2*IR+disp] > [BR+IR+disp]
Notice that an optimisation with SCALE=SMART may change the register role (base|index) and consequently the default segment register (SS|DS) used for addressing. This is usually not an issue in flat memory model, otherwise use SCALE=VERBATIM.
When the instruction encoding is displayed with EUROASM DisplayEnc=Yes
,
modifier SCALE=VERBATIM tells that SIB was actually emitted in this encoding,
otherwise SCALE=SMART signalizes no SIB byte.
This modifier specifies the distance of a target in control-transfer instructions. It can be one of FAR, NEAR, SHORT alias F, N, S.
DIST=FAR
is used when the target is in a different segment
and both rIP and CS registers need to be changed.
By default in intrasegment transfers €ASM automatically selects between SHORT and NEAR distance depending on the magnitude of the offsets difference.
Modifier DIST= has the same function as instruction suffix, they are only two differences:
JMP, CALL, RET
while DIST= modifier can also be applied to control-transfer instructions
LOOPcc, Jcc, JrCXZ
.Modifier DIST=NEAR
or DIST=FAR
can be also applied
to PROC, PROC1
pseudoinstructions.
As a consequence of making a FAR procedure is that CALLs and JMPs to that procedure will be
by default FAR, and that any RET inside this procedure will default to DIST=FAR
, too.
This modifier will choose the reference frame of memory addressing in 64-bit mode.
Allowed values are ABS, REL alias A, R.
A number encoded in the instruction code with absolute addressing is related
to the start of segment, which is always 0 at assembly time.
In a relative adressing frame it is related to the position of the next instruction,
i. e. to the contents of register RIP.
In legacy modes (16-bit, 32-bit) the reference frame is hardwired as ADDR=REL
in control-transfer
instructions (direct JMP, CALL, LOOP, Jcc), and as ADDR=ABS
in all other instructions.
RIP-relative addressing is shorter by one byte and it does not require relocation, which saves space in an object file
and avoids patching of the code at load-time. That is why ADDR=REL
is preferred by default
in 64-bit mode.
Explicit selection between absolute and RIP-relative addressing is relevant only
in 64-bit mode when the absolute address would require relocation at link-time.
This happens when the memory variable is specified as a displacement of an address symbol
(not a plain number), and no index or base register is involved in addressing.
All following modifiers apply only to instructions which use Advanced Vector eXtensions (AVX) encoding. Possible value of prefix is XOP, VEX, VEX2, VEX3, MVEX, EVEX (shortcuts are not available).
Most AVX-encodable instructions have their mnemonics prefixed with V~. Some instructions are defined with only one kind of AVX prefix, they don't need explicit modifier. When an instruction can be alternatively encoded with different AVX prefixes, €ASM will by default choose the shortest one.
Prefix VEX exists in two variants: VEX2 and VEX3. The longer encoding (VEX3) is automatically selected when the instruction uses indexregister or baseregister R8..R15 or when it uses opcode from map 0F38 or 0F3A.
Prefix EVEX or MVEX will be selected instead of VEX when the instruction uses register XMM16..XMM31, YMM16..YMM31, ZMM0..ZMM31, K0..K7, or modifier EH=, SAE=, ROUND=, MASK=, ZEROING=, OPER=.
Instruction encodable with both EVEX and MVEX default to PREFIX=EVEX
.
Software written for Intel® Xeon Phi™ CPU needs to explicitly request PREFIX=MVEX
in each such amphibious instruction.
In this case it is useful to disable EVEX EUROASM EVEX=DISABLED
and thus be warned if some MVEX instruction encodes as EVEX by omission.
Explicit specification of modifier EH=
(which is available with MVEX only)
will select MVEX too, and explicit PREFIX=MVEX
is not necessary in this case.
Prefix | Required EUROASM options |
---|---|
XOP | SIMD=AVX, AMD=ENABLED, XOP=ENABLED |
VEX | SIMD=AVX |
MVEX | SIMD=AVX512, MVEX=ENABLED |
EVEX | SIMD=AVX512, EVEX=ENABLED |
Modifier MASK=
(as well as ZEROING=, EH=, SAE=, ROUND=, BCST=, OPER=
)
is applicable only with Enhanced Advanced Vector eXtensions (EVEX or MVEX).
MASK specifies which opcode mask register is used to control which elements (floating-point or integer numbers)
should be written to the destination SIMD register. Only those elements which have the corresponding bits in mask-register set,
are written. Other elements are either zeroed (if modifier ZEROING=ON
) or left unchanged (ZEROING=OFF
).
Possible value of MASK= is K0, K1, K2, K2, K3, K4, K5, K6, K7 or an expression which evaluates to a number 0..7.
Default is MASK=0
. Opmask register K0 is special, it is treated as if it had all bits set, thus no masking is applied in this case.
Modifier ZEROING=
is boolean, it controls whether elements masked-off by the contents of opmask register
should be set to zero or left unchanged, which is called merging. It has no meaning when MASK=K0
or
when mask is not specified at all. Default is ZEROING=OFF
(merging). Modifier is applicable only
with EVEX encoding.
Boolean modifier EH=
(Eviction Hint) is applicable with the MVEX-encoded instructions only.
EH=1
informs CPU that the data is non-temporal and it is unlikely to be reused soon
so it has no effect to store them in CPU cache. This concerns register-to-memory instructions only.
Value of EH is also consulted in register-to-register instructions where it will select between swizzle operations and static rounding.
If boolean modifier SAE=
(Suppress All Exceptions) is switched on,
the instruction will not raise any kind of floating-point exception flags,
for instance when it operated with not-a-number value.
Instruction with SAE=ON
behaves as if all the MXCSR mask bits were set.
In EVEX-encoding SAE is by default enabled whenever static rounding is used, this behaviour cannot be switched off.
Modifier ROUND=
specifies static rounding mode, it is applicable on EVEX and MVEX instructions
with rounding semantic, for instance for conversion from double to single-precision FP numbers.
It has four possible enumerated values: NEAR, UP, DOWN, ZERO alias N, U, D, Z.
Static rounding is available only in ZMM register-to-register operations (not if one of the operands is in memory or when XMM and YMM registers are used). Default is no rounding, in this case general rounding mode controlled by RM bits in MXCSR applies.
Boolean modifier BCST=
can be used to enable data broadcasting in operations which
load data from memory. When BCST=ENABLED
, the memory source operand specifies only one element
and its contents will be broadcast (copied) to all positions of the destination register.
Default is BCST=OFF
. Broadcasting cannot be used with register-to-register operations.
Instruction modifier OPER=
encodes kind of operation performed with the source operand at run-time.
Affected operations are broadcasting, rounding, conversion, swizzling.
Possible value is a numeric expression which evaluates to 0..7.
Value of the operation will be encoded in bits 6, 5, 4 of 32-bit prefix EVEX or MVEX.
These bits are named S2, S1, S0 in MVEX specification
[IntelMVEX], and L', L, b in EVEX specification
[IntelAVX512].
The same bits are also affected by the modifiers BCST=, ROUND=, SAE=
and by SIMD register width, but direct
OPER= specification has higher priority when a conflict occurs.
Modifier OPER= is the only way how to request special conversion or swizzle (shuffle) operation for MVEX-encoded
instruction available on Intel® Xeon Phi™ CPU. Not all operation values from the table below are available with all MVEX instructions,
documentation in [IntelMVEX] should always be consulted prior to using OPER=
.
OPER= | register-to-register, EH=0 | register-to-register, EH=1 | memory-to-register | register-to-memory |
---|---|---|---|---|
0 | no swizzle {dcba} | ROUND=NEAR,SAE=NO | no operation | no conversion |
1 | swap (inner) pairs {cdab} | ROUND=DOWN,SAE=NO | bcst 1 element {1to16} or {1to8} | not available |
2 | swap with two-away {badc} | ROUND=UP,SAE=NO | bcst 4 elements {4to16} or {4to8} | not available |
3 | cross-product swizzle {dacb} | ROUND=ZERO,SAE=NO | convert from {float16} | convert to {float16} |
4 | bcst a element across 4 {aaaa} | ROUND=NEAR,SAE=YES | convert from {uint8} | convert to {uint8} |
5 | bcst b element across 4 {bbbb} | ROUND=DOWN,SAE=YES | convert from {sint8} | convert to {sint8} |
6 | bcst c element across 4 {cccc} | ROUND=UP,SAE=YES | convert from {uint16} | convert to {uint16} |
7 | bcst d element across 4 {dddd} | ROUND=ZERO,SAE=YES | convert from {sint16} | convert to {sint16} |
OPER= | register-to-register | memory-to-register |
---|---|---|
0 | DATA=OWORD,SAE=NO | DATA=OWORD,BCST=OFF |
1 | DATA=ZWORD,SAE=YES,ROUND=NEAR | DATA=OWORD,BCST=ON |
2 | DATA=YWORD,SAE=NO | DATA=YWORD,BCST=OFF |
3 | DATA=ZWORD,SAE=YES,ROUND=DOWN | DATA=YWORD,BCST=ON |
4 | DATA=ZWORD,SAE=NO | DATA=ZWORD,BCST=OFF |
5 | DATA=ZWORD,SAE=YES,ROUND=UP | DATA=ZWORD,BCST=ON |
6 | reserved | reserved |
7 | DATA=ZWORD,SAE=YES,ROUND=ZERO | reserved |
Alignment request may be applied to any machine instruction, and to pseudoinstructions D, PROC, PROC1, STRUC. See the alignment paragraph for accepted values. This instruction modifier has the same effect as if explicit pseudoinstruction ALIGN was placed above the statement.
This is a pseudoinstruction modifier, it can be applied only to pseudoinstructions
PROC, ENDPROC, PROC1, ENDPROC1.
Its value is boolean, default is NESTINGCHECK=ON
. Switching the nesting control off
will suppress error message on block mismatch.
This enables to establish bounds between macros which enhance some block pseudoinstructions.
See the definitions of macros Procedure and
EndProcedure as an example.
Some instructions in IA-64 work with registers fixed by design. €ASM accepts voluntary explicit specification of such registers which serves as a documentation for human reader and sometimes it may be exploited as address-size definition and|or segment override.
Unary FPU instructions with implicit destination ST0 may explicitly name this register as the first operand, or it may be omitted. In many other FPU instructions the default destination is ST0 and the default source is ST1, in which case one or both operands may be omitted. See also handlers of instructions FNOP, FCMOVB, FADD, FIADD, FADDP, FXCH, FCOM.
|00000000:000000000000F03F |Mem DQ 1.0 |00000008: | |00000008:DAC1 | FCMOVB ; ST0 = ST1 if Below. |0000000A:DAC1 | FCMOVB ST0,ST1 ; ST0 = ST1 if Below. |0000000C: | |0000000C:DAC7 | FCMOVB ST0,ST7 ; ST0 = ST7 if Below. |0000000E:DAC7 | FCMOVB ST7 ; ST0 = ST7 if Below. |00000010: | |00000010:D8C1 | FADD ; ST0 += ST1. |00000012:D8C1 | FADD ST0,ST1 ; ST0 += ST1. |00000014: | |00000014:DC05[00000000] | FADD ST0,[Mem] ; ST0 += [Mem]. |0000001A:DC05[00000000] | FADD [Mem] ; ST0 += [Mem]. |00000020: | |00000020:DCC7 | FADD ST7,ST0 ; ST7 += ST0. |00000022:DCC7 | FADD ST7 ; ST7 += ST0. |00000024: | |00000024:D9E9 | FLDL2T ; ST0 = log210. |00000026:D9E9 | FLDL2T ST0 ; ST0 = log210.String instructions are implicitly addressing the source as memory [DS:rSI]
or port DX
, and the destination as memory [ES:rDI]
or port DX
.
Beside the non-operand version €ASM accepts operand(s) explicitly representing source and destination,
with possible segment-override and address-size change.
Default translation table is implicitly addressed with [DS:rBX]
.
€ASM accepts optional memory operand which can specify nondefault segment override
and nondefault rBX width.
LOOP count register can be specified as the optional second operand.
|00000000:D7 | XLAT |00000001:D7 | XLATB ; XLAT and XLATB are identical. |00000002:D7 | XLATB [DS:EBX] ; Segment DS is the default, no override is necessary. |00000003:26D7 | XLATB [ES:EBX] ; Segment override. |00000005:67D7 | XLATB [BX] ; Address-size changed from 32 to 16 bits. |00000007: | |00000007:E2F6 | LOOP $-8 |00000009:E2F6 | LOOP $-8,ECX ; Default counter in 32-bit mode is ECX. |0000000B:67E2F5 | LOOP $-8,CX ; Counter register (its address-size) changed to 16 bits.Looping is not limited to a short-range distance in €ASM. When the destination of LOOP, LOOPcc, JCXZ, JECXZ, JRCXZ is far or near (out of byte range), €ASM will assemble three instructions instead:
LOOP $+2+2 ; Loop to the proxy-jump instead of the original destination.
JMPS $+JMPSsize+JMPsize ; Skip the proxy-jump when the loop has finished (rCX is zero).
JMP target ; Near or far unconditional proxy-jump to the original destination.
|[CODE1] |[CODE1] SEGMENT
|00000000:E366 | JECXZ CloseLabel:
|00000002:E364 | JECXZ CloseLabel:,DIST=SHORT
|00000004:E302EB05E95B000000 | JECXZ CloseLabel:,DIST=NEAR
|0000000D:E302EB07EA[68000000]{0000}| JECXZ CloseLabel:,DIST=FAR
|00000018: |
|00000018:E302EB05E947010000 | JECXZ DistantLabel:
|00000021:E302EB05E93E010000 | JECXZ DistantLabel:,DIST=SHORT
|## W2401 Modifier "DIST=SHORT" could not be obeyed in this instruction.
|0000002A:E302EB05E935010000 | JECXZ DistantLabel:,DIST=NEAR
|00000033:E302EB07EA[68010000]{0000}| JECXZ DistantLabel:,DIST=FAR
|0000003E: |
|0000003E:E302EB07EA[00000000]{0000}| JECXZ FarLabel:
|00000049:E302EB07EA(00000000){0000}| JECXZ FarLabel:,DIST=SHORT
|## W2401 Modifier "DIST=SHORT" could not be obeyed in this instruction.
|00000054:E302EB05E9(00000000) | JECXZ FarLabel:,DIST=NEAR
|0000005D:E302EB07EA[00000000]{0000}| JECXZ FarLabel:,DIST=FAR
|00000068: |CloseLabel:
|00000068:909090909090909090909090~~| DB 256 * B 0x90 ; Some stuff to stall off the DistantLabel.
|00000168: |DistantLabel:
|[CODE2] |[CODE2] SEGMENT
|00000000: |FarLabel:
A conditional jump to the distance exceeding the byte limit -128..127 was introduced with 386 CPU.
When the program is intended to run on older processors as well, near and far conditional jump
Jcc target
will be assembled by €ASM as two instructions:
J!cc $+J!ccsize+JMPsize ; Skip the proxy-jump if inverted condition is true.
JMP target ; Near or far unconditional proxy-jump to the original destination.
Near proxy-jump instead of standard 386 near conditional jump is assembled when these three conditions are met:
In many assemblers instructions PUSH, POP, INC, DEC may have just one operand. €ASM does not limit the number of operands, they are performed one by one in the specified order. If an instruction modifier or suffix is used, it applies to all operands. |00000000:57FF370FA06A04 | PUSH EDI,[EDI],FS,4 |00000007:590FA18F0658 | POP ECX,FS,[ESI],EAX |0000000D:40FF07 | INC EAX,[EDI],DATA=DWORD |00000010:48664AFEC9 | DEC EAX,DX,CL
Instructions AAD
and AAM use radix 10 by default
for adjusting AL before division or after multiplication of binary decimals.
In €ASM they accept optional 8-bit immediate operand, for instance AAD 16
.
|00000000:D40A | AAM
|00000002:D40A | AAM 10
|00000004:D410 | AAM 16
|00000006:D50A | AAD
|00000008:D50A | AAD 10
|0000000A:D510 | AAD 16
When both operands in TEST instruction specify the same register, the second operand may be omitted.
When the number of bits to rotate or shift in instructions RCL, ROL, SAL, SHL, RCR, ROR, SAR, SHR is equal to 1, the second operand may be omitted.
|00000000:85D2 | TEST EDX,EDX |00000002:85D2 | TEST EDX ; Operand2 of TEST is by default identical with Operand1. |00000004: | |00000004:D1D0 | RCL EAX,1 |00000006:D1D0 | RCL EAX ; Omitted rotate or shift count defaults to 1. |00000008:D165F8 | SHL [EBP-8],1,DATA=DWORD |0000000B:D165F8 | SHL [EBP-8],DATA=DWORDInstruction which does nothing (no-operation) except for taking some time and incrementing
instruction-pointer register, is implemented in all x86 processors as one-byte NOP,
actually XCHG rAX,rAX
(opcode 0x90). With Pentium II (CPU=686
) Intel proposed
dedicated multibyte no-operation instructions with opcodes 0x18..0x1F prefixed with 0x0F.
Multibyte NOP is more suitable for alignment purposes than series of one-byte NOPs,
as it's fetched and executed at once. On older CPU this real NOP must be emulated
with legacy instructions, e.g. XCHG reg,reg
or LEA reg,[reg]
.
[Sandpile] and [NasmInsns]
define real-NOP mnemonic as an undocumented instructions HINT_NOP0, HINT_NOP1, HINT_NOP2..63
.
with one memory operand of the desired length.
Instead of clutterring the instruction list with 64 new mnemonics, €ASM implements
just one mnemonic HINT_NOP
(suffixable as HINT_NOPW, HINT_NOPD, HINT_NOPQ
)
with ordinal number defined in the first immediate operand, and memory specification
moved aside to the 2nd operand.
Beside that, €ASM implements operandless instructions NOP1, NOP2, NOP3, NOP4, NOP5, NOP6, NOP7, NOP8, NOP9
which occupy the specified number of bytes, respecting the current CPU mode and level:
Mnemonic | Operation code (hexa) | Equivalent instruction in €ASM syntax |
---|---|---|
16-bit mode, CPU=086 | ||
NOP1 | 90 | XCHG AX,AX |
NOP2 | 87C9 | XCHG CX,CX |
NOP3 | 9087C9 | XCHG AX,AX ; XCHG CX,CX |
NOP4 | 87C987D2 | XCHG CX,CX ; XCHG DX,DX |
NOP5 | 9087C987D2 | XCHG AX,AX ; XCHG CX,CX ; XCHG DX,DX |
NOP6 | 87C987D287DB | XCHG CX,CX ; XCHG DX,DX ; XCHG BX,BX |
NOP7 | 9087C987D287DB | XCHG AX,AX ; XCHG CX,CX ; XCHG DX,DX ; XCHG BX,BX |
NOP8 | 87C987D287DB87E4 | XCHG CX,CX ; XCHG DX,DX ; XCHG BX,BX ; XCHG SP,SP |
NOP9 | 9087C987D287DB87E4 | XCHG AX,AX ; XCHG CX,CX ; XCHG DX,DX ; XCHG BX,BX ; XCHG SP,SP |
16-bit mode, CPU=686 | ||
NOP1 | 90 | NOP DATA=WORD |
NOP2 | 6690 | OTOGGLE NOP |
NOP3 | 666790 | OTOGGLE ATOGGLE NOP |
NOP4 | 670F1F00 | NOP [EAX],DATA=WORD |
NOP5 | 670F1F4000 | NOP [EAX],DATA=WORD,DISP=BYTE |
NOP6 | 670F1F442000 | NOP [EAX+0*EAX],DATA=WORD,SCALE=VERBATIM,DISP=BYTE |
NOP7 | 66670F1F442000 | NOP [EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=BYTE |
NOP8 | 670F1F8000000000 | NOP [EAX],DATA=WORD,DISP=DWORD |
NOP9 | 670F1F842000000000 | NOP [EAX+0*EAX],DATA=WORD,SCALE=VERBATIM,DISP=DWORD |
32-bit mode, CPU=386 | ||
NOP1 | 90 | XCHG EAX,EAX,DATA=DWORD |
NOP2 | 6690 | XCHG AX,AX,DATA=WORD |
NOP3 | 8D4000 | LEA EAX,[EAX],DATA=DWORD |
NOP4 | 8D442000 | LEA EAX,[EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=BYTE |
NOP5 | 3E8D442000 | LEA EAX,[DS:EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=BYTE |
NOP6 | 8D8000000000 | LEA EAX,[EAX],DATA=DWORD,DISP=DWORD |
NOP7 | 8D842000000000 | LEA EAX,[EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=DWORD |
NOP8 | 3E8D842000000000 | LEA EAX,[DS:EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=DWORD |
NOP9 | 663E8D842000000000 | LEA AX,[DS:EAX+0*EAX],DATA=WORD,SCALE=VERBATIM,DISP=DWORD |
32-bit mode, CPU=686 | ||
NOP1 | 90 | NOP DATA=DWORD |
NOP2 | 6690 | NOP DATA=WORD |
NOP3 | 0F1F00 | NOP [EAX],DATA=DWORD |
NOP4 | 0F1F4000 | NOP [EAX],DATA=DWORD,DISP=BYTE |
NOP5 | 0F1F442000 | NOP [EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=BYTE |
NOP6 | 660F1F442000 | NOP [EAX+0*EAX],DATA=WORD,SCALE=VERBATIM,DISP=BYTE |
NOP7 | 0F1F8000000000 | NOP [EAX],DATA=DWORD,DISP=DWORD |
NOP8 | 0F1F842000000000 | NOP [EAX+0*EAX],DATA=DWORD,SCALE=VERBATIM,DISP=DWORD |
NOP9 | 660F1F842000000000 | NOP [EAX+0*EAX],DATA=WORD,SCALE=VERBATIM,DISP=DWORD |
64-bit mode, CPU=X64 | ||
NOP1 | 90 | NOP DATA=DWORD |
NOP2 | 6690 | NOP DATA=WORD |
NOP3 | 0F1F00 | NOP [RAX],DATA=DWORD |
NOP4 | 0F1F4000 | NOP [RAX],DATA=DWORD,DISP=BYTE |
NOP5 | 0F1F442000 | NOP [RAX+0*RAX],DATA=DWORD,SCALE=VERBATIM,DISP=BYTE |
NOP6 | 660F1F442000 | NOP [RAX+0*RAX],DATA=WORD,SCALE=VERBATIM,DISP=BYTE |
NOP7 | 0F1F8000000000 | NOP [RAX],DATA=DWORD,DISP=DWORD |
NOP8 | 0F1F842000000000 | NOP [RAX+0*RAX],DATA=DWORD,SCALE=VERBATIM,DISP=DWORD |
NOP9 | 660F1F842000000000 | NOP [RAX+0*RAX],DATA=WORD,SCALE=VERBATIM,DISP=DWORD |
Mnemonic | Operation code (hexa) | Equivalent instruction in €ASM syntax |
Instructions PINSRB, PINSRW, PINSRD (insert Byte/Word/Dword into the destination register XMM) accept as source register (operand 2) not only GPR with the corresponding width, but any wider register. Only lowest byte|word|dword from this register is used.
|00000000:660F3A20C902 | PINSRB XMM1,CL,2 |00000006:660F3A20C902 | PINSRB XMM1,CX,2 |0000000C:660F3A20C902 | PINSRB XMM1,ECX,2 |00000012: | |00000012:660FC4C902 | PINSRW XMM1,CX,2 |00000017:660FC4C902 | PINSRW XMM1,ECX,2Instruction for variable blending uses fixed implied register XMM0 as a mask register. €ASM allows explicit specification of XMM0 as the third operand.
|00000000:660F3815CA | BLENDVPD XMM1,XMM2 |00000005:660F3815CA | BLENDVPD XMM1,XMM2,XMM0 |0000000A: | |0000000A:660F3814CA | BLENDVPS XMM1,XMM2 |0000000F:660F3814CA | BLENDVPS XMM1,XMM2,XMM0 |00000014: | |00000014:660F3810CA | PBLENDVB XMM1,XMM2 |00000019:660F3810CA | PBLENDVB XMM1,XMM2,XMM0Maskable copy to memory uses [DS:rDI] as the fixed destination. €ASM allows explicit specification of the destination memory as the optional first operand.
|00000000:0FF7CA | MASKMOVQ MM1,MM2 |00000003:0FF7CA | MASKMOVQ [DS:EDI],MM1,MM2 ; Default destination is [DS:EDI]. |00000006:260FF7CA | MASKMOVQ [ES:EDI],MM1,MM2 ; Segment override. |0000000A: | |0000000A:660FF7CA | MASKMOVDQU XMM1,XMM2 |0000000E:660FF7CA | MASKMOVDQU [DS:EDI],XMM1,XMM2 ; Default destination is [DS:EDI]. |00000012:26660FF7CA | MASKMOVDQU [ES:EDI],XMM1,XMM2 ; Segment override.Segment descriptor in system instruction VERR, VERW (operand 1) and LAR, LSL (operand 2) may be specified as 16-bit memory variable or 16, 32 or 64-bit GPR (only lower 16 bits are used).
|00000000:0F00E6 | VERR SI |00000003:0F00E6 | VERR ESI |00000006: | |00000006:0F00EE | VERW SI |00000009:0F00EE | VERW ESI |0000000C: | |0000000C:660F02C6 | LAR AX,SI |00000010:660F02C6 | LAR AX,ESI |00000014:0F02C6 | LAR EAX,SI |00000017:0F02C6 | LAR EAX,ESI |0000001A: | |0000001A:660F03C6 | LSL AX,SI |0000001E:660F03C6 | LSL AX,ESI |00000022:0F03C6 | LSL EAX,SI |00000025:0F03C6 | LSL EAX,ESI€ASM supports a few instructions which are not documented in the official specification published by CPU manufacturer.
They may not work with all processor generations and they require explicit feature EUROASM UNDOC=ENABLED
.
For more information see instruction handlers BB0_RESET, CMPXCHG486, F4X4, FCOM2, FCOMP5, FFREEP, FMUL4X4, FNSETPM, FRSTPM, FSBP1, FSBP2, FSBP3, FSTDW, FSTP1, FSTP8, FSTP9, FSTSG, FXCH4, FXCH7, HCF, HINT_NOP, IBTS, ICEBP, INT1, JMPE, LOADALL, LOADALL286, PREFETCHWT1, PSRAQ, SAL2, SALC, SETALC, SMINTOLD, TEST2, UD0, UD1, UD2A, UMOV, XBTS, VLDQQU.
Pseudoinstructions (sometimes also called directives) are orders for the assembler which are formally similar to ordinary machine instructions — many of them may have label field and operands. Some pseudoinstructions (ALIGN and D) can even emit data or code.
With the EUROASM pseudoinstruction the programmer controls various settings of EuroAssembler
- EUROASM options. Particular options are set with the keyword operands.
The same keywords are used in [EUROASM]
section of
euroasm.ini configuration file.
Options specified with this pseudoinstruction rewrite default options set in the configuration file. Names of those options are case-insensitive.
Current value can be retrieved in the form of
EUROASM system %^variables, for instance
InfoMsg DB "This program uses code page %^CODEPAGE.",13,10,0
For options which expect a Boolean value it may be provided with enumerated tokens TRUE, YES, ON, ENABLE, ENABLED
or FALSE, NO, OFF, DISABLE, DISABLED
(case insensitive) or they may contain
a logical expression.
Beside the keyword options the EUROASM pseudoinstruction also recognizes ordinal operand(s) which may have one of two enumerated values PUSH or POP. €ASM maintains a special option stack and these two directives allow to save and retrieve the whole set of EUROASM options to this stack. This feature is handy in macros which temporarily require some unusual option value. Blindly setting the option in macro would have had side effect on the statements following the macro invocation, because EUROASM is a switching statement. So it is better to save the current options on its stack at the beginning of macro and restore them at the end; other statements will not be influenced. Example:
SomeMacro %MACRO ; Macro definition. EUROASM PUSH, NOWARN=2102 ; Store all options to the option-stack and then supress the warning W2102. ; Here go instructions which may emit warning message W2102 ... EUROASM POP ; Restore the option-stack, W2102 is no longer suppressed. %ENDMACRO SomeMacro
This is a Boolean option; default is AUTOALIGN=ON
. Memory variables
created or reserved with D pseudoinstruction will be implicitly aligned
according to their TYPE#.
Aligned memory-variables can be accessed faster, on the other hand this option may blow up the size of your program if data definition of various types are mixed frequently. It's better to manually group data of the same size, so the alignment stuff is used only once per group.
EUROASM AUTOALIGN=
status.Structured data variables (defined with DS structure_name
)
do not autoalign by their largest member.
They are aligned by the segment width (WORD, DWORD or QWORD) if AUTOALIGN=ENABLED.
Programmers should design their structures with respect to the natural alignment of structure members. This is especially important in 64-bit mode, where API requires all data be aligned. On conversion from badly designed 32-bit structures they need manually inserted stuff-members which complete DWORD member sizes to QWORD alignment of the following members, and which rounds up the strucure size to a multiple of 8. See the WinAPI structure MSG as an example.
Autoalignment does not apply to machine instructions. If we want to have a procedure aligned
to the start of a cache boundery (for better performance), it should be aligned
explicitly, for instamce Rapid PROC ALIGN=OWORD
.
This is a Boolean option; default is AUTOSEGMENT=ON
. The section,
where the current statement emits to, is implicitly changed by €ASM according to the
purpose of the statement.
When more than one section with this purpose is defined in a program,
autosegment will switch to the last defined one.
If the statement is a machine instruction or prefix or PROC,
€ASM will switch to the last defined CODE section.
Similary, when the statement defines or reserves data (pseudoinstruction D
and its clones, including DI), the current section is switched
to the last DATA or BSS section.
Pseudoinstruction ALIGN, macros and all nonemitting operations, such as EQU or a solo label, do not change the current section.
If you rely on autosegmentation, avoid a pitfall when the new section begins with a macro invocation, with an explicit ALIGN or with just a label itself. These statement will not autoswitch the current section. You may need to insertNOP
orPROC
to autoswitch to CODE,DB 0
statement to autoswitch to DATA, orDB
to autoswitch to BSS.Example of such pitfall:EUROASM AUTOSEGMENT=ON Hello PROGRAM FORMAT=PE, ENTRY=Main: INCLUDE winapi.htm; Include some basic code macros. Title DB "World!",0 ; Correctly autoswitched to [.data]. Main: StdOutput Title ; Macro didn't swich to [.text] as desired. TerminateProgram ENDPROGRAM Hello ; Hello.exe does not work because its entry is in [.data] section.The label Main: incorrectly remained in previous [.data] section. Remedy is simple:
- insert a machine instruction
Main: NOP
- or make it a procedure
Main: PROC
- or switch the section manually:
[.text]
aboveMain: StdOutput Title
EUROASM AUTOSEGMENT=ON Hello PROGRAM FORMAT=PE, ENTRY=Main: INCLUDE winapi.htm; Include some basic code macros. Title DB "World!",0 ; Correctly autoswitched to [.data]. Main: PROC ; Correctly autoswitched to [.text]. StdOutput Title TerminateProgram ENDPROC Main: ENDPROGRAM Hello ; Hello.exe works as expected.
AUTOSEGMENT= is a weak option, it is automatically switched off when
the programmer changes the current section explicitly with [section_name]
in the label field of statement.
If you want to keep AUTOSEGMENT enabled after manual change of section, you need to explicitly switch it back on withEUROASM AUTOSEGMENT=ON
, or save its state usingEUROASM PUSH
and restore them withEUROASM POP
afterwards.
€ASM can use Unicode strings at run time but the data definitions in the source code are defined in bytes. Option CODEPAGE= tells €ASM which code page it should internally use for string conversion in the source text to Unicode at assembly-time.
Codepage may be specified with a direct 16-bit integer value, as specified by
[CodePageMS],
for instance CODEPAGE=1253
for Greek aplhabet.
Codepage values can also be specified as an enumerated token, such as
CODEPAGE=CP852, CODEPAGE=WINDOWS-1252, CODEPAGE=ISO-8859-2
etc,
see DictCodePages
for the complete list. Names of those specification are case insensitive.
Even though some of those enumerated codepage constants may look like an arithmetic substraction, they are recognized as verbatim tokens and not evaluated.
The factory default and recommended value is CODEPAGE=UTF-8
.
See also Character encoding above.
When an included file is specified without a path, €ASM will search for this file
in the directories which are defined in INCLUDEPATH=
option. Paths can be separated
with a semicolon ; or comma , and the whole list should be in double quotes.
Both backward \ and forward slashes / may be used as folder separator.
The last slash can be omitted. Default is INCLUDEPATH="./,./maclib,../maclib,"
.
This syntax doesn't support directory names which begin or end with a space as a significat part of the name. Nevertheless, such names should be avoided anyway.
When a linked file is specified without a path, €ASM will search for this file
in the directories which are defined in LINKPATH=
option. Paths can be separated
with semicolon ; or comma , and the whole list should be in double quotes.
Both backward \ and forward slashes / may be used as folder separator.
The last slash can be omitted. Default is LINKPATH="./,./objlib,../objlib,"
.
When a dynamic shared object (ELFSO module) is specified without a path, Linux dynamic linker will search for this file
in the directories which are defined in RUNPATH=
option. Paths can be separated
with semicolon ; or comma , and the whole list should be in double quotes.
Both backward \ and forward slashes / may be used as folder separator.
The last slash can be omitted. Default is RUNPATH="./,./objlib,../objlib,"
.
Parameter MAXINCLUSIONS limits the maximal number of succesfull executions of INCLUDE* statements in an €ASM source. This prevents the assembler from resource exhausting in the case of recursive inclusion loop.
Default value is EUROASM MAXINCLUSIONS=64
.
Parameter MAXLINKS limits the maximal number of files specified by LINK statements in an €ASM source. This prevents the assembler from resource exhausting in case of recursive link loop.
Default value is EUROASM MAXLINKS=64
.
Not all IA-32 machine instructions are available on all types of Central Processing Unit (CPU).
This EUROASM option specifies the minimal type of CPU which the program is intended for.
Possible CPU= values are
086 alias 8086,
186,
286,
386,
486,
586 alias PENTIUM,
686 alias P6,
X64.
The default is EUROASM CPU=586
. 64-bit program should have EUROASM CPU=X64
enabled.
EuroAssembler pretends that the later CPU also promotes all instructions supported by previous CPU versions.
This bunch of EUROASM boolean options tells €ASM which CPU features are required on the target computer. By default are all options switched OFF, you should explicitly enable each capability which you intend to program for.
ABM=
assembly of Advanced Bit Manipulation instructions.
AES=
assembly of Intel's Advance Encryption Standard (AESNI) instructions.
AMD=
instructions specific for AMD CPU manufacturer.
CET=
Control-flow Enforcement Technology instructions.
CYRIX=
instructions specific for CYRIX CPU manufacturers.
D3NOW=
assembly of AMD 3DNow! instructions.
EVEX=
assembly of Intel's EVEX-encoded AVX-512 instructions.
FMA=
assembly of Fused Multiply-Add instructions.
FPU=
assembly of Floating-Point Unit instructions (math coprocessor).
LWP=
assembly of AMD's LightWeight Profiling instructions.
MMX=
assembly of MultiMedia Extensions.
MPX=
assembly of Memory Protection Extensions.
MVEX=
assembly of Intel's MVEX-encoded AVX-512 instructions.
PRIV=
assembly of privileged mode instructions.
PROT=
assembly of protected mode instructions.
SGX=
assembly of Software Guard Extensions.
SHA=
assembly of Intel's Secure Hash Algorithm instructions.
SPEC=
assembly of other special instructions.
SVM=
assembly of Shared Virtual Memory instructions.
TSX=
assembly of Intel's Transactional Synchronization Extensions.
UNDOC=
assembly of undocumented instructions.
VIA=
instructions specific for VIA Geode CPU manufacturers.
VMX=
assembly of Virtual Machine Extensions.
XOP=
assembly of AMD's XOP-encoded AVX instructions.
This option defines which Single Instruction Multiple Data (SIMD) generation is required to assemble the following instructions.
Possible enumerated values are
SSE1 alias SSE alias boolean true,
SSE2,
SSE3,
SSSE3,
SSE4,
SSE4.1,
SSE4.2,
AVX,
AVX2,
AVX512.
Default value is SIMD=DISABLED
(no SIMD instructions are expected).
Options CPU generation, CPU features, SIMD generation do not restrain €ASM from assembling instructions for higher CPU but a warning is issued when the instruction requires some capability currently not enabled with EUROASM. This should warn you that your program may not run on every PC, or that you may have made a typo in instruction mnemonics.
Those boolean options are designed for debugging of assembly process, see also pseudoinstruction %DISPLAY. When enabled, €ASM inserts a diagnostic message below each assembled statement, which displays how is the statement parsed into fields, and what modifiers was used for the instruction encoding. Example:
EUROASM DISPLAYSTM=ON .L: MOV EAX,[ESI+16],ALIGN=DWORD EUROASM DISPLAYSTM=OFF, DISPLAYENC=ON LEA EDX,[ESI+16] ADD EAX,EDX
Listing of the previous example is here:
| | EUROASM DISPLAYSTM=ON |00000000:8B4610 |.L: MOV EAX,[ESI+16],ALIGN=DWORD |# D1010 **** DISPLAYSTM ".L: MOV EAX,[~~ALIGN=DWORD " |# D1020 label=".L" |# D1040 machine operation="MOV" |# D1050 ordinal operand number=1,value="EAX" |# D1050 ordinal operand number=2,value="[ESI+16]" |# D1060 keyword operand,name="ALIGN",value="DWORD" | | EUROASM DISPLAYSTM=OFF, DISPLAYENC=ON |# D1010 **** DISPLAYSTM "EUROASM DISPL~~SPLAYENC=ON " |# D1040 pseudo operation="EUROASM" |# D1060 keyword operand,name="DISPLAYSTM",value="OFF" |# D1060 keyword operand,name="DISPLAYENC",value="ON" |00000003:8D5610 | LEA EDX,[ESI+16] |# D1080 Emitted size=3,DATA=DWORD,DISP=BYTE,SCALE=SMART,ADDR=ABS. |00000006:01D0 | ADD EAX,EDX |# D1080 Emitted size=2,CODE=SHORT,DATA=DWORD.Options DUMP=, DUMPWIDTH= and DUMPALL= control how the dump column with emitted code is presented in listing.
The boolean option DUMP=
can switch off the dump completely,
the listing copies the input source almost verbatim in this case. Default is DUMP=ON
.
DUMPWIDTH= sets the width of dump column in €ASM listing.
This option specifies how many characters of dumped data will fit between the starting
| and ending | including those two border characters.
Default value is DUMPWIDTH=27
which is enough for 8byte long instruction.
Accepted dump width value is between 16 and 128 characters.
Dump data consists of an offset (4 or 8 hexadecimal characters, depending on section width), separator : and 2 hexadecimal digits per each byte of generated code.
When the generated code is too long to fit into the dump column, the Boolean option DUMPALL=
decides
if the rest will be omitted (the omittion is indicated by a tilde ~ in place of the last character),
or if additional lines will be inserted to the listing until all generated code is dumped.
Factory default is DUMPALL=OFF
.
See also the description of listing file.
Be careful when setting DUMPALL=ON with long duplicated data definition,
such as DB 2048 * B 0
, because this may clutter the listing with
many lines of the useless dump.
This option defines the name of the listing file. By default it is LISTFILE="%^SourceName%^SourceExt.lst"
,
i. e. it copies the name and extension of the source file and appends .lst
to it.
If not specified otherwise, the listing is always created in the same directory
as the corresponding source file.
LIST* family of options controls what should be copied to the listing file.
The boolean option LIST=OFF will suppress the generation of listing until it is switched on again.
Default is LIST=ON
.
Notice that switching off even a minor part of listing will cause
that the listing file is no longer usable as the source file, because some parts are not
copied by €ASM from original source to the listing.
Contents of the included files is by default omitted from the listing (LISTINCLUDE=OFF
).
When this option is ON, the INCLUDE statement will be replaced by the contents of file.
LISTMACRO= controls whether the instructions from macro expansion go to the listing.
Default state is LISTMACRO=OFF
and only the invocation of
macroinstruction is presented.
EUROASM option LISTREPEAT=
is similar to LISTMACRO= with the difference that it controls listing
of statements expanded in %FOR, %WHILE and %REPEAT blocks.
When a preprocessing %variable is used in the statement and the option LISTVAR=ON
,
the statement is duplicated in the form of a machine comment just below the original statement and the expanded
text is shown instead of %variables. Factory default is LISTVAR=OFF
.
See also the description of listing file above.
UNICODE= determines the character width. This boolean option specifies if data definition of unspecified string, such as
D "an explicit string"
or ="a literal string"
should be treated as a sequence of bytes
(8-bit characters) or unichars (16-bit characters).
The system variable %^UNICODE
is checked in macros or structure definitions
which have different versions for ANSI (8-bit) or WIDE (16-bit) string encoding.
It is also consulted in macros WinAPI (32-bit)
and WinABI (64-bit)
to determine which version of Windows API function (ANSI or WIDE) should be invoked.
Some string-handling macros and WinAPI functions expect the string size be specified in characters rather than in bytes. Attribute operation SIZE# returns the size of its operand always in bytes. This can be solved by testing the system variable %^UNICODE:
aString D "String" ; Symbol aString defines 6 bytes if UNICODE=OFF or 12 bytes if UNICODE=ON. %IF %^UNICODE ; WIDE version of aString. MOV ECX, SIZE# aString / 2 %ELSE ; ANSI version of aString. MOV ECX,SIZE# aString %ENDIF ; ECX is now loaded with the number of characters in aString.A trickier but more elegant solution exploits the fact, that %^UNICODE (and all other boolean system %^variables) expands to either 0 or -1, and that shift left by negative value is calculated as shift right by the negated value. When %^UNICODE is -1, size in bytes is shifted to the right by 1 bit, which is equivalent to division by two.
aString D "String" ; Symbol aString defines 6 bytes if UNICODE=OFF or 12 bytes if UNICODE=ON. MOV ECX, SIZE# aString << %^UNICODE ; ECX is now loaded with the number of characters in aString.
This boolean option specifies if a debug version should be assembled.
When EUROASM DEBUG=ENABLED
, linker includes symbol table
and|or other debugging information to the output program.
Macros can change their behaviour depending on condition %IF %^DEBUG
.
The final release of your programs should be assembled with this option turned off.
This boolean option specifies if profileable version should be assembled. Profiling is not implemented yet in this version of EuroAssembler.
The final release should be assembled with this option turned off.
Options WARN= and NOWARN= control which informative and warning messages will be issued in the assembly process. With NOWARN= it is possible to suppress anticipated messages with identification number below 4000. Suppressed warnings have no effect on the final errorlevel. User generated warnings (U5000..U5999) and errors with higher severity cannot be suppressed.
The value of option is either a number, or a range of numbers, which shouldn't exceed 3999.
WARN= and NOWARN= operands may repeat in a statement; they are processed from left to right.
For instance EUROASM NOWARN=0600..0999, WARN=705
will supress informative
messages I0600 to I0999 except for the message I0705 which remains enabled.
The default value is WARN=0..3999
(all messages are enabled}.
Pseudoinstructions PROGRAM and ENDPROGRAM specify a block of source code, which creates
standalone output file. In most other assemblers it is the whole source file which creates
the output file, sometimes it is called modul or unit of compilation. For instance, the command
nasm -f win32 HelloWorld.asm -o HelloWorld.obj
tells NetWide Assembler to create a COFF output file
HelloWorld.obj
.
In €ASM more than one output files could be created with the command
euroasm HelloWorld.asm
, provided that there are more PROGRAM / ENDPROGRAM blocks
in HelloWorld.asm
.
The label of PROGRAM statement represents the name of output program. Although it does not define a symbol, its name must follow the rules for symbol names, that is at least one letter followed with letters and digits. The same identifier may be used as the first and only operand in the corresponding ENDPROGRAM statement.
One source may contain more program blocks and the blocks may nest. Each program block assembles to a different output file.
Symbols defined in the program are not visible outside the block. When a program needs to call a label from another program, labels must be marked as extern and public, even when both program may lay in the same source file or when one program be nested in another.
Preprocessing %variables, macro definitions and Euroasm options, on the other hand, are visible throughout the source and they can transfer information between programs at assembly time. See the sample program LockTest as an example.
The PROGRAM pseudoinstruction has many important keyword operands which specify properties
of the output file. The same keywords are used in [PROGRAM]
division of
euroasm.ini configuration file.
The values of all PROGRAM options can be inspected as system %^variables
at assembly-time. For instance in the message InfoMsg DB "This is a %^WIDTH-bit program.",13,10,0
the system variable %^WIDTH
will be replaced with the actual width of the program (16, 32 or 64),
it could be tested with %IF %^Width <> 64
etc.
Unlike EUROASM options, which involve only a part of source, PROGRAM options involve the whole program en bloc. We cannot have a half of the program in a graphic subsystem, and another half in a console subsystem, for instance. That is why optionsLISTMAP=, LISTGLOBALS=, LISTLITERALS=
are properties of pseudoinstruction PROGRAM, butLISTINCLUDE=, LISTMACRO=, LISTREPEAT=, LISTVAR=
are properties of pseudoinstruction EUROASM.
Format and file-extension of the output file is determined with this PROGRAM's parameter.
FORMAT= | Default output file extension | Default program width |
Default memory model | Description |
---|---|---|---|---|
BIN | .bin | 16-bits | TINY | Binary file |
BOOT | .sec | 16-bits | TINY | Bootable file |
COM | .com | 16-bits | TINY | DOS/CPM executable file |
ELF | .o | 32-bits | FLAT | Linux relocatable object file |
ELFX | .x | 32-bits | FLAT | Linux executable file |
ELFSO | .so | 32-bits | FLAT | Linux dynamic shared object file |
OMF | .obj | 16-bits | SMALL | Object Module Format |
LIBOMF | .lib | 16-bits | SMALL | Object library in OMFormat |
MZ | .exe | 16-bits | SMALL | DOS executable file |
COFF | .obj | 32-bits | FLAT | Common Object File Format |
LIBCOF | .lib | 32-bits | FLAT | Object library in COFFormat |
PE | .exe | 32-bits | FLAT | Windows Portable Executable file |
DLL | .dll | 32-bits | FLAT | Windows Dynamic Linked Library |
See also Program formats for more details.
This parameter specifies operating mode of the program:
EUROASM CPU=X64
should be set, too)Program width also defines the default width for all its segments. Its value is a numeric expression which evaluates to 16, 32, 64, or to 0. Empty or zero value (factory default) specifies that program width should be set internally by €ASM according to its FORMAT=. Nevertheless, when a segment is defined, it may specify a different width, regardless of the default width of its program. €ASM doesn't protest against mixing 16-bit and 32-bit segments in one module.
Memory model describes sizes and distances of code and data, and the number of code and noncode segments. The main function of memory model specification is to set the default distance for segments and procedures defined in the program.
Program property MODEL= is taken into account in procedure pseudoinstructions (PROC, PROC1)
and in control-transfer instructions (JMP, CALL, RET) without explicitly specified distance.
In monocode models (TINY,SMALL,COMPACT,FLAT) the default transfer distance is NEAR.
In multicode models (MEDIUM,LARGE,HUGE) the default transfer distance is FAR.
In monodata models (TINY,SMALL,MEDIUM,FLAT) are all data addressed relatively to the start of the data segment.
In multidata models (COMPACT,LARGE,HUGE) it is the programmers responsibility
to load the used segment register with paragraph address of the data before they are accessed.
MODEL= | Default segment properties | Link properties | Usual usage | |||||
---|---|---|---|---|---|---|---|---|
CODE distance | DATA distance | Segm. width |
Multi- code | Multi- data |
Segm. overlap | CPU mode | Used in formats | |
TINY | NEAR | NEAR | 16 | no | no | yes | real | COM |
SMALL | NEAR | NEAR | 16 | no | no | no | real | MZ, OMF |
MEDIUM | FAR | NEAR | 16 | yes | no | no | real | MZ, OMF |
COMPACT | NEAR | FAR | 16 | no | yes | no | real | MZ, OMF |
LARGE | FAR | FAR | 16 | yes | yes | no | real | MZ, OMF |
HUGE | FAR | FAR | 32 | yes | yes | no | real | MZ, OMF |
FLAT | NEAR | NEAR | 32,64 | no | no | yes | protected | ELF, PE, DLL, COFF |
Subsystem is a numeric identifier in the header of Portable Executable file.
This parameter specifies whether MS-Windows should create a new console
when the PE program starts. The default is SUBSYSTEM=CON
.
Set it to SUBSYSTEM=GUI
when your PE program
creates graphical windows rather than using the standard text input and output.
Value of subsystem is one of the enumerated tokens from the table below,
or a numeric expression which evaluates to the corresponding number.
SUBSYSTEM= | Value | Remark |
---|---|---|
0 | 0 | Unknown subsystem. |
1 | NATIVE | Subsystem is not used, i.e. device driver. |
2 | GUI | Windows GUI graphical windows. |
3 | CON | Windows console (character subsystem). |
5 | OS2 | OS/2 character subsystem. |
7 | POSIX | Posix character subsystem. |
8 | WXD | Windows 95/98 native driver. |
9 | WCE | Windows CE graphical windows. |
This parameter specifies an address where execution of the program begins. Usually this parameter contains a label whose address is set to CS:rIP when loader transfers execution to the program at run-time.
By default the ENTRY= parameter is empty; in this case €ASM
will set it to 0 if PROGRAM FORMAT=BIN
or to 256 if
PROGRAM FORMAT=COM
. This parameter should be left empty in linkable
program formats but it must be specified in executable formats, otherwise
€ASM reports error.
This parameter limits the number of assembly passes through the source code. It is €ASM who decides how many passes will be necessary, nonetheless this parameter specifies the upper limit.
EuroAssembler repeats assembly passes until offsets of symbols do not change between passes (all symbols are fixed). Then it performs the last, emitting final pass.
In very rare circumstances this may lead to an oscillation of emitted code size due to optimisation of short|near jump encodings. In this very rare case €ASM would request more and more passes forever, that is why their number is limited. When the pass number approaches %^MAXPASSES-1, this (last but one pass) is marked as fixing pass. Symbol offsets may only grow up in the fixing pass and the vacant code space is stuffed with NOP bytes. See the test t7181 as an example of oscillating code with fixing pass.
Factory default value is MAXPASSES=32. You may need to increase this option only in extremely large sources with lots of macros and conditional-assembly constructs. The maximum ever reached within my programs is 44 passes consumed in the module iiz.htm.
Parameter MAXEXPANSIONS= limits the number of %FOR, %WHILE, %REPEAT or %MACRO
block expansions. €ASM declares a numeric program property named %.
and increments its value whenever a preprocessing block
is expanded. When this number exceeds MAXEXPANSIONS value,
€ASM emits error message and prevents further expansions.
Factory default is MAXEXPANSIONS=65536.
This mechanism protects €ASM from exhausting memory resources when some incorrectly written preprocessing loop fails to exit. If your program is really big, you may need to increase MAXEXPANSIONS value.
The same expansion counter is used to maintain the value of the special automatic %variable %..
OUTFILE= specifies filename of the output of assembly - executable or linkable object file.
This filename is related to the current shell directory, if not specified otherwise.
Default value is OUTFILE="%^PROGRAM"
followed by the extension specified by FORMAT=.
E. g.: Hello PROGRAM FORMAT=MZ
will create output file "Hello.exe"
,
if not directed otherwise.
Suboperation can be applied to the name specified by this option,
for instance OutFile="MyData.bin"[1..256]
will assemble the whole
module in memory but only its first 256 bytes will be written to the output file
MyData.bin
. See also the sample program boot16.htm as an example.
STUBFILE= is only used in COFF-based exectutables - PE and DLL formats. The stub is a 16-bit MZ program which gets control when the output file is launched in a 16-bit disk operating system (DOS). Usualy its only job is to tell the user, that this program requires MS-Windows.
When STUBFILE parameter is empty (default), €ASM will use its own
built-in stub code.
Otherwise it looks for the previously compiled MZ executable.
If the STUBFILE= is specified without a path, €ASM looks for the file
in pathes specified by EUROASM option LINKPATH=
.
The user-selected 16-bit stub program may have the same functionality as the main 32-bit Windows application. Such executable file then works in the same way both in DOS and in MS-Windows. See the sample project LockTest as an example of this technique.
ICONFILE= should specify an existing file with an icon which will be
built into the resource segment of PE or DLL output file.
This icon is used to graphically represent the output file
in MS-Windows environment (Desktop, Explorer etc). Icon file is searched for
in the path specified by the EUROASM option LINKPATH=
.
Factory-default value is EUROASM ICONFILE="euroasm.ico"
which represents an icon shipped
with EuroAssembler in directory objlib
.
Option ICONFILE= applies only when no resource file is linked to the output program, otherwise it is ignored and the first icon from resources (if any) is used by Windows Explorer to represent the executable.
When the parameter ICONFILE=
is empty, no icon is used
and €ASM does not create resource section at all.
Those three options control which auxilliary information will be dumped at the end of the listing file. See t8302 as an example of ListMap and ListGlobals format.
When LISTLITERALS=ON, contents of the data and code literal sections @LT16, @LT8, @LT4, @LT2, @LT1, @RT0 will be dumped too. See t1711 for an example of ListLiterals format.
Specifies the nominal time which is provided by €ASM system variables %^DATE, %^TIME and which is embedded in some COFF-based file formats: PFCOFF_FILE_HEADER.TimeDateStamp, PFLIBCOF_IMPORT_OBJECT_HEADER.TimeDateStamp, PFRSRC_RESOURCE_DIRECTORY.TimeDateStamp.
Value of this parameter represents the number of seconds elapsed
since midnight, January 1st 1970, UTC. When it is set to -1
or left empty (factory default), it will by assigned
from system timer at the start of assembly session.
TIMESTAMP= can be used to fake the time when was the target file created.
Other PROGRAM parameters are mostly important only in COFF-family of output formats (PE, DLL, COFF) formats and they form a PE header. See [MS PECOFF] specification for detailed description. Do not change them if you don't know what you are doing.
Pseudoinstruction SEGMENT declares a memory segment and specifies its properties.
Each segment definition also simultaneously defines a section with the same name.
Other section of the segment may be declared (or switched to) later,
with an operation-less statement which has the section name in its label field, for example
[Strings] ; Declare section [Strings] in the current segment.
.
The name of segment is specified in the label field and it looks like an identifier in square brackets. Segment properties are assigned with keyword parameters.
€ASM declares automatically a few default segments when it starts to assemble a program. In most cases there is no need to explicitly declare any other segments. Number and purpose of default segments depends on program format. If these segments are not used in the program (no code was emitted into them), they will be discarded at assembly time and do not appear in the object file. This happens when programers are not satisfied with default segment names and properties and they declare new segments of their own choice, usually near the program beginning.
Parameter SEGMENT PURPOSE= specifies what kind of information is the segment intended for. It is important in protected mode (formats ELFX, PE, DLL), where descriptor's access bits control the rights granted to read, write or execute the contents of segment.
PURPOSE= | Alias | Access | Default name | Contents |
---|---|---|---|---|
CODE | TEXT | read, execute | [.text]|[CODE] | Program code (instructions) (1) |
RODATA | RDATA | read | [.rodata]|[RODATA] | Initialized read-only data (1) |
DATA | IDATA | read, write | [.data]|[DATA] | Initialized data (1) |
BSS | UDATA | read, write | [.bss]|[BSS] | Uninitialized data (1) |
STACK | read, write | [STACK] | Machine stack (1) | |
LITERALS | LITERAL | read | parasites on other data/code segment | Literal sections (2) |
DRECTVE | discarded | [.drectve] | Linker directives (3) | |
PHDR | Program headers (4) | |||
INTERP | Dynamic interpreter (4) | |||
SYMBOLS | [.symtab] | [.dynsym] | Program symbols (4) | ||
HASH | [.hash] | Hash of symbol names | ||
STRINGS | [.strtab] | [.dynstr] | [.shstrtab] | Names of symbols|sections (4) | ||
DYNAMIC | [.dynamic] | Dynamic records | ||
RELOC | [.rel(a)*] | Relocations (4) | ||
GOT | [.got] | Global Offset Table | ||
PLT | [.plt] | Procedure Linkage Table | ||
EXPORT | [.edata] | Dynamic link export (4) | ||
IMPORT | [.idata] | Dynamic link import (4) | ||
RESOURCE | [.rsrc] | Programming resources (4) | ||
EXCEPTION | [.pdata] | Runtime exceptions (5) | ||
SECURITY | Attribute certificate (5) | |||
BASERELOC | discarded | [.reloc] | Load-time relocations (4) | |
DEBUG | [.debug] | Data for debugger (5) | ||
COPYRIGHT | ARCHITECTURE | Architecture info (5) | ||
GLOBALPTR | RVA of global pointer (5) | |||
TLS | [.tls] | Thread local storage (5) | ||
LOAD_CONFIG | Load configuration (5) | |||
BOUND_IMPORT | Bound import (5) | |||
IAT | [.idata] | Import address table (4) | ||
DELAY_IMPORT | Delayed import descriptor (5) | |||
CLR | [.cormeta] | CLR metadata (5) | ||
RESERVED | Reserved (5) |
Segments with special purpose names (4),(5) will be marked in the corresponding position of DataDirectory table in the optional header of PE or DLL file format.
Although the operand PURPOSE= accepts only enumerated values, they may be combined
using the operator Addition + or Bitwise OR |, for instance
[TINY] SEGMENT PURPOSE=CODE|DATA|BSS|STACK
or [.rodata] SEGMENT PURPOSE=DATA+LITERALS
.
When this parameter is empty or not specified, €ASM will guess the segment's purpose by its class or [name], following this rules:
STACK
(case insensitive),
PURPOSE=STACK
is assumed.BSS
or UDATA
(case insensitive),
PURPOSE=BSS
is assumed.DATA
(case insensitive), PURPOSE=DATA
is assumed.PURPOSE=CODE
is assumed.PURPOSE=LITERALS
is used together with CODE and|or DATA and it only suggests
that this segment should be preferably used to host the literal sections.
If no segment is explicitly marked as PURPOSE=LITERAL, €ASM will choose the last data or code segment
defined when some literal symbol was encountered.
Purpose guessing first looks at the SEGMENT CLASS=
property,
and only if it's empty, segment name is looked at. This mechanism can be used
with segments defined in OMF object files to propagate their purpose to the linked executable.
Segment width value can be a numeric expression which evaluates to 16, 32 or 64. By default (if omitted) the width of segment is determined by the program width.
This parameter requests alignment of the segment in memory at run-time.
Default alignment is ALIGN=OWORD
(16 bytes).
Special ELF and PE segments, such as [.symtab], [.strtab], [.reloc] etc. may have different alignment.
This parameter specifies how segments from other program modules will be combined at link time. This is important only in the MZ program format (16-bit DOS executables) linked from several object files. Possible values:
The value of CLASS= in an arbitrary identifier. It may be used by the linker to guess the segment purpose (CODE|DATA|BSS) in object formats which do not carry purpose information (OMF).
This pseudoinstruction enumerates segments addressed with the same addressing frame. Data in all grouped segments are addressed with the same value of segment register.
Segment groups are applicable in big realmode 16-bit programs. Only a 16-bit segment can be a member of the group.
Name of the group must be defined in the label field of the pseudoinstruction GROUP.
The names of grouped segments are enumerated in operand fields.
All names are surrounded in braces [ ].
Group name may be the same as the name of one of its segment. Example:
[DGROUP] GROUP [DATA],[STRINGS]
.
Grouped segment may be defined before or after the GROUP statement.
This pseudoinstruction has no keyword operands.
In short, the relation between a group and its segments at link time is similar to the relation between a segment and its sections at assembly time.
The PROC and ENDPROC pseudoinstructions declare a namespace procedure block. In most times it ends with machine instruction RET, so the block can be called to perform some function. After the execution it returns back just behind the CALL instruction.
The mandatory label of PROC
declares an assembler symbol
which is the procedure name. The same identifier may be used as the first
and only operand of the corresponding ENDPROC
pseudoinstruction.
Alias ENDP may be used instead of ENDPROC.
Equally the ENDPROC
may define its own label, too.
This label doesn't represent a return from the subprogram, it points
to the code which follows PROC..ENDP block. The label of ENDPROC is useful
only when the PROC..ENDP block is used to define the namespace block
rather than a callable subprogram block. Examples:
SubPgm:PROC ; Define PROC as a call-able subprogram block. ; PROC body instructions. TEST SomeCondition JC .Abort: ; Go to return belowCALL SubPgm:
statement. TEST OtherCondition JC .End: ; Go to continue below.End: ENDP
.Probably not what the programmer wanted. ; More body instructions. .Abort: RET ; Return belowCALL SubPgm:
statement. .End: ENDP SubPgm:
NameSp:PROC ; Define PROC as a pass-through-able namespace block.
; PROC body instructions.
TEST SomeCondition
JC .End: ; Go to continue below .End: ENDP NameSp:
statement.
; More body instructions. No RET instruction here.
.End: ENDP NameSp: ; Continue below this statement.
Jumping to the ENDPROC
label differs from jumping to
macroinstruction EndProcedure
defined in
calling convention macrolibraries.
Pseudoinstructions PROC, ENDPROC, PROC1, ENDPROC1
do not emit any machine code.
What are procedures good for? We could manage to write an assembly program without PROC..ENDP pseudoinstructions easily but wrapping the block of code in PROC..ENDPROC block has some advantages:
- The code is better structured and easier to understand.
- €ASM checks the propper pairing of labels, which is important when procedures are nested.
- It makes obvious where the procedure ends. You don't have to inspect the code to find out which RET instruction is the last in a procedure when you want to copy'n'paste it to another program (of course it is a bad programming practise to have more than one return point in a subroutine but sometimes it is used).
- Each PROC block defines its own namespace, preventing naming conflicts between local labels used in procedures.
Pseudoinstructions PROC and PROC1 accept keyword operands DIST= and ALIGN=.
DIST= sets the distance of the procedure (NEAR or FAR).
When DIST=FAR
, all CALL to this proc default to FAR, and all RET within this proc default to FAR
(of course this can be overriden with instruction suffix CALLN/CALLF, RETN/RETF).
The default parameter value depends on the program's memory model.
Procedure alignment is ALIGN=BYTE
by default.
For the best usage of instruction cache it sometimes may be useful to complete
frequently called procedures with
PROC ALIGN=OWORD
, if code size is not an issue.
This boolean option allows you to switch off the internal check of PROC..ENDPROC label matching. This has only exceptional use in macros simulating built-in pseudoinstruction, which need to hack their block context, such as Procedure and EndProcedure.
See also the instruction modifier NESTINGCHECK=.
Pseudoinstruction PROC does not accept ordinal parameters. Parameters can be passed in registers or machine stack and managed individually. Calling convention macrolibraries shipped with EuroAssembler define macros Procedure and EndProcedure with similar function as PROC and ENDPROC, which allow to pass arbitrary number of arguments as macro parameter when the Procedure is invoked.
Pseudoinstructions PROC1 and ENDPROC1 are equivalent to PROC and ENDPROC with two major differences:
A procedure declared with PROC1..ENDPROC1 block may occur in the program more than one time. Repeated declarations of PROC1..ENDPROC1 block with the same label are ignored, it is only emitted once.
This predetermines PROC1 for semiinline macros, which contain both 1) the call of a procedure, and 2) the procedure itself. When the procedure is defined with PROC1..ENDPROC1, such macro can be invoked many times but the called procedure will be assembled and emitted only once (during the first macro expansion).
[@RT1]
and it is automatically created in the segment with PURPOSE=CODE+LITERAL
or in the lastly defined code segment. In some circumstances €ASM may also use runtime sections
[@RT2]
, [@RT3]
etc. This happens when the code inside
the PROC1..ENDPROC1 block contains other semiinline macros, so the current
runtime section already is [@RT1]
and €ASM must choose another one.
Emitting procedures to a different section, than the main program currently uses, has an advantage that the procedure body needs not to be bypassed with jump instruction. It also leads to shorter code because jumps over the semiinline macros need not to jump over the whole procedure body, which could make them exceed 128 distance easily and that would require using longer form of jump instructions.
Pseudoinstructions HEAD and ENDHEAD just claim a division of source code. This division may be included to other source files with INCLUDEHEAD or INCLUDEHEAD1. The block usually contains the interface of programming objects (definition of structures, macros, constants) which needs to be included in other separately assembled programs.
Label field of pseudoinstruction HEAD may be used as a block identifier but it does not create a symbol. More than one HEAD..ENDHEAD block can be specified in a source file. When these blocks are nested, the whole outer (larger) block will be included.
Languages which do not have implemented this mechanism require to put interface part in separate header files. With HEAD..ENDHEAD they can be kept together with the implementation body in one compact file.
This pseudoinstruction incorporates file(s) with the name(s) specified as its operand to the source text. The INCLUDE statement is virtually replaced with the contents of included file.
Inclusion may be nested, i. e. included files may contain other INCLUDE statements.
Double quotes may be omitted when the filename contains only alphanumeric characters (no spaces or punctuation).
The pseudoinstruction INCLUDE can have unlimited number of operands, for example
INCLUDE "Win*.htm", ./MyConstants.asm, C:\MyLib\*.inc
.
When the file is specified without a path, it will be searched for in folders specified
with EUROASM option INCLUDEPATH=.
If the included filename contains at least one slash, backslash or colon
/ \ : , this means that it has specified its own path and the
INCLUDEPATH=
is ignored in this case.
The filename may contain wildcards * ?, in this case €ASM will include all files conforming this mask. The order of inclusion depends on operating system.
Behaviour of INCLUDE statement is described in the following table:
Path | Wildcard | Example | When the first file is found | When no file is found |
---|---|---|---|---|
No | No | file.inc | Done, stops further searching in INCLUDEPATH. | Error E6914. |
Yes | No | ./file.inc | Done. | Error E6914. |
No | Yes | file*.inc | Continue searching for more files in INCLUDEPATH. | Nothing is included, no error. |
Yes | Yes | ./file*.inc | Continue searching for more files in the given path. | Nothing is included, no error. |
Only a part of source file can be included when substring
or sublist operator immediately follows the file name.
Example: INCLUDE "file.inc"{%&-20..%&}
will include
the last twenty lines of file.inc
(automatic %variable %&
represents the number of lines in the file).
Filename must be in double quotes when the suboperation is used.
If the suboperation is used on wildcarded filename, it will be applied to all files.
The pseudoinstruction include once behaves exactly like INCLUDE
but first it
looks if the same file (with the same size and contents, regardless of their names)
was already included in the program, and skips the file in this case.
Using INCLUDE1 instead of INCLUDE allows to resolve mutual dependencies
of source libraries. When some included library uses macros, structures and constant definitions
from another library, we can do INCLUDE1 another.library
in each such library.
The INCLUDEHEAD variant includes only the contents of HEAD..ENDHEAD block(s) of the included file, see the test t2420. An error is reported if no such block is found in the file or if the block is incomplete (missing ENDHEAD). When a suboperation is used with INCLUDEHEAD, it is applied first to the entire included file and HEAD..ENDHEAD block is searched for in the subrange only.
The INCLUDEHEAD1 and INCLUDE1 will ignore the source if the file or any part of it has already been included in the program using INCLUDE, INCLUDE1, INCLUDEHEAD or INCLUDEHEAD1.
Library is treated as already-included when it was included as an entire file with INCLUDE or INCLUDE1, when its interface division was included with INCLUDEHEAD or INCLUDEHEAD1, or when only a suboperated part of it was included.
Unlike INCLUDE and INCLUDEHEAD, this pseudoinstruction does not treat the file contents as a source to assemble, but the contents is emitted as is at the position specified by the offset pointer $ of current section.
Including binary data should not be misplaced with linking; it does not update relocatable addresses or external symbols. For instance the statement
INCLUDEBIN "C:\WINNT\Media\chimes.wav"[0x2C..]
will skip the first 0x2C bytes of WAV header in sound file and load the rest (raw samples) to the assembled target, as if they were defined with DB statements.See also t2470.
Pseudoinstruction LINK specifies file(s) which should be linked into the current program.
Each ordinal operand represents a file name, which may have wildcards and may be specified with or without path. Relative path refers to the current directory.
If the linked file name does not contain path, it will be searched for
in all directories specified with EUROASM LINKPATH=
option, respectively.
Unlike included files, suboperations with linked files are not supported.
Linkable files have specific internal structure, which probably would have been damaged if only suboperated part of the file were subjected to the link process. Therefore only whole object file or library can be linked.
Position of the LINK statement within the program is not important, the actual
linking will be performed when the final program pass is about to end.
Order in which the files are linked respects the order in which pseudoinstruction LINK
appeared in source. However, if linked files are specified with wildcards, e.g. LINK "modul*.lib"
,
their order depends on current filesystem and cannot be reliably predicted. Example:
LINK Subproc.obj, "..\My modules\W*.obj"
See static linking for more info.
Pseudoinstructions GLOBAL, PUBLIC, EXTERN, EXPORT, IMPORT
set the
scope property of symbol(s), which is used in linking.
The symbol, whose scope is being declared, may be in the label field or in the operand field of the statement, or in both. More than one symbol may be declared with one statement. Symbols in question may be forward or backward referred.
Example: Explicit scope declaration of four symbols: Sym1 PUBLIC Sym2, Sym3, Sym4
Specifying the symbol as PUBLIC just tells €ASM that the symbol, which was or will be defined somewhere else in the program, should be referrable from other statically linked programs. Public declaration does not create the symbol yet, in fact symbol with that name must be defined somewhere else in the same program.
Pseudoinstruction EXTERN symbol
tells €ASM that the symbol is not defined
in the program, so references to its address must be patched in the code at link time.
It is an error to define symbol which is declared as EXTERN in the same program.
Instead, it is searched for in other modules at link time,
and only the linker may report an error when the external symbol is not found.
Pseudoinstruction GLOBAL can be used to automatize dealing with PUBLIC and EXTERN scopes. If the symbol is marked with GLOBAL statement, it behaves either as public or external, depending whether or not it is defined in the same program.
As the programmer surely knows whether the declared symbol belongs to the current program or not, so why is the declaration of PUBLIC and EXTERN scope duplicated by GLOBAL? Lets have a program PgmA which defines the public symbol SymA and refers external symbol SymB. Similary PgmB defines SymB and refers SymA:PgmA PROGRAM PUBLIC SymA EXTERN SymB CALL SymB: ; Reference to external symbol. SymA: RET ; Definition of public symbol. ENDPROGRAM PgmA PgmB PROGRAM PUBLIC SymB EXTERN SymA CALL SymA: ; Reference to external symbol. SymB: RET ; Definition of public symbol. ENDPROGRAM PgmBIf we replace PUBLIC and EXTERN declarations with GLOBAL, the same declaration statement can be used in all statically linked programs, either copy&pasted or included from external file, which is easier to maintain:PgmA PROGRAM GLOBAL SymA, SymB CALL SymB: ; Reference to external symbol. SymA: RET ; Definition of public symbol. ENDPROGRAM PgmA PgmB PROGRAM GLOBAL SymA, SymB CALL SymA: ; Reference to external symbol. SymB: RET ; Definition of public symbol. ENDPROGRAM PgmBAnother raison d'être of GLOBAL is backwards compatibility with NASM, which doesn't know the directive PUBLIC at all. NASM uses the directive GLOBAL instead whenever €ASM would require PUBLIC.
Scopes IMPORT and EXPORT are used in dynamic linking,
when our program calls an imported function from DLL.
This pseudoinstruction accepts keyword parameter LIB=
which specifies the library file. The LIB= parameter may be omitted
when the symbols are imported from the default MS-Windows library kernel32.dll
.
Library file name doesn't have to be in quotes when it follows DOS convention 8.3.
The library is always specified without a path. Operating system uses its own rules
([WinDllSearchOrder])
concerning directories where are the libraries searched for at bind-time.
Scope EXPORT is used when we make a dynamic library and it declares symbols which are expected to be imported by other programs. Similar to the PUBLIC scope, symbol marked for EXPORT must be defined in the program, sooner or later.
Pseudoinstruction EXPORT accepts two keyword parameters
FWD=
and LIB=
, which specify that the exported symbol (function name)
is in fact provided by another dynamic library (defined with LIB=) under a different
symbol name (defined with FWD=). Example:
kernel32 PROGRAM FORMAT=DLL EXPORT EnterCriticalSection, LIB="NTDLL.dll", FWD=RtlEnterCriticalSection ; Other kernel functions. ENDPROGRAM kernel32
Library "kernel32.dll" yields API function RtlEnterCriticalSection, which is in fact provided by the library "NTDLL.dll". In other Windows version it may be provided by a different library "XPDLL.dll" but programs importing the function from a proxy library "kernel32.dll" need no update or recompilation.
This pseudoinstruction is used for explicit
alignment of current section pointer $. For instance
ALIGN OWORD
in code section will emit several (0..15)
bytes of NOP operation, so that the next statement will be emitted
at octword-aligned address.
ALIGN in data sections uses NUL byte (0x00)
instead of NOP (0x90) as a stuff.
The operand can be a type specifier in short or long notation:
B, U, W, D, Q, T, O, Y, Z, BYTE, UNICHAR, WORD, DWORD, QWORD, TBYTE, OWORD, YWORD, ZWORD
or arithmetic expression which evaluates to the power of two:
1, 2, 4, 8, 16, 32, 64, 128, 256, 512
.
ALIGN TBYTE
aligns to 8.
ALIGN statement may have no label but it can have two operands.
The second operand is used for intentional unalignment, it needs not to be the power of 2
and it must be lower than the first one.
For instance ALIGN OWORD, QWORD
alignes $
to an odd multiple of 8.
ALIGN 8,2
requests the current offset be set at
the second byte in qword (counted from zero).
Example of offsets which meet such requirement are 2, 10, 18, 26...
A structure represents a virtual section of data declarations which can be used as a mask or a grid-template laid over a piece of memory. Structure is declared with the STRUC..ENDSTRUC block. The only statements which may be used within the block are
Declaration of a structure does not emit any data to the target file.
Data are emitted or reserved only when the declared structure is actually
used in a data definition (in pseudoinstruction D or DS).
We say that the structure is instatiated.
When initialized data is defined in the structure declaration,
it will be used to initialize corresponding members
at the time of structured data definition (with pseudoinstruction D or DS),
unless explicitly redefined.
Named data definitions in the structure must have local names
(starting with .)
This alows to:
Each member is given its offset relative to the start of the structure. The program section, which was current at the time of structure declaration, is irrelevant. Each structure declaration temporarily creates its own pseudosection with a zero based virtual address 0.
Structure must be given an unique structure name, which is defined in the label field of a STRUC statement and, optionally, in the operand field of the ENDSTRUC statement.
The size of the structure can be obtained with the attribute SIZE#Structure_name
.
Pseudoinstruction STRUC accepts the keyword operand ALIGN=, which specifies alignment
of instances of the structure when EUROASM AUTOALIGN=ON
.
If the alignment is not explicitly specified with STRUC declaration, alignment corresponding to
PROGRAM WIDTH=
is used as the default (WORD, DWORD or QWORD).
See tests t2500, t25010, t2504 for more examples of structure declaration.
Both initialized and uninitialized data are defined and reserved with
pseudoinstruction D. When a static value is specified, the data are defined.
When the value is omitted, data are reserved.
If EUROASM option AUTOSEGMENT=ON
, INSTR data definition will switch to code section,
all other data definition will switch to data section
and data reservation will switch to bss (uninitialized data) section.
See t2482 for more examples.
Each operand of D
is a data expression.
Pseudoinstruction mnemonic D
may be appended with
suffix B, U, W, D, Q, T, O, Y, Z, I, S
. Suffix defines the default datatype,
which is used if it's not explicitly specified in operand.
For instance DD 2,3,4
defines three dwords with static values 2, 3 and 4.
Suffix also determines datatype of symbol, which defines the data.
For instance in definition Sym1 DQ B 1, W 2, D 4
the suffix specifies that
the datatype of Sym1 is QWORD, although it defines only byte, word and dword data.
The default datatype specified with mnemonic suffix can be overridden in operand fields
by an explicit datatype in short or long notation.
Operands without explicit redefinition take the default data type from D-suffix,
for instance DB 27, "$", W 120
defines two bytes followed with one word.
Datatypes in the operand may be specified with long names as well,
e.g. DB 27, "$", WORD 120
.
See t2481 for more examples.
For instance TranslateTable: D 256 * BYTE
reserves 256 uninitialized bytes.
If duplication is not used, it defaults to 1. A negative duplicator is not permitted.
Duplicator 0 does not define or reserve any data, but still it provides default datatype
of the symbol and, if AUTOALIGN=ON
, it aligns the curent offset $.
If no suffix is used, the default datatype is taken from the first nonempty operand,
e.g. D D 2,3,4
defines three dwords with static values 2,3 and 4.
When no default is defined, as in D 2
, €ASM reports an error.
The only exception, when the datatype needs not to be explicitly specified, is the definition of a text string,
for instance D "Some text."
. In this case the default datatype is B or U,
which depends on the current value of EUROASM option UNICODE=
.
L1: D B 5 ; Define one byte with value 5. TYPE#L1='B', SIZE#L1=1. L2: D 2*WORD 3 ; Define two words with value 3. TYPE#L2='W', SIZE#L2=4. L3: DW W ; Reserve one word. TYPE#L3='W', SIZE#L3=2. L4: DW 0*D ; Reserve nothing, align to DWORD. TYPE#L4='W', SIZE#L4=0. L5: DQ ; Reserve nothing, align to QWORD. TYPE#L5='Q', SIZE#L5=0. L6: D ; Do nothing. TYPE#L6='A', SIZE#L6=0.
Unlike other assemblers, omitted operand doesn't emit any data, €ASM requests that operand type and|or value be specified, no matter if the D operation is suffixed or not. For instanceDB
reserves one byte in MASM but it does nothing in €ASM. UseD B
orDB B
instead.
EuroAssembler can define operation code of machine instruction as data, with pseudoinstruction DI. It is similar to DB or DU but the string contents is not emitted verbatim, it is assembled first. The quoted text in DI operand(s) should be a valid machine instruction, it may have prefix and operands but not a label.
For instance DI "SEGES:MOVSB"
defines bytes 0x26,0xA4.
D 8*I"MOVSD"
defines eight bytes 0xA5.
See t2515 for more DI examples.
A structured memory variable is defined with pseudoinstruction
DS struc_name
or just D struc_name
.
€ASM does not allow multiple ordinal operands when a structured object
is defined, such as
.
Nevertheless, duplication is supported, e. g. DS MyStruc1, Mystruc2 DS 4*MyStruc
.
Members of the structured object can be overriden statically, using keyword operands. Keyword name is the local name of defined member, immediately followed with equal sign = and with the new value of statically defined member. Namespace of operand fields in DS statement is temporarily changed to the namespace of structure definition.
The instance of MyStruc declared above in a STRUC example
could be for example defined as MyObject DS MyStruc, .Member2=2, .Member4=4
.
This initializes the contens of MyObject.Member2 to dword integer 2, and
the contents of MyObject.Member4 to byte integer 4.
Contents of MyObject.Member3 is already statically defined as byte integer 255,
other members of MyObject remain uninitialized.
If at least one member is initialized, the object is by default emitted to
data section, uninitialized members are filled with zeroes.
See also test t2510.
Pseudoinstruction EQU (or its alias =) defines a symbol, which is presented in the label field. The statement must have just one operand, which specifies the address or the numeric value of the symbol.
Instruction Label:EQU $
or Label:= $
are equivalent to Label:
, i.e. specifying
the statement with label only, which assigns an address to the symbol Label.
Using EQU is the only way how to define
a plain numeric symbol, such as FILE_ATTRIBUTE_ARCHIVE = 00000020h
.
See any macrolibrary within PROGRAM realm as an example of EQU symbol definitions, for example winsfile.htm.
Those pseudoinstructions define block comments, i.e. range of source code which is ignored by €ASM. In the label field of %COMMENT there may be an identifier, which gives the block a name (but it does not create a symbol). The same identifier can be used as the first operand of %ENDCOMMENT statement. This helps €ASM to check correct matching of %COMMENT & %ENDCOMMENT, especially when the comment blocks are nested.
%DROPMACRO tells €ASM to forget previously defined macroinstruction.
One %DROPMACRO statement may drop one or more macros specified as operands, e.g.
%DROPMACRO Macro1, Macro2, Macro3
.
Alternatively we may drop all macros declared so far with
%DROPMACRO *
.
See also %DROPMACRO example below.
Instructions between %IF and %ENDIF is assembled only if the condition in the first and only %IF operand is evaluated as true. %IF accepts extended boolean expression and it also accepts an empty operand, which is always evaluated as false.
Pseudoinstruction %ELSE may occur in the %IF..%ENDIF block. It reverses the logic of assembly: instructions between %IF and %ELSE are assembled when the %IF condition is true and instructions between %ELSE and %ENDIF are assembled when the %IF condition is false.
%IF may have an identifier in the label field which does not create a symbol but it identifies the block. The same identifier can be used in the operand field of %ELSE and %ENDIF statements.
Pseudoinstructions %FOR and %ENDFOR create block which is assembled repeatedly for each operand of the %FOR statement. The label field of %FOR statement must be an identifier. It does not create a symbol, instead it defines a formal preprocessing %variable which is accessible in the %FOR..%ENDFOR block only. The name of this %variable consists of percent sign followed with the identifier.
Operands can be arbitrary elements which we need to operate with: register, number, expression, string. The formal %variable will be assigned with each %FOR operand respectively, and the block will be emitted with its value in the formal %variable. The following example defines %FOR loop with three operands and it emits three memory variables:
data %FOR "a", 3*B(5), "Long text" D %data %ENDFOR dataand it will be expanded to |00000000:61 + D "a" |00000001:050505 + D 3*B(5) |00000004:4C6F6E672074657874 + D "Long text" |0000000D: |
Repeating the identifier in the operand field of %ENDFOR and %EXITFOR statement is optional and it can be used to check proper pairing of block instructions.
The operand of %FOR can also be a numeric range, the block is repeated with each integer value of the range in this case. Slope of the range can be negative; default step of control %variable is -1 in this case instead of +1.
i %FOR 0..5 ; Slope is positive, therefore implicit step = +1. DB "A"+%i ; Define bytes "A","B","C","D","E","F". %ENDFOR i j %FOR 'z'..'x' ; Slope is negative, therefore implicit step = -1. DB %j ; Define bytes 'z','y','x'. %ENDFOR j
See also t2640.
%FOR accepts keyword integer operand STEP=
which explicitly defines how is the control %variable incremented
when a range is used.
The default value is zero (STEP=0
), which is a special case:
the actual effective step is then either +1 or -1,
depending on the range slope.
Both kind of operands (enumerated and range) can be combined. When the step is explicitly defined and its sign differs from the range slope, the %FOR..%ENDFOR body is not assembled. On the other hand, if STEP= is omitted or set to 0, ranges with both slopes can be combined in one %FOR statement and each range-operand will receive its own appropriate step +1 or -1. Example:
a %FOR 1..3, 6..4, 7 ; Block is assembled with %a = 1,2,3,6,5,4,7. %ENDFOR b %FOR 0..64, 256, 400..300, 512, STEP=16 ; Block is assembled with %b = 0,16,32,48,64,256,512. %ENDFOR
When the formal %FOR variable has identical name with another previously user-defined %variable, it prevails and the user-defined %variable is not visible inside the %FOR..%ENDFOR block. See t2641.
When €ASM encounters %EXITFOR
pseudoinstruction,
it breaks the assembly of remaining instructions in %FOR..%ENDFOR block
and continues below the %ENDFOR statement, no matter how many unprocessed %FOR operands
is left.
i %FOR 0..9 DB %i %IF %i>=3 %EXITFOR i %ENDIF DB "a" + %i %ENDFOR i ; This will define bytes 0,"a",1,"b",2,"c",3
In nested %FOR..%ENDFOR blocks the formal variable (%EXITFOR's first and only operand) can be used for specification which of the nested block should be exited, see t2642 as an example.
The block of statements between %WHILE and %ENDWHILE is being assembled repeatedly while the condition in the first and only %WHILE operand is true. If the condition is false at the block entry, it is skipped entirely.
An identifier may be used in the label of %WHILE and in the operand of %ENDWHILE and %EXITWHILE just for visual binding; it does not define a symbol.
Unlike %FOR, which temporarily declares and maintains its own control %variable, the %WHILE does not. It is the programer's duty to declare some control %variable outside the block, and to change it within %WHILE..%ENDWHILE. Example:
%i %SETA 3 ; Define %variable %i which will control the block expansion. id1 %WHILE %i C%i: DB %i %i %SETA %i - 1 ; Alternate the user-defined control %variable. %ENDWHILE id1 ; Statements assembled with %WHILE..%ENDWHILE block: C3: DB 3, C2: DB 2, C1: DB 1.
%EXITWHILE in the block will cause skipping the rest of statements; €ASM will continue below %ENDWHILE.
The conditional assembly block %REPEAT..%ENDREPAT is similar to
%WHILE..%ENDWHILE but the condition is evaluated at the end of block,
and the logic is inverted. %REPEAT takes no label and no operand.
The statements in the block are always assembled at least once.
The control condition is in the operand field of %ENDREPEAT;
if it evaluates to false, €ASM will assemble the block repeatedly.
Alias %UNTIL
may be used instead of mnemonic %ENDREPEAT
.
Block %REPEAT..%ENDREPEAT can use identifier for nesting check. Unlike other block statements, position of the block identifier is different: Block identifier can be specified as the first operand of %REPEAT, and as the label of %ENDREPEAT (alias %UNTIL).
%i %SETA 3 ; Define %variable %i which will control the block expansion. %REPEAT Id1 C%i: DB %i %i %SETA %i - 1 ; Alternate the user-defined control %variable. Id1 %UNTIL %i = 0 ; Statements assembled with %REPEAT..%UNTIL block: C3: DB 3, C2: DB 2, C1: DB 1.
%EXITREPEAT in the block will cause skipping the rest of statements; €ASM will continue below %ENDREPEAT.
Pseudoinstruction %SET and other members of its family are designed to assign a value to preprocessing %variable. This %variable is in the label field of the statement.
%SET assigns the whole list of operands as a verbatim text, including the commas which separate operands from one another. White spaces between the operation mnemonics (%SET) and the first operand are omitted. White spaces after the last operand are trimmed off, too. White spaces are similary trimmed when line-continuation is used.
%CardList %SET Hearts, Diamonds, Clubs, Spades ; Comment
%CardList now will contain the string Hearts, Diamonds, Clubs, Spades (31 characters including spaces and commas).
See also t2810.
%SETA accepts arithmetic expressions. They will be evaluated and assigned to the %variable as a signed decimal number. An error is reported if the %SETA operand is not a valid expression.
When more than one operand is used, each value is set to the corresponding comma-separated item of the %variable, which is being assigned. Example:
%Value %SETA PoolEnd - PoolBegin %Sizes %SETA 2+3, 4, ,-5*2
The difference between offsets PoolEnd and PoolBegin
in previous example was calculated and assigned to %Value as a decadic number.
%Sizes now contains the text 5,4,,-10 (8 characters).
Individual items of %Sizes can be retrieved with sublist
operation, such as %Sizes{2}
.
See also t2821.
%SETA is better suitable for modification of control %variable in preprocessing loop, such as%i %SETA %i+1
. Though text assignment%i %SET %i+1
would work here as well, with %SET is the expression not evaluated immediately and we might wind up with something like+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1
after 15th expansion.
%SETB is similar to %SETA, it accepts extended boolean expressions and assigns them in the form of binary digits 1 or 0.
See also t2831.
Unlike with %SETA, the binary digits are not separated with commas when more than one operand is used in %SETB statement. Items of assigned variable can be retrieved with substring operation. Example:
%TooBig %SETB 5 > 4 ; %TooBig is assigned with one character 1 (true). %Flags %SETB %TooBig, 2,,3>2,off,4,, ; %Flags are assigned with 110101. %IF %Flags[1] ; True, equals to 1st member of %Flags, i. e. %TooBig, i. e. 1. Flags: DB %Flags[]b ; Memory variable contains 00110101b.
%SETC accepts expression in its operand, which must evaluate to a plain number not above 255 and not lower than -128. The result will be assigned as one character with evaluated ASCII byte value. Example:
%Quote %SETC """" ; One character "quote" (ASCII 33) is assigned. %Tab %SETC 9 ; One character "tabelator" (ASCII 9) is assigned. %NBSP %SETC -1 ; One character (ASCII 255) is assigned.
Similar with %SETB, multiple operands may be defined in %SETC and the resulting characters are not separated with commas.
%Hexadigits %SETC 'A','B','C','D','E','F' ; %Hexadigits now contains six characters ABCDEF
See also t2841.
%SETC allows to assign special characters to preprocessing %variable, which couldn't be possible to assign as a plain text with %SET due to €ASM parser syntax rules.%Space %SETC 32
assigns one space. This could also be achieved with%QuotedSpace %SET " "
and suboperating only the second of three assigned characters:%Space %SET %QuotedSpace[2]
.
This pseudoinstruction reads environment variable from the operating system at assembly time and assigns its value to the preprocessing variable. Name of environment variable specified in the operand field(s) is cited without quotes, percent signs or dollar sign, e.g.
%OS %SETE OS Msg: DB "This program was assembled at %OS system."
€ASM reports warning W2520 when the requested variable is empty or not defined.
%SETE allows to retrieve more than one environment %variables, their values will be assigned as unquoted and comma-separated. Example:
%CpuInfo %SETE PROCESSOR_ARCHITECTURE, PROCESSOR_IDENTIFIER, \ PROCESSOR_LEVEL, PROCESSOR_REVISION
On my old computer this will assign following text to %CpuInfo:x86,x86 Family 15 Model 1 Stepping 2, GenuineIntel,15,0102
. Due to comma character inserted by Windows into the value of %PROCESSOR_IDENTIFIER% it wouldn't be easy to retrieve individual components from such concatenation with sublist %CpuInfo{4}. So it is usually better to use %SETE for only one environment variable.
%SETS looks at the %variable in its operand field and assigns its size, i.e. the number of bytes which its value occupies.
%SomeVar %SET ABC, DEF %SomeSize %SETS %SomeVar ; %SomeSize is now 8 (3 letters + comma + space + 3 letters). %SizeOfSomeSize %SETS %SomeSize ; %SizeOfSomeSize is now 1 (one digit).
%SETS must have just one operand, which looks like a preprocessing %variable (percent sign followed with an identifier).
See also t2861.
%SETL is similar to %SETS except that is assigns length of the %variable contents, i.e. the number of comma-separated items in the %variable contents.
%SomeVar %SET ABC, DEF %SomeLength %SETL %SomeVar ; %SomeLength is now 2 (2 comma separated items). %LengthOfSomeLength %SETL %SomeLength ; %LengthOfSomeLength is now 1 (one item).
%SETL must have just one operand, which looks like a preprocessing %variable (percent sign followed with an identifier).
See also t2866.
Consider assembly of the statement %Var1 %SET %Var2
.
€ASM first expands the %Var2 and the result of expansion is then
assigned to %Var1. First two tokens of the statement are not
expanded, because %Var1
is the target which is just being assigned,
and %SET
is reserved name which is never expanded.
%SET2 is similar to %SET except that the operand field is expanded 2 times before being assigned. Each expansion "swallows" one percent sign.
%V1 %SET "A" %V2 %SET "B" %V3 %SET "C" i %FOR 1..3 %DataExp %SET2 %%V%i DB %DataExp %ENDFOR i ; Emit DB "A", DB "B", DB "C".
See also t2871.
Only special macros make use of %SET2, for instance EndProcedure where it is used to expand %variable with not-known-yet dynamically changing name.
When a pseudoinstruction of SET* family is being assembled,
€ASM does not expand label field and operation field
of statements such as %Label %SET* anything
.
This applies to %SET, %SETA, %SETB, %SETC, %SETU, %SETE, %SETS, %SETL,
%SET2 but not to %SETX. In this statement the label field is
expanded, too.
After the expansion of label field %SETX works like ordinary %SET, which means that
it requires a valid %variable name in the label field.
For instance %%Var1 %SETX ABC
is equivalent to %Var1 %SET ABC
.
Using %SETX we can assign %variables whose names are not explicitly set at the assembly time and they dynamically change. Example:
i %FOR 1..4 %%M%i %SETX %i ; Identical with %M1 %SET 1, %M2 %SET 2 etc. %ENDFOR ; This will assign values 1,2,3,4 to preprocessing %variables %M1,%M2,%M3,%M4.
See also t2881.
Only special macros make use of %SETX, for instance Procedure where it is used to assign stack-frame addresses to %variables, whose names are not-known-yet at macro-write time.
Block of statements claimed with pseudoinstructions %MACRO and %ENDMACRO is called
macro declaration. Identifier in the label field of
%MACRO statement is the name of macro.
%MACRO statement itself is called macro prototype,
as it declares macro name and gives names to macro arguments.
Once declared, macro can be expanded many times it the
program.
When €ASM reads the macro declaration in source text, it does not emit any code. Instructions from the macro body will be emitted only when the macro is actualy expanded with its macroinstruction.
%EXITMACRO allows to break the emitting process if it is encountered, usually when some error condition was detected.
Both %EXITMACRO and %ENDMACRO pseudoinstructions may have the macro name in the operand field in order to emphasize the block matching.
Example of a macro declaration and a macro expansion:
AlignEAX %MACRO ; Round-up the contents of EAX to a multiple of 4. ADD EAX,3 AND EAX,-4 %ENDMACRO AlignEAX MOV EAX,13 AlignEAX ; After macro expansion EAX contains 16.
For more information see also the chapter MacroInstructions.
Pseudoinstruction %SHIFT is usable in macro block only. It will decrement the ordinal number of all macro operands by one or by the integer, which it has in the operand field. %SHIFT may have no label and only one operand which evaluates to a plain integer number. Default 1 is assumed when the operand is omitted.
%SHIFT 0
does nothing.
Shifting by negative number will inverse the direction.
Effect of the operation is limited only when macrooperands are accessed by their ordinal number, such as %1, %2 etc. Accessing operands by formal names remains unaffected by %SHIFT operation.
Operands, which are left-shifted from ordinal position %1 to position zero or negative, are not accessible by ordinal number any longer, but they are not lost forever, as they may be shifted back by a negative number.
| |Sample %MACRO Oper1, Oper2, Oper3 | |L1: DB %1, %Oper1 | | %SHIFT 1 | |L2: DB %1, %Oper1 | | %SHIFT 2 | |L3: DB %1, %Oper1 | | %ENDMACRO Sample |0000: | |0000: |Sample 0x44, 0x55, 0x66, 0x77 | +Sample %MACRO Oper1, Oper2, Oper3 |0000:4444 +L1: DB %1, %Oper1 | + %SHIFT 1 |0002:5544 +L2: DB %1, %Oper1 | + %SHIFT 2 |0004:7744 +L3: DB %1, %Oper1 | + %ENDMACRO Sample |0006: |See also t7221.
Pseudoinstruction %ERROR will insert an user-defined error message into the listing file and to the message output. The message is similar to those emitted by €ASM itself when it founds some mistake in the source text. %ERROR is often used in macroinstructions and it usually warns the programmer that the macro was not used in the intended way.
User defined errors have severity code U and
severity level 5, which is somewhere between warnings and assembler errors. The programmer may specify
the actual message identifier with optional keyword operand
ID=
which can be a plain decimal number between
5000 and 5999.
%ERROR will also accept identifier with value 0..999
and it adds internally 5000 in this case. Default value is 0,
so the user defined message has identifier U5000, if no keyword operand ID=
is used.
The message text does not have to be in quotes. If the message text consists from more than one ordinal operands, they will be concatenated verbatim, including quotes, if used. Example:
%ERROR Id=5123, Something went wrong. Try again.
See also t2581 for more examples.
Pseudoinstruction %DISPLAY is used for retrieving information
about internal objects created by €ASM during the assembly process.
Each such object is displayed in the form of debug message with severity level
1. The message is printed both to output console (in each pass)
and to the listing file (in the final pass).
%DISPLAY is active even in non-emitting source passages, such as false %IF branch or block disabled with %COMMENT.
It is intended to investigate €ASM internals when something is not working as expected.
Pseudoinstrucion %DISPLAY accepts arbitrary number of operands – object categories, which specify
the kind of objects that we want to review. Categories may be provided as ordinal operands or as keyword operands
with value which specifies the filter. Filter can restrict the amount of displayed lines.
Category names are case insensitive but the filtering value,
if used, is case sensitive. Filter value defines first few characters of those object names, which we want to display.
Filter value may be terminated with asterix *, but this is not mandatory.
For instance the statement %DISPLAY Macros=Alig
will display all macros
whose names begin with "Alig".
Operands of pseudoinstruction %DISPLAY have rather relaxed syntax.
Object categories (ordinal operand name or keyword name) may be shortened, too. Only this number of characters
is required which is enough to identify the desired category.
For instance %DISPLAY se
will display map of all segments and their sections.
%DISPLAY File
displays the list of input files (main source and included libraries).
%DISPLAY sym=Num*, sym=En
will list only those symbols, whose name begins with Num
or En.
%DISPLAY UserVar
, %DISPLAY UserVar=*
and %DISPLAY user=
work equally
(empty filter value will match any %variable name).
Nonfilterable categories, such as segments, context stack, automatic macro %variables, will always
display their complete list, any filtering value is ignored.
When specifying user-defined and system %variable names as the filtering value, the leading percent sign % or %^
may be omitted, or the percent sign must be doubled (otherwise it would have been expanded to its current contents).
%DISPLAY UserVar=Loc
%DISPLAY us=Loc*
and %DISPLAY user=%%Loc
are equal in their function: they display the current contents of user-defined preprocessing %variables whose name
begins with %Loc.
%DISPLAY operand | Messages | Filter | Order | Displayed objects |
---|---|---|---|---|
All | D1100..D1900 | yes | alphabetical | All objects specified below (shortcut for Fil,Ch,Se,St,Co,Sym,L,Rel,M,V). |
Files | D1150..D1190 | ignored | natural | Source files included in the program. |
Chunks | D1200..D1240 | ignored | natural | Chunks of source code. |
Sections | D1250..D1290 | ignored | natural | Map of groups, segments and sections. |
Segments | D1250..D1290 | ignored | natural | Map of groups, segments and sections. |
Groups | D1250..D1290 | ignored | natural | Map of groups, segments and sections. |
Structures | D1300..D1340 | yes | alphabetical | Structures declared in the program. |
Context | D1350..D1390 | ignored | stacked | Context stack of block statements |
Symbols | D1400..D1450 | yes | alphabetical | All explicitly defined symbols (shortcut for Fix,Unf,Unr,Ref). |
UnfixedSymbols | D1410..D1450 | yes | alphabetical | Symbols whose properties are not stable yet. |
FixedSymbols | D1420..D1450 | yes | alphabetical | Symbols whose properties are already fixed. |
UnreferencedSymbols | D1430..D1450 | yes | alphabetical | Symbols which were not used yet. |
ReferencedSymbols | D1440..D1450 | yes | alphabetical | Symbols which were mentioned at least once, or used in a structure. |
LiteralSymbols | D1500..D1540 | ignored | alphabetical | All literal symbols declared in the program. |
Relocations | D1550..D1590 | ignored | natural | Relocation records. |
Macros | D1600..D1690 | yes | alphabetical | Macroinstructions declared at this moment. |
Variables | D1700..D1790 | yes | alphabetical | All preprocessing %variables currently set (shortcut for Au,Fo,Us,Sys). |
AutomaticVariables | D1710..D1730 | ignored | fixed | Automatic macro %variables. |
FormalVariables | D1740..D1750 | yes | alphabetical | Formal macro/for %variables. |
UserVariables | D1760..D1770 | yes | alphabetical | User-defined preprocessing %variables. |
SystemVariables | D1780..D1790 | yes | alphabetical | System preprocessing %^variables. |
Displayed message usually contains object name, it's attributes and other properties.
%DISPLAY operands Groups, Segments, Sections are identical, each of them always
displays the complete tree.
A line with the group lists all groups's segment names.
A line with the segment is indented by 2 spaces and displays purpose, width,align, combine, class, src.
A line with the section is indented by 4 chars and displays address, size, align, ref.
Property src= specifies whether the file or chunk is
euroasm.ini
Chunk property type= shows what kind of information is in this chunk of source text:
A boolean property ref= tells whether the symbol, structure or section was used
(referenced at least once in the program). Members of the structure are automatically marked as used
when the structure is defined.
Similar property fix= specifies if the offset
of symbol or section is already fixed, i.e. it is stable between assembly passes.
Context property emit= informs whether the block is in normal (emitting) status,
or if it is just bypassed without emitting any code or data.
Context property %.= shows current value of expansion counter in this block.
Property src= identifies position in source text where the displayed object was defined, in standard form "FileName"{LineNumber}.
Automatic and formal %variables are defined only in %macro or %for expansion,
i. e. when the statement %DISPLAY Auto,Formal
is inserted in %MACRO..%ENDMACRO or %FOR..%ENDFOR body
and the macro is then expanded.
See tests t2901..t2917 for examples of %DISPLAY output.
Unlike other instructions, the statement %DISPLAY is alive and kicking even in non-emitting status. Be cautious to put unfiltered %DISPLAY in repeating preprocessing loops (%FOR, %WHILE, %REPEAT), as this may substantionally flood the output.
The main purpose of %DISPLAY is to find errors at assembly-time, when €ASM doesn't work as expected, together with EUROASM options
DISPLAYSTM=, DISPLAYENC=
and with PROGRAM optionsLISTGLOBALS=, LISTLITERALS=, LISTMAP=
.
For investigation of your program at run-time use a debugger or the macro Debug.
Those pseudoinstruction names are reserved for future extension of EuroAssembler, they are not implemented yet. See also EUROASM boolean options DEBUG= and PROFILE=.
Macro is defined by a block of statements (macro body) encapsulated between pseudoinstructions %MACRO and %ENDMACRO. The %MACRO statement itself ( macro prototype) must have a label, which can be used later for macro invocation (alias macro expansion).
Statement, which has the name of previously declared %MACRO
in its operation field, is called macroinstruction or simply
macro. It will be replaced with statements from the block
%MACRO..%ENDMACRO
.
Macro can be a fixed static set of instructions, such as
CarriageReturn %MACRO MOV AH,2 ; 3 statements between %MACRO and %ENDMACRO are macro body. MOV DL,13 INT 21h %ENDMACRO CarriageReturn
More useful are macros which can modify the expanded instructions
depending on operands they are invoked with.
When a macro is invoked, it is usually provided with operand values,
which are available in the macro body as formal %variables
or as automatic ordinal %variables %1, %2, %3,...
.
Operands in macrodefinition may be given temporary formal
symbolic name; they are accessible in the macro block
by this name prefixed with percent sign %.
Or they may be referred with their ordinal number prefixed with %.
Keyword operands are only accessible with the formal key name prefixed with %.
Example:
Copy %MACRO Source, Destination, Size=ECX ; Statement %MACRO is called macro prototype. MOV ESI, %Source ; or MOV ESI, %1 MOV EDI, %Destination ; or MOV EDI, %2 MOV ECX, %Size REP MOVSB %ENDMACRO Copy
The previous macro needlessly moves the number of copied bytes (Size)
to register ECX even when it is already there at the time of its invocation.
The expanded instruction MOV ECX,ECX
could be spared in this case:
Copy %MACRO Source, Destination, Size=ECX MOV ESI, %Source ; Instead of formal %Source we could useMOV ESI, %1
MOV EDI, %Destination ; OrMOV EDI, %2
%IF "%Size" !== "ECX" MOV ECX, %Size %ENDIF REP MOVSB %ENDMACRO Copy
Now when the macro is invoked as Copy From, To, Size=ecx
or as Copy From, To
, no superfluous MOV ECX,ECX
is expanded.
If the name of the formal macro %variable happens to collide with some previously
user-defined preprocessing %variable, visibility of the user-defined %variable
is temporarily overriden with the formal %variable, see the test t7347.
Automatic variables, such as
%*, %#, %:, %1, %2,,, are not visible outside the macro body.
Number of operands specified at macro invocation doesn't need to correspond
with the number of operands specified at macro definition.
If the macro is invoked with less ordinal operands than its prototype declares,
€ASM does not treat this as error and silently expands the omitted operands
to nothing.
When the macro is invoked with more operands than its prototype specifies,
those superfluous operands are not accessible in macro expansion by formal names,
but still they may be referred by their automatic ordinal number.
See also pseudoinstruction %SHIFT.
When a keyword operand is omitted in macro invokation, it retains its value which was specified at macro definition. Adding a voluntary keyword operand(s) allows to extend functionality of macroinstruction without destroying the backward compatibility. Consider this simple macro:
Write %MACRO TextPtr,TextSize ; Write the text to the standard output. MOV DX,%TextPtr MOV CX,%TextSize MOV BX,1 ; File handle of the standard output. MOV AH,40h ; Write string DS:DX to a device or file. INT 21h ; Invoke the DOS service. %ENDMACRO Write
Later we may want to use the same macro for writing to other devices, too.
Let's extend it with keyword operand Handle=
with predefined default value
of standard output:
Write %MACRO TextPtr,TextSize,Handle=1 ; Write the text to the standard output or other device. MOV DX,%TextPtr MOV CX,%TextSize MOV BX,%Handle ; Handle of output device or file. MOV AH,40h ; Write string DS:DX to a device or file. INT 21h ; Invoke the DOS service. %ENDMACRO Write
Now it's possible to write to other devices, too, for instance to
the standard line printer: Write Message,80,Handle=4
.
The enhanced macro Write is backward compatible.
Even if our old programs include updated macrolibrary with enhanced macro Write,
they don't have to be recompiled.
Similary to preprocessing %variables, macros may be redefined many times. However, this is not usual and €ASM will emit a warning W2512 in this case. Once defined macro can be undefined with pseudoinstruction %DROPMACRO.
As an example of situation, where dropping of the macro definition may be useful,
is emulation of a machine instruction by the macro with the same name.
Machine instruction BSWAP, which reverses the byte order in 32-bit register, was not available
on Intel 80386. This could be solved by emulation using three ROR or ROL instructions.
If we detect that our program runs on Pentium, we can drop the macro definition
and €ASM will assemble BSWAP
as a native machine instruction.
Advanced EuroAssembler macrolanguage allows to change our programming style.
We can create macroinstructions which mimic the functions of high-level languages
and customize the new "language" for the particular task.
See the macros Ii*
in €ASM source file ii.htm as an example of pseudolanguage developed
for intelligible description of conversion from assembly-instruction
to the machine code.
When something doesn't work as expected, it's always possible to look at the expanded macroinstruction body in the listing and adhere to a plain assembly code.
The target of EuroAssembler's endeavour is an output file in one of the formats
selected by PROGRAM FORMAT=
option. There are three main categories
of €ASM output files:
.oor
.obj.
Default filename extension of object or import library is .lib
, in case of dynamic library it is .so
or .dll
.
.x,
.exeor
.com. It can also create dynamically loaded libraries DLL, very similar to PE format, but they can be executed only indirectly, through invocation of their exported function from another program, or through a special Windows loader, such as
RUNDLL32.exe.
Option PROGRAM FORMAT=BIN
is chosen as the default when FORMAT= is not explicitly specified.
Default options for BIN format are
Name: PROGRAM FORMAT=BIN, OUTFILE=%^PROGRAM.bin, MODEL=TINY, WIDTH=16, \ ENTRY=0, IMAGEBASE=0, SECTIONALIGN=0, FILEALIGN=0.
€ASM creates the default segment [BIN] with universal purpose:
[BIN] SEGMENT WIDTH=16,ALIGN=16, \ PURPOSE=CODE+DATA+BSS+STACK+LITERALS
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
Structure of BIN file is straightforward: the binary image is a concatenation of emitted contents of its segments. Noninitialized (BSS) segments are omitted.
Segment alignment in the image is by default specified by the highest value
of PROGRAM FILEALIGN=0
, PROGRAM SECTIONALIGN=0
and
SEGMENT ALIGN=16
.
Gaps between segments are filled with alignment stuff,
which is 0x90 (NOP) if the neighbouring segments have both SEGMENT PURPOSE=CODE
,
otherwise it is 0x00.
Typical applications of binary format are pure data files, conversion tables, Dos drivers, boot sectors etc., see the sample BIN projects.
Option PROGRAM FORMAT=BOOT
creates a binary format file adapted for booting.
The difference from the BIN format:
.sec.
Default options for BOOT format are
Name: PROGRAM FORMAT=BOOT, OUTFILE=%^PROGRAM.sec, MODEL=TINY, WIDTH=16, \ ENTRY=, IMAGEBASE=0, SECTIONALIGN=0, FILEALIGN=0.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
See the sample projects boottest.htm and boot16.htm.
Files in COM format are legacy of CP/M operation system, they are directly executable in DOS and in 32-bit Windows. In other systems only with DOS emulator.
Default options for PROGRAM FORMAT=COM
are
Name: PROGRAM FORMAT=COM,OUTFILE=%^PROGRAM.com,MODEL=TINY,WIDTH=16,IMAGEBASE=0, \ ENTRY=256,SECTIONALIGN=0,FILEALIGN=0.
Options ENTRY=0x100
and IMAGEBASE=0
are fixed
for this format and cannot be changed (they can be omitted from the PROGRAM statement).
€ASM creates default implicit segment [COM] with universal purpose:
[COM] SEGMENT WIDTH=16,ALIGN=16,PURPOSE=CODE+DATA+BSS+STACK+LITERALS
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
Structure of COM file is similar to BIN format, there are no metainformation
stored in the file except for its extension .com
which tells OS
to treat it as an executable. OS loader will allocate 64 KB of memory,
load segment registers CS,DS,ES,SS with the paragraph address of that block,
initialize 256 bytes long [PSP] structure located at offset 0,
load the entire file contents at offset 256 (0x0100),
set stack pointer to the top of allocated block (usually SP=0xFFFE)
and finally set IP=0x0100.
Size of code+data+stack altogether should not exceed 64 KB in TINY memory model. Program in COM format can use 32-bit registers, if CPU is 386 or higher. Also additional memory blocks may be requested from OS at runtime. Typical application of this obsolete format are fast and short little utilities and Terminate-and-Stay-Resident (TSR) programs which provide services in DOS, see the sample project for DOS.
The following COM example is only 1 byte long, yet it is a formally valid computer program, though it does nothing:
EUROASM Shortest PROGRAM FORMAT=COM RET ENDPROGRAM Shortest
Program in COM format can link other object files or libraries, see the test table linker combinations.
Specifying program format MZ creates a 16-bit or 32-bit realmode executable file,
which can be directly run in DOS and in 32-bit Windows.
Its structure is described in [MZ]
and [MZEXE].
Dos executable file begins with MZ signature 'M','Z'
.
Default options for PROGRAM FORMAT=MZ
format are:
PROGRAM FORMAT=MZ, ENTRY=, OUTFILE=%^PROGRAM.exe, MODEL=SMALL, WIDTH=16, IMAGEBASE=0, \ SECTIONALIGN=0, FILEALIGN=0, SIZEOFSTACKCOMMIT=8K, SIZEOFHEAPCOMMIT=1M
€ASM creates default implicit segments [CODE], [RODATA], [DATA], [BSS], [STACK] in program formats MZ, OMF, LIBOMF.
Parameter PROGRAM SizeOfStackCommit=
specifies the default size of the segment [STACK], so we don't have
to explicitly define stack segment when EUROASM option AUTOSEGMENT=
is enabled at the ENDPROGRAM statement.
Parameter PROGRAM SizeOfHeapCommit=
can be used to limit the requested amount of heap memory preallocated by the loader
(member .e_maxalloc of DOS file header).
If the memory model is HUGE or FLAT and program width is not explicitly specified, it defaults to
PROGRAM WIDTH=32
, otherwise it is 16.
ImageBase=0
is fixed for this format and cannot be changed.
Explicit specifications of PROGRAM Entry=
is mandatory in MZ format.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SIZEOFHEAPRESERVE, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
As an example of a MZ executable program for DOS see the test t8300.
Object Module Format as specified in [OMF] is designed to be linked to 16-bit and 32-bit real-mode programs. Imports in this format are linkable to the protected-mode executables.
Default segments are the same as in MZ format.
File format OMF is recognized for LINK when it is composed of valid OMF records and the first record is THEADR or LHEADR.
Default options for this format are:
Name: PROGRAM FORMAT=OMF,OUTFILE=%^PROGRAM.obj,MODEL=SMALL,WIDTH=16
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, FILEALIGN, ICONFILE, IMAGEBASE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
As an example of a OMF module see the test t8400.
OMF library format is described in Apendix2 of the same document as [OMF]. The hashed dictionary, required by format specification at the end of library, is created on output, but €ASM linker ignores it. When the library is linked to another program, its public symbols are searched sequentionally. Page size of LIBOMF libraries created by €ASM is fixed at 16.
Default segments are the same as in MZ format.
File format LIBOMF is recognized by LINK when it starts with LIBHDR record with page size 16, 32, 64,..32K, and this record are followed by the valid OMF modules, which start with THEADR or LHEADR records and which end with MODEND or MODEND32 record each. Library dictionary at the end of the file is not checked.
Default options for PROGRAM FORMAT=LIBOMF
are:
Name: PROGRAM FORMAT=LIBOMF,OUTFILE=%^PROGRAM.lib
Other properties are inherited from its library modules.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, FILEALIGN, ICONFILE, IMAGEBASE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MODEL, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM, WIDTH.
Modules, which will be stored to the library, should be assembled beforehand to the files in OMF format. If the program, which creates library, contains some code, it will be assembled and stored as the first library module. Modules from other linked libraries, which do not declare any global symbol, will not be included in the target library at all. Example of a static OMF library linked from 3 standalone modules:
MyLib: PROGRAM FORMAT=LIBOMF LINK "Module1.obj", "Module2.obj", "Module3.obj" ENDPROGRAM MyLib
Although format OMF was developed for real-mode programs, in can be enhanced with import declarations represented with OMF records COMENT/IMPDEF, and the such import library used in Windows programs.
Some librarians (for instance [ALIB])
create longer alternatives of import library, which adds LEDATA+FIXUPP records with relocatable
machine code of proxy jumps to the imported function.
€ASM does not create the longer version of import libraries but
both short and long versions are accepted by the linker.
Example of a program creating pure import library in short OMF format:
ImpLib PROGRAM FORMAT=LIBOMF IMPORT LIB="kernel32.dll",TerminateProcess,TerminateThread IMPORT LIB="user32.dll",CreateCursor,CreateIcon,CreateMenu ENDPROGRAM ImpLib
As an example of a LIBOMF library see the test t8600.
EuroAssembler implements the object format COFF in Microsoft modification described in [MS_PECOFF]. This description is also valid for €ASM formats LIBCOF, PE, DLL (COFF-based formats).
€ASM creates three default
segments (sections) in COFF-based formats:
[.text], [.rodata], [.data], [.bss]
.
Machine stack for executables will be established by the loader at run-time.
Default options for PROGRAM FORMAT=COFF
are:
PROGRAM FORMAT=COFF,OUTFILE=%^PROGRAM.obj,MODEL=FLAT,WIDTH=32
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, FILEALIGN, ICONFILE, IMAGEBASE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
Generated value in PFCOFF_FILE_HEADER.Machine
for legacy mode COFF is always 0x014C
(Intel 386) regardless of EUROASM CPU=
value.
In 64-bit mode PECOFF is always 0x8664
(architecture AMD64).
Architecture Itanium (0x0200) is currently not supported.
PFCOFF_FILE_HEADER.TimeDateStamp
corresponds with the current system time, unless it is forged by the option EUROASM TIMESTAMP=
.
Linked COFF module is recognized by the contents of
PFCOFF_FILE_HEADER.Machine
which should be one of the words with value 0x0000, 0x014C, 0x014D, 0x014E, 0x0200, 0x8664
.
As an example of a COFF program see the test t8850 (for Windows) or t9000 (for Linux).
COFF library format is described in [COFFlib].
Default options for PROGRAM FORMAT=LIBCOF
are:
PROGRAM FORMAT=LIBCOF,OUTFILE=%^PROGRAM.lib,MODEL=FLAT,WIDTH=32
Default segments are the same as in COFF format.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, FILEALIGN, ICONFILE, IMAGEBASE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
COFF library is identified by the signature !<arch>
followed with byte 0x0A
.
You can create the library by linking from other object files. Modules, which will be stored to the library, should be assembled beforehand to files in COFF format (or OMF or ELF). If the program, which creates the library, contains some code (beside the LINK statements), it will be assembled and stored as the first library module. Modules which do not declare any global symbol, will not be included in the library at all. Example of COFF library linked from 3 modules:
MyLib: PROGRAM FORMAT=LIBCOF LINK "Module1.obj", "Module2.obj", "Module3.obj" ENDPROGRAM MyLib
€ASM does not create the longer version of import libraries but both short and long versions are accepted by the linker. Example of a program creating import library in short COFF format:
ImpLib: PROGRAM FORMAT=LIBCOF IMPORT LIB="kernel32.dll",TerminateProcess,TerminateThread IMPORT LIB="user32.dll",CreateCursor,CreateIcon,CreateMenu ENDPROGRAM ImpLib:
As an example of a LIBCOF library see the test t9150.
ELF alias Executable and Linkable Format is the file format used in Linux. There are three kinds of ELF files:
Default options for PROGRAM FORMAT=ELF
are
Name: PROGRAM FORMAT=ELF, OUTFILE=%^PROGRAM.o, MODEL=FLAT, WIDTH=32, \ FILEALIGN=16
ELF is an object (linkable) file with extension .o
.
It has the default segments [.text], [.rodata], [.data], [.bss]
.
The segments are called sections in [ELF] documentation.
Beside those regular sections €ASM also creates service sections [.symtab], [.strtab], [.shstrtab], [.rela.text], [.rela.data]
.
See t9750 as an example of ELF object.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, ICONFILE, IMAGEBASE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
This is an executable program for Linux. It has the default file extension .x
,
if not prescribed otherwise by PROGRAM OUTFILE=.
The format ELFX creates segment groups [LOAD.HDR], [LOAD.CODE], [LOAD.RODATA], [LOAD.DATA]
,
see for instance the test t9850.
The groups are called program headers in [ELF] documentation
or in Linux tools such as readelf
.
Name: PROGRAM FORMAT=ELFX, OUTFILE=%^PROGRAM.x, MODEL=FLAT, WIDTH=32, \ ENTRY=, IMAGEBASE=4M, FILEALIGN=16, SECTIONALIGN=4K
The default extension is .x
. Parameter ENTRY= is mandatory, it specifies the entry point of the program.
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
This is an DSO - Dynamic Shared Object for Linux.
EuroAssembler creates DSO with the file extension .so
but it does not dynamically link them.
€ASM does not encompass capability of specialized Linux dynamic linker GNU ld
.
When a DSO is linked to an ELFX program, it is linked only statically.
Name: PROGRAM FORMAT=ELFSO, OUTFILE=%^PROGRAM.so, MODEL=FLAT, WIDTH=32, \ IMAGEBASE=4M, FILEALIGN=4K, SECTIONALIGN=4K
These PROGRAM options are irrelevant: DLLCHARACTERISTICS, ENTRY, ICONFILE, MAJOROSVERSION, MAJORSUBSYSTEMVERSION, MAJORIMAGEVERSION, MAJORLINKERVERSION, MINOROSVERSION, MINORSUBSYSTEMVERSION, MINORIMAGEVERSION, MINORLINKERVERSION, WIN32VERSIONVALUE, SECTIONALIGN, SIZEOFHEAPCOMMIT, SIZEOFHEAPRESERVE, SIZEOFSTACKCOMMIT, SIZEOFSTACKRESERVE, STUBFILE, SUBSYSTEM.
Portable executable file format PE for Windows is described in the document [MS_PECOFF].
Default options for PROGRAM FORMAT=PE
are
Name: PROGRAM FORMAT=PE,OUTFILE=%^PROGRAM.exe,MODEL=FLAT,WIDTH=32,IMAGEBASE=4M,FILEALIGN=512,SECTIONALIGN=4K, \ SUBSYSTEM=CON,ICONFILE="euroasm.ico",MAJORLINKERVERSION=1,MINORLINKERVERSION=0,ENTRY=, \ MAJOROSVERSION=4,MINOROSVERSION=0,MAJORIMAGEVERSION=1,MINORIMAGEVERSION=0, \ MAJORSUBSYSTEMVERSION=4,MINORSUBSYSTEMVERSION=0,WIN32VERSIONVALUE=0,DLLCHARACTERISTIC=0x000F, \ SIZEOFSTACKRESERVE=1M,SIZEOFSTACKCOMMIT=8K,SIZEOFHEAPRESERVE=4M,SIZOHEAPCOMMIT=1M
Default segments are the same as in COFF format.
PE file begins with DOS program (stub) in MZ format, which is executed when the program is not launched in MS-Windows.
At the file address PFMZ_DOS_HEADER.e_lfanew
it expects the PE format signature with bytes 'P','E',0,0
.
Older file format with NE (New Executable) signature, used in 16-bit Windows and OS/2, is not supported by €ASM.
COFF file header is followed by PFPE_OPTIONAL_HEADER. Almost all its fields
are configurable with PROGRAM options.
PROGRAM ENTRY=
must be explicitly specified in PE format.
Option PROGRAM STUBFILE=
specifies the file name of 16-bit MZ program used when the program runs in DOS.
If it is left empty, €ASM will use its own built-in stub, which reports error message
This program was launched in DOS but it requires Windows. and terminates.
Factory default option ICONFILE="euroasm.ico"
specifies the file name ot the icon
,
which will be built in the resource section of linked PE file.
It visually represents the compiled file in Desktop or in Windows Explorer.
This parameter is ignored if any resource file is explicitly linked into PE (Explorer will then use the first icon found in the PE resources). If the ICONFILE= option is explicitly defined as empty, and if no resources are linked, the resource section [.rsrc] will be omitted from PE file completely.
Optional header is followed with 16 special directory entries which identify sections with special purposes (other than ordinary segment purposes CODE, DATA, BSS). See the last 16 lines in Segment purpose table, starting with EXPORT.
EuroAssembler natively supports only few of special PE directories:
Other special directories are not supported by this EuroAssembler version. Nevertheless, their segment may be created explicitly, their contents created manually or by some third-party tool and emitted to the segment with INCLUDEBIN or directly with Data definition statements. If segment parameter PURPOSE= complies with the name in purpose table (case insensitive), the corresponding directory entry in PE optional header will be created, covering the whole segment contents. Example:
[.cormeta] SEGMENT PURPOSE=CLR D '<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">' D ' <application>' D ' <!-- A list of all Windows versions that this application is designed to work with.>' D ' </application>' D ' </compatibility>'
When EUROASM option DEBUG=ENABLED at the ENDPROGRAM pseudoinstruction, symbol table is appended to the PECOFF image.
Debuggers should be able to retrieve symbol names from the debugged executable and associate them with disassembled source lines. Unfortunately, none of tools which I tried, was able to exploit the symbol table from PE.
File format DLL is almost identical with the format PE, with a few minor differences:
File header field
PFCOFF_FILE_HEADER.Characteristic if flagged with pfcoffFILE_DLL = 0x2000
,
default file extension and image base are:
Name: PROGRAM FORMAT=DLL,OUTFILE=%^PROGRAM.dll,IMAGEBASE=256M
option ENTRY=
is optional in DLL.
Default segments are the same as in COFF format.
Dynamically linkable symbols should be explicitly declared with exported scope.
Pseudoinstruction EXPORT supports dynamic
DLL forwarding
of exported function to a different function in other DLL, using the EXPORT key operands
FWD= and LIB=. See the test t9475 as an example.
Format DLL is sometimes used as resource library which contains only [.rsrc] section, typically a collection of icons. This is achieved by linking of compiled resource file, as created by a third party resource compiler. Example of resource-only DLL, which contains 3 icons, can be found in tests t9485 and t9536.
Microsoft resources is the common name for multimedia data, such as bitmap pictures, icons, cursor shapes, fonts etc.
The resource used in GUI program are described in resource script as a tree referring individual graphic files.
Typical script is a plain text file with extension .rc
and it should be converted by a resource compiler
into a binary resource file with extension .res
, which is linkable by €ASM or other linkers.
Its format is described in [RSRC].
does not work,
EuroAssembler cannot compile resource scripts. Use third party tool instead, such as
[MS_RC], [GoRC],
or [ResourceHacker].MyCompiledResource PROGRAM FORMAT=RSRC
When a resource file is linked to the PE or DLL image created by €ASM, program option ICONFILE=
is ignored. The file is converted by €ASM to an internal PECOFF binary-tree structure in the special section [.rsrc]
and referred with an optional-header directory entry RESOURCE.
The width of output files linked by EuroAssembler is determined by the program option WIDTH= and it defaults to 32 in ELF and COFF-based formats. To create a 64-bit program ELF, ELFX, ELFSO, PE, DLL, COFF or LIBCOF, the program width must be explicitly specified. 64-bit CPU should be enabled, too (EUROASM CPU=X64).
Member | PROGRAM WIDTH=16 | PROGRAM WIDTH=32 | PROGRAM WIDTH=64 |
---|---|---|---|
PFCOFF_FILE_HEADER.Machine | 0x014C (Intel 386) | 0x014C (Intel 386) | 0x8664 (AMD64) |
PFCOFF_FILE_HEADER.Characteristics:32BIT_MACHINE | 0 (false) | 0x0100 (true) | 0 (false) |
PFCOFF_FILE_HEADER.Characteristics:LARGE_ADDRESS_AWARE | 0 (false) | 0 (false) | 0x0020 (true) |
PFPE_OPTIONAL_HEADER32.Magic | 0x010B (PE32) | 0x010B (PE32) | 0x020B (PE32+) |
SIZE# PFPE_OPTIONAL_HEADER32 | 224 | 224 | 240 |
This chapter describes EuroAssembler capabilities.
Many assemblers provide the tools which help programmer with tedious and repetitive work, they are called macroassemblers. Preprocessing (macro) apparatus in EuroAssembler is recognizable by the percent sign % prefixed to pseudoinstructions which control generating of repeated blocks of source code (%REPEAT, %WHILE, %FOR, %MACRO), conditional assembly (%IF, %COMMENT), assembly-time debugging (%DISPLAY) and assignment and expansion of preprocessing %variables (%SET* family).
This set of tools manipulates with the source text before it is submitted to the final assembly processing (to the plain assembler, which is not aware of preprocessing apparatus at all).
Some compilers perform preprocessing in a special 0-th pass, which takes the input source file and emits plain assembly source. Preprocessed intermediate file can be manually inspected then.
EuroAssembler utilizes a different approach: instead of preprocessing the source file as whole at once it will preprocess statement by statement in each assembly pass. This allows to manipulate with data which dynamically change and which are not fixed before €ASM was given the opportunity to pass through the source program at least once, for instance the distance between labels, size of not-defined-yet structures and segments etc.
When €ASM reads a line of source text, first it searches for percent character %. If found, it inspects the immediately following character(s) and prepares a copy of the source line for the plain assembler, expanded according to the following rules:For more details about scope of %variable expansion see the source text VarExpand.
What follows after % Example What will replace it % %% self-escaped single percent sign %
& %& suboperation size/length . %. expansion counter : %: macro label ! %!formal inverted condition, e.g. NC
* %* macro ordinal operand list # %# number of macro ordinal operands =* %=* macro keyword operand list =# %=# number of macro keywords ^identifier %^Width system variable value (digits 16, 32 or 64} decimal digit(s) %12 12th ordinal macro operand value letter(s) %If pseudoinstruction name is left unexpanded %If
%Size if it is formal operand of %FOR or %MACRO, it will be expanded to its value %OtherId otherwise it is expanded as user-defined preprocessing variable
The relation between preprocessing and the plain assembly is similar to the relation between Javascript and the plain HTML text in internet browsers.
Proper function of €ASM preprocessing can be checked in the listing, by enabling
options EUROASM LISTVAR=ENABLE, LISTREPEAT=ENABLE, LISTMACRO=ENABLE
.
This chapter demonstrates various methods how we can break up the program functionality to small subprogrames in EuroAssembler.
Let's suppose that we need a function which calculates the third power of input positive integer number. The result should fit to 32 bits, otherwise the program will report an overflow and abort.
Assuming 32-bit mode and the input number loaded in register EAX, the solution uses instruction MUL (unsigned multiplication) two times.
Straightforward solution inserts the code directly to the main program flow.
; EAX contains the input number N. MOV ECX,EAX ; Copy the input value N to the register ECX. MUL ECX ; Let EDX:EAX = N*N JC Abort: ; CF=OF=1 when EDX is nonzero (32-bit overflow). MUL ECX ; Let EDX:EAX = N*N*N JC Abort: ; Abort on overflow. ; EAX now contains N3, continue the main program flow.
When such calculation is needed more than once, we should consider
refactoring the direct code to a subprocedure which could be called
repeatedly. We will insert the procedure named Cube
to the program flow
when its function is needed for the first time. Insertion of callable procedure
requires a bypass skip. The procedure should be also accompanied with remarks which document its function.
; EAX contains the input number N. CALL Cube: ; Invoke the function which calculates N3. JC Abort: ; Abort on overflow. JMP Bypass: ; Skip the function code. Cube PROC ; Define a function which calculates 3rd power of N. ; Input: EAX=integer number N. ; Output: CF=OF=0, EAX=N3, ECX=N, EDX=0. ; Overflow:CF=OF=1, EAX,ECX,EDX undefined. MOV ECX,EAX ; Copy the input value N to the register ECX. MUL ECX ; Let EDX:EAX = N*N JC .Abort ; CF=OF=1 when EDX is nonzero (32-bit overflow). MUL ECX ; Let EDX:EAX = N*N*N .Abort:RET ; CF=OF=1 when EDX is nonzero (32-bit overflow). ENDPROC Cube Bypass: ; EAX now contains N3, continue the main program flow.
The instruction JMP Bypass:
could be spared if the procedure code
would have been defined somewhere else, below the main program flow.
This can be achieved with emitting the procedure to a different code section (for instance [Subproc]
).
; EAX contains the input number N. CALL Cube: ; Invoke the function which calculates N3. JC Abort: ; Abort on overflow. %CurrentSect %SET %^Section ; Backup the current section name to a variable. [Subproc] ; Switch emitting to a different code section. Cube PROC ; Define a function which calculates 3rd power of N. ; Input: EAX=integer number N. ; Output: CF=OF=0, EAX=N3, ECX=N, EDX=0. ; Overflow:CF=OF=1, EAX,ECX,EDX undefined. MOV ECX,EAX ; Copy the input value N to the register ECX. MUL ECX ; Let EDX:EAX = N*N JC .Abort ; CF=OF=1 when EDX is nonzero (32-bit overflow). MUL ECX ; Let EDX:EAX = N*N*N .Abort:RET ; CF=OF=1 when EDX is nonzero (32-bit overflow). ENDPROC Cube [%CurrentSect] ; Return to the original code section. ; EAX now contains N3, continue the main program flow.
Rather than manual section switch we could also utilize €ASM
block PROC1..ENDPROC1
which will switch to a different
section [@RT1] and return to the original section automatically.
; EAX contains the input number N. CALL Cube: ; Invoke the function which calculates N3. JC Abort: ; Abort on overflow. Cube PROC1 ; Define a function which calculates 3rd power of N in section [@RT1]. ; Input: EAX=integer number N. ; Output: CF=OF=0, EAX=N3, ECX=N, EDX=0. ; Overflow:CF=OF=1, EAX,ECX,EDX undefined. MOV ECX,EAX ; Copy the input value N to the register ECX. MUL ECX ; Let EDX:EAX = N*N JC .Abort ; CF=OF=1 when EDX is nonzero (32-bit overflow). MUL ECX ; Let EDX:EAX = N*N*N .Abort:RET ; CF=OF=1 when EDX is nonzero (32-bit overflow). ENDPROC1 Cube ; End of subprocedure in section [@RT1]. Return to [.text]. ; EAX now contains N3, continue the main program flow.
Definition of function Cube at the place where it is used is good
for understandability. On the other hand, when there are more such definitions,
they clutter the main program thread. It could be more clearly organized
if those helper functions were put away to a different file,
for instance functions.inc
. This file will be
included to the main source file at assembly-time.
INCLUDE "functions.inc" ; File with Cube: PROC
source definition.
; EAX contains the input number N.
CALL Cube: ; Invoke the function which calculates N3.
JC Abort: ; Abort on overflow.
; EAX now contains N3, continue the main program flow.
Functions defined in the included file functions.inc
can be
wrapped to a block(s) functions PROGRAM
..ENDPROGRAM
and assembled separately to an OMF, ELF or COFF object file functions.obj
,
eventually to a library. The function name (Cube
) must be declared
as GLOBAL or PUBLIC in the object file, and it must be declared as GLOBAL or EXTERN
in the main file.
Instead of explicit GLOBAL declaration it may also be specified with the double colon
(Cube::
). The assembled object then will be
statically linked to the main program at link-time.
LINK "functions.obj" ; Object file with assembled code of function Cube. ; EAX contains the input number N. CALL Cube:: ; Invoke the external function which calculates N3. JC Abort: ; Abort on overflow. ; EAX now contains N3, continue the main program flow.
Functions defined in included file functions.inc
can be
wrapped to a block(s) functions PROGRAM
..ENDPROGRAM
and assembled separately to a dynamically linked library file functions.dll
,
The function name (Cube
) must be declared
as EXPORT in the library file, and as IMPORT in the main executable file.
The assembled function in DLL program then will be
dynamically bound to the main program at run-time.
IMPORT Cube, LIB="functions.dll" ; EAX contains the input number N. CALL Cube:: ; Invoke the DLL function which calculates N3. JC Abort: ; Abort on overflow. ; EAX now contains N3, continue the main program flow.
An alternative approach to the repeated inline code is utilizing a macro which will expand itself whenever the functionality is requested.
Statements which define the macro need not be bypassed, because they don't emit any code, but the macrodefinition must appear before the macro is used. The definition could be put aside to an included file as well, similary to PROC in INCLUDE method.
Cube %MACRO
MOV ECX,EAX ; Copy the input value N to the register ECX.
MUL ECX ; Let EDX:EAX = N*N
JC Abort%.: ; CF=OF=1 when EDX is nonzero (32-bit overflow).
MUL ECX ; Let EDX:EAX = N*N*N
Abort%.: ; Label name is modified by %.
variable, which increments in each macro expansion.
%ENDMACRO Cube
; EAX contains the input number N.
Cube ; Expansion of the macro.
JC Abort: ; Abort on overflow.
; EAX now contains N3, continue the main program flow.
Inline macros are fast but each invocation repeats the whole function code.
Size of program can be reduced if the macro calls the procedure with function code,
which also can be put aside to functions.inc
. The function of macro
is then limited to process eventual parameters and to hide the calling convention
(no parameters are actually used in our simple example, thou).
INCLUDE "functions.inc" ; File with Cube: PROC source definition. Cube %MACRO ; Definition of the macro Cube. CALL Cube: ; Calling the procedure Cube: %ENDMACRO Cube ; EAX contains the input number N. Cube ; Invoke macro which calls the included PROC. JC Abort ; Abort on overflow. ; EAX now contains N3, continue the main program flow.
Disadvantage of previous method is that we have to maintain two blocks of code: macro definition and procedure definition. €ASM provides procedure block PROC1 which is assembled only once, even if the macro, which contains it, is invoked repeatedly. Thank to this, the procedure code is emitted only once, when the macro is invoked for the first time, and if the macro is never invoked, the code is not emitted at all. Macrolibrary with such semiinline macros can be included to any program and does not increase the final code if the macro is not used (expanded) in the program.
This method is preferred in most macrolibraries shipped with EuroAssembler.
Cube %MACRO ; Definition of the semiinline macro Cube. CALL Cube: ; Calling the procedure Cube: Cube: PROC1 ; The PROC1 block is assembled only once on first macro invocation. MOV ECX,EAX ; Copy the input value N to the register ECX. MUL ECX ; Let EDX:EAX = N*N JC .Abort: ; CF=OF=1 when EDX is nonzero (32-bit overflow). MUL ECX ; Let EDX:EAX = N*N*N .Abort:RET ; CF=OF=1 when EDX is nonzero (32-bit overflow). ENDPROC1 Cube: %ENDMACRO Cube ; EAX contains the input number N. Cube ; Invoke of macro which calls the embedded PROC1. JC Abort ; Abort on overflow. ; EAX now contains N3, continue the main program flow.
This chapter gives a closer look how a program block of statements is processed by EuroAssembler.
Consider a plain text file src.asm
submitted to assembler:
DB 'This source "src.asm" has' DB ' no PROGRAM statement.',13,10 DB 'EuroAssembler will use ' DB 'a fictive envelope instead.'
As no PROGRAM..ENDPROGRAM
block is defined in this source,
the output format of €ASM object file is configured only by [PROGRAM] section in the configuration file
euroasm.ini
, or by built-in default, which is PROGRAM FORMAT=BIN,MODEL=TINY,WIDTH=16
.
EuroAssembler formally wraps each source file into the two
fictive envelope statements PROGRAM and ENDPROGRAM.
Prefixed envelope PROGRAM statement derives its
label (module name) from the source file name, cutting off its extension.
Thus it will assemble the source src.asm
to a data file src.bin
.
This behaviour is compatible with most other assemblers.
If the source file name starts with a digit, for instance
123.asm, such label is not acceptable by €ASM, so the module name will be prefixed with grave ` and source123.asmis assembled to`123.bin.Similary, when the label of PROGRAM statement contains ? or other letters unacceptable by filesystem, such character in the module file name will be replaced with an underscore _. Statement
IsNumlockOn? PROGRAM FORMAT=COM
will produce program namedIsNumlockOn_.com.€ASM uses ANSI version of Windows API for dealing with file names, so I recomend to abstain from using national characters outside the current codepage in source file names.
When the source file is loaded in the memory, €ASM begins to read the source,
starting with the envelope statement PROGRAM
. When the corresponding
ENDPROGRAM
is found, an assembly pass is over.
€ASM checks all symbols, which might have been defined in the program,
and looks whether their offset is marked fixed, i. e. it did not change
between passes. If at least one symbol has its offset not fixed yet,
another assembly pass is needed and €ASM goes back to the PROGRAM
statement. When all symbols are fixed, €ASM starts the final assembly pass,
in which code+data is generated to the target file and listing is produced.
Each source requires at least two passes to assemble.
assembly progress ─> ┌─────────┬──────────────────────────────────┐ █ │envelope │src: PROGRAM │ █ ┌█ ├─────────┼──────────────────────────────────┤ █ │ █ │ {1}│ DB 'This source "src.asm" has' │ █ │ █ │"src.asm"│ DB ' no PROGRAM statement.',13,10│ █ │ █ │ {3}│ DB 'EuroAssembler will use ' │ █ │ █ │ {4}│ DB 'a fictive envelope instead.' │ █ │ █ ├─────────┼──────────────────────────────────┤ █ │ █ │envelope │ ENDPROGRAM src: │ █┘ █─┐ └─────────┴──────────────────────────────────┘ █ ││ │ │ │ I0010 EuroAssembler started.───────────────────────┤│ │ │ │ I0180 Assembling source file "src.asm".────────────┤│ │ │ │ I0270 Assembling source "src".─────────────────────┘│ │ │ │ I0310 Assembling source pass 1.─────────────────────┘ │ │ │ I0330 Assembling source pass 2 - final.──────────────────────┘ │ │ I0760 16-bit TINY BIN file "src.bin" created from source, size=99.───┘ │ I0750 Source "src" (4 lines) assembled in 2 passes with errorlevel 0.─┤ I0860 Listing file "src.asm.lst" created, size=717.───────────────────┤ I0990 EuroAssembler terminated with errorlevel 0.─────────────────────┘
Envelope statements are used regardless if an explicit PROGRAM block was defined in the source text, or not. Source lines between the start of file and the explicit PROGRAM statement, as well as lines between the explicit ENDPROGRAM and the end of source, should not emit any data or code. In this case the envelope source is empty and does not create target file from the source.
Consider the following source file src.asm
.
There is an explicit block Src:PROGRAM..ENDPROGRAM Src:
(lines 5..8)
inside the invisible envelope statements src: PROGRAM
and ENDPROGRAM src:
.
When the internal Src:PROGRAM..ENDPROGRAM Src:
block is found in assembly process,
this entire block is skipped until a final pass of outer block is performed. Then €ASM puts
the currently assembled final pass aside, and starts to assemble the inner block
in as many passes as necessary, creating the inner program target file.
After then €ASM returns to finish the final pass of the outer (envelope) program.
EUROASM ; Common options. ; Source file "src.asm" ; with PROGRAM defined explicitly. Src:PROGRAM FORMAT=BIN DB 'Data emitted ' DB 'by program Src.' ENDPROGRAM Src:
Notice the bug: the wrap of comment line {3} yields an not-comment line {4}.
Expression explicitly.
is treated as a valid label (definition of an address symbol).
This causes the envelope being treated as not empty and target file src.bin
is created from it,
nonetheless with zero filesize, as it contains only a zero-sized address symbol.
Inner program from lines {5..8} creates target file Src.bin
with size 28 bytes, but it is soon overwritten with the envelope zero-sized target src.bin
which happens to have almost identical name (filesystem in Dos|Windows is case-insensitive).
┌─────────┬──────────────────────────────────┐ █ assembly progress ─────────> │envelope │src: PROGRAM │ █ ┌█ ┌█ ├─────────┼──────────────────────────────────┤ █ │ █ │ █ │ {1}│ EUROASM ; Common options. │ █ │ █ │ █ │ {2}│ ; Source file "src.asm" │ █ │ █ │ █ │ {3}│ ; with PROGRAM defined │ █ │ █ │ █ │ {4}│explicitly. │ █┐ │ █┐ │ █ │"src.asm"│Src:PROGRAM FORMAT=BIN │ │ │ │ │ █─█ ┌█ │ {6}│ DB 'Data emitted ' │ │ │ │ │ █ │ █ │ {7}│ DB 'by program Src.' │ │ │ │ │ █ │ █ │ {8}│ ENDPROGRAM Src: │ └█ │ └█ │ █┘ █┐ ├─────────┼──────────────────────────────────┤ █ │ █ │ └█ │envelope │ ENDPROGRAM src: │ █┘ █┘ █┐ └─────────┴──────────────────────────────────┘ █ ││ │ │ │ ││ │ │ ││ I0010 EuroAssembler started.────────────────────┤│ │ │ │ ││ │ │ ││ I0180 Assembling source file "src.asm".─────────┤│ │ │ │ ││ │ │ ││ I0270 Assembling source "src".──────────────────┘│ │ │ │ ││ │ │ ││ I0310 Assembling source pass 1.──────────────────┘ │ │ │ ││ │ │ ││ I0310 Assembling source pass 2.─────────────────────────────┘ │ │ ││ │ │ ││ I0330 Assembling source pass 3 - final.────────────────────────────────┘ │ ││ │ │ ││ W2101 Symbol "explicitly." was defined but never used. "src.asm"{4}─────────┘ ││ │ │ ││ I0470 Assembling program "Src". "src.asm"{5}──────────────────────────────────┘│ │ │ ││ I0510 Assembling program pass 1. "src.asm"{5}──────────────────────────────────┘ │ │ ││ I0530 Assembling program pass 2 - final. "src.asm"{5}───────────────────────────────┘ │ ││ I0660 16-bit TINY BIN file "Src.bin" created, size=28. "src.asm"{8}─────────────────────┤ ││ I0650 Program "Src" assembled in 2 passes with errorlevel 0. "src.asm"{8}──────────────┘ ││ W3990 Overwriting previously generated output file "Src.bin".─────────────────────────────┤│ I0760 16-bit TINY BIN file "src.bin" created from source, size=0.──────────────────────────┤│ I0750 Source "src" (8 lines) assembled in 3 passes with errorlevel 3.─────────────────────┤│ I0860 Listing file "src.asm.lst" created, size=1372.──────────────────────────────────────┘│ I0990 EuroAssembler terminated with errorlevel 3.──────────────────────────────────────────┘
EuroAssembler allows to define more than one program block in a single source file, and assemble all of them with one command. Remember that symbols used in different PROGRAM..ENDPROGRAM blocks have private scope, so they don't see each other, although they are defined in the same source file. If we want to call a procedure defined in Pgm1 from Pgm2, the called symbol must be declared global and both assembled modules must be linked together.
┌─────────┬──────────────────────────────────┐ █ assembly progress ─────────────────> │envelope │src: PROGRAM │ █ ┌█ ├─────────┼──────────────────────────────────┤ █ │ █ │ {1}│ EUROASM ; Common options. │ █ │ █ │ {2}│Pgm1:PROGRAM FORMAT=PE,ENTRY=Run1:│ █┐ │ █─█ ┌█ ┌█ │ {3}│ ; Pgm1 data. │ │ │ █ │ █ │ █ │ {4}│Run1: ; Pgm1 code. │ │ │ █ │ █ │ █ │"src.asm"│ ENDPROGRAM Pgm1: │ │ │ █┘ █┘ █┐ │ {6}│ ; Pgm2 description. │ │ │ █ │ {7}│Pgm2:PROGRAM FORMAT=PE,ENTRY=Run2:│ │ │ └█ ┌█ ┌█ │ {8}│ ; Pgm2 data. │ │ │ █ │ █ │ █ │ {9}│Run2: ; Pgm2 code. │ │ │ █ │ █ │ █ │ {10}│ ENDPROGRAM Pgm2: │ └█ │ █┘ █┘ █┐ ├─────────┼──────────────────────────────────┤ █ │ └█ │envelope │ ENDPROGRAM src: │ █┘ █┐ └─────────┴──────────────────────────────────┘ █ ││ │ │ │ │ │ │ │ │ │ ││ I0010 EuroAssembler started.───────────────────┤│ │ │ │ │ │ │ │ │ │ ││ I0180 Assembling source file "src.asm".────────┤│ │ │ │ │ │ │ │ │ │ ││ I0270 Assembling source "src".─────────────────┘│ │ │ │ │ │ │ │ │ │ ││ I0310 Assembling source pass 1.─────────────────┘ │ │ │ │ │ │ │ │ │ ││ I0330 Assembling source pass 2 - final.──────────────────┘ │ │ │ │ │ │ │ │ ││ I0470 Assembling program "Pgm1". "src.asm"{2}─────────────────┤ │ │ │ │ │ │ │ ││ I0510 Assembling program pass 1. "src.asm"{2}─────────────────┘ │ │ │ │ │ │ │ ││ I0510 Assembling program pass 2. "src.asm"{2}──────────────────────┘ │ │ │ │ │ │ ││ I0530 Assembling program pass 3 - final. "src.asm"{2}───────────────────┘ │ │ │ │ │ ││ I0660 32bit FLAT PE file "Pgm1.exe" created, size=14320. "src.asm"{5}──────┤ │ │ │ │ ││ I0650 Program "Pgm1" assembled in 3 passes with errorlevel 0. "src.asm"{5}─┘ │ │ │ │ ││ I0470 Assembling program "Pgm2". "src.asm"{7}────────────────────────────────┤ │ │ │ ││ I0510 Assembling program pass 1. "src.asm"{7}────────────────────────────────┘ │ │ │ ││ I0510 Assembling program pass 2. "src.asm"{7}─────────────────────────────────────┘ │ │ ││ I0530 Assembling program pass 3 - final. "src.asm"{7}──────────────────────────────────┘ │ ││ I0660 32bit FLAT PE file "Pgm2.exe" created, size=14320. "src.asm"{10}─────────────────────┤ ││ I0650 Program "Pgm2" assembled in 3 passes with errorlevel 0. "src.asm"{10}────────────────┘ ││ I0750 Source "src" (10 lines) assembled in 2 passes with errorlevel 0.───────────────────────┤│ I0860 Listing file "src.asm.lst" created, size=1736.─────────────────────────────────────────┘│ I0990 EuroAssembler terminated with errorlevel 0.─────────────────────────────────────────────┘
Why should we pack multiple modules together with their documentation to a single file rather than scatter them to a bunch of small files? It's a matter of individual preferences.
One reason could be the transfer of information between modules with preprocessing %variables. Unlike ordinary symbols, scope of %variables is not limited with PROGRAM..ENDPROGRAM block bounderies. Suppose that in Pgm2 we need to know the size of data segment from Pgm1. Let's read the size to %variable with statement
%Pgm1DataSize %SETA SIZE# [DATA]
which is placed in Pgm1 just aboveENDPROGRAM Pgm1
. In the final pass of Pgm1 is the segment size reliably known, and the variable%Pgm1DataSize
will be visible in the whole source below its definition, so Pgm2 can calculate with it.Another example where grouping programs is profitable is when the programs are similar or they share common data, declared with preprocessing %variables. The following example creates three similar short programs
RstLPT1.com,RstLPT2.com,RstLPT3.comin a loop:Nr %FOR 1,2,3 ; Repeat the %FOR..%ENDFOR block three times. RstLPT%Nr PROGRAM FORMAT=COM ; Program to reset LinePrinter port. MOV DX,%Nr ; LPT port ordinal number (1,2,3). MOV AH,1 ; BIOS function INITIALIZE LPT PORT. INT 17h ; Use BIOS function to reset printer. MOV DX,Message ; Put the address of $-terminated string to DS:DX. MOV AH,9 ; DOS function WRITE STRING TO STDOUT. INT 21h ; Use DOS function to report success. RET ; Terminate program. Message:DB "LPT%Nr was reset.$" ENDPROGRAM RstLPT%Nr %ENDFOR Nr ; Generate 3 clones of the program.
Program modules can be nested in one-another. For instance when building amphibious program
executable both in Dos and in Windows we may want to reflect the fact, that the Dos-executable MZ file
is embedded as a stub in Windows-executable PE file, both providing the same functionality.
See the sample projects LockTest
or EuroConvertor
as examples of dual DOS&Windows program.
Again, when the outer program sees inner program block in non-final pass, it is skipped. In the final pass is the assembly of outer program temporarily suspended, inner program completely assembled, and then the final pass of outer program continues.
┌─────────┬──────────────────────────────────┐ █ assembly progress ──────────────> │envelope │src: PROGRAM │ █ ┌█ ├─────────┼──────────────────────────────────┤ █ │ █ │ {1}│ EUROASM ; Common options. │ █ │ █ │ {2}│Pgm1: PROGRAM FORMAT=PE,ENTRY=Run:│ █┐ │ █─█ ┌█ ┌█ │ {3}│Run: ; Pgm1 data + code. │ │ │ █ │ █ │ █ │ {4}│ Pgm2: PROGRAM FORMAT=COFF │ │ │ █┐ │ █┐ │ █─█ ┌█ │"src.asm"│ ; Pgm2 data + code. │ │ │ │ │ │ │ █ │ █ │ {6}│ ENDPROGRAM Pgm2: │ │ │ └█ │ └█ │ █┘ █─█ │ {7}│ ; Pgm1 more code. │ │ │ █ │ █ │ █ │ {8}│ LINK "Pgm2.obj" │ │ │ █ │ █ │ █ │ {9}│ ENDPROGRAM Pgm1: │ └█ │ █┘ █┘ █─█ ├─────────┼──────────────────────────────────┤ █ │ █ │envelope │ ENDPROGRAM src: │ █┘ █─┐ └─────────┴──────────────────────────────────┘ █ ││ │ │ │ │ │ │ │ │ │ │ I0010 EuroAssembler started. ──────────────────┤│ │ │ │ │ │ │ │ │ │ │ I0180 Assembling source file "src.asm".────────┤│ │ │ │ │ │ │ │ │ │ │ I0270 Assembling source "src".─────────────────┘│ │ │ │ │ │ │ │ │ │ │ I0310 Assembling source pass 1.─────────────────┘ │ │ │ │ │ │ │ │ │ │ I0330 Assembling source pass 2 - final.──────────────────┘ │ │ │ │ │ │ │ │ │ I0470 Assembling program "Pgm1". "src.asm"{2}─────────────────┤ │ │ │ │ │ │ │ │ I0510 Assembling program pass 1. "src.asm"{2}─────────────────┘ │ │ │ │ │ │ │ │ I0510 Assembling program pass 2. "src.asm"{2}──────────────────────────┘ │ │ │ │ │ │ │ I0530 Assembling program pass 3 - final. "src.asm"{2}───────────────────────────┘ │ │ │ │ │ │ I0470 Assembling program "Pgm2". "src.asm"{4}───────────────────────────────────────┤ │ │ │ │ │ I0510 Assembling program pass 1. "src.asm"{4}───────────────────────────────────────┘ │ │ │ │ │ I0530 Assembling program pass 2 - final. "src.asm"{4}───────────────────────────────────┘ │ │ │ │ I0660 32bit FLAT COFF file "Pgm2.obj" created, size=78. "src.asm"{6}──────────────────────┤ │ │ │ I0650 Program "Pgm2" assembled in 2 passes with errorlevel 0. "src.asm"{6}────────────────┘ │ │ │ I0560 Linking COFF module ".\Pgm2.obj". "src.asm"{9}───────────────────────────────────────────┤ │ │ I0660 32bit FLAT PE file "Pgm1.exe" created, size=14320. "src.asm"{9}──────────────────────────┤ │ │ I0650 Program "Pgm1" assembled in 3 passes with errorlevel 0. "src.asm"{9}─────────────────────┘ │ │ I0750 Source "src" (9 lines) assembled in 2 passes with errorlevel 0.──────────────────────────────┤ │ I0860 Listing file "src.asm.lst" created, size=1237.───────────────────────────────────────────────┘ │ I0990 EuroAssembler terminated with errorlevel 0.────────────────────────────────────────────────────┘
Some useful features of EuroAssembler can help the programmer to assure that the source is assembled as intended.
Keep on mind that this is asm-time debugging which helps to discover misunderstanding and errors in EuroAssembler itself rather than bugs in the assembled program.
Dump column of the listing displays the assembled code .
Repeated stretchs, which are considered bug-free, are suppressed by default, but they can be displayed on demand
with directives
EUROASM LISTINCLUDE=ON, LISTVAR=ON, LISTMACRO=ON, LISTREPEAT=ON
.
Recognition of fields in statements can be investigated with option EUROASM DISPLAYSTM=ON
,
which inserts comment lines identifying each field.
As this option blows up the listing size significantly, it's better to limit DISPLAYSTM only
to the suspected lines, and then switch the option OFF or restore the previous set of options:
EUROASM PUSH, DISPLAYSTM=ON ; Store all current EUROASM options with PUSH first. MyMacro Operand1, Operand2 ; "MyMacro" was not defined yet as a %MACRO, so it's treated like a label. D1010 **** DISPLAYSTM "MyMacro Operand1, Operand2" D1020 label="MyMacro" D1040 unknown operation="Operand1" D1050 ordinal operand number=1,value="Operand2" EUROASM POP ; Restore EUROASM options. D1010 **** DISPLAYSTM "EUROASM POP" D1040 pseudo operation="EUROASM" D1050 ordinal operand number=1,value="POP" ; Statement fields are no longer displayed.
Detailed machine instructions encoding can be displayed with option
EUROASM DISPLAYENC=ON
, which inserts comment line below machine instruction with the list
of actually used modifiers.
EUROASM PUSH, DISPLAYENC=ON ; Store all current EUROASM options with PUSH first. SHRD [RDI+64],RDX,2 D1080 Emitted size=6,DATA=QWORD,DISP=BYTE,SCALE=SMART,ADDR=ABS,IMM=BYTE. VMOVNTDQA XMM17,[RBP+40h] D1080 Emitted size=7,PREFIX=EVEX,DATA=OWORD,OPER=0,DISP=BYTE,SCALE=SMART,ADDR=ABS. EUROASM POP ; Restore EUROASM options. Encodings are no longer displayed.
All configuration options, which can be specified with EUROASM and PROGRAM keyword operands, are retrievable in the form of system %^variables, thus their current value can be checked or otherwise exploited:
%IF %^NOWARN[2101] %ERROR You shouldn't suppress the warning W2101. Move unused symbols to included file instead. %ENDIF
The most powerful assembly-time debugging tool is the pseudoinstruction %DISPLAY, which displays internal €ASM objects at assembly-time and helps to find out, why €ASM doesn't work as expected.
See tests t2901..t2917 as examples.
Linking in IT terminology is the process when the separately assembled or compiled modules are joined, interactions between the globally accessible symbols resolved, their code and data combined and reformated to the target file format. See [Linkers] for more details.
Unlike many other linkers, EuroAssembler can create not only executable files, but also linkable formats ELF, COFF and OMF, and their libraries LIBCOF and LIBOMF (see Object convertor and the table of supported linker combinations).
Linking is mediated with pseudoinstruction LINK which is followed by filenames of input modules. Input formats acceptable for EuroAssembler linker are of two kinds:
CPU mode | Program width | Output executable |
Output linkable |
Input linkable | Input importable |
---|---|---|---|---|---|
Real | 16 | BIN, BOOT, COM, MZ | OMF, LIBOMF, COFF, LIBCOF | OMF, LIBOMF, COFF, LIBCOF | - |
Real | 32 | BIN, BOOT, COM, MZ | OMF, LIBOMF, COFF, LIBCOF | OMF, LIBOMF, COFF, LIBCOF | - |
Prot | 32 | ELFX, PE, DLL | ELF, COFF, LIBCOF, OMF, LIBOMF | ELF, COFF, LIBCOF, RSRC, OMF, LIBOMF | ELF, COFF, LIBCOF, DLL, OMF, LIBOMF |
Prot | 64 | ELFX, PE, DLL | ELF, COFF, LIBCOF | ELF, COFF, LIBCOF, RSRC | ELF, COFF, LIBCOF, DLL, OMF, LIBOMF |
See also the table of tests on
linker combinations.
Notice that the object format OMF cannot be linked in 64-bit programs.
The actual format of linked file is recognized by the file contents, not by the file name extension. Each linked module is loaded and converted to an €ASM internal format (PGM) in memory prior to the actual linking.
Position of pseudoinstruktion LINK within the block PROGRAM..ENDPROGRAM is not important, names of the linked modules are just collected a the linking is postponed till the end of program.
Code and data from the linked object files in formats ELF, COFF or OMF will be combined and concatenated with code and data from the base program (i. e. the one to which it's linked). Base program may be empty, however. Linker also resolves mutual references between the public and external symbols from all linked modules.
Unlike other linkers, EuroAssembler does not accept names of linked module as its command line arguments. A linker script (€ASM source program) must be prepared beforehand when we want to employ EuroAssembler as a pure linker, for instance to convert object files created by 3rd-party assembler or compiler to an executable file. The desired output file name and format will be specified as the PROGRAM arguments:
MyExeFile PROGRAM FORMAT=PE, WIDTH=32, ListMap=Yes, ListGlobals=Yes LINK MyCoff.obj, PascalOmf.obj, Win32.lib ENDPROGRAM MyExeFile
Save the linker script asMyScript.asm, executeeuroasm MyScript.asm
and it will produce the Windows programMyExeFile.exeand listingMyScript.asm.lstwith the map of linked sections and global symbols.
Beside standalone object modules the code and data can be also linked from object libraries in formats LIBCOF and LIBOMF.
When the target base program is executable, €ASM only links those modules from library, which are at least once referenced by other modules (smart linking). This helps to keep size of the linked file small, eliminating the dead (never-to-be-executed) code.
If we nevertheless need to combine the unreferenced library procedures to our executable program, we would have to explicitly declare their names GLOBAL in the the base program.
Smart linking does not apply when the target file is linkable, for instance when a LIBCOF library is created from other libraries and standalone object modules. In this case all modules (referenced or unreferenced) will be linked to the target file.
The good reason why to split big project into smaller, separately assembled modules, is faster build.
When a project grows and its source is doubled in size, the number of symbols in it is likely to double, too. Each symbol needs to be compared with an array of other already declared symbols to avoid duplication. Number of checks, and also the consumed time, grows almost quadratically with source size.
During the developement process we are usually focused only to one part (module) of the project, so the remaining unchanged modules do not need to be recompiled again in each developement cycle (see also Makefile manager).
Recapitulation: If you want to statically link your own function (procedure),
declare it PUBLIC function
(or terminate its definition label with two colons function:: PROC
)
and assemble the function to an object or library module.
Then assemble the main program, declare the linked function EXTERN function
(or terminate the called name with two colons) and
insert pseudoinstruction LINK module.obj
into the main program.
The main program then can CALL function::
as if it were assembled in its own body.
The same applies for functions from 3rd party library.
Again, you must observe its published name, calling convention, number, order and
type of arguments.
This version of EuroAssembler does not support dynamic linking of Linux dynamic libraries (DSO).
The command LINK DSO.so
tries to link the file only statically.
This chapter concerns dynamic linking for MS-Windows only.
The code and data of dynamically linked functions are not copied to the target executable image,
they remain in dynamic library (DLL), which has to be available on the system where our executable runs.
When our program calls a function from DLL, it actually executes a thunk code
represented by a call of single proxy jump instruction (stub).
€ASM generates stubs in a special import section [.idata]
in the form of indirect absolute near jump (JMPN).
Each such proxy jump is 7 bytes long (0xFF2425[00000000]
)
and it uses pointer into Import Address Table (IAT) as its indirect DWORD target.
Virtual address in the pointer [00000000]
is resolved by the linker,
but the actual 32-bit or 64-bit virtual address of the library function (pointed to by the resolved dword)
will be fixed up later, by the loader at bind time when the application starts.
Loader, implemented in Windows kernel, needs two pieces of information to dynamically link library functions and to fix up their addresses in IAT:
1) The name of the linked symbol (function name) or its ordinal number in the table of exported symbols.
Calling by ordinals is not supported in €ASM.
2) The name of the library file which exports the symbol (without path).
Path to the library file will be established by the loader. The order of directories where MS-Windows searches for the library is explained in [WinDllSearchOrder].
Program, which needs to call a symbol (imported function) from the dynamic library, should declare
the symbol as imported. It may be declared GLOBAL as well, either explicitly or implicitly
( CALL ImportedSymbol::
), but €ASM will treat such global symbol as EXTERN (statically linked)
and complain that the corresponding public symbol was not found.
There are several methods how to tell €ASM that the symbol should be dynamically linked:
Use explicit import declaration, for instance
IMPORT ImportedSymbol1, ImportedSymbol2, LIB="dynamic.dll"
.
Dynamic library should be specified without a path and it may be omitted
when it is the library of core Windows functions kernel32.dll
.
Presence ofdynamic.dllin filesystem is not checked at link time, the program will assemble without errors even when no such file exists on our computer. However, Windows will complain when it tries to launch the program and cannot find the library or when ImportedSymbol is not exported by thedynamic.dll.
Link import library to the program, for instance
LINK "winapi.lib"
Such file may be in format LIBOMF or LIBCOF
and it usually contains only declarations of exported symbol names and their library files
(pure import library).
€ASM will match undefined global symbols with those names, and redeclare them as IMPORT.
Some older librarians (e.g. [ALIB]) produce import libraries in long format, which contains additional code of thunk jumps. €ASM accepts the long format but this superfluous contents is ignored.
Some compilers mangle the exported names, decorating them with leading underscores and other info concerning calling convention and number of operands. €ASM does not support name mangling, imported functions must be called by the identical name which is exported.
The file dll2lib.htm tells how to create an import library from MS-Windows DLLs.
%SystemRoot %SETE SystemRoot LINK "%SystemRoot\system32\USER32.DLL"
Linking of a dynamic library does not copy its code and data to our program, it only
detects names of exported functions. This statement is equivalent to declaration
IMPORT *, LIB="USER32.DLL"
or to the linking of corresponding
import library LINK "USER32.lib"
.
LoadLibrary("library.dll"); GetProcAddress(SymbolName);
before the SymbolName is used.
Recapitulation: If you want to dynamically link
your own function (procedure) in other programs,
declare it EXPORT function
and assemble the function to an DLL format
(mylib PROGRAM FORMAT=DLL
). Be sure to distribute mylib.dll
together with your programs.
Then assemble the main executable program, declaring the linked function
IMPORT function, LIB=mylib.dll
.
The main program then can invoke it using CALL function
.
More often you will need to call the functions from 3rd party dynamic library,
which is the case of
MS-Windows API. You might explicitly enumerate each used WinAPI functions
with pseudoinstruction such as IMPORT function1,function2,LIB=user32.dll
,
but more comfortable solution is to use import library, which declares
all function names exported by the DLL. Then you won't have to add new
import declarations every time when a new function is used in your program during its developement.
Simply call the new function with double colon and, when its name appeares
in some import library, it will be treated as imported.
You may also want to use the macro
WinAPI (32-bit) or
WinABI (64-bit)
which takes care of IMPORT declaration and automatic selection
between ANSI and WIDE variant.
EuroAssembler can create libraries from previously assembled object modules (files in ELF, OMF or COFF format). When the library program itself contains some code and data, it will be implicitly linked to the library as the first module.
Library PROGRAM FORMAT=LIBOMF ; or FORMAT=LIBCOF ObjModule1:: PROC ; One of the object modules can also be defined here. ; Source code of ObjModule1. ENDP ObjModule1:: LINK "ObjModule2.obj", "ObjModule3.obj" ; Other ELF, OMF or COFF object modules. ENDPROGRAM Library
If the linked modules contain import information, it is copied to the output library, too. Pure import library contains import declarations only. They may be explicitly declared as IMPORT, or loaded from dynamic library, or linked from other import libraries. Following example exploits all three methods:
ImpLibrary PROGRAM FORMAT=LIBOMF ; or FORMAT=LIBCOF IMPORT Symbol1, Symbol2, LIB="DynamicLibrary1.dll" ; Explicit declaration. LINK "C:\MyDLLs\DynamicLibrary2.dll" ; Automatic export detection from DLL. LINK "OtherImportLibrary.lib" ; Reimport from another library. ENDPROGRAM ImpLibrary
Example of libraries created from three separately assembled modules can be found in €ASM tests:
t8552 (object library LIBOMF for 16-bit Dos),
t9113 (object library LIBCOF for 32-bit Windows),
t9164 (object library LIBCOF for 64-bit Windows),
t8675 (import library LIBOMF for Windows),
t9225 (import library LIBCOF for Windows),
EuroAssembler can directly link all main object formats OMF, ELF and COFF, so the demand for explicit object conversion between them should be rare. Example:
OMFobject PROGRAM FORMAT=OMF ; Convert COFF object file to the format OMF. LINK "COFFobject.obj" ENDPROGRAM OMFobject
COFFobject PROGRAM FORMAT=COFF; Convert OMF object file to the format COFF. LINK "OMFobject.obj" ENDPROGRAM COFFobject
ELFobject PROGRAM FORMAT=ELF; Convert COFF object file to the format ELF. LINK "COFFobject.obj" ENDPROGRAM ELFobject
COFFobject PROGRAM FORMAT=COFF; Convert ELF object file to the format COFF. LINK "ELFobject.o" ENDPROGRAM COFFobject
OMFlibrary PROGRAM FORMAT=LIBOMF ; Convert COFF object library to the format LIBOMF. LINK "COFFlibrary.lib" ENDPROGRAM OMFlibrary
COFFlibrary PROGRAM FORMAT=LIBCOF ; Convert OMF object library to the format LIBCOF. LINK "OMFlibrary.lib" ENDPROGRAM COFFlibrary
Operator FILETIME# retrieves the last modification time of a file at assembly-time, which can be used for detection if the target file needs reassembly or not. Just compare the filetime of target with filetime of each source, which the target depends on. If the target file does not exist, its attribute-operator FILETIME# returns 0, which is the same as if it was very old, so its reassembly will be required anyway.
; Recompile "source.asm" only if "target.exe" doesn't exist or if it is older than its sources. %IF FILETIME# "target.exe" > FILETIME# "source.asm" && FILETIME# "target.exe" > FILETIME# "included2source.inc" %ERROR "target.exe" is fresh, no need to assemble again. %ELSE target PROGRAM FORMAT=PE INCLUDE "source.asm" ENDPROGRAM target %ENDIF
As an example of more sofisticated makefile script see the main EuroAssembler source file euroasm.htm.
Computer programs are often written in assembler because we want them to be fast and small. However, those are not the only criteria how a program can be optimised:
See also optimisation tutorials.
Let's look how EuroAssembler can help with optimisation.
€ASM selects by default the shortest possible encoding of machine instruction. On the other hand, it respects instruction mnemonic chosen by the programmer, which doesn't always have to be the shortest variant. A couple of rules worth remembering:
|0000:B80000 | MOV AX,0 |0003:29C0 | SUB AX,AX ; Using SUB or XOR for zeroing is shorter. Side effect: flags are changed. |0005: | |0005:89D8 | MOV AX,BX |0007:93 | XCHG AX,BX ; XCHG is shorter than MOV. Collateral damage: 2nd register is changed, too. |0008: | |0008: |Label: |0008:8D06[0800] | LEA AX,[Label] |000C:B8[0800] | MOV AX,Label ; Moving offset to a register is shorter than loading its address by LEA. |000F: | |000F:5053 | PUSH AX,BX |0011:60 | PUSHAW ; Pushing/popping all registers at once is shorter than individual push/pop. |0012: | |0012:050100 | ADD AX,1 |0015:40 | INC AX ; Increment/decrement is shorter than add/subtract. |0016: | |0016: |LoopStart: |0016:49 | DEC CX |0017:75FD | JNZ LoopStart: |0019:E2FB | LOOP LoopStart: ; LOOP, JCXZ are shorter than separate test+jump.Programs which aspire for short-size category should have PROGRAM FORMAT=COM
and EUROASM AUTOALIGN=OFF
.
They may be terminated by a simple near RET
instead of invoking DOS function TERMINATE PROCESS,
because the return address on stack of COM program is initialized to 0 and the final RET
transfers execution to DOS terminating
interrupt at the beginning of PSP block (CS:0), which was established by the loader.
Hello PROGRAM FORMAT=COM MOV DX,=B "Hello world!$" MOV AH,9 INT 21h RET ENDPROGRAM Hello
For some more inspiration check
[Golfing_tips],
Hugi Size Coding Competition Series,
Assembly nibbles competition,
Graphical Tetris in 1986 bytes by Sebastian Mihai,
BootChess play in 487 bytes by Oliver Poudade.
Windows executable program created by €ASM will be shorter
when the option PROGRAM ICONFILE=
is explicitly specified as empty
and no resource file is linked. In this case the resource section will not
be included in PE file at all. You may also experiment with PE file properties
using program options, such as PROGRAM FILEALIGN=
value.
Writing fast programs is fully in the hands of programmer, EuroAssembler cannot help much here,
it does no optimisations behind your back as high-level compilers do.
You may want to set EUROASM AUTOALIGN=ON
to be sure that all data will be
aligned for the best performace. Total control of instruction encoding
in €ASM allows to select a variant with exact code size, which is faster than size-optimised
encoding stuffed by NOPs. €ASM supports
optimised no-operations encoding for
fast and easy manual alignment.
There are many tricks how to squeeze every CPU clock: by loop unrolling, parallelization, avoiding memory access, and last but not least, choosing the fastest algorithm. Performance also heavily depends on CPU model and generation. Good guide is [SoftwareOptimisation] by Agner Fog.
Performance is usually traded off with the program size, for instance many tricks mentioned above lead to slower execution. You may want to optimize only the critical parts of the code which are executed many times in your program.
EuroAssembler is not optimised for speed, nevertheless duration of assembly is usually not an issue. It mostly depends on the number of passes, which is governed by €ASM itself and not directly impactable by the programmer. At least two passes are always required. Number of passes increases when the program contains forward references, assembly-time loops, macroinstructions.
When €ASM is assembling forward-referenced jumps, at first it anticipates short distance to not-yet-defined target,
and reserves room for only 2 byte (short) opcode.
If we know at write time that the forward target will be further than 127 bytes,
it is recommended to explicitly specify DIST=NEAR
, which can save one pass at assembly time.
However the pass will be spared only when the distances of all such jumps are specified,
which is usually not worth the effort.
If you are interrested why €ASM performs this many passes, put the statement
%DISPLAY UnfixedSymbols
in front of ENDPROGRAM
to find out which symbols do oscillate between assembly passes.
The build time of big projects can be reduced significantly by splitting the code to smaller, separately assembled modules, which will be finally linked together. See also the euroasm.htm source itself.
EuroAssembler introduced some new comfortable features which are not usual among other assemblers:
INCLUDE "Win*.inc"
.INCLUDE1 file
instead of INCLUDE file
and €ASM will take care of that.EQU
or STRUC
.FLD ST0,[=Q 1.234]
.euroasm.exeand they can be easily tailored and optimized by modification of API macros.
Well commented and structured program is easy to read and maintain. EuroAssembler allows HTML formatting in comments, so the source code can be directly published on web sites and each part of source can be immediately documented with rich formated remarks, tables, images, hypertext links.
Size and language of identifiers is not limited, so they can be selfdescribing.
If English is not your mother tongue, it is a good idea to prefer labels with
non-English names, such as Drucken
rather than Print
,
файл
rather than file
etc.
This helps the reader of your program to distinguish
built-in reserved words from identifiers created by the author.
Elements of EuroAssembler language use decorators which help the human reader to distinguish the category of decorated identifier:
If you have read this manual hitherward and if you want to try EuroAssembler, download the latest version, print a hardcopy of a paper crib and look at the sample projects. Good luck!