EuroAssembler is written in EuroAssembler in a static pseudo object-oriented paradigma (OOP).
Programming object is represented by a collection of related memory variables
described by an assembler structure (object class) STRUC..ENDSTRUC
.
The data are manipulated with procedures – object methods. Objects are bound with their methods only by naming convention. Inheritance and cascading of methods is not utilized.
Each €ASM object and its methods are encapsulated in a separate source file
which is assembled to a COFF module file. All modules are then linked into the final executable
file euroasm.exe
(32bit console application for MS Windows).
Class | Representation | Module file |
---|---|---|
CHUNK | Chunk of assembled source text. | chunk.htm |
DOS stub program for PE executables. | coffstub.htm | |
CTX | Assembly block context. | ctx.htm |
DICT | Language dictionary. | dict.htm |
EA | EuroAssembler main object. | ea.htm |
EAOPT | EUROASM options. | eaopt.htm |
EuroAssembler linker script. | euroasm.htm | |
EXP | Expression evaluator. | exp.htm |
LST | Assembly listing. | lst.htm |
MAC | Macroinstruction handler. | mac.htm |
MEMBER | Member of structure or library. | member.htm |
MSG | EuroAssembler message. | msg.htm |
PASS | Assembly pass through the source. | pass.htm |
PGM | The assembled module. | pgm.htm |
PGMOPT | PROGRAM options. | pgmopt.htm |
PSEUDO | Pseudoinstruction handlers. | pseudo.htm |
RELOC | Relocation. | reloc.htm |
SRC | Input source file. | src.htm |
SSS | Structure/section/segment/group. | sss.htm |
STM | Statement. | stm.htm |
SYM | Symbol. | sym.htm |
SYS | System calls of MS Windows functions. | syswin.htm |
VAR | Preprocessing %variable. | var.htm |
Instruction category | Uses registers | Module file | |
---|---|---|---|
all | Machine instruction handlers support | - | ii.htm |
A | Vendor specific (AMD) | RAX..YMM15 | iia.htm |
B | Intel Fused Multiply-Add (FMA) | XMM0..ZMM31 | iib.htm |
C | Vendor-specific (CYRIX) | MM0..MM7 | iic.htm |
D | 3DNow! specific (AMD, D3NOW) | XMM0..XMM15 | iid.htm |
F | Floating-point (FPU) | ST0..ST7 | iif.htm |
G | General instructions | RAX..R15 | iig.htm |
K | Mask-registers manipulation (AVX512) | K0..K7 | iik.htm |
M | Multimedia (MMX) | MM0..MM7 | iim.htm |
P | Packed (SSE) | XMM0..XMM15 | iip.htm |
S | System special (SPEC, UNDOC, PROT, PRIV, MPX, SGX) | special | iis.htm |
T | Transactional & other extensions (TSX, RTM, VMX, SVM) | - | iit.htm |
V | Advanced Vector extension (AVX) | XMM0..YMM15 | iiv.htm |
X | XOP-encodable AMD | XMM0..YMM15 | iix.htm |
Y | Advanced Vector extension (AVX2) | XMM0..YMM15 | iiy.htm |
Z | Advanced Vector extension (AVX512) | XMM0..ZMM31 | iiz.htm |
Format | Platform | Module file |
---|---|---|
all | Linker for all €ASM format output files | pf.htm |
BIN | Binary output file | pfbin.htm |
BOOT | Boot sector file | pfboot.htm |
COFF | 16|32|64bit Common Object Format module | pfcoff.htm |
COM | 16bit DOS executable | pfcom.htm |
DLL | 32|64bit Dynamically Linked Library | pfdll.htm |
ELF | 32|64bit linkable module | pfelf.htm |
ELFSO | 32|64bit Linux shared object | pfelfso.htm |
ELFX | 32|64bit Linux executable | pfelfx.htm |
LIBCOF | Library of COFF modules | pflibcof.htm |
LIBOMF | Library of OMF modules | pflibomf.htm |
MZ | 16bit DOS executable | pfmz.htm |
OMF | 16|32bit Object Module Format | pfomf.htm |
PE | 32|64bit Windows Portable Executable | pfpe.htm |
RSRC | Compiled Windows resource (input only) | pfrsrc.htm |
Realm | OS | Support | Maclib file |
---|---|---|---|
EUROASM | any | Extensions of CPU machine instructions | cpuext.htm |
EUROASM | any | Extensions of CPU machine instructions | cpuext32.htm |
EUROASM | any | Memory management macros. | memory.htm |
EUROASM | any | Data sorting. | sort32.htm |
EUROASM | any | Boolean flag manipulation. | status32.htm |
EUROASM | any | StdCall 32bit calling-convention macros. | stdcal32.htm |
EUROASM | any | Operations with zero-terminated strings. | string32.htm |
EUROASM | Win | List of MS Windows API functions with ANSI+WIDE variants. | winansi.htm |
EUROASM | Win | Macros for core 32bit MS Windows functions. | winapi.htm |
PROGRAM | Win | Wrappers of 32bit MS Windows file functions. | winf32.htm |
Ea
as long as the process euroasm.exeis running.
euroasm.exeas a command-line parameter. It exists in one and only static instance named
Src
.
€ASM will open the file + its included file(s),
create an object Src, assemble, link, close the source, write the output and listing and
then destroy the object Src.
euroasm.exe, the object
Src
is reinitialized
and its assembly repeats.
PROGRAM..ENDPROGRAM
block
in the source file.
It is created at the start of each assembly pass when the pseudoinstruction
PROGRAM is assembled,
and it is destroyed in ENDPROGRAM handler.
Some information about mutual relation between objects (€ASM source procedures
and macros) is scatterred throughout the source files:
Main €ASM execution and termination
Source statement processing
Program processing
Linkage processing procedures
Machine instruction assembly
Macro expansion
Boolean data are implemented as 1 bit flag in object's DWORD member usually named .Status
.
Integer numbers with 64 bits (QWORD type) are usually accessible as two DWORD variables postfixed Low and High,
e.g. STM.OffsetLow
and STM.OffsetHigh
. When loaded into two 32bit registers,
such pair is referred in comments as colon-separated, e.g. EDX:EAX
.
Pointers are referred in comments as Ptr
or as a carret sign ^.
for instance ^Name
represents 32bit offset of the Name.
Reference to strings is implemented with several methods:
Pointer and size where the first register or variable keeps pointer to the first byte of the string,
and the second register or variable keeps the string size in bytes. They are referred in comments
as comma-separated pairs, e.g. ESI,ECX
.
Begin and end where the first register points to the first string byte,
and the second register points right behind the last string byte.
They are referred in comments as ellipsis-separated pairs, e.g. ESI..EDX
.
Size of such string can be computed by subtraction of two registers.
ASCIIZ termination where the string is referred with one and only pointer.
The string ends with NULL control character 0x00 (C string). This convention is mostly used when
the string specifies file name.
Size prefixed strings have their size encoded in their first byte.
Size of such string cannot exceed 255 bytes. This Pascal convention is employed in some older file formats (OMF).
Project of such magnitude requires strict discipline in choosing symbol names.
They always begin with abbreviated object identification (object shortcut, for instance
Stm
is shortcut of the statement object), so it is easy to tell
the class where a method or symbol belongs to, ergo in which source file it is defined.
The character case indicates what kind of data the identifier represents:
Class/structure names are all in uppercase (C convention), for instance STM
.
Boolean flag names begin with lowercase shortcut (camel convention), for instance
stmPrefixPresent
.
Procedures and methods have the first letter of object shortcut capitalized (lochness convention),
for instance StmParse
.
Local labels in procedures usually do not have mnemonic names. Monotonous numeric sequence
is used instead, e.g. .10:, .20:, .30:
(Basic convention).
The verb, which follows object shortcut in method name, indicates the function of the method.
Object constructors & destructors are named Create & Destroy, e.g. StmCreate, StmDestroy
.
Boolean flags (max.32 per class) are kept in an object DWORD variable named .Status
and manipulated
with macros SetSt, RstSt, JSt, JNSt
from library
status32.htm.
Zero-terminated (ASCIIZ) strings and macros which operate with such strings have their name terminated with the dollar character $, see macrolibrary string32.htm.
Objects which have the property name, such as symbol, %variable, structure, program etc.,
usually keep their name in their first two DWORD members: pointer to the object name
.NamePtr
and size of the name .NameSize
.
All case-insensitive names (registers, prefixes, machine instructions, pseudoinstructions, keywords etc.) are written in upper case here in EuroAssembler sources. Names of variables, procedures, macroinstructions are in mixed case.
If a special character is part of the €ASM term and should be embedded in an identifier or in HTML anchor, it is replaced with two lowercase letters:
Char | Replacement | Char | Replacement |
---|---|---|---|
% | pc | & | am |
$ | do | : | co |
# | ha | * | as |
= | eq | . | pt |
For instance the handler of pseudoinstruction %SHIFT has the label PseudopcSHIFT
and URL pseudo.htm#PseudopcSHIFT.
EuroAsembler uses three kinds of subprogrammes:
PROC..ENDPROC
,Ad 1.: Beside ad-hoc macroinstructions defined in the same source which uses it (for instance macros in ii.htm), €ASM hires some generally usable macros from libraries shipped with EuroAssembler.
Ad 2.: PROC .. ENDPROC blocks are used only sporadically as local subroutines in large procedures. They are called with register calling convention, using input/output registers described in their header.
Ad 3.: €ASM extensively employs subprograms defined with macroinstructions
Procedure, LocalVar, EndProcedure, Invoke
.
Those four macros encapsulate their StdCall calling convention,
register preservation, local stack variables reservation, maintenance of stack frame and the final return.
Another advantage is that they can be invoked with an arbitrary number of arguments. Nevertheless,
the number of arguments provided by Invoke must exactly match the Procedure declaration.
Where a variable number of arguments was required, the subprogram was implemented as a macro
(see Msg as an example).
Procedures used in €ASM preserve all registers except those which return the result. Usually it is EAX but the result is sometimes returned in other register(s), too. For instance the macro BufferRetrieve returns the contents of the buffer as a string in registers ESI,ECX.
Arithmetic CPU flags are not preserved by subprogrammes. The exceptions are macros Msg MsgUnexpected, which preserve all CPU flags and registers. Nonetheless, many procedures use Carry flag to signalize error, or Zero flag to signalize emptiness.
All procedures expect clear Direction flag on input, and they return it reset on output (DF=0).
Calling convention of operating system functions is hidden in system macros in the file sys*.htm.
Configuration files euroasm.ini
are loaded to memory, assembled and immediately released.
Input files (the actually assembled source file and its included files) are mapped to memory and kept open with sharing access allow read, deny write until the assembler/linker ends and output file is completed in a memory stream.
Actually assembled source files may be kept open by the text editor in which they are being written, but you won't be able to save them until the assembly terminates.
Output files (the target object|executable file and listing) are compiled in memory. When they are complete, input files are closed and only then is the output compilation flushed at once to an output disk file.
This method allows to create output listing with the same name as input source, overwriting the source with its listing (or even with the assembled output file)..
Requests for service from the operating system are encapsulated in macroinstructions
gathered in easource/sys???.htm
. In Windows version of EuroAssembler
it is the source file easource/syswin.htm which imports the
following API services from system library kernel32.dll
:
See syswin.htm for their description.
Encapsulation of OS calls by Sys* macroinstructions facilitates future porting of EuroAssembler from MS Windows to other operating systems.
Unique objects Ea, Src, dictionary of enumerated tokens used by EuroAssembler language, text of €ASM messages, literal strings and some ad hoc local tables are allocated statically, in [.data] or [.bss] segments.
All other €ASM objects are allocated dynamically at run time,
either on machine stack, or in the memory provided on request from the operating system.
Recursively invokable procedures protect themselves from stack overflow with macro
EaStackCheck.
EuroAssembler does not use the system heap. It allocates dynamic memory in portions called
pool, implemented as a linked list of pool blocks with typical size 64 KB (or larger, if requested so).
In MS Windows it is provided by API functions VirtualAlloc(), VirtualFree()
.
Memory once allocated from the pool is not returned to OS at the moment when the object is discarded,
there is no garbage collection.
Instead, the pool memory is returned as a whole to the operating system when the pool's owner is destroyed.
There are four classes in €ASM which maintain their own pools: EA, SRC, PGM, PASS.
Object methods choose the appropriate pool depending on the lifetime of each stored object,
see also DOM.
Although the memory can be allocated from the pool directly (using macros PoolNew or PoolStore), the pool serves mainly as a container for more sofisticated access methods:
STACK keeps a table of objects of the same size.
It has nothing common with CPU stack SS:ESP except its name and access method LIFO (Last In = First Out).
STACK is used by €ASM to reflect the structure of nested block objects CTX,
CHUNK_HEAD,
EAOPT.
LIST keeps the bidirectionally linked list of objects of the same size,
which are not kept together in memory. Listed objects can be accessed only sequentionally,
either forward (FIFO | LILO) or backward (FILO | LIFO).
This method is used to store €ASM objects whose number cannot be reliably estimated at the beginning,
such as symbols, %variables, macros, sections.
STREAM is a write-only memory class
which stores unformated data string sequentionally to a collection of memory blocks.
StreamStore access method is similar to FileWrite. When the stream is completed, it may be flushed
to a disk file at once with macro StreamDump.
This method is used in €ASM when output files are formated.
BUFFER is used to store data items (strings) of variable size. Unlike the stream or list method, all data in buffer are stored continuously. If the estimated initial buffer size specified on BufferCreate was underestimated and is exhausted, the buffer silently allocates from its pool another block of memory with doubled size, and copies the whole previous contents to the new location. Thus the entire buffer contents returned by BufferRetrieve is always continuous.
Buffers are extensively used by €ASM. Leaving their contents abandoned, until
the termination of parent PASS or PGM, would have negative impact on total memory consumption.
Therefore buffers can also be borrowed from the stack of preallocated buffers
Ea.BufferStack
by the invocation of
EaBufferReserve and returned with
EaBufferRelease, not wasting the once allocated memory.
When the requested dynamic memory size exceeds usual values, for which are buffers preallocated, the buffer or stack will automatically request additional memory portion from its pool, and if there is no more free memory on the pool, its manager requests another block from the operating system. Thus dynamic memory management works transparently for the programmer and it is limited only by the amount of OS virtual memory, no matter how big identifiers, expressions, nesting level etc. may occur in the assembled source.
Executable file euroasm.exe
can be recompiled in easource
subdirectory
with the command ..\euroasm euroasm.htm
, assuming that the stable
euroasm.exe
version was not moved yet from EuroAssembler home directory
somewhere to system %PATH%.
Target euroasm.exe
is created in subdirectory easource
.
EuroAssembler can also be built from browser at the page generate.php
.
On my Intel Pentium machine running at 3 GHz with 8 GB RAM the complete rebuild reports: ... I0660 32bit FLAT PE file "euroasm.exe" created, size=3444378. "euroasm.htm"{373} I0650 Program "euroasm" assembled in 2 passes with errorlevel 0. "euroasm.htm"{373} I0750 Source "euroasm" (787201 lines) assembled in 4 passes with errorlevel 0. I0860 Listing file "euroasm.htm.lst" created, size=18537503. I0980 Memory allocation 148160 KB. 25666337 statements assembled in 791 s. I0990 EuroAssembler terminated with errorlevel 0.
Hints and technique how to extend EuroAssembler source are scatterred throughout the source files:
Add a new EUROASM option
Add a new PROGRAM option
Add a new operator
Add a new machine instruction
Add a new output format
Porting EuroAssembler to other OS
If you want to modify EuroAssembler, I recommend to follow these steps:
- Copy the latest downloaded stable
euroasm.exeboth to €ASM home and to its source subdirectory (easource\euroasm.exe).- Using
testman.phporgenerate.phpassure that all tests ofeasource\euroasm.exepass without error.- Using
generate.php#Buildassure that all modules can be rebuilt without errors. Or change toeasource\, delete old modules withdel *.obj
and then rebuild all with..\euroasm.exe euroasm.htm
. The build should terminate with errorlevel 0.- Modify the EuroAssembler sources (
easource\*.htm) with your enhancements.- Rebuild the modified sources with downloaded stable version from €ASM home (repeat step 3).
- Perform all tests with the new
easource\euroasm.exe(repeat step 2).- If all passed, you can copy the modified
easource\euroasm.exeto %PATH% and use it on your computer.See also Licence for information concerning the modification of EuroAssembler sources.
euroasm.exe
€ASM assembler and linker is not optimised for speed, it has plenty of issues which could make it run faster:
LOOP Target
although it is usually slower then
SUB ECX,1 ; JNZ Target
.My goal was an application which gets along with processor 486 and any 32bit version of MS Windows. Order of optimalisation criteria was:
- Maintainability and extensibility,
- readability and understandability,
- debugability and stability (proper treatment of errors),
- size of the code,
- speed of assembly,
- economical usage of memory.
Computer users should never trust executable files downloaded from Internet. It is a good practice to download the project in the form of source files and recompile it on your own PC with a compiler which you trust.
In case of self-compiled program it is complicated, because you don't have a trusted compiler yet. Suggested sandbox solution for paranoid EuroAssembler users follows:
euroasm.zip, compute its hash (
md5 euroasm.zip
)
and compare obtained value with the hash published on the distribution site. Hash of each €ASM release
is also published on discussion forum,
on Twitter account @EuroAsm,
or, in reasoned case, you can try to ask the author for confirmation.euroasm.exeis trustworthy.
easource
and rebuild the source with the downloaded executable.
Be sure to provide the same forged timestamp which was used when the original source was released, e. g.
..\euroasm.exe euroasm.htm, timestamp=1512345678
. Otherwise the compiled file
couldn't be binary-identical with the downloaded version.easource\euroasm.htm.lstwith the code in COFF objects or in the PE itself.
fc ..\euroasm.exe euroasm.exe
.
When both files are identical, the assumption made in step 3 is true and euroasm.exewas succesfully audited.