Maybe you're interested in learning to program in assembler, but don't know where to start. You've heard that this language is used to write bootsectors, device drivers, compilers, operating systems, so you try to write something like that, and then you flood the technical forums with questions, admitting that you're new to assembler and that something doesn't work. Before embarking on such specialized tasks, it is necessary to get a basic orientation and practice in the use of tools such as assembler, disassembler, debugger or analyser, etc. This is best done by using them often, first with simple examples like "Hello world", calculator, drawing ASCII graphics, and later on less trivial applications. You don't have to worry that typing in assembler is slower than in other languages; just typing on the keyboard you spend many times less time than thinking about the logic of the program, and this is true in all languages.
Suppose you have seen some assembly instructions somewhere that you'd like to try, but now the main problem occurs when working with a new programming language: where to write these instructions and how to make your computer execute them. Here you'll know how.
This tutorial is intended for those interested in programming in assembly language for x64_32 (Intel, AMD) personal computers. We will write programs for MS Windows, Linux, DOS.
Let's assume you have at least basic knowledge of English and you can operate a computer, install and run programs from the command line, you can edit files with plain text editor (nano, joe, notepad, etc.), you know hexadecimal notation and can perform basic operations with hexadecimal numbers, for example you can calculate examples of the type
by heart, on paper or at least with the help of a programmer calculator.
From application programmer's point of view, a computer consists of
The smallest unit of information is one bit, whose physical realization in the CPU can be imagined
as a flip-flop circuit with two stable states that can be flipped to provide a voltage at the output,
to which we have assigned a value of either logic 1 or logic 0 (and nothing in between).
The flap circuit remembers its state and can change it on demand (write or read).
A concatenation of multiple flip-flop circuits so that their bits can be written/read at the same time
is called a register.
A register can be 8-bit, 16-bit, 32-bit, etc. up to 512-bit.
Writing the contents of the register using zeros and ones would be cluttered, instead, the contents
of the register are usually written using two hexadecimal digits for each of its 8-bit bytes.
The registers are located on the CPU chip. Unlike memory, registers are very fast, but their number and size are limited. We will be working mainly with General Purpose Registers (GPR). The registers are accessed by the programmer using their name (not the address), and for most GPRs, subsets of registers can be named in addition to the entire register. The lower half of the 64-bit RAX register is called EAX, the lower quarter is AX, and the lower eighth is AL. For a complete list of registers see the manual. In some places we use the names rAX, rBX, etc. The lower case letter r here indicates that rAX can represent RAX, EAX or AX registers, depending on the current processor mode.
Similarly to RAX, the other general registers RBX, RCX, RDX, RBP, RSI, RDI, RSP, R8..R15 can also be divided.
The general registers are specialized or fixed in some instructions, which partly corresponds to the mnemonic of their name.
Other registers, such as ST0..ST7 (math coprosessor), MM0..MM7 (multimedia), YMM0..YMM15 (SIMD registers) are ortogonal, i. e. not specialized and interchangeable with one-another.
In addition to registers, there are flip-flop circuits on the CPU chip called flags that are set automatically after certain operations, especially arithmetic. We will be mainly interested in the following:
Other flags (Parity, Auxilliary, Trap, Interrupt) are not normally used in common applications. The flags can be viewed as separate flip-flop circuits with a memory of one bit. For the sole purpose of writing them to the stack, they are grouped into a virtual register which can be manipulated by instructions PUSHF, POPF and thus restore all flags at once.
The registers can also include the rIP instruction pointer, which points to the address of the next instruction during the execution of any instruction, except for transfer instructions where it will be replaced by the destination address to jump to.
The Carry Flag is exceptional in that we can set it to 1 with
STC, set it to 0 with
CLC or change its value to the opposite with
CMC.
Similarly, we can also set and reset the Direction Flag using
STD and
CLD,
and Interrupt Flag using
STI and
CLI.
Other flags cannot be explicitly changed in this way but Zero Flag can be set to
1 by zeroing any register with SUB reg, reg
.
CPU is connected to memory by a data and address bus (a set of wires). Whenever the CPU needs to read or write something, it sets the address on the address bus and reads or writes the written data on the data bus.
Reading and writing to the device works in a similar way. Devices include a keyboard, monitor, mouse, network card, and other similar peripherals. Unlike memory, the data combinations used to select them are not called address, but port, e.g. a keyboard has a fixed port of 64h, a printer has a port of 378h, etc. For an overview of personal computer ports, see v TechHelp.
In general, from the point of view of an assembler programmer, it can be said that
This manipulation can be an arithmetic or logical operation, changing bits, setting to some value, etc. The steps to be performed by the processor are determined by machine instructions. These have a variable length of 1 to 15 bytes and are stored in the operating memory, one after the other. The CPU fetches and executes them sequentially from memory.
Each instruction has a mnemonic abbreviation (specified by the CPU manufacturer) followed by operands that specify where the information is to be written from and to. The job of a program called an assembler is to convert the mnemonic abbreviations and operands into hexadecimal code for the machine instructions and store them in a file so that they are executable by the operating system.
A typical instruction has two operands – input and output –
and in intel syntax they are written in the order of instruction output, input
.
For example, the ADD EAX,ECX
instructs the processor to add the contents
of the input register ECX to the contents of the EAX (output) register.
The contents of the ECX register remain unchanged. The contents of the general registers
are treated as a fixed-point integer.
The example above works with the 32-bit wide EAX and ECX registers, but we would apply the same
to add registers with widths of 8, 16, 32, or 64 bits, for example, ADD AH,CL
,
ADD AH,CH
, ADD RAX,RCX
.
Fractional registers that do not have a name, such as the upper half of EAX, the third eighth of RCX, etc.,
cannot be directly added in this way, but we could use rotation instructions to temporarily
move the contents of the desired fractional register to the named part, perform the addition
using ADD AL,CL
, and then, if necessary, reverse the rotation to return
the fractional register back to position.
In addition to the Intel syntax, there is also a syntax developed by AT&T, in which the input and output are swapped. However, we will not cover that here, since almost all assemblers and processor manufacturers prefer the Intel syntax.
When viewing assembly tutorials, we can notice many inconsistencies in the data description:
The contents of the ECX register in the ADD EAX,ECX
example above
are written hexadecimal as 56 78 9A BC
and thus begin with its most significant byte 56
.
This is seemingly inconsistent with storing a 32-bit word in memory starting with the least significant byte,
but we think of a word in a register differently than a word in memory.
If we were to store the ECX register in memory at, say, address 0
(which would be done with the MOV [0],ECX
) and then display the memory contents with a debugger
or similar tool, we would see
This shows that it depends on whether we view the contents of memory space as a multi-byte word or as a series of bytes.
Personal computers nowadays run almost exclusively in protected mode. In this mode, the operating system protects itself primarily to prevent the user from inadvertently or deliberately disturbing the memory of the system or of other users who might be working on the computer at the same time. Access is denied if the user attempts to read or write to memory that has not been allocated to the user. Similarly, access to input/output ports is restricted. Direct write and read instructions with IN and OUT are privileged in protected mode and their use is reserved by the operating system.
In the computer prehistoric times, personal computers (then only 16-bit) ran in the real mode where the user had the entire operating memory and all ports to himself. On today's PCs, this mode is only available through an emulator such as DOSBox, which is somewhat inconvenient compared to the native Linux or Windows environment. Yet, oddly enough, 16-bit mode is preferred in assembler courses, perhaps out of the (mistaken) belief that 16-bit mode is easier for beginners than 32-bit or 64-bit mode.
If the computer is switched to 16-bit real mode, either as part of emulation
or by booting the computer into DOS, we will only have the
16-bit registers AX, BX, CX, DX, BP, SP, SI, DI available, and we must consider
the segment registers CS, DS, SS, ES when addressing memory.
We can address memory either by writing the address directly, e.g. MOV AX,[1234h]
,
or by loading the address into a register first and then using that register for addressing:
In real mode, at most one base register BX or BP and at most one index register SI or DI
can be used for addressing, e. g. MOV AX,[BX+SI+1234h]
.
When using BP, SS will be used as the default segment register (unless explicitly specified otherwise),
otherwise the default data register DS will be used.
The address, calculated as the sum of the contents of the BX, SI, and direct value 1234h registers, is added to the contents of the 16-bit segment register multiplied by 16 before this linear address is used to access memory.
In protected 32-bit mode memory addressing is much simpler.
We have 32-bit registers EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI, and we can use
any combination of up to two of these registers to address memory, for example,
MOV EAX,[ECX+EDX+12346789h]
.
If ESP or EBP is used in addressing, the segment register becomes the SS register
instead of the default DS register. However, this usually doesn't matter because
DS, SS and ES contain the same address in protected mode and we don't have to bother
with segment registers at all. So it is obvious that 32-bit mode is much easier
for the programmer than 16-bit mode.
Similar addressing rules are valid in 64-bit mode, moreover we can use besides RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI other general registers R8..R15. Segment registers CS, DS, SS, ES are not used in 64-bit mode at all.
Data type determines how to view a data item – whether to treat it as an integer value, a floating-point number, a text string, a bitmap, or some other structure. See the manual. Unlike in higher programming languages, data types are not controlled by the assembler. The basic data type is determined by the width of the data item (e.g. 8, 16, 32, 64 bits), but nothing prevents us from interpreting, for example, a 64-bit floating-point number as a binary number or as a string of characters (and getting the wrong result, of course). We can display the contents of an item (memory location) as text, convert it to a number in the format expected by humans, play it as a sound, display it as a photo, whatever.
What can we use in the format mnemonic output, input as a output, input? There are four possibilities.
Instructions are called machine instructions, because they are built into the machine (processor). In assembler, we can still see pseudoinstructions that look similar, but they are directives for the assembler, not for the processor.
The latest processors of the x86-64 architecture can have over two thousand different machine instructions. The good news is that we only need a few dozen for normal programming, which we will now describe.
The best place to look for authoritative descriptions of each instruction is on the CPU manufacturer's web site, such as Intel. However, these are usually large PDFs that are difficult to work with and reference, so you'll probably prefer to look at their form converted to web-based HTML format.
Probably the most versatile instruction is MOV, short for MOVE. Which is a rather unfortunate choice of name; COPY would be better, as it is about copying data from one place to another, not moving it. The information in the input registry or memory location is preserved. In fact, there is no way to remove information from the registry or memory, there is always some information left, at least all zeros.
In addition to copying between 8,16,32,64-bit registers and memory locations, the same mnemonic MOV is also used to move between general (GPR) and other registers such as segment, control, debug, MMX or SIMD.
We use MOV from memory to register to load the contents of a memory location (byte, word, doubleword)
into a register, which is expressed by enclosing the memory location in square brackets:
MOV ECX,[1234h]
, MOV ECX,[EBX]
etc.
Omitting the parentheses would load the address of the memory location instead of the contents stored in memory.
Thus, MOV ECX,1234h
would fill the ECX register with the number 1234h.
MOV ECX,EBX
would just copy the contents of the EBX register into the ECX register.
Not realizing the difference between the address (offset) of a memory location and its contents
is a common source of error.
Another very commonly used instruction is the LEA.
While the MOV registr, [memory]
fetches the contents of memory and the
MOV registr, memory
fetches the address of memory,
LEA registr, [memory]
fetches the address even though the second operand
is always in square brackets. The LEA instruction is also one byte longer than the MOV, so why use it?
For instance when we are interested in calculating the address and don't want to know its contents
(which may not even exist in memory). In 32-bit and 64-bit mode, we can use memory addressing
with two general registers (one is base register and the other is index register),
and the index register can be scaled, i.e. internally multiplied by two, four or eight.
This allows us to use address arithmetic for simpler calculations:
LEA EDX,[ESI+ECX]
fills EDX with the sum of the contents of the ESI and ECX registers.
LEA EAX,[8*EAX]
fills EAX with eight times the original contents of EAX.
LEA EBX,[EDX+4*EDX]
fills EBX with five times the contents of EDX.
Another area of application of LEA is address fetching in 64-bit mode, where, unlike MOV, it uses relative addressing frame instead of absolute addressing, allowing memory to be addressed within plus or minus 2 GB of the LEA instruction.
Data copy instructions also include XCHG, which is the mutual exchange of information between two registers or between a register and a memory location.
It is good to remember that MOV, LEA, XCHG leave the flags unchanged.
With a single MOV instruction, we can load 1 to 8 bytes from memory into a register,
or up to 64 bytes using VMOV*. The processor groups memory accesses, so when we load or write only one byte,
e.g. if we use MOV AL,[RBX]
, MOV [RBX],AL
,
the CPU actually loads, say, 16 contiguous bytes at a time, and then selects the one desired
byte and leaves the other bytes in their original state.
The smallest granularity of a memory access is one byte.
What if we only need to change part of a byte, say just one bit? Then the programmer has to take care of that: read the whole byte, change only the desired bit (and leave the others as they are), and then write the byte back to memory. To manipulate individual bits, we can use the specialized instructions BTS (set bit to 1), BTR (reset bit to 0) or BTC (change the bit to opposite). Or we can use bitwise logic operations OR to set bits, AND to reset bits, or XOR to flip bits to the opposite state.
Arithmetic instructions compute integers in binary. The contents of an eight-bit register
can range from 00h to FFh, or in decimal notation from 0 to 255. We call such interpretation
unsigned numbers. In addition to the 8-bit register, we can also store integers
in 16, 32, or 64-bit registers. If we add, say, 9Ah=154 to an 8-bit register containing 78h=120,
with instruction ADD AL,BL
, the result is 112h=274.
This is 12h more than the capacity of the 8-bit register (FFh), so only 12h is stored
in the AL register and the processor sets the Carry Flag to 1 to indicate an overflow.
This 1 overflowed from AL does not go to the higher register AH as it would in a 16-bit
addition with ADD AX,BX
.
However, it can be added to the next addition if it were done with the
ADC, (Add with Carry) instead of the plain
ADD.
With ADC an extra 1 is added to the result if the Carry Flag has been set:
ADC AH,0
.
In practice, this is used to add and subtract numbers longer than the GPR capacity.
The following example illustrates the addition of two large numbers in 16-bit registers when 32-bit registers were not available, as they were in DOS. For example, let us add two 32-bit numbers 89ABCDEFh and 55556666h in 16-bit mode. We will split the numbers into register pairs DX:AX and BX:CX. The colon between the register names here represents the concatenation of two 16-bit registers into one virtual 32-bit register.
Thus, the result in the DX:AX register pair is DF013455h.
In addition to ADD and ADC, other arithmetic operations include SUB and SBB (Subtrack with Borrow). SBB differs from simple subtraction (SUB) in that it subtracts an extra 1 when the Carry Flag is set.
CMP is similar instruction to SUB but it does not subtract anything (register contents are not changed), it only sets flags according to the result of the hypothetical subtraction.
The logical OR, AND and XOR instructions perform the same logical operations with operands of width 8, 16, 32, 64 bits each, i.e. the zero bit of the output operand with the zero bit of the input operand, the first bit with the first bit, the second bit with the second bit, etc.
The integers in the 8-bit register 0..255 were considered unsigned. But this is not the only possible interpretation; we can reserve the most significant bit for a sign and thus treat the number as signed. Then the values 01h..7Fh will correspond to the positive numbers 1..127 and the values FFh..80h to the negative ones -1..-128. Zero remains zero. So the numeric range has changed to -128..+127 for the 8-bit register, and of course it will be much larger for wider registers. The beauty of binary arithmetic is that signed and unsigned numbers add and subtract in the same way, using the same ADD and SUB instructions. It doesn't matter to these instructions whether we've presented them with signed or unsigned numbers; we can interpret the result of the arithmetic operation either way.
If we are operating on signed numbers, an overflow (going out of the allowed numeric range) is indicated by an Overflow Flag instead of a Carry Flag.
The NEG instruction
converts a positive binary number to its negative value and vice versa.
It does this by changing all bits to the opposite and adding one to the result.
In an 8-bit register, the NEG AL
instruction changes the value of AL
from 02h to FEh, from 01h to FFh, from 00h to 00h, from FFh to 01h, etc.
A similar instruction is NOT, which differs from NEG in that it does not add any one to the inverted bits, so it is more suited to logical operations.
Useful arithmetic operations are INC and DEC, which increment and decrement the contents of a register or memory location by one. With these two instructions, we must remember that they change arithmetic flags except for CF.
The arithmetic instructions include multiplication and division. However, they do not treat positive and negative numbers the same. If we want to multiply or divide signed binary numbers, we must either convert them to positive (using NEG) and then eventually convert the calculated value back to negative, or instead of their unsigned variants MUL and DIV use signed multiplication and division IMUL a IDIV. For multiplication and division, it is not true that we can use either register. The result of multiplying two 64-bit numbers may require up to 128 bits to store the result, so the fixed register pair rDX:rAX is used to store the product. For 32-bit multiplication, the result is stored in the pair EDX:EAX, for 16-bit multiplication in DX:AX, only for 8-bit multiplication there is an exception and the result of multiplying AL by the input 8-bit value goes into the AX register (DX remains unchanged). Overflow can no longer occur in principle, but setting the Carry Flag and Overflow Flag to ones simultaneously indicates that the result is large and has overflowed into the upper of the pair of output registers (DX, EDX or RDX).
For integer division, the reverse procedure is used: the divisor is placed
in the rDX:rAX register pair; the divisor can be another register or memory location
of the appropriate width. However, overflow may occur here if the divisor is smaller
than the number in the upper half of the input register pair (DX, EDX,RDX).
The result would not fit in the lower half (AX, EAX, RAX) and would therefore
not be defined (this is called division by zero).
The x86 architecture doesn't know what number it should store in the output register
in this case and thus raises a program exception (interrupt),
which can cause our program to crash. Therefore, before division,
the upper half of the input register pair must be reset (in the case of DIV) or,
conversely, set to all ones when dividing negative numbers using IDIV.
This is best served by zeroing rDX using
SUB rDX,rDX
for unsigned division, and using the short instruction
CWD, CDQ or CQO
before signed division.
A stack is a contiguous area reserved from the total amount of operational memory and declared as a stack. The general register rSP (stack pointer) is used for addressing in the stack. The stack is most often used to temporarily store and then restore the contents of the general registers with the PUSH and POP instructions. When a program is loaded into memory, the operating system makes sure to reserve enough memory for the stack and sets its ESP or RSP pointer to its beginning, which is not the lowest address, but rather the highest. The addresses gradually decrease when stored on the stack using PUSH and, conversely, increase when removed from the stack using POP.
The subject of a PUSH can be general registers of 16, 32 or 64 bits wide, or memory variables of the same width, as well as segment registers, and a direct numeric value which will be sign-extended by the processor to the width of the operand. The processor first decrements the stack pointer rSP by 2, 4 or 8 bytes and stores the operand in the resulting space. The register rSP thus addresses the currently stored item.
The POP operation works in reverse: it moves the contents of the 2, 4, or 8 bytes addressed by the rSP register to the operand and then increments the rSP by 2, 4, or 8 bytes.
The use of the rSP register to address the stack is implicit; only the input or output operand is specified in the PUSH and POP instructions. Some assemblers allow more than one operand to be written to a single PUSH/POP instruction, but this is implemented internally as a series of separate PUSH or POP instructions. Writing multiple operands is mainly used to save source program lines.
Saving and restoring from the stack is done in principle by the LIFO method, i.e. Last In, First Out. The register saved to the stack last by PUSH is then restored first by the subsequent POP instruction; thus, we must restore them in the reverse order of saving. Example:
In 16- and 32-bit mode, instead of saving and then restoring more registers, we can use the PUSHA and POPA instructions, which save and restore all 8 GPRs at once in the order eAX, eCX, eDX, eBX, eSP, eBP, eSI, eDI. While saving all eight registers is often unnecessary and slower, it will save code size because both PUSHA and POPA are encoded in mere one byte. In 64-bit mode, PUSHA is not available and we have to store the registers individually.
If you remember from your programming language lessons about the prohibition of jumps and the harmfulness of the GOTO statement, you can forget about it in assembler. All program constructs such as IF, ELSE, WHILE, SWITCH, REPEAT UNTIL, etc. are executed here using conditional jumps, where a condition is first evaluated, e.g. by the CMP or TEST instructions, and then a jump is made (or not made) to another place in the code using the Jcc instruction. This jump has a number of variants differing by the condition cc in the instruction's mnemonic name. For instance, JA (Jump if Above) first examines whether CF=0 and ZF=0 are simultaneously true and jumps to the target address (label) only if both conditions are met, otherwise it ignores the instruction and continues with the one below it.
The terms Above, Below are used if we compared unsigned numbers, such as two addresses using CMP. The term Greater, Less are used on the other hand after comparing signed integers.
We don't need to mind the differences between short and near jumps, it is the assembler's concern to use the correct one.
The conditional jumps include the LOOPcc instruction, which first decrements the contents of the rCX register by 1, and if rCX is non-zero, it jumps to the label specified in the instruction operand, otherwise it continues under the LOOP instruction. If rCX was already zero before the LOOP instruction, it will first be decremented to CX=65535 or even ECX=4294967295, which is non-zero, and therefore the loop will repeat just this many times. Which we probably didn't want, so the rCX register is tested with JCXZ or JECXZ before the LOOP instruction, and the loop is skipped if it is zero:
In addition to conditional jumps, we can also jump to another location in the program unconditionally, i. e., each time a JMP appears in the instruction stream. This instruction replaces the rIP register (which normally contains the address of the next instruction) with the address being jumped to.
Related to unconditional jumps are a pair of
CALL and
RET instructions.
Like JMP, the CALL instruction replaces the rIP with the jump address, but in addition,
it stores the contents of the rIP on the stack beforehand, much as we would hypothetically
perform a non-existent PUSH rIP
operation.
The RET instruction performs a hypothetical POP rIP
operation,
which is equivalent to jumping to the return address that was stored on the stack by the CALL instruction.
Stack instructions allow you to divide the flow of instructions into shorter subroutines (procedures) to structure the program in a clean way. We can treat each subroutine or program macro as a black box, document its input, output, and function, and then forget about the details of its implementation.
The CALL machine instruction is similar to
INT which causes
software interrupt.
The operand is a number 0..255, which in real mode specifies the sequence number of the double word
in the interrupt table (IDT) indicating the address of the routine handling the interrupt.
For example, on INT 21h
the CPU looks at the address 21h*4 and
loads two 16-bit words from that address into the IP and CS registers. At this address
there should be a subroutine performing the function expected from INT 21h; this is then terminated by an
IRET.
The difference between CALL/RET and INT/IRET instructions is that INT additionally stores
flags there before storing the return address on the stack, and then IRET restores them back.
The following eight shift and rotatiton instructions allow the contents of an 8, 16, 32, or 64-bit register or memory location to be manipulated. The number of shifts is specified in the second opcode as an immediate number or as the contents of a fixed register CL. The contents are shifted or rotated by this number of bits either to the left, i. e., from the least significant bit LSb to the most significant bit MSb, or to the right, i. e., from MSb to LSb.
For rotate via carry (RCL, RCR) instructions, the Carry Flag bit is added to the register bits as the ninth (or 17th, 33rd, 65th) bit.
The arithmetic instructions SAL, SAR are used to quickly multiply and divide signed numbers
by powers of two, e. g. SAL AX,4
is equivalent to multiplying the contents of AX
by sixteen (24), SAR EBX,3
divides the contents of EBX by eight, etc.
Therefore for arithmetic right shift (SAR) the highest signed bit of MSb copies its original value at each step.
Logical shifts (SHL, SHR) are useful for logical operations. The SHL and SAL instructions behave identically. Two examples:
As we know, instructions can move data between registers and between registers and memory, but not from one memory location to another. This is not quite true, there is an instruction MOVS that does this at the cost of bypassing the standard way of encoding the address (ModRM+SIB). Instead, it expects the input address to be stored in register rSI and the output address in rDI. The amount of data transferred depends on the instruction extension, i.e. MOVSB, MOVSW, MOVSD, MOVSQ to transfer a single byte, a 16-bit word (WORD), a 32-bit word (DWORD) or a 64-bit word (QWORD). The number of words transferred by a single instruction can be larger if the repeat prefix REP is used before the instruction. It specifies that the number of times a single element is transferred should be repeated as many times as the contents of rCX, and after each transfer, rCX is decremented by 1 and the addresses in the rSI and rDI registers are changed by the size of the word being transferred, i.e., by 1, 2, 4, or 8 bytes. Whether the rSI and rDI addresses are incremented or decremented by the size of the word depends on the Direction Flag.
Two examples of using MOVS in 32-bit mode where ESI, EDI are used as index registers rSI, rDI:
The REP MOVSB
instruction in the previous example first transferred
ECX=3 bytes from the address at ESI=02h to the address at EDI=07h. In the continuation of the example
we changed the direction of transfer using STD to the left, changed the width of the element from
one byte (MOVSB) to one 16-bit word (MOVSW), and prescribed the transfer of only ECX=2 of these elements
from the address ESI=05h to the address EDI=0Ah. The change in the contents of the two addressing registers
rSI, rDI occurs after the transfer of one element. If the REP prefix is not specified before
the MOVS instruction, the transfer of just one element is performed (and both registers are incremented
or decremented by the size of the element), otherwise the contents of rCX are examined and
until they are non-zero, the transfer of the element including the change of rSI, rDI is repeated
and rCX is then decremented by one. The contents of rCX in REP MOVS therefore determine
the number of elements transferred. If rCX=0, REP MOVS
is not executed even once
and the contents of registers rCX, rSI, rDI are not changed.
A more interesting situation occurs when the transferred fields partially overlap. In the following example, the source string (address ESI=02h, length ECX=5) overlaps with the destination string (address EDI=07h, length ECX=5). In each of the five copy steps, one element (in this case a byte) is first read and transferred, even if this input element was transferred a moment ago. If ESI<EDI and DF=0, or if ESI>EDI and DF=1, the string is not copied, but only the portion between the starting addresses of the two registers is propagated along the length of the output string. In order for the source string to be moved forward or backward, ESI<EDI and DF=1, or ESI>EDI and DF=0, would have to be true.
Another useful string instruction is STOS, which stores the contents of the AL, AX, EAX or RAX register into memory at the address stored in the rDI register, and when stored, increments or decrements the contents of rDI by the size of the element, depending on the Direction Flag. Also, the REP prefix is often used before this instruction, allowing large sections of RAM to be reset.
The opposite of STOS is LODS, which loads memory from the address given by the ESI register into AL, AX, EAX or RAX, and increments or decrements rSI after the load. It makes no sense to use the REP prefix for this instruction, since the values loaded would overwrite the contents of rAX before we could do any processing with it. LODSB is used in conjunction with STOSB when we want to copy a string byte by byte while responding to the copied characters. Example of a simple conversion of a zero-terminated string from lowercase a..z to uppercase A..Z:
The SCAS instruction is used to find the position of a value in the AL, AX, EAX, or RAX register in the input string addressed by rDI. The compare sets the flags and then increments or decrements rDI by the width of the register being compared (1, 2, 4, or 8) depending on the Direction Flag. The instruction is used with the REPNE (repeat if not equal) prefix, with the number of repeats determined by the contents of rCX. Execution of the instruction is terminated either by finding a character in the string (setting the Zero Flag as a match sign), or by exhausting the length of the string in the rCX register. If the operation terminated due to exhausting of rCX, we can still use the status of the ZF to determine if the value was found in the last element of the input string.
Somewhat less meaningful is scanning repeatedly with the REPE prefix (repeat while equal). We would use this if the EDI register is pointing to a string of all null characters, for example, and we want to find the position of the first non-zero character:
The last useful string instruction is CMPS, which compares an element in memory (BYTE, WORD, DWORD, or QWORD) addressed by rSI with an element of the same width addressed by rDI. It sets arithmetic flags and then increments or decrements rSI and rDI by the size of the element. This instruction can be repeated using the REPE prefix, too. The following example shows a search for a word or expression (Needle) within a longer character string (Haystack). The algorithm first searches for the first character of Needle using SCASB and then it compares the subsequent characters of the search word using CMPSB. If they are not there, it continues to search for the first character and repeats the process.
Both Linux and Windows require the Direction Flag to be reset before calling its functions, and they also guarantee that DF=0 on return. If we need to use a string instruction with the Direction Flag set, it needs to be reset again using CLD, preferably right after the instruction is executed. Then we don't even need to reset this flag at the beginning of the examples given.
Remember that string instructions increment registers after execution,
so for example REPNE SCASB
stops one element later after finding a byte,
and thus the byte found is at address EDI-1.
After familiarizing ourselves with the basic machine instructions, we finally get to try them out. The instructions need to be written to the source file as plain text in 8-bit encoding (not UTF-16) and without internal flags signalizing bold or italics, headings, etc. We then submit the file to a program called assembler, which converts it into another executable file. This tutorial recommends using EuroAssembler, which is convenient because we don't have to specify any command line parameters, just the name of the source file. The output can be a directly executable file for DOS, Linux or Windows, so no linker is needed. We will not use any third party libraries either, only the API (Application Programming Interface) of the operating system.
To make programming amusing, every program should do something interesting, at least output
something to the monitor. It usually starts by typing out the phrase "Hello, world!".
The monitor, keyboard, and mouse are devices which are under the control of the operating system,
which does not allow us to use
IN and
OUT
instructions to write to the device. Maybe this would be possible in real DOS,
but we would have a hard time dealing with the ports of the various video adapters
that were used in the DOS era anyway.
We will need to use the operating system services to output the character string to the monitor.
We'll show how to output "Hello, world!" in 16-bit DOS, then in 32-bit Linux, 64-bit Linux, 32-bit MS Windows, and 64-bit MS Windows.
The DOS services are invoked by the INT 21h machine instruction and are described for example in the
DOS Fn Index or INT 21h.
When we expand the INT 21h function list, we see two lines dealing with standard output,
i.e., writing to the console:
Int 21/AH=02h - DOS 1+ - WRITE CHARACTER TO STANDARD OUTPUT
and
Int 21/AH=09h - DOS 1+ - WRITE STRING TO STANDARD OUTPUT
.
We want to output the entire string, not just a single character, so we'll look at the second function,
which has two input parameters:
AH = 09h
DS:DX -> '$'-terminated string
Calling this function therefore requires the following instructions:
There is a small problem to solve: how to write the address of the string "Hello, world!" provisionally symbolized by the word HelloWorldAddress, into the DS:DX pair?
In DOS we will prefer programs in COM executable format. This takes place entirely within
a single 64 KB block of memory, which is plenty for our small tasks,
and has the great advantage that all four segment registers are already set to the necessary
contents by DOS, so we don't have to deal with them. When DOS executes our program,
it loads it into memory and stores a 256-byte data structure called
PSP before it starts.
The first instruction of our program follows right after it, at address 256.
The program cannot start at address 256 with the definition of the string
"Hello, world!" because the processor would try to execute the characters
of the string as program code and would probably report an error or freeze.
So we will put the string definition to the end of the program, the assembler doesn't care.
After the INT 21h instruction we should terminate our program, otherwise it would print a greeting,
but then it would try in vain to execute the bytes of the string as machine instructions and probably freeze.
Looking through the list of DOS 21h functions, we find two items containing the text TERMINATE:
Int 21/AH=00h - DOS 1+ - TERMINATE PROGRAM
and
Int 21/AH=4Ch - DOS 2+ - EXIT - TERMINATE WITH RETURN CODE
.
But there is another and much simpler way to terminate a COM format program:
with a simple RET instruction. This fetches a null word from the stack, causing the program
to return to address CS:0, where the beginning of the PSP is, and where the
INT 0x20
machine instruction is stored to terminate the DOS program.
So our program will look like this:
The text string was defined in the program using the
DB (define bytes)
and written in quotes. The assembler uses the address following the RET instruction, stores 13 bytes
of the string at that address, and assigns the symbolic name HelloWorldAddress to that address.
So we have the source code, we write it to a file called e. g. hello.asm
and save it.
Assuming that we have installed EuroAssembler accordig to the
installation instructions, we can try to translate it into executable form by typing
euroasm hello.asm
in the console (the quotes around the filename can be omitted if it does not contain spaces):
From the output messages generated by EuroAssembler during compilation, note the line
I0760 16bit TINY BIN file "hello.bin" created from source, size=29.
The COM format program should have the file name extension .com
.
This is because we didn't tell the assembler to compile to COM format,
so it used the default .bin
extension, which among other things
doesn't create a PSP, so our program wouldn't work anyway.
How to tell EuroAssembler to generate COM? This is done by using a pair of pseudoinstructions PROGRAM and ENDPROGRAM and their operands FORMAT=, WIDTH=, MODEL=, SUBSYSTEM= (and many others). So we have to wrap our program between PROGRAM and ENDPROGRAM. As a mandatory label for the PROGRAM command we will give the name of the program (without the suffix). Let's call it HelloDos, for example. The same name is also given for ENDPROGRAM, but not in the label field, but as the first operand, as is usual for EuroAssembler block pseudo-instructions. The program name does not necessarily have to match the source file name, as is the case with other assemblers. In fact, we could define more than one PROGRAM/ENDPROGRAM block in a single file and thus produce several different executable programs at once. But we won't try that yet.
In addition to the name of the program, we have to specify the format and width of the resulting file in the parameters of the PROGRAM pseudoinstruction. However, EuroAssembler derives the width for COM format itself as WIDTH=16, so we could omit this operand. Similarly, we could omit the ENTRY=256 operand, since the program entry point is always fixed at this address for COM-format programs.
After running it using euroasm hello.asm
we get:
The last line is important, as it informs about errorlevel 0, i.e. no errors.
Otherwise we would have to find and remove them in the source text first. Similarly, the line
I0660 16bit TINY COM file "HelloDos.com" created, size=29.
confirms that a program in COM format was generated and what size it is.
After almost every message, EuroAssembler still appends the position in the source file
to which the message refers. For example, "hello.asm"{7}
associates the message with the seventh line of the source file, which is
ENDPROGRAM HelloDos
.
Note also the message
I0860 Listing file "hello.asm.lst" created, size=1039.
.
EuroAssembler creates a listing file of the translated file without asking.
This is again a plain text file, so we can view it with Notepad or similar viewer:
We can see that the listing contains a copy of the source code indented to the right, and in a left part a column delimited by | (pipe character) containing a four-digit hexadecimal address terminated by a colon, followed by the machine code of the instruction. At the end of the program occurs a map showing how its sections were linked into the resulting file (**** ListMap), and also a list of global symbols (**** ListGlobals), in our case empty.
The format of the listing produced by EuroAssembler matches the valid source code, since the left column delimited by the | characters is ignored as a machine comment. Thus, if we re-run the translation with the
euroasm hello.asm.lst
command, we would again get theHelloDos.comfile.
The translation of the program ended with errorlevel 0.
We can then open the DOS emulator (DosBox), go to the directory where we have HelloDos.com
and try to run it by typing
HelloDos
or HelloDos.com
.
You will probably see the text Hello, wordl!
followed by a jumble of nonsense characters in the DOS window.
This is caused by overlooking a detail in the service description
DS:DX -> '$'-terminated string
– the string being written out must end with a dollar sign.
Along with the dollar sign, we can also add a Carriage Return and Line Feed pair, 13 and 10,
to the end of the string to cause line breaks:
HelloWorldAddress DB "Hello, world!",13,10,'$'
After correcting the line and the new translation, everything should work as expected.
When the computer is turned on, the CPU starts in real mode and executes the program hardwired in memory on the motherboard. At that time, DOS or other operating system is not yet booted from the disk, but the BIOS or UEFI interface is working, which can perform several basic functions of the computer: print characters and strings to the monitor, read the keyboard, load a program from the disk to boot the OS. These functions can be called with INT instructions, especially INT 10h for working with the video adapter and thus the monitor. For a complete list of functions invoked by INT interrupts, see the Interrupt Jump Table.
In the previous example, the DOS interface was used to write out the string. If we needed to write to the screen immediately after the computer is turned on, before DOS or any other operating system is booted, we would have to use the BIOS interface hardwired into the motherboard firmware. This is the behavior of, for example, the boot sector program, which loads the 512 byte long contents of one disk sector into memory and passes control to it. Let's try writing out "Hello, world!" using the BIOS services. These, like the DOS services, preserve the contents of all registers except those that return a result. We'll use the TELETYPE OUTPUT service, which expects the character to be written out in AL, then it expects AH=0Eh as the service identifier, and BH=0 as the video adapter's internal page number.
Although EuroAssembler allows to generate the boot sector directly, by selecting PROGRAM FORMAT=BOOT, booting the boot sector is complicated and inconvenient. For practical reasons, we will again generate the proven good old COM format:
MOV DS,CS
for this, as such an instruction is not supported.
MOV SI,HelloWorldAddress
MOV CX,HelloWorldSize
CLD ; Reset the Direction Flag for safety.
SUB BX,BX ; Use the base (zero) page of the video adapter.
MOV AH,0Eh ; BIOS function number INT 10h.
Next: LODSB ; Load one character of the string into AL, increment SI.
INT 10h ; Call the BIOS function to print the character.
LOOP Next ; Jump for the next character CX times.
JMP $ ; End the program by endlessly jumping on itself. Without the OS, there is no other way.
HelloWorldAddress DB "Hello, world!",10
HelloWorldSize EQU $ - HelloWorldAddress
ENDPROGRAM HelloBioAfter running HelloBio.com
in the DosBOX emulator, we again get the expected output
Hello, world!. However, due to the program looping at the end,
we then have to emergency terminate DosBOX.
The executable format for Linux in EuroAssembler is called ELFX, generated programs in this format
get the file extension .x
, which we can get rid of by renaming with
mv HelloL32.x HelloL32
and have an executable program without the extension,
as is customary in Linux. In addition to the parameters of the pseudoinstruction
PROGRAM FORMAT=ELFX and WIDTH=32, we must also specify the entry point of the program,
i. e. the first instruction to be executed. We mark the entry point (ENTRY=) symbolically with a label,
for example Start: or Main: etc.
If we run the programs in this tutorial under MS Windows, to run the Linux variant we would need to have the WSL emulator installed in Windows.
To output text in Linux, we again have to use the application interface of its kernel.
In the case of a 32-bit system, a kernel function is called with the INT 80h
instruction and its parameters are entered into the registers EBX, ECX, EDX, ESI, EDI, EBP,
while the identifier of the called function is entered in EAX.
Of course, we only fill the input registers that the kernel function requires,
in the case of the write function(sys_write has three parameters)
these are the registers EBX, ECX, EDX. The kernel call returns the result of the function
in the EAX register, the other registers are returned unchanged.
To output the string "Hello, world!" we'll need the sys_write function
to write to standard output. According to the
Linux Syscall Reference (32 bit)
this function has the identifier EAX=0x04.
The first parameter in the EBX register specifies fd, which is the file descriptor
alias file handle. For standard output this is the number 1, for error output it would be 2.
In the next two registers we specify the second and third parameters, which are the address
and size of the string to be output. We could specify the size as a number
(in our example MOV EDX,13
), but it makes more sense to specify it indirectly,
as the difference of the $ address and the HelloWorldAddress.
If we were to later lengthen or shorten the HelloWordAddress string, its size would be set automatically.
The dollar symbol $ denotes the current address of the instruction, in this case the
EQU instruction. Since we specify the length explicitly, there is no need to end the string
with a null character. So let's try it:
Compile the program in the hello.asm
file again using euroasm hello.asm
.
If everything went well and EuroAssembler reported
we can try to run it in native Linux or in WSL with the command ./HelloL32.x
.
In my case, the console printed:
Hello, world!Segmentation fault (core dumped)
.
So the program works, but after dumping the string, it crashed with a Segmentation fault. The error was in exiting the program: the RET instruction is not enough in Linux, we have to use the kernel function sys_exit. And at the same time, we add a Line Feed character (10) at the end of the string for line feed after the dump is finished. Carriage return is not usually used in Linux.
After replacing the RET instruction with the system call, the program works as expected.
The program for 64-bit Linux looks similar, the difference is the parameter WIDTH=64,
which we have to specify in the PROGRAM pseudo-instruction, since its default is WIDTH=32.
Another thing that distinguishes 32-bit and 64-bit Linux is the calling of the kernel functions
not by INT 80h
, but by the
SYSCALL instruction.
The numeric identifiers of kernel functions passed in RAX and the order of registers
for parameter passing also differ: instead of EBX, ECX, EDX, ESI, EDI, EBP,
they are entered in registers RDI, RSI, RDX, R10, R8, R9.
Otherwise the functions remain the same. For an overview of kernel calls, see e.g.
Linux System Call Table for x86 64.
We get a number of warnings when we try to compile it: W2340 This instruction requires option "EUROASM CPU=X64".
Even with these warnings our program would work, but we prefer to give it the required option
by adding the line EUROASM CPU=X64
before
the HelloL64 PROGRAM
pseudo-instruction so that it does not
unnecessarily point out that we are using 64-bit registers.
After the new compile we should get errorlevel 0 and we can test our 64-bit program:
Function calls in MS Windows use the Win32 application interface defined
in dynamically linked libraries. Most of the basic functions are available in the kernel32.dll
library.
We will again write to standard output. Unlike Linux, however, there are no fixed fd file identifiers, as there were 0 for standard input, 1 for standard output and 2 for error output. Instead, we must first ask the operating system what identifier (file handle) is used for standard output today. To do this, we use a Win32 function called GetStdHandle. It takes as a parameter the identifier of standard output, which is the number -11. The value returned by the function in the EAX register is then already the file handle used in WriteFile function.
What remains to be solved is how to call the Win32 functions and how to pass parameters
(address and string size) to them. 32-bit Windows uses the standard call convention,
where the parameters are first stored on the stack (PUSH) in order from last to first,
and then the imported function is called (CALL).
The programmer of the function in question takes care of removing the stored parameters
from the stack. It's usually done by using RET n*4
instead of a regular RET
return,
where n
is the number of parameters on the stack.
The RET n*4
instruction works like a regular return from a function (RET),
but then it also increases the ESP by n*4 bytes.
How do we make the CALL instruction recognize that we are calling a dynamically linked function? Either we include the winapi.lib import library in the program, or we define the name of the function with the IMPORT pseudoinstruction. This pseudoinstruction has the LIB= parameter, which defines the file (library) that contains the function. The library is defined in the description of each function on the Microsoft website in the Requirements paragraph. If the library name is "kernel32.dll", we do not need to specify it with the LIB= parameter. So let's try to write an executable for Windows that will have the PE (Portable Executable) format:
Assemble it using euroasm hello.asm
and the result should be
After running the program in Windows with the command HelloW32.exe
or just HelloW32
we should get the greeting Hello, world!
.
MS Windows in 64-bit mode uses the same Win32 functions and the same kernel32.dll
libraries as 32-bit Windows, but the calling convention differs significantly:
instead of StdCall, FastCall is used. In this convention,
RCX, RDX, R8, R9 are used for the first four parameters. If a function requires more than four parameters,
they are stored on the stack again in reverse order (from last to fifth).
Then the stack is reserved for the four parameters passed in the registers,
even if the function has less than four parameters.
In addition, the stack (register RSP) must be rounded to an integral multiple of 16 bytes
before calling an external function (before the CALL instruction).
In case that floating point parameters are passed to the function, the lower half
of XMM0, XMM1, XMM2, XMM3 is used instead of the corresponding RCX, RDX, R8, R9.
The called function does not remove parameters from the stack after it is finished,
this must be taken care of by the caller.
Equipped with this knowledge, we will try to write out the string in 64-bit Windows:
As you can see, especially in the last example for 64-bit Windows, calling a function as simple as writing out a greeting requires writing quite a large number of machine instructions. Let's try to do something about it. The key to reducing the programmer's workload is the use of macro instructions. Each macro instruction (alias a macro) can replace a number of machine instructions, and the macro can accept input parameters and thus modify its operation according to the programmer's needs.
The syntax of the macro instruction language and its nuances are properties of the assembler used;
virtually every assembler handles them in its own way.
Here we will focus on writing macro instructions in the EuroAssembler language.
Its apparatus uses the percent sign % for the expressions used in writing macros.
Pseudoinstructions starting with a percent refer to macros or auxiliary assembler variables.
While common memory variables written using the
D, such as OrdinaryVar DD 1234h
,
define a memory location called OrdinaryVar containing a DWORD with the value 1234h,
the %OrdinaryVar
variable represents something quite different:
a variable of EuroAssembler itself. Its location is not in the compiled program code or its data,
but it exists in EuroAssembler's memory while it is running. Its content can be set by the
%SET pseudoinstruction and it can be any text,
arithmetic expression, string, number, etc.
Whenever %OrdinaryVar appears in the source text, it will be replaced by that text.
Macroinstructions are defined by a pair of block pseudoinstructions %MACRO and %ENDMACRO. The identifier in the %MACRO label field becomes the name of the macroinstruction. The machine instructions within the %MACRO/%ENDMACRO block are the body of the macro. Whenever the name of a macro instruction appears in the source code, it is replaced with all the machine instructions from its definition. Example macro for 64-bit Linux:
We have defined a macro called WriteString with two parameters: the address of the string and its length. Specifying the name of the macro in the program then causes it to be expanded into five machine instructions, with the first and second parameters available as variables %1 and %2. Instead of using numeric labels for the parameter variables (%1, %2), we could also use formal parameter names by prefixing the parameter name in the macro definition (StringAddress, StringSize) with a percent sign in the body of the macro:
We can further improve the macro by not hard-coding the file descriptor for the standard output passed in RDI as the number 1, but by specifying it as a parameter. And to avoid having to specify this parameter if we use its usual value of 1, we will specify it as a key parameter, i. e. with an equals sign and a default value of 1:
We can now use the same macro to write to the error output
instead of the standard output; just add the fd=2
parameter.
EuroAssembler allows you to define memory variables using literals, i. e., directly specified values. Instead of defining a DB pseudoinstruction string and inventing its symbolic name, we define it only when its value is used in an instruction:
A literal is defined by an equals sign = followed by a type specification (BYTE, WORD, DWORD, QWORD, or just B, W, D, Q) and then its value. The advantage of a literal over a symbol is that we don't have to invent a name for it and we can immediately see its value in the instruction where it was used.
Using the macro language and literals, we can now write our own macros to output the string to standard output. For example, let's call them StdOutput. This work has already been done and the macros StdOutput are listed in the libraries supplied with EuroAssembler, namely the macrolibraries dosapi.htm for DOS 16 bits, linapi.htm for Linux 32 bits, linabi.htm for Linux 64 bits, winapi.htm for Windows 32 bits, winabi.htm for Windows 64 bits.
With the use of the StdOutput macro and the use of literals, our test programs become much simpler. Just include the appropriate library according to the target platform. Inclusion using the INCLUDE causes the named library (which is just another source file) to replace the INCLUDE command line with its contents.
API (16 and 32 bit) or ABI (64 bit) libraries contain definitions for the macros StdOuput, TerminateProgram and several others.
For simplicity, we will write the DOS, Linux and Windows programs into a single source file hello.asm
.
Since the macros for writing to standard output and for terminating the program are named
the same in all libraries (StdOutput and TerminateProgram), we should tell EuroAssembler
to forget their definitions from the previous library before defining the next program using
%DROPMACO pseudoinstruction.
After translating the above file with the familiar euroasm hello.asm
command,
we should get five programs named
HelloDos.com
, HelloL32.x
, HelloL64.x
, HelloW32.exe
, HelloW64.exe
,
which we can immediately try with the help of emulators (DosBox, WSL, wine).
The block of instructions between %MACRO and %ENDMACRO represents the definition of a macro. The definition itself does not do anything interesting yet, it just takes up space in the source code. Only when we try to use the macroinstruction in the program it will be expanded, i. e. the macro name will be replaced by the instructions from the macro body (and maybe also errors will be shown, if we made any while writing the macro).
The macro definition can be written at the beginning of the PROGRAM/ENDPROGRAM block, or before this block, or even in a separate included file (library), but always before the macro is used (expanded) for the first time/ Macro instructions (together with EuroAssembler variables starting with the percent sign), pass through the boundaries of the PROGRAM/ENDPROGRAM block, they are visible throughout the source code starting from their definition. This is where they differ from symbols, which must be unique in the PROGRAM/ENDPROGRAM block and they must not be repeated.
Thus, macros and %variables can be redefined within the same source file.
However, macro redefinitions are somewhat uncommon, so EuroAssembler responds with
the warning message W2512 Overwriting previously defined macro
.
If we need to overwrite a macro with a different macro definition with the same name,
it is better to first make it forget its previous definition using the %DROPMACRO pseudo-instruction.
Passing the text to the user of our program was easy, in the previous examples we used an operating system service usually called write or similar, perhaps wrapped in a StdOutput macro. Now let's look at the opposite case, where we want to get something from the user. One possibility is to retrieve arguments from the command line that we used to run our program.
If our program is called MyCalc.exe, for example, and we typed
MyCalc.exe 2 + 3
in the console, the operating system
will provide us with a string containing the same information
MyCalc.exe 2 + 3.
That is the name of the running program (the 0-th argument) and then an exact copy
of the following characters, including spaces or other characters separating the arguments.
Where is this string stored?
In DOS, it is in the PSP structure starting at byte 81h. The previous byte at address 80h contains the length of the string.
In Windows we get a pointer to a similar string using the API function GetCommandLine. If we need to have each argument separately, we have to retrieve the string e.g. with the LODSB instruction and respond to the separator characters (this is called parsing).
It's a bit different in Linux: the string is already parsed into the program name and its
space-separated arguments, and each of these items is terminated by a null character and
the pointers to them are stored on the stack. This is shown schematically in the figure for the
GetArg macro.
The addresses of the stack items in this figure grow upwards.
The width of each item is 8 bytes in a 64-bit program or 4 bytes in a 32-bit program.
The MOV RCX,[RSP]
instruction at the beginning of the program would
load the total number of arguments into RCX. The address of the first argument is obtained by
MOV RSI,[RSP+2*8]
,
the address of the second argument is obtained by MOV RSI,[RSP+3*8]
and so on.
Or we can use the ready-made GetArg macro which delivers the individual arguments
already parsed, regardless of the operating system. Just use the appropriate macro library
for DOS, Linux or Windows in the required 16, 32 or 64 bit width.
Macro libraries are loaded with the INCLUDE pseudo-instruction, which takes as parameter
the file dosspi.htm
, linapi.htm
, linabi.htm
, winapi.htm
nebo winabi.htm
.
The macro GetArg is named the same in all of these libraries, and returns a pointer to the
requested argument in rSI and its length in rCX. Alternatively, it returns a CarryFlag if the argument
in question was not supplied on the command line.
Let's try programming a primitive four-task calculator. After entering two integers separated by the sign of an arithmetic operation, our calculator should return the correct result. The allowable operations will be determined by the +, -, *, / sign for addition, subtraction, multiplication, division.
The target platform will be 32-bit Windows, so the corresponding library will be
winapi.htm
. If we write 3 + 4
to the CalcW32 command line,
we want to get a result of 7
.
Let's call the source code file calc.asm
, for example:
You should be puzzled by the definition of the variables Arg1, Arg2, Arg3
in the middle of the instruction flow. This is a bad programming technique,
but it works in EuroAssembler thanks to the EUROASM AUTOSEGMENT=ENABLED
parameter enabled by default. Autosegmentation distinguishes whether a machine instruction
or a data definition has been placed on a line, and accordingly divides the output
of the translated instructions into separate sections for program code, initialized data,
and uninitialized data. These sections have the traditional names [.text], [.data], [.bss].
If we look at the listing of calc.asm.lst
, we can see the automatic change of sections there.
However, it is not very wise to rely on autosegmentation for longer programs;
rather, when writing the program, we should already position the Arg* data items below the program code.
Next, note the help line DB "Example: %^PROGRAM 3 + 4", 13, 10, 0
.
The %^PROGRAM expression here is not a user-defined variable due to the ^ (caret)
following the percent sign. Such variables are system variables, their value is set
by EuroAssembler itself, in this particular case to the program name (CalcW32).
EuroAssembler has many system variables, each parameter of the EUROASM and PROGRAM pseudoinstructions has one,
see the help crib.
The program compiled without error, but returns an incorrect result when run:
Thanks to the line StdOutput Arg1, Arg2, Arg3, Size=1 ; Argument checklist.
we had listed the arguments with which CalcW32 was run.
This is a good tactic for debugging. We can see that all three arguments were entered correctly.
A more experienced programmer would have found several more errors in CalcW32.
For example it is not specified where to continue the program after the Error help line is printed.
We want to send it to exit the program. So let's add an End: label to the TerminateProgram
pseudoinstruction and let the program jump to it after printing the Help.
Another error that causes the incorrect result 3+4=g
is not distinguishing between the binary number and the ASCII code of the individual digits that GetArg returns.
When given the digit 3 as an argument, the GetArg macro returns its ASCII code,
which is in fact hexadecimal 33h. Similarly, instead of digit 4 we get 34h, and since we have added
the ASCII codes (and not the numbers), the result is 33h + 34h = 67h,
which is the ASCII code of the letter g and not the digit 7.
Before using the ADD instruction for addition we need to convert the ASCII codes to plain binary numbers.
This is easily done by subtracting 30h or '0' from the ASCII code of the digit.
After the binary addition, we then convert the sum by adding 30h, and we can print this ASCII code output
as a result using StdOutput. Let's correct the program CalcW32:
The program already works fine, but only for adding single digit numbers and we have to keep the spaces between the arguments. When entering CalcW32 3+4 without spaces, the program prints the Help, i.e. it doesn't like something. I guess it took the whole string 3+4 as one argument and when trying to retrieve the non-existent second and third one, GetArg returns an error. We will have to solve the input and output of binary numbers in the form of ASCII for longer numbers than just single digits. Unfortunately, operating systems do not offer any function to convert ASCII numbers to binary and back, we have to program everything ourselves. For example, if we have the characters "123", as the input string of the first number, we expect the result in binary form as the number 123, i. e. hexadecimal 7Bh in 8-bit register or 0000007Bh in 32-bit register. How to convert the string "123" to binary number? First convert each successively read character in the range "0" to "9" to a number in the range 0 to 9 by subtracting "0" or by subtracting 30h. Then there are two possible approaches to order the numbers thus obtained successively into the result:
The second method looks simpler because we don't have to calculate increasing
weights of 1, 10, 100, etc., instead we make do with repeated multiplication by 10.
We'll call this algorithm ASCIItoInteger. We'll probably use it more often
in various programs, so we'll encapsulate it in a procedure using the PROC and ENDPROC
pseudoinstructions. Then we will call it with CALL ASCIItoInteger
whenever we need to convert a number from ASCII characters to an integer binary value.
All that remains is to complete the procedure with a description of exactly what it does, what values it expects on input, and what it provides on output:
We have thus produced something like a black box, for which we no longer have to think
about its instructions, but we are concerned only with its description as a whole.
Later we can put the whole procedure in a separate file and thus create a library that we can then include
in all programs that require string-to-number conversions. We can call the library for example
libcvt32.asm
.
In the simple calculator example, we still need to solve the reverse conversion of a binary number to a series of ASCII characters representing that number in a format that we can print using StdOutput. Again, there are two options here:
The second approach is made more complicated by the need to maintain divisors of 1_000_000_000, 100_000_000, 10_000_000 etc., so let's use the first method. Again, we will program this as a separate procedure as it is a frequently used function.
Let's return to our calculator. The disadvantage of reading input from the command line is that it is only usable for one example; to enter another calculation, the program must be run again. Therefore, we will learn to read input characters entered from the keyboard so that we can enter different numerical examples repeatedly.
Direct reading of the keyboard and mouse using IN and OUT instructions is no longer on-topic in the age of USB peripherals and protected operating systems. The lowest level of keyboard handling in DOS and its emulations is provided by the INT 16h interface, specifically CHECK FOR KEYSTROKE and if it detects that a key has been pressed, then GET KEYSTROKE, which returns the ASCII character in AL.
In Windows, Linux, and even DOS, a function to read from standard input is preferable.
It is redirectible and returns the text entered from the keyboard line by line.
The contents of another text file can be redirected to a program using standard
input reading, for example in Linux with cat answers.txt | program
and in DOS/Windows with type answers.txt | program.exe
.
For reading standard input use use in DOS the function READ FROM FILE OR DEVICE, in Linux sys_read, in Windows ReadFile. These functions do not return ASCII characters each time a key is pressed, instead they implement a line editor. This means that the typed text can be overwritten or deleted using the Backspace key, and is only returned to the program when the Enter key is pressed. The returned text will be terminated by a Line Feed character (10) in Linux, and by a pair of Carriage Return and Line Feed characters (13, 10) in DOS and Windows. In addition, these functions report the number of characters actually read, including those terminating 0Dh,0Ah.
Let's try to improve our not-very-functional-yet calculator in the calc.asm
source file
by reading from the standard input. Let's assume we already have a text file called libcvt32.asm
,
in which we have stored the two conversion routines above, ASCIItoInteger and IntegerToASCII.
We will insert this file into calc.asm
using INCLUDE libcvt32.asm
,
so the calculator source code will be much shorter.
For a change, let's write a program for 32-bit Linux:
When trying to compile the previous program, EuroAssembler reported the following errors:
The error occurred because we used the labels called Next1, Next2, End
multiple times in the procedures and in the main program. We should come up with more unique names,
but there is an even better solution to get rid of duplicates: make the symbols local.
Local names start with a period, and EuroAssembler actually remembers them associated
with the name of the procedure or program in which the label was defined.
So End
in the IntegerToASCII
procedure, when renamed to .End
,
will be stored internally as IntegerToASCII.End
, and will not conflict with End
in the CalcL32 program.
What about the periods before the symbol name and the colons after it? The period at the beginning indicates the symbol as local, in fact the symbol name is modified by prefixing the namespace name (program, procedure, structure) before the local name. See also the namespace paragraph in manual.
The colon may or need not be appended after the symbol name to emphasize that it is a symbol
and not the name of a structure, register, instruction. Unlike in most other assemblers,
in €ASM a colon can be appended after the symbol name not only when defining it in the label field,
but also whenever the symbol is mentioned, for example in MOV RSI, Symbol:
.
And if the colon is doubled, it also indicates the global visibility of the symbol,
so we don't have to explicitly specify it with
GLOBAL Symbol
, or with PUBLIC and EXTERN.
After correcting the symbols names Next1, Next2, End1, End2 in libcvt32.asm
to local version by prefixing a period before their name, the program should compile
without errors and we can test it with longer numbers and different operations.
We can see that retrieving arguments from standard input works better than
getting them from the program command line.
Still, the ./CalcL32.x
program had to be run
again after each example, since we have not yet programmed a transition
to a new input after each successful calculation. This is easily remedied
by simply including a jump to the beginning, i. e., End: JMP Start:
instead of TerminateProgram at the End label. Better yet, replace all jumps to End: with a jump to Start:.
Another change would be to replace the Linux kernel call with the
StdInput from the linapi.htm
library,
which does essentially the same thing and is easily replacible by the same macro from
libraries for other operating systems.
The last thing to fix in the previous source code are the redundancies in the calculation
of the arithmetic operations Addition, Subtraction, Multiplication, Division:
loading the first number of MOV EAX,[Arg1]
can only be done once
and then used for all four possible operations.
Instructions
are repeated, so for the second, third and fourth arithmetic operations we can replace them
by jumping to the first one. The program source code in the calc.asm
file is shortened
by these interventions:
So we have a working calc.asm
calculator for 32-bit Linux. Porting to Windows 32-bit is easy:
instead of CalcL32 PROGRAM FORMAT=ELFX, WIDTH=32, ENTRY=Start:
use
CalcW32 PROGRAM FORMAT=PE, WIDTH=32, ENTRY=Start:
and instead of
INCLUDE libcvt32.asm, linapi.htm
use INCLUDE libcvt32.asm, winapi.htm
.
That's it, the other instructions are not changed and the macros StdOutput, StdInput and TerminateProgram
change only their body but not their name.
It's a bit more complicated with the 64-bit program porting.
While 32-bit registers were used to handle both data and addresses in 32-bit mode,
in 64-bit mode the address width is increased to 64 bits, while the default data width
remains 32 bits (although it can also be increased to 64 bits by using registers starting with R).
Also remember that writing to a 32-bit register nulls the upper 32 bits of the corresponding 64-bit register, too.
Thus, SUB EAX,EAX
not only nulls the EAX register, but also the upper (more significant)
half of the RAX register.
Our library for converting integers to ASCII and back will look slightly different after porting to 64 bits.
Let's call it libcvt64.asm
:
The calculator code for a 64-bit operating system will change only slightly,
as we will remain limited to 32-bit numbers, and working with non-address registers
will remain the same. Instead of the 32-bit libraries
libcvt32.asm
, linapi.htm
, winapi.htm
we will include the libraries
libcvt64.asm
, linabi.htm
, winabi.htm
.
Instead of MOV ESI,Buffer
it will be better to use
LEA RSI,[Buffer]
. Other instructions may remain the same as in the 32-bit variant:
Porting from 64-bit Linux to Windows is again easy: replace the linabi.htm
library
with winabi.htm
, change the name and format of the program from
CalcL64 PROGRAM FORMAT=ELFX
to
CalcW64 PROGRAM FORMAT=PE
and that's it. You get a calculator for 64-bit Windows.
Finding and removing program errors – debugging – is an essential part of a programmer's job, and this is doubly true in assembler. We'll demonstrate several methods for finding bugs.
In the previous examples we used the StdOutput macro to dump the specified arguments for checking.
This is a pretty good method of making sure that the processor is running through the corners
of our program where we expect it to.
In the first place, let's put the StdOutput dump right at the beginning, where the program ENTRY= points.
This will make us sure that we've let our program load the correct library and that the dump to
standard output works. The text output by StdOutput doesn't need to be sophisticated,
it just needs to tell us that the program has reached a certain check-point. For example
StdOutput ="I am at line 123",Eol=Yes
.
Very often it would be useful to see the contents of a particular register or memory location in addition to knowing where we are. While we could convert the contents of any register to decimal or hexadecimal ASCII form, store it in some temporary memory variable and then dump it using StdOutput, this is an inconvenient solution. It's better to use a specialized macro that does the same for all registers and leaves no trace in the debugged program (except for increasing its length).
The maclib/debug.htm library is available for debugging and contains a single Debug macro, which is independent of the operating system and of the width of the program being debugged (16, 32 or 64 bits). The OS independence is redeemed by the fact that the Debug macro does not print the register information itself, but it sends a formatted dump-string to a procedure with the default name DebugOutput, which is called as a callback. This procedure has to be included (temporarily) in the debugged program, fortunately it is not difficult. DebugOutput's job is to write out a string addressed by the rSI register of length rCX bytes to the standard output. We can use the well known StdOutput macro for the output.
Examples of the DebugOutput procedure for different program widths:
How to debug using the Debug macro?
EUROASM DEBUG=ON
at the beginning of the program.dosapi.htm,
linapi.htm,
linabi.htm,
winapi.htmor
winabi.htm), include the
debug.htmlibrary in the debugged program.
Debug
macroinstructions in critical places of our program, compile the program, run it,
and then observe whether it passes through the places where it should.The output of the computer state after each use of the Debug macro looks something like this in a 32-bit program:
The macro initially lists its position in the source code as "filename.ext"{1234)
,
i. e. the filename and the line number in compound brackets, and then the contents of all GPRs
in hexadecimal notation. Inserting the macro into the program does not affect it,
all registers and flags are preserved.
Debug can also dump the contents of a single memory location whose address and size are specified
by the macro parameters. For example Debug ESI, Size=32
will print the contents of the memory at address ESI, which is 32 bytes long,
in addition to the contents of the GPR.
The Size=
parameter is in the range 1..256, the default value is 16.
For protected OSes, care must be taken that the specified memory actually exists
(is allocated to the program), otherwise a program crash may occur.
A much more convenient approach to debugging is provided by specialized debugging applications – debuggers. For Linux we have the console application gdb or its graphical extension nemiver or ddd.
Very good debuggers are:
For DOS Turbo Debugger supplied by Borland along with Turbo Assembler or Turbo Pascal.
For Windows 32-bit OllyDbg,
for Windows 64-bit x64dbg.
The nice thing about these full-screen applications is that their authors have copied from each other
the uniform basic control using the function keys F4, F7, F8, etc.
We will use mostly the basic CPU screen which, divided into four parts, displays machine instructions,
register contents, memory contents and the stack. Stepping through the program (F7), possibly skipping
detailed procedures (F8), allows us to see how the registers, program memory
and stack have changed after each step.
The debugger shows the CPU addresses and disassembled machine code in the upper left quarter of the window.
The addresses correspond to what was assigned to them by the linker, we see them as hexadecimal numbers.
It would be useful to see the labels used in the source program instead.
Unfortunately, this linking of the numeric address to the symbols does not happen automatically,
even when a table of symbols with their addresses is present in the executable.
If we want to know the addresses that EuroAssembler has assigned to the symbols, let's look at the listing.
If we have left this enabled with the PROGRAM LISTGLOBALS=ON
option,
at the end of the listing calc.asm.lst
we will see the virutal address (VA=) for each global symbol.
If we needed to know the address of other, non-global symbols, we would need to give them global visibility.
This could be done by adding explicit pseudoinstructions GLOBAL Division, Error, Multiplication
etc.,
but it would be even easier to add two colons to the end of each label name.
After the new translation, we will see their global addresses in the listing:
OllyDebugger for 32-bit Windows could even read addresses from the symbol table in the COFF object file, using the Ctrl-O keyboard shortcut. While we are translating the source code directly into an executable, without the intermediate COFF file, we could temporarily changePROGRAM FORMAT=PE
toPROGRAM FORMAT=COFF
and then generate the COFF object format and use it to load the symbol addresses with OllyDebugger.