Authenticated as: Anonymous (Change Credentials / Create Account)

soja_a

Author: soja_a

Posted: 08:38:35 2005-12-11

Modified: 09:44:35 2007-07-05 by fwaggle

An Introduction to Assembly Language Programming

by soja_a (©2000 alt.hacking)

All assembly language instructions are composed of three components: the opcode (short for "operation code"), the operand(s) and the addressing mode. By combining these three components, the actual machine language instruction can be "assembled." In this section, these three components will be discussed.

3.1 80x86 instruction set anatomy

As opposed to the Reduced Instruction Set Computing (RISC) paradigm, the 80x86 instruction set is of variable length, composed of several different modules which in turn can vary in size and number. The modules correspond to the three instruction components mentioned earlier and are organized according to the following scheme:

------------------------------------------------------------------------------ | prefix(es) | opcode | ModR/M | SIB | displacement | immediate | | 0-4 bytes | 1-2 bytes | 1 byte | 1 byte | 1,2 or 4 bytes| 1,2 or 4 bytes| | (optional) | | (both if needed) | (or none) | (or none) | ------------------------------------------------------------------------------

3.1.1 Prefixes

These are optional 1 byte codes that are classed into 4 categories: lock/repeat, segment override, operand size override and address size override. Up to one code from each category can be used.

3.1.2 ModR/M and SIB

These two, 1 byte codes are used to specify the addresing modes discussed in Sect. 3.2. When a location in memory is specified, the ModR/M byte will take a value that reflects the type of addessing mode used ('Mod' = mode, 'R/M' = register/memory). The SIB (scale-index byte) is needed for a limited number of 32-bit addressing schemes. For detailed information about the codes used, see the Intel Architecture Software Developers Manual.

[note: this is a *way* informative document, not for the newbie, running to some 266 pp. and available in PDF format on Intel's Web site]

3.1.3 Displacement and immediate modules

In certain addressing schemes, a displacement value will follow the ModR/M byte. This displacement value can be either an 8-, 16- or 32-bit quantity. Likewise, for instructions using immediate mode addressing (see Sect. 3.2.2) an 8-, 16- or 32-bit quantity is needed for the immediate mode data.

3.2 Operands and Addressing modes

The 80x86 processor family supports 7 different types of operands, known as "addressing modes," used in conjunction with its instructions: register, immediate (constant), direct, indirect, and indexed.

3.2.1 Register addressing

In this addressing mode, the operand(s) to the instruction are hardware registers. The instruction below uses register addressing:

MOV AX, BX ; moves the contents of BX to AX

where the opcode (MOV) operates on the two registers AX and BX, in this case moving the contents of BX into AX. (The ";" denotes a comment and the assembler ignores the remainder of the line)

Note also that the first operand is the "destination" and the second is the "source." This is the typical syntax of an assembly language instruction.

3.2.2 Immediate addressing

In this mode, a constant value is specified. The following instruction uses this mode:

MOV AX, 1 ; moves the value 1 to AX

where the MOV instruction in this case will transfer the value 1 (a constant) into register AX.

3.2.3 Direct (displacement-only) addressing

In this mode, a location in memory is specified by its numerical address. An example of this is:

MOV AX, DS:[100h] ; moves the contents of location DS:100h to AX

where the MOV instruction will find the value stored in memory location 100h in the DS segment and move it to the AX register. By default, the value in a direct addressing scheme is assumed to be an index into the data segment (specified by DS).

3.2.4 Indirect (based) addressing

In this addressing mode, the contents of a register specify the location of the operand. This concept is analogous to the concept of 'pointers' in the C programming language. The instruction shown below uses indirect addressing:

MOV AX, [BX] ;moves the contents of the address contained in BX

where '[BX]' indicates that the value of the BX register specifies the memory address of the operand that will be moved into AX. The registers that can be used in indirect addressing are BX, BP, DI and SI. By default, BX, SI and DI all use DS as their default segment. The base pointer register BP uses the stack segment register SS as its default.

In all cases, the default offset can be overridden by inclusion of an explicit offset in the instruction:

MOV AX, ES:[BX] ;moves the contents of the location ES:BX to AX

3.2.5 Indexed addressing

In this addressing mode, the location fetched is the sum of a base address plus a displacement value. This addressing mode is used to reference array elements and individual items from records. The instruction shown below uses indexed addressing:

MOV AX, 100h[BX] ; moves the contents of the address specified by ; the contents of BX added to 100h

where '100h[BX]' indicates that the contents of the BX register should be incremented by 100h, and the resultant sum used as the memory address of the operand. In addition to BX, DI, SI and BP can also be used. As with indirect addressing, BX, DI and SI by default use the DS offset whereas BP uses the SS offset. Also, as with indirect addressing, the default offset can be explicitly overridden, as in:

MOV AX, ES:100h[BX]; moves from BX+100h in the extra segment

It should also be noted that accepted alternative notations for indexed addressing are:

MOV AX, [BX+100h] ; does the same thing as the first example MOV AX, [100h][BX]; does the same thing as the first example

Additionally, the order of register and displacement is flexible, so other syntaxes are also legal.

3.2.6 Based indexed addressing

This addressing mode combines indexed and indirect addressing by using the contents of a second register as the displacement in an indexed addressing scheme. An example is the instruction:

MOV AX, [BX][DI] ; moves the contents of the address specified by ; the contents of BX and DI

Legal examples use either BX or BP as the base register and either DI or SI as the index register, giving rise to four possible combinations. When BX is used as the base register, the default offset is the data segment (DS), but when BP is the base register, the default offset is the stack segment (SS). As before, these defaults can be overridden by the explicit use of a segmentation register:

MOV AX, ES:[BX][DI]; as above, but source is in extra segment

Again, an alternate syntax would be:

MOV AX, [BX+DI] ; based indexed addressing

3.2.7 Based indexed plus displacement addressing

In this addressing mode, a constant displacement is used in addition to the base and index registers. An example of this type of addressing is:

MOV AX, 100h[BP][SI] ; moves the data found at BP+SI+100h to AX

The four register combinations given in 3.2.6 can also be used in this scheme. The alternate syntaxes shown in 3.2.5 can also be applied to this addressing mode.

In summary, the addressing modes listed in Sec. 3.2.3-3.2.7 can be generalized as:

[displacement][base register][index register]

where up to two of the three elements can be omitted, and where either BX or BP can be used as base register and either DI or SI as index register. In addition, a segment register (offset:) can prefix any of these combinations.

3.3 Fundamental 80x86 Opcodes

The 80x86 processor family supports 20 basic instruction types that account for the vast majority of the instructions executed by a typical program. It is these basic opcodes that we will focus on first.

3.3.1 Data transfer

3.3.1.1 'MOV' - move a value to a specified address

The MOV opcode is used typically to move a value into a register. The generic form of the instruction is:

MOV <dest>, <src>

where <dest> and <src> are the operands, using one of the five addressing modes. Using a specific example:

MOV AX, 1 ; moves the value 1 to AX

The instruction specifies to the CPU to move the constant value 1 into the accumulator (the AX register).

Certain restrictions apply to the two operands. First, the <dest> operand can be a register or memory location, but not a constant (obviously). Second, the <dest> and <src> operands cannot both be memory locations -- one must be a register or constant. Third, if <dest> is a segmentation register, you cannot transfer a constant into it, so <src> cannot be immediate mode data.

*** NOTE: These restrictions apply to ALL 80x86 instructions ***

It is also important to remember that the size of the <dest> and <src> operands must be the same. Since the 80x86 processors support 8-, 16- and 32-bit operations you must choose the sizes of each operand to match.

Since register names imply their sizes, the major concern is immediate mode data and memory locations. In the case of immediate mode data, the assembler will pad a value with zeroes to make it fit in a register -- but will return an error if the constant value is too large for the register. Memory locations are more tricky, as the assembler is given an address without reference to size. The way to avoid confusion is to explicitly state the size in the memory reference. Three examples are:

MOV AH, byte ptr [100h] ; transfers a byte from 100h to AH MOV word ptr [AX], BX ; transfers a word from BX to the address ; pointed to by AX MOV EAX, dword ptr [BX+100h] ; transfer a double word from address ; pointed to by BX+100h to EAX

In the first case, an 8-bit quantity is transferred; in the second, a 16-bit is and in the third case, a 32-bit quantity is. It also worth noting that the third case is limited to 80386 and higher processors. When <destination> is a segmentation register, the transfer is always 16-bit (word).

The MOV instruction leaves all flags unaffected.

3.3.1.2 'MOVZX' - move a value with zero-fill

The MOVZX instruction is used to move data into a larger destination, with the remaining space being filled with zeroes. The transfer can be 8 bits into a 16-bit destination, 8 bits into 32, or 16 bits into a 32-bit destination.

The form of the instruction is:

MOVZX <dest>, <src>

where <dest> is the larger operand and <src> the smaller. Specific examples are:

MOVZX AX, BL ; transfers BL into AL and zeroes AH MOVZX EAX, BX ; transfers BX into AX with zero fill

The same restrictions apply to the operands of MOVZX as with MOV, with the single exception that (obviously) the operands are not of the same size. Also, flags are left unchanged.

3.3.1.3 'MOVSX' - move a value with sign extension

The MOVSX instruction is used to move data into a larger destination, with the remaining space being filled according to the sign of the binary quantity being transferred. In this instance, the binary quantity is explicitly treated as a signed integer, so the sign bit of a negative number must be transferred. The transfer takes take same size operands as MOVZX. The form of the instruction is:

MOVSX <dest>, <src>

where <dest> is the larger operand and <src> the smaller. Specific examples are:

MOVSX AX, byte ptr [BX] ; transfers byte pointed to by BX into AX MOVSX EAX, AX ; sign extends AX into 32 bits

The same restrictions apply to the operands of MOVSX as with MOVZX. Also, flags are left unchanged.

3.3.1.4 'XCHG' - exchanges two values

The XCHG instruction exhanges the contents of the two operands (naturally). Its generic form is:

XCHG <operand>, <operand>

By definition, neither operand can be a constant, and recall that both cannot be memory locations either. Therefore, in practical terms at least one operand must be a register. So, three specific examples are:

XCHG AX, BX ; swaps two registers, AX and BX XCHG AX, [BX] ; swaps two 16-bit quantities XCHG EAX, word ptr [100h] ;swaps 2 32-bit quantities (386 and above)

In all cases, the order of operands is irrelevant and sizes must be matched. And, like the MOV instruction, XCHG doesn't affect any flags.

3.3.1.5 'LxS' - load a register pair

The LxS family of instructions moves a 32-bit quantity into a general purpose register and segmentation register. They take the general form:

LxS <dest>, <src>

where <src> is a 32-bit quantity and <dest> is a general purpose (16-bit) register. The actual opcode depends on which segmentation register gets the other 16 bits of <src>. Two specific examples are:

LDS AX, [100h] ; loads AX and DS with quantity at 100h LES BX, AX[10h] ; loads BX and ES with quantity at AX+10h

In addition, the 80386 and higher processors have the LFS, LGS and LSS opcodes for loading register pairs that include the FS, GS and SS segmentation registers. In all cases, the low order word (bits 0-15) of <src> is loaded into the general purpose register and the high order word (bits 16-31) is loaded into the segmentation register. These instructions don't affect any flags.

3.3.1.6 'PUSHxx', 'POPxx' - send values to/from the stack

The PUSHxx family of instructions moves a 16- or 32-bit quantity onto the stack. It takes the general form:

PUSHxx <operand>

The POPxx family performs exactly the opposite task and takes the form:

POPxx <operand>

The most basic form of these commands are PUSH and POP:

PUSH AX ; pushes AX onto the stack POP BX ; pops BX off of the stack

In addition to the general purpose registers, all the segmentation registers except CS can be pushed/popped, as can any memory address using the addressing modes discussed previously. On 80286 and later processors, immediate mode values can be pushed (though naturally they cannot be popped). On 80386 and above processors, either 16- or 32-bit quantities can be pushed and popped. Although register names define their size, an ambiguity arises when a memory location is specified: the assembler cannot tell what size data is intended to be pushed/popped. This ambiguity can be prevented in one of two ways. The first way is to use PUSHW and POPW for word (16-bit) operations and PUSHD and POPD for double word (32-bit) operations. The second way is to use explicit typing:

PUSH word ptr [BX] ; push 16 bits from the address pointed to by BX POP dword ptr [BX] ; pop 32 bits to the address pointed to by BX

There are other forms of PUSH and POP as well. PUSHA and POPA are implemented on 80286 and above processors and push/pop all the 16-bit general purpose registers in the order: ax, cx, dx, bx, sp, bp, si and di (note that POP removes them from the stack in opposite order). On the 80386 and above, PUSHAD and POPAD do the same thing with 32-bit registers. These instructions take no operands.

The special case instructions PUSHF and POPF are used to push and pop the flag register. The major purpose of these instructions is to modify the trace flag, which can only be done by the following sequence: PUSHF followed by POP <register>, then modifying bit number 8 of the general purpose register, PUSHing it back onto the stack and a POPF to restore the flag register with a modified trace flag.

3.3.2 Arithmetic operations

The arithmetic logic unit (ALU) is part of the 80x86 CPU. It is able to perform basic binary (integer) arithmetic operations: add, subtract, multiply, divide, negate and compare. As with all the 80x86 instructions, special smaller, faster versions of the arithmetic instructions are available when the accumulator (AX) is used as the destination operand.

Therefore, programmers should always maximize their use of AX for all instructions, especially the arithmetic ones. Unlike the data movement operations, these instructions will affect selected flags.

3.3.2.1 INC - add one (increment) an operand

This is the most basic arithmetic operation of any processor. It takes the form:

INC <operand>

where the operand can be a general purpose or segmentation register, or memory location using any of the previously discussed addressing modes. Some specific examples are:

INC AX ; add one to the accumulator INC DS ; add one to the data segment register INC 100h[BX] ; add one to the location pointed to by BX+100h

The flags will be affected in the following way:

(all other flags, most notably carry, left unchanged)

3.3.2.2 ADx - add two quantities together

These are more complex addition operations, involving two operands. They take the general form:

ADD <dest>, <src> ADC <dest>, <src>

The ADD instruction adds the two operands and places the result in <dest>. The <dest> operand can be either a register or a location in memory. The <src> operand can be either register, memory or immediate data (but you cannot do memory to memory addition). Two examples:

ADD AX, 10h ; adds 10h to AX and stores the result in AX ADD 10h[BX], AX ; adds AX to the value located at [BX]+10h

The ADC instruction behaves identically to ADD, with one difference: it adds the value of the carry register (prior to the execution of the ADC) into the sum. In practice, this means that ADC behaves identically to ADD when the carry flag is cleared, but will result in the sum+1 when the carry flag is set. Both instructions affect the same flags as INC along with the carry flag:carry - set if an unsigned arithmetic overflow occurs; cleared otherwise

3.3.2.3 DEC - decrement an operand

This instruction is the analogue of INC in which we subtract one instead of adding it. Its generic form is:

DEC <operand>

where the operand can be a general purpose or segmentation register, or memory location using any of the previously discussed addressing modes. Some specific examples are:

DEC AX ; subtract one from the accumulator DEC DS ; subtract one from the data segment register DEC 100h[BX] ; subtract one from the memory location BX+100h

The same flags are set by DEC as were by INC.

3.3.2.4 SxB - subtract one quantity from another

These are more complex subtraction operations, involving two operands. They take the general form:

SUB <dest>, <src> SBB <dest>, <src>

The SUB instruction subtract <src> from <dest> and places the result in <dest>. The <dest> operand can be either a register or a location in memory. The <src> operand can be either register, memory or immediate data (but you cannot do memory from memory subtraction). Two examples:

SUB AX, 10h ; subtracts 10h from AX and stores the result in AX SUB 10h[BX], AX ; subtracts AX from the value located at [BX]+10h

The SBB instruction behaves identically to SUB, with one difference: it subtracts the value of the carry register (prior to the execution of the SBB) from <dest>. These instructions affect the same flags as ADD and ADC.

3.3.2.5 CMP - compare two operands

The compare function is the same as an SUB with the exception that it doesn't store the result of the subtraction anywhere. Its purpose is to set the flags that an SUB instruction would set. Its general form is:

CMP <dest>, <src>

An example would be:

CMP AX, 10h ; compares 10h and AX

In this case, the flags would take the following values:

This instruction will be used primarily to set the flags for a conditional jump (see Sec. 3.3.4). For this purpose, the following interpretations can be placed on flag values:

unsigned signed <dest> = <src> Z=1 Z=1 <dest> <> <src> Z=0 Z=0 <dest> < <src> C=1 S=0,O=1 OR S=1,O=0 <dest> >= <src> C=0 S=1,O=1 OR S=0,O=0

Note that there is no simple way to test for <dest> > <src> using the CMP instruction. Therefore, the choice of which operand is <dest> and which is <src> does make a difference, even thought neither operand will be affected by the CMP. Of course, in the example above the immediate data must be <src>.

3.3.2.6 NEG - produce the two's complement of the operand

The NEG command negates its operand. In signed integer arithemetic, that involves taking its "two's complement" (see Sec. 2.1 for details). Its general form:

NEG <operand>

An example of this is:

NEG AX ; AX = -AX

If the initial value of AX were 018F7AFFh , its value after the NEG instruction would be FE708500h. Normally, the NEG instruction clears the carry flag. When the operand has a value of zero, NEG leaves it unchanged but sets the carry flag. The other special case arises when the argument to NEG has the sign bit set with all other bits cleared (i.e., it has the largest negative value allowed for a signed integer). In this case, NEG will again leave it unchanged but will set the overflow flag. In all case, the NEG instruction will affect the Z, S, P and A flags just as a SUB instruction would.

3.3.2.7 MUL, IMUL - integer multiplication

Unlike addition and subtraction, multiplication comes in two different forms: unsigned (MUL) and signed (IMUL). In addition, because of Intel's lack of foresight when designing the original 8086 instruction set, the two forms of multiplication are not strictly parallel. The general form for MUL is:

MUL <operand>

Note that MUL, unlike other arithmetic commands, takes only one argument. This was done out of necessity when the 8086 was designed, and lives on only for backwards compatibilty. As a result, though, the instruction will only multiply the accumulator (AX) by the operand, placing the result back in AX. An example is:

MUL BX ; DX:AX = AX * BX

The operand can be a register or a location in memory, using any of the addressing modes. However, it cannot be a constant, as immediate mode addressing is not supported by this opcode. This is a severe restriction on the utility of MUL. One other important "feature" of MUL is that the size of the operand must be doubled to ensure that the result can be fit. Thus, an 8-bit operand requires a 16-bit result, a 16-bit operand requires a 32-bit result and 32-bit operands require 64-bit results. This is accomplished in the following way:

operand size multiplicand destination byte AL AX word AX DX:AX dword EAX EDX:EAX (only 80386 and later)

The meaning of the DX:AX symbolism is that the high order word of the product is stored in DX and the low order word in AX. Likewise, EDX:EAX means the same thing for double word quantities. If the multiplication product exceeds the size of the operand (i.e., the product of MUL BX is larger than 16 bits), the carry and overflow flags will be set. The MUL instruction also changes the values of the S, Z, P and A flags, but in unpredictable fashion.

The IMUL instruction has several different forms:

IMUL <operand> IMUL <register>, <immediate> (on 80286 and later) IMUL <register>, <operand>, <immediate> (on 80286 and later) IMUL <register>, <operand> (on 80386 and later)

In all of these cases, <operand> can be either a register or a memory location, and either 16 or 32 bits (and 8 bits in the first case only). <register> can be either a 16- or 32-bit register and <immediate> can be either an 8-bit constant or a constant of the same size as <register> and <operand> should be of the same size as <register>. The first case operates just like the MUL instruction, except that it treats its operands as signed integers. The next two cases were a result of Intel's realization that it's often very useful to be able multiply by a constant (especially for implementation of 2- and higher dimensional matrices). Two examples are:

IMUL AX, 100h ; AX = AX * 100h IMUL AX, BX, 100h ; AX = BX * 100h

Unlike the MUL instruction (and the first type of IMUL), these multiplications do not double the space of the operands. Notice that the product is being stored in a register that's the same size as the operand. If the product exceeds the size of the register, the carry and overflow flags are set and you lose the extra high order bits. As with MUL, the other flags are scrambled. The third case is the most general:

IMUL AX, 10h[BX] ; AX = AX * [BX+10h]

This form mostly closely resembles ADD and SUB. As with those instructions, <dest> and <src> must be the same size. And, as with the other IMUL instructions, the carry and overflow flags will be set if the product doesn't fit in the destination register; all other flags will be set to indeterminate values.

3.3.2.8 DIV, IDIV - integer division

Like MUL and IMUL, the 80x86 processors support both unsigned (DIV) and signed (IDIV) integer division. Their general forms are:

DIV <operand> IDIV <operand>

where <operand> can be either a register or memory location. Their operation bears a close resemblence to that of the original form of the MUL and IMUL instructions: by defintion the numerator is the accumulator and the denominator is the operand. Unlike MUL, the numerator must be larger than the denominator, so the instruction doubles the size of the operand when dealing with the accumulator. Therefore, in the examples:

DIV BL ; AL = AX/BL, AH = MOD(AX,BL) DIV BX ; AX = DX:AX/BX, DX = MOD(DX:AX,BX) DIV EBX ; EAX = EAX/EBX, EDX = MOD(EDX:EAX,EBX) (80386 & later)

the truncated integer produced by AX/BL is stored in AL and the remainder in AH. When a 16-bit operand is used (BX), the numerator is a 32-bit quantity defined by using DX as the high order word and AX as the low order word. The truncated integer is stored in AX and the remainder in DX. The analogous situation occurs when a 32-bit operand (EBX) is used. The IDIV instruction operates in a similar fashion on signed integer quantities. Unlike the other arithmetic instructions, DIV and IDIV can result in a fatal error if the denominator is zero or the result of division overflows the accumulator. In both cases, an INT 0 system trap will result, BIOS will produce a "divide by zero" or "division error" message and program execution will abort. The flags are scrambled as a result of any DIV or IDIV instruction.

3.3.3 Boolean operations

These operations explicitly deal with binary quantities. Most involve what is known as "Boolean algebra," a branch of math dealing with binary operations. The most common Boolean operations are AND, OR and NOT, though several other (less familiar) instructions are also included in this section.

3.3.3.1 AND - perform a logical AND

This instruction implements one of the fundamental Boolean algebra operations: the logical AND. AND is defined as giving a result of 1 only when both operands are one (it's like the concept of overlap in set theory). One way to define the AND (or any other Boolean) operation is by a 'truth table':

AND arg 1 0 1 a ----------- r 0 | 0 | 0 | g |----|----| 1 | 0 | 1 | 2 -----------

What the truth table shows is that an AND will give a result of 1 if and only if both operands are 1. Since the 80x86 processor deals with quantities that occupty multiple bits, we must expand the definition of AND to take into account 9-, 16- and 32-bit quantities. Fortunately, they can all be performed as bitwise operations (i.e., the result is the same as treating each bit independently). The generic format of the 80x86 AND instruction is:

AND <dest>, <src>

where both <dest> and <src> can be either registers or memory locations and <src> can also be immediate mode (constant) data. Both <dest> and <src> must be of equal size: 8, 16 or 32 (on 80386 and later) bits. A specific example is:

AND AX, word ptr 10h[BX] ; AX = AX and [BX+10h]

where the result of the logical AND between AX and the 16 bit memory location specified by [BX+10h] is stored in the accumulator AX. It has the following effect on the flags:

Note: most masking operations will use an AND instruction to mask bits to 0.

3.3.3.2 OR - perform a logical OR

This instruction implements another of the fundamental Boolean operations: the logical OR. An OR operation results in 1 if either of the operands is a 1. This operation is akin to the concept of union in set theory. Its truth table looks like:

OR arg 1 0 1 a ----------- r 0 | 0 | 1 | g |----|----| 1 | 1 | 1 | 2 -----------

In virtually all respects, the OR instruction behaves analogously to the AND instruction. It takes the same operands and affects the flags in identical fashion. It can be used to mask bits to 1.

3.3.3.3 XOR - perform an exclusive OR

The exclusive OR operation gives a result of 1 when the operands are unlike and 0 when they are alike. Its truth table therefore looks like:

XOR arg 1 0 1 a ----------- r 0 | 0 | 1 | g |----|----| 1 | 1 | 0 | 2 -----------

In all other ways, it behaves like AND and OR in its operation. XOR can be used to set a register to zero, since

XOR BX, BX ; zeroes BX

is shorter in length than the corresponding MOV instruction:

MOV BX, 0 ; zeroes BX

3.3.3.4 NOT - logical negation

Unlike arithmetic negation (see the NEG instruction), logical negation produces the Boolean complement of an argument: 1 is changed to 0, and vice versa. Again, this is done as a bitwise operation on the 80x86 processors, so NOT(1770h) = E88Fh. Its general form is:

NOT <operand>

where the <operand> can be a register or memory location. A specific example is:

NOT AX ; take the logical complement of AX

Unlike the other logical operations, NOT operates on a single operand and places the result back into the source. Also, it leaves all flags unchanged.

3.3.3.4 Sxx - Shift instructions

Another logical operation that can be performed is the shift operation, in which bits are moved within a data element. The 80x86 shift instructions come in 4 permutations: left/right and logical/arithmetic. Their generic form is:

Sxx <operand>, <count>

where the <operand> can be a register or memory location and <count> is either immediate mode (constant) data (on 80286 or later) or the CL register. Some specific examples are:

SHL AX,4 ; perform a logical shift to the left by 4 bits SHR [BX],1 ; perform a logical shift to the right by 1 bit SAL AX,CL ; perform an arithmetic shift to the left by CL bits SAR AX,4 ; perform an arithmetic shift to the left by 4 bits

The basic shift operation involves moving bits within a byte, word or double word either to the left or to the right. The value contained in CL or specified in immediate mode must be no larger than the number of bits in the operand. Because of the architecture of the 80x86 family, the SHL and SAL instructions are synonymous and correspond to the same machine language instruction. The behavior of an SHL instruction is illustrated below:

HO ... 4 3 2 1 0 ---------------------- | | | | | | | | ---------------------- C <- <- <- <- <- <- <- <- 0

In this example, the value of the low order bit (0) is shifted to its left (bit 1) and filled with a value of zero. Bit 1, in turn, has its value shifted to bit 2, bit 2 to bit 3 and so on until we reach the high order bit(HO) which receives the value of the bit to its right and sends its value to the carry flag (C). Therefore, if the AL register has a starting value of 6Ah (01101010), a SHL AL,1 instruction results in a value of D4h(11010100)in AL and the carry flag cleared. Note that in the illustration and example, a one bit shift is described. A multiple bit shift can be considered to be the equivalent of a repeated number of single bit shifts. In other words, a SHL AL, 3 instruction when AL has a starting value of 6Ah results in a value of 50h(01010000)with the carry bit set. The remaining flags are set as follows:

Note that if a zero bit shift is specified, the flags are left untouched.

The SHR instruction performs an analogous operation to SHL/SAL, except that bits are shifted to the right. A one bit SHR is illustrated below:

HO ... 4 3 2 1 0 ---------------------- | | | | | | | | ---------------------- 0 -> -> -> -> -> -> -> -> C

In this case, the low order bit is transferred to C and the high order bit is filled with 0. The flags are affected in the same way as with SHL/SAL. The SAR instruction is an arithmetic (as opposed to logical) shift, meaning that the value placed into the high order bit is not always a zero. Rather, in an SAR instruction the value of the high order bit remains unchanged. An SAR is shown below:

HO ... 4 3 2 1 0 ---------------------- | | | | | | | | ---------------------- -> -> -> -> -> -> -> C

The flags are affected by an SAR in the following ways:

(No flags are changed if the bit count is zero)

Although shifts are often used to unpack data, another use for them is for multiplication by powers of 2 (SHL/SAL) and integer division by powers of 2 (SAR). However, when negative numbers are divided using SAR, the result is different than what IDIV would give. This results from rounding behavior: IDIV rounds negative results up (toward zero) while SAR rounds down (toward negative infinity). Such behavior can be useful at times.

3.3.3.5 Rxx - Rotate instructions

The rotate operation is another logical operation that manipulates bits.

The 80x86 rotate instructions come in 4 permutations: left/right and ordinary/through carry. Their generic form is:

Rxx <operand>, <count>

where the <operand> can be a register or memory location and <count> is either immediate mode (constant) data (on 80286 or later) or the CL register. Some specific examples are:

ROL AX,4 ; rotate to the left by 4 bits RCL [BX],1 ; rotate through carry to the left by 1 bit ROR AX,CL ; rotate to the right by CL bits RCR AX,4 ; rotate through carry to the right by 4 bits

The rotate operation behaves much like a shift, but the bits "wrap around" from high order to low order (or vice versa) so that no information is lost. The behavior of an ROL instruction is illustrated below:

HO ... 4 3 2 1 0 ---------------------- | | | | | | | | ---------------------- |<- <- <- <- <- <- ^ -------------------|

In this example, the value of the low order bit (0) is shifted to its left (bit 1); bit 1, in turn, has its value shifted to bit 2, bit 2 to bit 3 and so on until we reach the high order bit (HO). The value of the high order bit is passed back to the low order bit (0). Therefore, if the AL register has a starting value of 6Ah (01101010), a ROL AL,1 instruction results in a value of D4h(11010100)in AL. Note that in the illustration and example, a one bit rotate is described. A multiple bit rotate can be considered to be the equivalent of a repeated number of single bit rotates. In other words, a ROL AL, 3 instruction when AL has a starting value of 6Ah results in a value of 53h(01010011).

The ROR instruction operates identically to ROL except that the bits rotate in the opposite direction and the low order bit is moved to the high order position. Flags are affected identically except that the carry flag contains the value of the low order bit prior to the rotate.

The RCL and RCR instructions behave similarly to their ROL and ROR counterparts, but involve the carry flag in the rotation between high order and low order bits. An illustration of the RCL operation is shown below:

HO ... 4 3 2 1 0 ---------------------- | | | | | | | | ---------------------- |<- <- <- <- <- <- ^ -------> C ------|

In this example, the high order bit (HO) is passed to the carry flag (C) as the carry flag is moved to the low order bit (0). In all other respects, the behavior of the RCL instruction is identical to that of the ROL. The flags are modified as described for ROL, except that the carry flag takes on the value of the last bit rotated into it. Likewise, the RCR uses the carry bit as an intermediary when passing information from low order bit to high order bit.

3.3.3.6 Bit test instructions

The 80x86 instruction set also includes several different functions for testing the values of individual bits in data elements and setting various flags to indicate its condition. The most basic of these is the TEST opcode:

TEST <operand1>, <operand2>

Where operand1 can be a register or memory address and operand2 can be either of those or immediate mode data. Both operands can be either 8-, 16- or 32-bit operands but must be the same size as one another. The TEST instruction a logical AND just like the AND instruction, but it does not store the result. It does change the flags just as an AND would:

The 80386 processor and later models also support a number of single bit test operations: BT, BTS, BTR and BTC. Their general form is:

BTx <operand1>, <operand2>

Where operand1 can be a register or a memory location and operand2 can be a register or immediate mode. Both operands must be either 16- or 32-bit quantities. Two specific examples using the BT instruction are:

BT AX, 1 ; test bit 1 of AX BT [AX], BX ; test the location pointed to by AX, indexed by BX

In the first example, bit 1 of AX is copied into the carry flag. The carry flag can subsequently be checked to determine the value of bit 1. When <operand1> is a register, the bit number must fall within the range appropriate for the size of the register. In this case, the value must be within the range 0-15 as AX is a 16-bit quantity. In the second example, the memory location pointed to by AX is tested by copying the bit number found in BX into the carry flag. Because <operand1> is a memory location, BX can have any value. If the value in BX is greater than 7, the address tested will be [AX]+BX/8 and the bit tested will be BX mod 8. All other flags are unaffected by the BT instruction.

The BTS, BTR and BTC instructions are variants of the BT instruction. They behave the same as BT, except that they set to 1(BTS), reset to 0(BTR) or invert (BTC) the tested bit after copying it into the carry flag. In all other respects they behave identically to BT.

3.3.3.7 SETxx - set on condition

The SETxx instructions are a family of opcodes that either set to 1 or clear to 0 a byte operand based on the status of one or more flags. Their general form is:

SETxx <operand1>

where operand1 can be an 8-bit register or memory address. The various forms of SETxx are listed in the table below, where O (overflow), S (sign), Z (zero), P (parity) and C (carry) are used for the various flags involved:

SETxx condition mnemonic ---------------------------------------------- SETO O=1 set if overflow SETNO O=0 set if no overflow SETS S=1 set if sign SETNS S=0 set if no sign SETZ Z=1 set if zero SETNZ Z=0 set if not zero SETP P=1 set if parity SETNP P=0 set if no parity SETPE P=1 set if parity even SETPO P=0 set if parity odd SETC C=1 set if carry SETNC C=0 set if no carry

In each of the preceding examples, <operand1> will receive a value of 1 if the condition is met by the flag bit in question and a value of 0 otherwise. Note that the SETP and SETPE instructions are synonymous, as are SETNP and SETPO. The following instructions are designed for use after an unsigned comparison using the CMP instruction:

SETA C=0,Z=0 set if above (>) SETNA C=1 or Z=1 set if not above (<=) SETAE C=0 set if above or equal (>=) SETNAE C=1 set if not above or equal (<) SETB C=1 set if below (<) SETNB C=0 set if not below (>=) SETBE C=1 or Z=1 set if below or equal (<=) SETNBE C=0,Z=0 set if not below or equal (>) SETE Z=1 set if equal (=) SETNE Z=0 set if not equal (<>)

Again, note the redundancy. For instance, SETAE is synonymous with SETNB and also with SETNC. This is done for completeness's sake, but can be both annoying and confusing to the novice. Another set of SETxx instructions were designed for use with signed comparisons using CMP:

SETG S=O,Z=0 set if greater (>) SETNG S<>O or Z=1 set if not greater (<=) SETGE S=O set if greater or equal (>=) SETNGE S<>O set if not greater or equal(<) SETL S<>O set if less (<) SETNL S=O set if not less (>=) SETLE S<>O or Z=1 set if less or equal(<=) SETNLE S=O,Z=0 set if not less or equal (>) SETE Z=1 set if equal(=) SETNE Z=0 set if not equal (<>)

The SETE and SETNE instructions are identical to those used for unsigned comparisons. Again, redundancy results from the use of a complete set of opcodes. Note also that the sign flag (S) is compared to the overflow flag (O), not zero, in these instructions. These commands are especially useful for mimicking the comparison functions of higher level languages by converting a comparison (using CMP) to a Boolean value using SETxx.

3.3.4 Flow control operations

Another important class of instructions governs the part of the code being executed. Such "flow control" commands are analogous to the jumps, conditional statements and loops of higher level languages.

3.3.4.1 JMP - unconditional jump

This is actually a family of instructions that differ only in the form that the target address takes. They have a generic format:

JMP <dest>

Where <dest> is the destination address to jump to. <dest> can be an 8-, 16- or 32-bit offset (usually specified as a label - see Sect. 3), a 16- or 32-bit address or a 16-bit register. Some examples of this instruction are:

JMP label ; jump to the memory location specified by label JMP word ptr 10h[BX] ; jump to a near indirect memory location JMP dword ptr 10h[BX] ; jump to a far indirect memory location JMP SI ; jump to a location specified by SI

The difference between the second and third examples is the size of the <dest> operand: a 16-bit (word) specification results in a jump within the current segment (near), whereas the 32-bit (dword) specification results in a jump to another segment (far).

3.3.4.2 Jxx - conditional jumps

This family differs in the condition tested for a jump. The general form:

Jxx <dest>

where <dest> is restricted to be an 8-bit displacement from the current location. The various forms of the Jxx family are listed below:

Jxx condition mnemonic ---------------------------------------------- JO O=1 jump if overflow JNO O=0 jump if no overflow JS S=1 jump if sign JNS S=0 jump if no sign JZ Z=1 jump if zero JNZ Z=0 jump if not zero JP P=1 jump if parity JNP P=0 jump if no parity JPE P=1 jump if parity even JPO P=0 jump if parity odd JC C=1 jump if carry JNC C=0 jump if no carry

(below are instructions designed for signed comparisons)

JA C=0,Z=0 jump if above (>) JNA C=1 or Z=1 jump if not above (<=) JAE C=0 jump if above or equal (>=) JNAE C=1 jump if not above or equal (<) JB C=1 jump if below (<) JNB C=0 jump if not below (>=) JBE C=1 or Z=1 jump if below or equal (<=) JNBE C=0,Z=0 jump if not below or equal (>) JE Z=1 jump if equal (=) JNE Z=0 jump if not equal (<>)

(below are the conditional jumps for use after signed comparisons)

JG S=O,Z=0 jump if greater (>) JNG S<>O or Z=1 jump if not greater (<=) JGE S=O jump if greater or equal (>=) JNGE S<>O jump if not greater or equal(<) JL S<>O jump if less (<) JNL S=O jump if not less (>=) JLE S<>O or Z=1 jump if less or equal(<=) JNLE S=O,Z=0 jump if not less or equal (>) JE Z=1 jump if equal(=) JNE Z=0 jump if not equal (<>)

As with the SETxx instructions, major redundancies exist in the Jxx family. Additionally, it is important to note that for each Jxx instruction there is a complementary JNxx instruction that jumps on the opposite condition. This is important because of the restriction on the displacement in the Jxx jumps. If you want to jump on a condition farther than 256 bytes from the current instruction, it is most easily accomplished by using the opposite conditional jump in conjunction with an unconditional jump to a distant address. For instance, if you want to jump to the location Far_away when the carry flag is set, you can do it with the following sequence:

JNC Clear ; skip over next instruction when carry clear JMP Far_away ; jump to Far_away only when carry is set Clear: ...

Using these instructions, assembly language programmers can create the same flow control constructs that IF...THEN...ELSE statements provide in higher-level languages.

3.3.4.3 CALL - call a subroutine call

This instruction is used to call a subroutine by jumping to the specified location while pushing the return address onto the hardware stack. The CALL instruction takes the generic form:

CALL <dest>

where <dest> can be either a memory location, a 16-bit register or an indirect reference. Several variants are shown below:

CALL subrout ; call the subroutine subrout CALL array[BX] ; call a subroutine pointed to by array[BX] CALL word ptr [BX] ; call a subroutine pointed to by [BX] CALL BX ; call a subroutine pointed to by BX

In the first case, the CALL function will push either IP or the CS:IP pair onto the stack before jumping to subrout. Whether IP or CS:IP gets pushed is determined by whether subrout is located in the same or a different memory segment. In the last three cases, only IP will be pushed since <dest> is a 16-bit quantity (and the jump must be intrasegment). In addition, the last two examples accomplish the same task. Use of a segmented address for <dest> will naturally result in a 32-bit address being pushed to the stack.

3.3.4.4 RETx - return from a subroutine

This set of instructions is the counterpart to CALL. They return from a subroutine by popping the calling address off the stack into IP (or CS:IP). There are three forms of this instruction:

RET ; return by popping RETN ; return near RETF ; return far

In the second example, a "return far" is executed by popping a 16-bit address off of the stack and jumping to that location. In the third example, a "far return" is executed by popping a 32-bit segmented address from the stack and jumping to that location. In the first example, the assembler must decide whether a near or far return is called for.

Another variant of these instructions is:

RETx <disp>

Where <disp> is a stack displacement used to increment the stack pointer <disp> bytes before the return and thereby remove any information that the subroutine may have placed on the stack. This form can be used with RET, RETN and RETF.

3.3.4.5 INT, INTO - generate a software interrupt

These instructions are related to a CALL instruction, but instead of transferring control to a subroutine they transfer control to an interrupt service routine, either written by you or provided by the operating system or BIOS. The form it takes is:

INT <value>

where <value> is a number between 0 and FFh. The function performed by the software interrupt is determined by the value provided. Various values used in I/O operations will be discussed in that section. The definition of other values should be found in the Intel Architecture Software Developers Manual, Volume 1, Chapter 4.

A special case version of the INT instruction is the INTO instruction, which takes no argument. In this case, INTO will exectute an INT 4 if the overflow (O) flag is set. This instruction is used to trap overflow exceptions, but will probably crash the system if executed with no service routine provided. When executed, both INT and INTO push the flags and CS:IP onto the stack before transferring control to the designated interrupt service routine. Both instructions change only the trace (T) flag.

3.3.4.6 IRET - return from a software interrupt

The IRET instruction is analogous to RET for an interrupt service routine. It takes no arguments and pops both the return address and the flag register from the stack. It differs from RET only in popping the flag register.

3.3.4.7 LOOP - decrement CX and jump

The LOOP instruction is useful for coding the assembly language equivalent of DO..WHILE DO...UNTIL and FOR loops. It functions primarily as a conditional jump instruction, but also decrements the loop counter register CX. It takes the form:

LOOP <dest>

Where <dest> is the destination address, with the same restrictions as the addresses in the conditional jump instructions. Functionally, it performs the equivalent of:

DEC CX JNE <dest>

and will generally be used to jump to an instruction that precedes it in memory to achieve a loop. Note that LOOP can be put to other uses as well. Because the decrement occurs before the JNE, a starting value of 0 in CX will loop 65,536 times before CX reaches zero again. To avoid this scenario, Intel provides an additional conditional jump (JCXZ) that can be used immediately before a LOOP instruction, as in:

Begin: JCXZ End LOOP Begin End: ...