X86 Assembly Language

1. Introduction to X86 Assembly Language
2. Analyzing Binary Code
- 2.1. Decompilation
- 2.2. Writing in assembly
3. Basic X86 Instructions

1. Introduction to X86 Assembly Language

Class: Malware Analysis and Incident Forencsis
Topic: X86 Assembly language

2. Analyzing Binary Code

Malicius code can be different from a compiler-generated-code, thats because there can be:

hand-written code
self modifying code
subtle semantic. For example the code is represented in a mode, but there can be a jump to an instruction that chenges the behaviour of the code
obfuscation of control and data flows
plenty of other decoys.

2.1. Decompilation

Decompilers are tools that from binary code transform it to imperative code, some examples are: Ghidra, Hex-Rays and more. Their behaviour is euristic, so a lot of information are lost, like: name of variables, type of variables and decompiling obsfuscated binary lead to obfuscated source code. But decompiling is still useful and used by reverse-engineering professionals.

The scope of the course is not to teach decompilation, but how to find malicious behaviours in an executable.

2.2. Writing in assembly

When writing in assembly there is no mediator, we use a Instruction Set Architecture that works as an interface between the humans and the machine. The assembler produces the binary code from a ISA-conformant assembly code. For Intel and AMD CPUs the most common ISAs is the X86. Most malwares today are 32-bit because they want to hit as many targets as possible.

A word is the natural data unit for a specific CPU-design. For X86 processors, the word size is 32bit, however for backward compatibility x86 instruction assume a word operand to be 16-bit long, while a dword operand is 32-bit word.

Memory always operates at byte level. Endianness specifies how multi-byte sequences are read/written from/to memory. Big Endian more significant byte first, Little Endian least significant byte first; little endian is the most popular.

2.2.1. Register

Code uses many variables. CPUs work on few registers, a registers hold data with fast access time; there are several categories of registers, and are all 32-bit wide:

General purpose. They are A, B, C, D, SI and DI, the first four can be accessed with different level of granularity: the whole 32-bits or just the last 16-bits. ESP points to the top of the stack, EBP is set to be equal to the ESP when entering into a funciton, it can be used by the compiler to generate faster code.
Status register: holds thruth values on the state of the processor after the last executed instructions.
Program counter: holds the address of the instruction being executed. EIP it is not writable

2.2.2. Instruction Cycle

You write the instruction in memory, it can range from 1 to 15 bytes, since the size can vary it is possible to committ errors. The bytes are stored consecutively in memory along with any immediate operands Registered operands are embedded instead in the opcode binary representation.

The CPU works like this:

Fetch an instruction from address stored in EIP
the control unit determines meaning of the instruction, and it knows how long the instruction will take
carry out the computation using ALU or move data
The EIP advances to the next adjacent instruction as soon the execution of the last instruciton starts.

2.2.3. Memory Addressing

Addressing modes provide a way to express a way to express the address of data to be read from/wirtten to the main memory; it is not possible to read and write in the same instruction a register is needed. The expressions can take immediate operands, register or boths. Some common addressing modes are (the first operand is the destination):

Immediate: `move eax,[0x1000]`
Register: `mov eax, [esi]`
Register + offset: `mov eax, [esp-8]`
Register*width + offset: `mov eax, [ebx*4+oxff]`
Base + Reg*width + offset: `mov eax, [edx+ebx*4+8]`

2.2.4. Memory Map of a process

In the context of an Operating System there is the concept of process, there are some part of memory that have a specific purpose. In windows lower 2GB are for user space, upper 2GB are for kernal code, stack grows to low addresses 🤯.

Key user page data is shared by the kernel to the user space, and it is used by malware. The base address of a program image is `0x00400000`, the program image has different parts: PE headers, code, imports and resources. The compiler can use the base address or can randomize it (adding more entropy) using the ASLR. Then each program needs memeory for the stack and memory for the dynamic variables (heap), stack and heap grow in different directions:

stack grows in lower addresses
Heap grows to greater addresses

Every thread of the program gets it’s own stack, and dynamic link libraries will have their own Program Image and their addresses are randomized but are shared between processes. If a DLL is modified a copy will be created.

2.2.5. Memory Protection Mechanism

Paging gives more contraint granularity when touching memory, normally a typical page size is 4096 bytes, enforce read/write/execution permission on regions. Also operating systems may enforce high-level security mechanisms to hinder memory vulnerability exploitation attack:

DEP (Data Execution Prevention)
ASLR (Address Space Layout Randomization)
Heap allocation randomization

3. Basic X86 Instructions

Programs are written using three kinds of instructions:

data movement are instruction used to copy data, mov, push and pop. push first substract 4 bytes, then data is added to the stack. pop will read from the top of the stack and write to a destination. Both of them works with registers.
Arithmetich instructions are relevant for many reasons, to do math, to compute offsets and function addresses, to modify pointer modification (reserving memory), and used to perform data movement (_cheap obfuscation_).
Logic intructions, `and`, `or` and `xor` are bit-wise operands. `xor` is used a lot to set a register to 0.
Rotate / shift intructions. Rotate and shift bit malipulation are very popular, and used to swap half of a number with the other half.
Control flow control. Control flow can be implemented in three ways, they are used to change the behaviour of the program during the execution.
- Unconditional branch, a so called jump.
- Conditional brancj, the EIP is replaced with some desidered address depending on the value of one or more bits from EFLAGS.
- Function calls and returns are special type of unconditional branches.
  
  To perform a program flow two ingredients are needed: a mean to specify the target of a branch, a mean to evaluate the code.

3.1. Unconditional Branch

An unconditional branch can take as a destination an offset specified as relative or absolute; or an absolute address that is provided as a register or memory operand.

3.2. Conditional Branch

Conditional branches evaluate a condition over seleted bits of the EFLAGS register, two compare two values we can use the cmp operand. Some common cases of conditions evaluated on EFLAGS are:

CF: carry flag
OF: overflow flag
SF: sign flag
ZF: zero flag

There is a big family of jcc intruction, in which cc is the condition. je, jz, jg, jge, ja and jae.

Another common comparison instruction is test that computes the bitwise AND of the operands without modifying them.

The problem is that malware won’t use only cmp and test, but they can use sub or dec in a smart way.

3.3. Functions and stack frames

We can say that a funciton is a unit of code that controls register values and its portion of stack independently of other units.

The base pointer can be used to reference a local variable location via fixed offsets. Accessing local variables with ESP is like chasing a moving target.

Logically we say that a function is a self contained unit of code, it is important to stick to some rules when writing functions in assembly, some rules depends on the OS we’re writing on and some other rules depends on the architecture of the CPU.

The stack is conceptually divided into frames, one for each currently active function. A frame includes:

the arguments for the function invocations
local function variables and other storage
return address for the call

When entering a function the EBP is set equal to the ESP, and it (the EBP) can be used to reference local viariable via a fixed offset. To invoce functions we use the call instruction:

It stores on the stack the address of the instruction that follows it

The ret instruction is used to resume the execution of the caller; those instructions exist only at a logical level, a malicious programmer can scramble the layout to make difficoult to identify the boundaries of a function.

3.4. Calling Convention

As said above, there are some conventions to be respected when writing assembly functions, those conventions helps us to know which value are preserved and how to pass value between functions. More specifically a convetion specifies three elements:

parameter passing
align stack pointer
registers to preserve for the callers

When calling a function that interacts with the OS you must follows Windows API rules.

3.4.1. CDECL

In the CDECL convention the arguments are pushed in the reverse order (in such a way that the first argument is on top of the stack), then the caller saves EAX, ECX and EDX before the call values are saved onto the stack, the calle, instead, uses EBX, EBP, EDI, ESI.

At the end the ESP adjust ESP after the return to retrieve stack values and the return value (if present) is stored in EAX.

CDECL Example

3.4.2. STDCAL (Windows API)

It is very similar to the CDECL convention, but is the callee that have to adjust the stack pointer upon return; for this reasons the instruction ret N is used, N is the amount of bytes to be added to the stack pointer after the return (so in total, the amount of bytes to be added is \(N + 4\)).

STDCAL Example

3.4.3. Caller and Callee saves recap

When calling a function the Caller and the Callee will push reigsters before a call and in the prologue respectively:

Caller-save registers are pushed to the stack before a call, if the caller is going to use their values after a call: EAX, ECX, EDX
Callee-save registers are pushed to the stack in the prologue of the callee when it needs to use them: EDI, ESI, and EBX

It is important to note that in custom code we can find any sort of assortment.