Basic Static Analysis

Table of Contents

1. Lecture 2 - Querzoni

2. Basics of Static Analysis

We’ll introduce the problem on what we’re looking for in a Malware, introducing the main techniques that we’ll use during the course.

People in charge of the defense of a IT system have to understand whats happening and how to measure and contain the attack, acquiring bits of information and use them to better protect the system in future. Understand how an attack folds it’s foundamental to understand how to protect from simila attacks in the future.

A lot of attacks are based on the usage of malicious software, so it is important to study the internal of such softwares, to extract from there who is the attacker and what are his intentions. So it is a critical part of incident response.

After the anlysis of a malware the ideal would be to have a knowledge that covers:

  • How it works, knowing the enemy’s strategies
  • How to identify it, in order to identify the threat behind that malware, so we want to We want to extract signatures and IoCs to prevent further similar infections. The signatures can be used to create rules, but this approach is typically inefficient because its extremely simple for the attacker to make little changes to the simple to make a signature invalid. So more general characteristics are needed.
  • How to defend or eliminate it. This is important for the last phase of the Incident Response Process, the recovery. To perform a good recovery We need to know all the actions that the malware did to our system.

2.1. Signatures

By analysing the internal of a malware we can identify bits of information that will allow us to write rules, rules used to identify similar malicious software from the same threat of threats with similar characteristics. There are two big families of signatures:

  1. Host based Signatures. They are levereged at host level, they can be used to identify file, processes or registry keys on a victim computer that indicate an infection; they focus on what the malware does on the system non the malware itself. In this way we overcome the possibility that the attacker has to easily modify the signature of the maliciuos executable. So we are talking about behavioural analysis.
  2. Network based Signatures. They are used to detect malware by analysing network traffic, does the malware inspect the surroundings? or does it contact an external C2 server? Clearly the Network based signatures can be identified without analysing the interals of a malware via network inspection, so to be really effective it has to be coubled with internal malware analysis.

2.2. Safety Guidelines

The inspection is performed via a set of tools, methodology and personal experience. First of all it is important to say that Malware Analysis can be (it’s) a riskly activity, so doing something wrong can cause harm. All the analysis have to be perfomed inside a safe environment. Malicious code can be potentially written by criminals, so it is important to be extremely careful, if an IP contained in a malware is live there can be an adversary behind it.

  • Keep an offline backup
  • Always use a VM
  • Disconnect the VM from the internet
  • If you find references to external IP addresses never ever try to directly contact them. There are ways for a malware to know if it is executed in an analysis environment.

2.3. Approacches

We’ll use two main approacches: static analysis and dynamic analysis. The first means that the content of the malware is inspected without running it, using tools like strings, PE Inspection tools, and a disassembler; The latter means that the inspection is performed running the malware monitoring its effects and interactions with the operating system. Clearly performing dynamic analysis is way more risky than static analysis; it’s important to use snapshots and tools like RegShots, Process Monitor, Process Cracker, Capture BAT and debuggers to have an high level view on how to the malware interacts with the environment like registry keys, filesystem and processes.

The Static Analysis lacks of the context provided by the execution state, so you have to infer it by experience, but it is extremely difficoult. Of course more complex malwares can detect if they are executed by a debugger and show a behaviour that is not standard and that will never be actually performed in normal condition.

The approach suggested is:

  1. Start with automatic tools extracting base informations
  2. Perform static analysis and use previous extracted information to understand the overall structure
  3. For specific functions, where the expected behaviour is no obvious from static analysis use dynamic analysis.

Note that it is not a sequential process, it can happen to jump back and fourth between static and dynamic analysis.

2.4. Basic Analysis

The process of basic analysis has no general definition, but it can be viewed as the fact that if the analyst does not know how to use advanced tools, but there are plenty of tools that can automatically extract features from the malware. So as starting point they provide useful informations.

One really basic tools are strings used to extract IP addresses, name of API functions and more. Using this tools is quick and easy, but with complex malware a lot information will be missing like in case of obfuscation. On the dynamic analysis side automatic tools can be not effective on all malware because evasion techniques can be used by the coder.

2.5. Advanced Analysis

To overcame the limitations derived from obfuscation and evasion techniques used by the attacker the analyst has to resolve on more advanced tools. From the static point of view it is needed to inspect the disassembled code of the malware, and mic these with the controlled execution of parts of the malware through debugging.

3. Types of Malware

There is no single categorization, in general we recognize:

  • Backdoor
  • Botnet
  • Downloader
  • Information-stealing malware
  • Launcher
  • Rootkit
  • Scareware
  • Spam sending
  • Worms

Real malware can have a mix of characteristics coming from different categories. So it is very difficoult to sharply divide malware in families.

4. Expected Behaviors

We are looking for a set of well defined behaviors.

4.1. Uniqueness

Typically malware is not meant to be an industrial level piece of sotware, so it may contains bugs, spaghetti code; so it can be incopatible with other piece of software. Quite frequently the malware is incopatible with itself, when two instances of the same malware runs at the same time a rece condition can happen between them. So on startup the malware will check for uniqueness, so it will check if another instace itself is running. There can be several variations like patch a vulenrability used for exploitation.

Note that for a malware creating a signature to check uniqueness is risky, because it can used against it for detection purposes.

4.2. Environment Checks

Most malware are written to target specific categories of users or machines, like specific companies of geographical regions. So the malware can limit its spread via specific checks based on time/data, localization, presence of vulnerability and more.

4.3. Persistence

The malware will make sure to be able to survive to different states of the machine. To guarantee its persistence it will create different copies of itself and buries them in the filesystem changing the configuration of the Operating System to preserve its persistence.

4.4. Obfuscation and Evasion

4.5. Fingerprinting

A malware wants possible to contact a C2 server, to do this it has to fingeprint the infected machine to publicize its presence. To perform this process the malware has to collect information about the infected machine.

4.6. Communication

The malware often interacts with C2 server, it can use heterogenous link types and protocols.

5. General Rules

When analyzing a malware it is not important to understand 100% of the code, just understand the minimum portion of the code to understand the behavior of the malware; on the other side, to not miss key features it is important to use different tools and know how to use them.

6. Basic Static Analysis Techniques

6.1. Antivirus Scanning

Antivirus software are basic and automatic tools, given a piece of software they give the best guess to the fact that that software is a malware or not. Even today antiviruses can be fooled by malware coders.

Virus Total is a web app that, given a suspicious file it will use all the avaiable antivirus on the market to scan it and give a statistical measure of it that file is a malware (and obviously it does many more thing). Virus Total has also some drawbacks, if the malware you upload is unknown you have to problems:

  1. It will not give any useful information
  2. When submitted it takes the right to collect that sample, so the adversary can check if the virus has been found
  3. If the malware is targeted it may contains private information of the target

6.2. Hashes

To avoid the points (2) and (3) it is possible to perform a search on Virus Total without upload the sample just using the (MD5 or SHA-1) hash of the sample, this signature is extremely weak because it can be easily circunvented by the attacker.

The hash of a sample is a characteristic of the sample itself, not a characteristic of its behaviour.

A sample is a specific instance of a malware, the malware is the composition of source code, the strategy decided by the coder and the compiled code. A single malware can have different samples.

6.3. A file’s strings, functions and headers

To go deeper in the understanding of a malware it is necessary to look at its internals. A first simple way to do it is to look for the strings contained into the malware itself. The tool used to perform this operation is strings. A common way used by malware writers to obfuscate the presence of strings it to use packing.

Packing was used in the past to reduce the size of software; it works like a ZIP file, the compressed code will be decompressed in memory, so the in disk space of the software will be less than the in memory space of it. The usage of strings tool on packed file extracts very few informations; there are tool to understed which packing technique is used and also to unpack it, an example of tool is PEID.

7. File execution in Windows

When double clickin an executable file in windows the OS loader will perform a series of activities to launch the file:

  • load data of the executable file
  • interpret the content of the file
  • instance memory
  • intance the context of execution

The structure of the executable is prefixed, in Windows the PE standard is used, depending on the metadata the OS can understand which kind of file it is.

PE_file-format.png

  • MS-DOS Header and the DOS stub are used for compatibility
  • PE Header contain metadata used to correctly load and execute de file
  • At least to sections. Each section contains specific kind of data. The purpose of the section table is to describe the sections. The most common sections present in an executable are:
    • Code Sections named .text
    • Data Sections named .rdata
    • Resource Sections named .rsrc
    • Export Data Section
    • Import Data Section
    • Debug Information Section

Note that the name of these sections is there only for human interpretetion, so a skilled malware writer can alter the name or the content of a specific section just to confuse the analyst, without impacting the execution of the malware itself.

7.1. DOS Header and STUB

They are strctured like:

dos-header_stub.png

THe DOS Stub doesn’t have a maximun size, today typically this part is small, but some malware can contains code in the DOS stub to hide it. The first part of the DOS Header contains the DOS magic signature because it’s values are always the same: MZ, then at the very end of the DOS Header (last 4 bytes) there is the pointer to the PE Header, it contains the number of bytes to be skipped to reach the PE Header after the DOS Stub that has variable lenght.

dos-header_stub_2.png

7.2. PE Header

It containes quite a lot of fields that are used by the operating system to create a memory footprint of the process that will host the program that will run. The first 4 bytes always contain the PE Signature: PE. COFF File Header it is 20 bytes long, and it is followed by the Optional Header, that despite of the name, are not optional, it is composed by two parts: the headers, and the DataDirtectory form a total of 224 bytes.

7.2.1. COFF Header

Some of the elements present in this header are deprecated, so we’ll focus on just two of them: the #NumberOfSection, and the Characteristics one. The first contains how many sections are present in the file, and the latter is a set of flags that gives the OS information about the nature of the file (like if it is an EXE or a DLL). Another important part is the SizeOfOptionalHeader that indicate the total size (headers fixed and DataDirectory variable).

7.2.2. Optional Headers

The FileAlignment field deals to the way data in organized into the filesystem, it indicate the granularity of sections alignment in file (for example if the sections start at multiple of 512). The SectionAlignment field deals to the way data is organized in memory, at the same matter of the FileAlignment way, of course the granularity will be different because different technologies need different ggranularity to be accessed efficiently. The ImageBase contains theprefferred load address for the file in memory; given thath the address space is virtualized in modern OSs the programmer may want to indicate the preferred load address. The AddressOfEntryPoint contains the address of the first instruction to be executed, it is not necessarly the first instruction of the main function. The SizeOfImage contains the overall size of the PE image in memory, it is the sum of all headers and section aligned to SectionAlignment; the SizeOfHeaders its the size of all headers plus section part.

7.2.3. Section Table

There is of table for each section defined in the file:

section_table.png

8. Imports

The OS has to take care to load code from external libraries, library linking can happen in three different ways: statically, dynamically and at runtime.

8.1. Static Linking

It is rarely used for Windows Executable, the code of the linked functions became an integral part of the compiled code, it makes executable large in size. Statical linked libraries represent a small problem when static analyzing a sample, because it increase the difficoulty of the binary: there is no difference between the code written by the coder and the imported one.

8.2. Dynamic Linking

It is the most common method used in windows applications, the OS search for necessary libraries when the program is loaded. The coder just declare which functions are needed to the program and are linked at load time by the os, the metadata are loaded in the ImportTable of the PE Header. If a library is not present in the filesystem the OS will raise an error.

8.3. Runtime Linking

The library is loaded when the software is already running. The process of locating the library is on the shoulder of the coder. It is unpopular in friendly software, but common in malware especially if they are packet or obfuscated. It is most commonly done with the LoadLibrary and GetProcAddress function. A malware coder prefers the runtime linking method because it makes the malware more difficult to reverse and individuate.

8.4. Clues 🔍 in libraries

The PE header lists every library and function that will be loaded, their names can reveal what the program does, for example, URLDownloadToFile indicates that the program downloads something. Modern malware interact a lot with the Operating System, so just inspecting the ImportHeader will reveal which are the interaction between the malware and the OS. A tool that can be used to inspect the ImportTable (and the ExportTable) of a PE file is Dependency Walker.

Some common libraries used are: Kernel32.dll, Advapi32.dll, User32.dll and so on. Windows OS make available a lot of different libraries to the user.

8.5. Clues 🔍 in the PE Header

The TimeDateTimestamp provide the timestamp of compilation, and this information can be used in combination with other information to pin point in time the creation of the software.

9. Resource Hacker

It is a nice tools that make possible the browse of the .rsrc section, it contains strings, icons, and menus of the PE file.

Author: Andrea Ercolino

Created: 2022-12-12 lun 12:10