Portable Executable File

Hi techies!! I was recently reading about Static Malware Analysis, and I found the term "PE File" being used most of the time but wasn't sure of what it actually is. When digging deep into it, there were many interesting things about PE Files, so here I am writing a blog on Portable Executable File for a more clear picture. In this blog, I will be discussing the basics of File Format and will go through the PE File Format, the structure and the tools to view the PE Files. 


What is File Format?



A file format is a structure of a file in terms of how the data within the file is organized. The data stored in a file must be viewed in a proper layout; thus, the program that uses the data must be able to recognize and access data within the file. For example, a file in the HTML File Format can be processed by the Web browser program so that it appears as a Web page, but it cannot display a file in a format designed for Microsoft's Word program. File format can be identified by the file name extension. 


A few of the more common file formats are:
  • Word documents (.doc)
  • Executable programs (.exe)
  • Web text pages (.htm or .html)
  • Images (.gif and .jpg)
  • Adobe Acrobat files (.pdf)
  • Multimedia files (.mp3)

What is PE File Format?



Portable Executable File Format is a file format used by Windows 32-bit and 64-bit Operating System for executables, DLLs, COM files, .NET executables, Object code, .FON Font files, NT's Kernel-mode drivers, etc. The PE file format contains the information that is important for the Windows OS loader to manage the wrapped executable code. COFF(Common Object File Format) was used in Windows NT systems before the PE file format. The different extensions used to recognize that file format are : .cpl, .dll, .drv, .efi, .exe, .ocx, .scr and .sys.


Basic Structure of PE File



Diagram 1 shows a basic structure of the Portable Executable File Format. You can also use a tool such as PE Viewer to view the basic structure of a PE File.


Diagram 1

1. DOS MZ Header


The first 64-byte of all the PE file has this header. This section recognizes if the file is a valid PE file or not. All the valid PE files contain the value of the first two-byte as 4D and 5A ("MZ" in ASCII) as shown in Exhibit 1, named after Mark Zbikowsky, a well-known architect of MS-DOS. Under this header, includes a list of structure. Here, we will be discussing two important ones i.e., magic and ifanew structure.


  • E_magic is the first field, also called magic number. The primary purpose of this field is to identify that the file is compatible with the MS-DOS file type. The value for all MS-DOS-compatible executable files is set to 4D 5A, as shown in Exhibit 1, representing the ASCII characters MZ. MS-DOS header is sometimes referred to as MZ headers.
  • E_ifanew is the offset to the PE Header. By using this, you can directly go to the PE Header. The windows loader looks for this offset to skip the DOS stub and go directly to the PE header.


2. DOS Stub


DOS Stub section contains the string "This program cannot be run in DOS mode.". It like a warning message displaying that the program cannot be run on windows. It starts just after a 4-byte reserved address "ifanew" and its standard universal size is 128 bytes. 

3. PE File Header


PE Header is also known as IMAGE_NT_HEADER and contains three main components as shown below -

i. Signature
  • The structure includes the DWORD value 50h, 45h, 00, 00 (meaning "PE" followed by two termination zeros), meaning its a signature indicating that the PE header starts here.
The below diagram illustrates the structure and value of the PE executable.


Exhibit 1
ii. File Header
  • The next 20 bytes after Signature represents the file header. It contains information about the physical layout and properties of the file which includes the following - 
    • Machine - The number in it identifies the type of machine such as Intel, AMD, etc.
    • NumberOfSections - Tells the number of Sections the PE file holds with it. If the value is 04h,00 it means it contains four sections.
    • TimeDateStamp - It represents the time when the linker or the compiler for an OBJ file produced this file.
    • PointerToSymbolTable
    • NumberOfSymbols
    • SizeOfOptionalHeader -  The value for an object file is set to zero. As the name suggests, this is the size of the optional header required for an executable file. 
Exhibit 2
    • Characteristics -  It contains the flag value which can help in identifying if the file is a DLL or an executable.

iii. Optional Header
This header follows FileHeader and makes the next 224 bytes containing information about the logical layout of the file. Some of the important ones are:
  • Magic - The unsigned integer that identifies the state of the image file. Exhibit 3 shows that the value is set to 0x10b for 32-bit executable.
Exhibit 3
  • AddressOfEntrypoint - The stored value in it presents the address where the execution of the file starts.
  • SectionAlignment - This is the alignment of sections when they are loaded into the memory. If the value is 130(1000h), this indicates that each section is going to get stored in multiple slots of 130 bytes each no matter the actual size of the section(less or more).
  • SizeOfImage - This value is the combined file size of all the sections of the file. It must be a multiple of SectionAlignment.
  • FileAlignment - This is the alignment of sections in the file when the file is not loaded. It is similar to SectionAlignment. The only difference is in the size of each slot. In this case, it's 134 bytes(200h).
  • DataDirectories - The last 228 bytes represent DataDirectory, an array or 16 IMAGE_DATA_DIRECTORY structures, each one of them relating to an important data structure in PE file, for example, Import table, Export table, etc. 

4. Section Table


This table immediately follows the optional header. It contains information about the Sections present in PE files. The total number of sections can also be viewed in the File Header under NumberOfSections. If the number of sections present in a PE file is five, then, there must be five IMAGE_SECTION_HEADER structures present just after the PE file header.
  • Name1 - An 8-byte null-padded UTF8 encoding string. This can be null.
  • VirtualSize - This is the actual size in bytes of the section's data. The size may be less than the size of the section on disk.
  • SizeOfRawData - The size of the section's data in the file on the disk.
  • PointerToRawData - This is so useful because it is the offset from the file's beginning to the section's data.
  • Characteristics - This flag describes the characteristics of the section.

5. PE File Section


PE File section contains the main content of the file, including code, data, resources, and other executable files. Each section has a header and a body.
  • .text - The section, also known as CODE, is the place where all the instructions reside. These instructions are further executed by the CPU. This is the section that contains "Entry Point," as mentioned earlier.
  • .rdata - The import and export information is represented by this section. This section stores other read-only data used by the program like literals, constant strings, etc.
  • .data - The .data section consists of the program's global data, which can be accessed from anywhere in the program.
  • .rsrc - The .rsrc section contains resources such as images, icons, menu, etc. used by the executable. ResHacker is a resource editor tool that displays this section in a structured tree format.

Tools

PE data can be viewed using various tools. Some of the free tools are listed below-
  • PE View - Available for Windows
  • PE Explorer - Available for Windows
  • FileAlyzer - Available for Windows
  • CFF Explorer - Available for Windows

Conclusion



Thus, this is a brief about the Portable Executable File Structure. For starters, this would be enough to get a basic understanding of the PE File Structure. If you want to dig deeper into this, you can definitely use Google to do it. I hope this blog was useful.

Comment and share if you like it. Source