Presentation is loading. Please wait.

Presentation is loading. Please wait.

SRE Basics SRE Basics 1.

Similar presentations


Presentation on theme: "SRE Basics SRE Basics 1."— Presentation transcript:

1 SRE Basics SRE Basics

2 In this Section… We briefly cover following topics Assembly code
Virtual machine/Java bytecode Windows PE file format SRE Basics

3 Assembly Code SRE Basics

4 High Level Languages First, high level languages…
Ancient high level languages Basic --- little structure FORTRAN --- limited structure C --- “structured” language C was designed to deal with complexity OO languages take this one step further Above languages considered primitive today SRE Basics

5 High Level Languages Object oriented (OO) languages
“Object” groups code and data together Consider best way to handle complexity (at least for now…) Important OO ideas include Encapsulation, inheritance, polymorphism SRE Basics

6 High Level Languages Program must deal with code and data Data Code
Variables, data structures, files, etc. Code Reverser must study control flow Conditionals, switches, loops, etc. SRE Basics

7 High Level Languages High level languages --- different users want different things Goes back (at least) to C vs FORTRAN Today, major tradeoff is between simplicity and flexibility Simplicity --- easy to write short program to do exactly what you want (e.g., C) Flexibility --- language has it all (e.g., Java) SRE Basics

8 High Level Languages Some languages compiled into native code
exe is specific to the hardware C, C++, FORTRAN, etc. Other languages “compiled” into “code”, which is interpreted by a virtual machine Java, C# Often possible to make compiled version For reverser, this distinction is far more important than OO or not SRE Basics

9 Intro to Assembly At the lowest level, machine binary
Assembly code lives between binary and high level languages When reversing native code, we must deal with assembly code Why assembly code? Why not “reverse” binary to, say, C? SRE Basics

10 Intro to Assembly Reverser would like to deal with high level, but is stuck with low level Ideally, want to create mental “link” from low level to high level Easier for code written in C Harder for OO code, such as C++ Why? SRE Basics

11 Intro to Assembly Perhaps biggest difference at assembly level is dealing with data High level languages hide lots and lots of details on data manipulations For example, loading and storing Also, low level instructions are primitive Each instruction does not do very much SRE Basics

12 Intro to Assembly Consider following simple C program
Simple, but far higher level than assembly code int multiply(int x, int y) { int z; z = x * y; return z; } SRE Basics

13 Intro to Assembly In assembly code…
int multiply(int x, int y) { int z; z = x * y; return z; } In assembly code… Store state before entering function Allocate memory for z Load x and y into registers Multiply x by y and store result in register Copy result back to memory for z (optional) Restore state that was stored in 1. Return z SRE Basics

14 Intro to Assembly Why are things so complicated at low level?
It’s all about efficiency! Reading memory and storing are slow No single asm instruction to read memory, operate on it, and store result But this is common in high level languages SRE Basics

15 Intro to Assembly Registers --- “local” processor memory
So don’t have to read and write RAM Stack --- “scratch paper” (in RAM) Holds register values, local variables, function parameters and return values E.g., storage for “z” in multiply example Heap --- dynamic, variable-sized data Data section --- e.g., string constants Control flow --- high level “if” or “while” are much more complex at low level SRE Basics

16 Registers Registers used in most instructions
Specifics here deal with “IA-32” Intel Architecture, 32-bit Used in “Wintel” machines We use IA-32 notation AT&T notation also exists Eight 32-bit registers (next slide) All 8 start with “E” Also several system registers SRE Basics

17 Registers EAX, EBX, EDX --- generic, used for int, Boolean, …, memory operations ECX --- generic, used as counter ESI/EDI --- generic, source/destination pointers when copying memory SI == source index, DI == destination index EBP --- generic, stack “base” pointer Usually, stack position after return address ESP --- stack pointer Curretn stack frame is between ESP to EBP SRE Basics

18 Flags EFLAGS --- special registers
Status flags updated by various operations to “record” outcomes System flags too, but we don’t care about them Flags are basic tool for conditionals For example, a TEST followed by a jump instruction TEST sets various flags, jump determines action to take, based on those flags SRE Basics

19 Instruction Format Most instructions consist of…
Opcode --- the “instruction” One or two operands --- “parameter(s)” Operand (parameters) are data Operands come in 3 flavors Register name --- for example, EAX Immediate --- e.g., hard-coded constant Memory address --- enclosed in [brackets] SRE Basics

20 Operand Examples EAX 0x30004040 [0x4000349e]
Read from (or write to) EAX register, depending on opcode 0x Immediate --- number is embedded in code Usually a constant in high-level code [0x e] This os a memory address Could be a global variable in high level code SRE Basics

21 Basic Instructions We cover a few common instructions
First we give general format Later, we give a few simple examples There are lots of assembly instructions But, most assembly code uses only a few About 14 assembly instructions account for more than 90% of all code SRE Basics

22 Opcode Counts Typical opcode counts, “normal” code SRE Basics

23 Opcode Counts Opcode counts, typical virus code SRE Basics

24 Instructions We consider following operations Moving data Arithmetic
Comparisons Conditional branches Function calls SRE Basics

25 Moving Data MOV is the most popular opcode
2 operands, destination and source: MOV DestOperand, SourceOperand Note the order Destination first, source second SRE Basics

26 Arithmetic Six integer arithmetic operations
ADD, SUB, MUL, DIV, IMUL, IDIV Many variations based on operands ADD Op1, Op2 ; add, store result in Op1 SUB Op1, Op2 ; sub Op2 from Op1 --> Op1 MUL Op ; mul Op by EAX ---> EDX:EAX DIV Op ; div EDX:EAX by Op quotient ---> EAX, remainder ---> EDX IMUL, IDIV --- like MUL and DIV, but signed SRE Basics

27 Comparisons CMP opcode has 2 operands Subtracts Operand2 from Operand1
CMP Operand1, Operand2 Subtracts Operand2 from Operand1 Result “stored” in flag bits If 0 then ZF flag is set Other flags can be used to tell which is greater, depending on signed or unsigned SRE Basics

28 Conditional Branches Conditional branches use “Jcc” family of instructions (je, jne, jz, jnz, etc.) Format is Jcc TargetAddress If Jcc true, goto TargetAddress Otherwise, what happens? SRE Basics

29 Function Calls Use CALL and RET RET can be told to increment ESP
CALL FunctionAddress …… RET ; pops return address RET can be told to increment ESP Need to reset stack pointer Why? SRE Basics

30 Examples What does this do? Compares value in EBX with constant
cmp ebx,0xf020 jnz What does this do? Compares value in EBX with constant Jumps to specified address if operands are not same Note: JNE and JNZ are same instruction SRE Basics

31 Examples What does this do?
mov edi,[ecx+0x5b0] mov ebx,[ecx+0x5b4] imul edi,ebx What does this do? First, add 0x5b0 to ECX register, get value at that memory and put in EDI Next, add 0x5b4 to ECX, get value at that memory and put in EBX Note that ECX points to some data structure Finally, EDI = EDI * EBX Note there are different forms of IMUL SRE Basics

32 Examples What does this do? PUSH four register values
push eax push edi push ebx push esi push dword ptr [esp+0x24] call 0x10026eeb What does this do? PUSH four register values PUSH something related to stack ptr Probably, parameter or local variable Would need to look at more code to decide Note “dword ptr” is effectively a cast CALL a function SRE Basics

33 Examples What does this do? Maybe “data structure in an array”
mov eax, dword ptr [ebp - 0x20] shl eax, 4 mov ecx, dword ptr [ebp - 0x24] cmp dword ptr [eax+ecx+4], 0 call 0x10026eeb What does this do? Maybe “data structure in an array” Last line ECX --- gets base pointer EAX --- current offset into the array Add 4 to get specific member of structure SRE Basics

34 Examples AT&T syntax pushl $14 pushl $helloWorld pushl $1
movl $4, %eax pushl %eax int $0x80 addl $16, %esp pushl $0 movl $1, %eax SRE Basics

35 Compilation Converts high level representation of code to binary
Front end --- lexical analysis Verify syntax, etc. Intermediate representation Optimization Improve structure, eliminate redundancy, … SRE Basics

36 Compilation Back end --- generates the actual code
Instruction selection Register allocation Instruction scheduling --- pipelining, parallelism Back end process might make disassembly hard to read Optimization too Each compiler has its own quirks Can you automatically determine compiler? SRE Basics

37 Virtual Machines & Bytecode
SRE Basics

38 Virtual Machines Some languages instead generate intermediate bytecode
Bytecode runs in a virtual machine Virtual machine is a program that (historically) interprets bytecode Translates bytecode for the hardware Bytecode analogous to assembly code SRE Basics

39 Virtual Machines Advantages? Disadvantages?
Hardware independent Disadvantages? Slow Today, usually just-in-time compilers instead of interpreters Compile snippets of bytecode into native code as needed SRE Basics

40 Reversing Bytecode Reversing bytecode is easy
Unless special precautions are taken Even then, easier than native code Bytecode usually contains lots of metadata Possible to reconstruct highly accurate high level language Bytecode can be obfuscated In worst case, reverser must learn bytecode But bytecode is easier than native code SRE Basics

41 Windows PE Files SRE Basics

42 Windows PE File Format Designed to be standard executable file format for all versions of OS… …on all supported processors Only small changes since PE format was introduced E.g., support for 64-bit Windows SRE Basics

43 Windows PE Files Trivia PE file on disk is a file
Q: What’s the difference between exe and dll? A: Not much --- one bit differs in PE files Q: What is size of smallest possible PE file? A: 133 bytes PE file on disk is a file Once loaded into memory, it’s a module File is mapped to module Address where module begins is HMODULE PE file may not all be mapped to module SRE Basics

44 Windows PE Files WINNT.H is final word on what PE file looks like
Tools to examine PE files Dumpbin (Visual Studio) Depends PE Browse Professional In spite of its name, it’s free PEDUMP (by author of article) SRE Basics

45 PE File Sections Each section is “chunk of code or data that logically belongs together” For example, all import tables in one section Code is in .text section Code is code, but many types of data Data examples Program data (e.g., .rdata for read-only) API import/export tables Resources, relocation info, etc. Can specify section names in C++ source SRE Basics

46 PE File Sections When mapped, module starts on a page boundary
Linker can be told to merge sections E.g., to merge .text and .rdata: /MERGE:.rdata=.text Some sections commonly merged Some sections cannot be merged SRE Basics

47 Relative Virtual Addresses
Exe file specifies in-memory addresses PE file specifies preferred load location But DLL can actually load just about anywhere So, PE specifies addresses in a way that is independent of where it loads No hardcoded addresses in PE Instead, Relative Virtual Addresses (RVAs) RVA is an offset relative to where PE is loaded SRE Basics

48 Relative Virtual Addresses
To find actual memory location, add RVA to the actual load address For example, suppose Exe file is loaded at 0x400000 And RVA is 0x1000 Then code (.text) starts at 0x401000 In Windows terminology, actual address is known as Virtual Address (VA) SRE Basics

49 Data Directory There are many data structures within exe DataDirectory
For efficiency, must be loaded quickly E.g., imports, exports, resources, base relocations, etc. DataDirectory Array of 16 data structures #define IMAGE_DIRECTORY_ENTRY_xxx defines array indexes (0 to 15) SRE Basics

50 Importing Functions To use code or data from another DLL, must import it When PE file loads, Windows loader locates imported functions/data Usually automatic, when program first starts Imported DLLs may import others For example, any program created with Visual C++ imports KERNEL32.DLL… …and KERNEL32.DLL imports from NTDLL.DLL SRE Basics

51 Importing Functions Each PE has Import Address Table (IAT)
IAT contains arrays of function pointers One array per imported DLL Each imported API has spot in IAT The only place where API address stored So, all calls to API go thru one function ptr E.g., CALL DWORD PTR [0x ] But, by default it’s a little more complex… SRE Basics

52 PE File Structure Next slides describe PE file structure
Note that all of these data structures defined in WINNT.H Usually, 32-bit and 64-bit versions For example, IMAGE_NT_HEADERS32 IMAGE_NT_HEADERS64 Identical except for widened fields for 64-bit SRE Basics

53 MS-DOS Header Every PE begins with small MS-DOS exe MS-DOS Header
Prints message saying Windows required MS-DOS Header IMAGE_DOS_HEADER 2 “important” values e_lfanew --- file offset of PE header e_magic x5A4D, “MZ” in ASCII… Why MZ? SRE Basics

54 IMAGE_NT_HEADERS Header
Primary location for PE specifics Location in file given by e_lfanew One version for 32-bit exes and another for 64-bit exes Only minor differences between them Single bit specifies 32-bit or 64-bit SRE Basics

55 IMAGE_NT_HEADERS Header
Has 3 fields typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 In valid PE, Signature is 0x In ASCII, this is “PE00” SRE Basics

56 IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 IMAGE_FILE_HEADER predates PE Struct containing basic info about file Most important info is size of “optional data” that follows (not really optional) SRE Basics

57 IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS { DWORD Signature; IMAGE_FILE_HEADER FileHeader; IMAGE_OPTIONAL_HEADER32 OptionalHeader; } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32 IMAGE_OPTIONAL_HEADER DataDirectory array (at end) is “address book” of important locations in exe Each entry contains RVA and size of data SRE Basics

58 PE Sections Recall, section is “chunk of code or data that logically belongs together” For example All data for exe’s import tables are in one section SRE Basics

59 Section Table Section table contains array of IMAGE_SECTION_HEADER structs An IMAGE_SECTION_HEADER has info about associated section Location, length, and characteristics Number of such headers given by field: IMAGE_NT_HEADERS.FileHeader.NumberOfSections SRE Basics

60 Alignment of Sections Visual Studio 6.0 Visual Studio .NET
4KB sections by default Visual Studio .NET 4KB by default, except for small files uses 0x200-byte alignment Also, .NET spec requires 8KB in-memory alignment (for IA-64 compatibility) SRE Basics

61 PE Sections So far, overview of PE file format
Now, look inside important sections… …and some data structures within sections Then we finish with look at PEDUMP Recall there are other similar utilities SRE Basics

62 Section Names .text ---The default code section.
.data --- The default read/write data section. Global variables typically go here. .rdata --- The default read-only data section. String literals and C++/COM vtables are examples of items put into .rdata. SRE Basics

63 Section Names .idata --- The imports table. It has become common practice (explicitly, or via linker default behavior) to merge .idata into another section, typically .rdata. By default, the linker only merges the .idata section into another section when creating a release mode exe. .edata --- The exports table. When creating an executable that exports APIs or data, the linker creates an .EXP file which contains an .edata section that's added into the final executable. Like the .idata section, the .edata section is often found merged into the .text or .rdata sections. SRE Basics

64 Section Names .rsrc --- The resources. This section is read-only. However, it should not be renamed and should not be merged into other sections. .bss --- Uninitialized data. Rarely found in exes created with recent linkers. Instead, the VirtualSize of the exe's .data section is expanded to make room for uninitialized data. .crt --- Data added for supporting the C++ runtime (CRT). A good example is the function pointers that are used to call the constructors and destructors of static C++ objects. SRE Basics

65 Section Names .tls --- Data for supporting thread local storage variables declared with __declspec(thread). This includes the initial value of the data, as well as additional variables needed by the runtime. .reloc --- Base relocations in an exe. Base relocations are generally only needed for DLLs and not EXEs. In release mode, the linker doesn't emit base relocations for EXE files. Relocations can be removed when linking with the /FIXED switch. .sdata --- "Short" read/write data that can be addressed relative to the global pointer. Used for IA-64 and other architectures that use a global pointer register. Regular-sized global variables on the IA-64 will go in this section. SRE Basics

66 Section Names .srdata --- "Short" read-only data that can be addressed relative to the global pointer. Used on the IA-64 and other architectures that use a global pointer register. .pdata --- The exception table. Contains an array of IMAGE_RUNTIME_FUNCTION_ENTRY structs, CPU-specific. Pointed to by IMAGE_DIRECTORY_ENTRY_EXCEPTION slot in the DataDirectory. Used for architectures with table-based exception handling, such as the IA-64. The only architecture that doesn't use table-based exception handling is the x86. .didat --- Delayload import data. Found in exes built in nonrelease mode. In release mode, the delayload data is merged into another section. SRE Basics

67 Exports Section Exe may export code or data
Makes it available to other exes Refer to an exported thing as a symbol At minimum, to export symbol, must specify its address in defined way Keyword ORDINAL tells linker to use numbers, not names, for symbols After all, names just a convenience for coders SRE Basics

68 IMAGE_EXPORT_DIRECTORY
Points to 3 arrays And a table of ASCII strings containing symbol names Only required array is Export Address Table (EAT) Array of function pointers Addresses of exported functions Export ordinal is an index into this array SRE Basics

69 IMAGE_EXPORT_DIRECTORY
Structure example SRE Basics

70 Example exports table: Name: KERNEL32.dll Characteristics: 00000000
TimeDateStamp: 3B7DDFD8 -> Fri Aug 17 23:24: Version: Ordinal base: # of functions: A0 # of Names: A0 Entry Pt Ordn Name 00012ADA ActivateActCtx 000082C AddAtomA •••remainder of exports omitted SRE Basics

71 Example Spse, call GetProcAddress on AddAtomA API
System locates KERNEL32’s IMAGE_EXPORT_DIRECTORY Gets start address of Export Names Table (ENT) It finds there are 0x3A0 entries in ENT Does binary search for AddAtomA Suppose AddAtomA is 2nd entry… …loader reads 2nd value from export ordinal table SRE Basics

72 Example (Continued) Call GetProcAddress on AddAtomA API
… AddAtomA has export ordinal 2 Use this as index into EAT (taking into account base field value) Finds AddAtomA has RVA of 0x82C2 Add 0x82C2 to load address of KERNEL32 to get actual address of AddAtomA SRE Basics

73 Export Forwarding Can forward export to another DLL Example
That is, must find it at “forward” address Example KERNEL32 HeapAlloc function forwarded to RtlAllocHeap function exported by NTDLL In EXPORTS section of KERNEL32, find EXPORTS HeapAlloc = NTDLL.RtlAllocHeap SRE Basics

74 Imports Section Importing is opposite of exporting
IMAGE_IMPORTS_DESCRIPTOR Points to 2 essentially identical arrays Import Address Table & Import Name Table IAT and INT Contain ordinal, address, forwarding info After binding, IAT rewritten, INT retains original (pre-binding) info Binding discussed next… SRE Basics

75 Imports Section Example Importing APIs from USER32.DLL SRE Basics

76 Binding Binding means IAT overwritten with actual addresses
VAs overwrite RVAs Why do this? Increased efficiency Loader checks whether binding valid SRE Basics

77 Delayload Data Hybrid between implicit & explicit importing
Not an OS issue A linker issue, at runtime There is IAT and INT for the DLL Identical to regular IAT and INT But read by runtime library code instead of OS Benefit? Calls then go directly to API… SRE Basics

78 Resources Section For resources such as…
icons, bitmaps, dialogs, etc. Most complicated section to navigate Organized like a file system… SRE Basics

79 Base Relocations Executable has many memory addresses
As mentioned, PE file specifies preferred memory address to load the module ImageBase field in IMAGE_FILE_HEADER If DLL loaded elsewhere, all addresses will be incorrect Base relocations tell loader all locations that need to be modified Note that this is extra work for the loader What about EXE, which is not a DLL? SRE Basics

80 Base Relocation Example
Consider the following line of code : 8B 0D 34 D mov ecx,dword ptr [0x0040D434] Note that “8B 0D” specifies opcode Also note the address 0x0040D434 Suppose preferred load is at 0x If it loads at that address, it runs as-is Suppose instead it loads at 0x Then code above needs to change to 8B 0D 34 D mov ecx,dword ptr [0x0050D434] SRE Basics

81 Base Relocation Example
If not loaded at preferred address, then loader computes delta For example on previous slide… delta = 0x x So, delta is 0x Also, there would be base relocation specifying location 0x Loader modifies address located here by delta SRE Basics

82 Debug Directory Contains debug info Not required to run the program
But useful for development Can be multiple forms of debug info Most common is PDB file SRE Basics

83 .NET Header .NET executables are PE files
However, code/data is minimal Purpose of PE is simply to get .NET-specific info into memory Metadata, intermediate language (IL) MSCOREE.DLL at start of a .NET process This dll “takes charge” and uses metadata and IL from executable So PE has stub to get MSCOREE.DLL going SRE Basics

84 TLS Initialization Thread Local Storage (TLS)
.tls section for thread local variables New threads initialized using .tls data Presence of TLS data indicated by nonzero IMAGE_DIRECTORY_ENTRY_TLS in DataDirectory Points to IMAGE_TLS_DIRECTORY struct Contains virtual addresses, VAs (not RVAs) The actual struct is in .rdata, not in .tls SRE Basics

85 Program Exception Data
x86 architecture uses frame-based exception handling A fairly complex way to handle exceptions IA-64 and others use table-based approach Table containing info about every function that might be affected by exception unwinding Table entry includes start and end addresses, how and where exception to be handled When exception occurs, search thru table… SRE Basics

86 PEDUMP Tools for analyzing PE files Dumpbin (Visual Studio) Depends
PE Browse Professional In spite of its name, it’s free PEDUMP (by author of article) SRE Basics


Download ppt "SRE Basics SRE Basics 1."

Similar presentations


Ads by Google