COM850 Computer Hacking and Security

COM850 Computer Hacking and Security
Lecture 1. x86 Prof. Taeweon Suh Computer Science & Engineering Korea University

x86? What is x86? Generic term referring to processors from Intel, AMD and VIA Derived from the model numbers of the first few generations of processors: 8086, 80286, 80386,  x86 Now it generally refers to processors from Intel, AMD, and VIA x86-16: 16-bit processor x86-32 (aka IA32): 32-bit processor * IA: Intel Architecture x86-64: 64-bit processor Intel takes about 80% of the PC market and AMD takes about 20% Apple also have been introducing Intel-based Mac from Nov. 2006 * aka: also known as

x86 History (as of 2008)

x86 History (Cont.) 8086 in 1978 2009 2011 2013 2012 4-bit 8-bit
32-bit (i386) 32-bit (i586) 32-bit (i686) 64-bit (x86_64) 2009 2011 1st Gen. Core i7 (Nehalem) 2nd Gen. Core i7 (Sandy Bridge) 2013 2012 4th Gen. Core i7 (Haswell) 3rd Gen. Core i7 (Ivy Bridge)

Moore’s Law Transistor count will be doubled every 18 months
1.7 billions Montecito 42millions Exponential growth 2,250

Feature Size (Technology) Trend

P ≈ CVDD2f Power Dissipation
By early 2000, Intel and AMD made every effort to increase clock frequency to enhance the performance of their CPUs But, the power consumption is the problem P ≈ CVDD2f C: Capacitance VDD: Voltage f: Frequency * Prescott: 3.8 GHz. 31 pipe stages * Tejas: was slated to operate at 7GHz or higher; #pipeline stages = 40 ~ 50. Cancel at May 2004.

Power Density Trend Source: Intel Corp.

Watch this! Click the chip Slide from Prof H.H. Lee in Georgia Tech

How to Reduce Power Consumption?
Reduce supply voltage with new technologies i.e., reducing transistor size Keep the clock frequency in modest range No longer increase the clock frequency Then… what would be the problem? So, the strategy is to integrate simple many CPUs in a chip Performance Dual Core, Quad Core….

Multi-core Processor Gala
Prof. Sean Lee’s Slide in Georgia Tech

Intel’s Core 2 Duo 2 cores on one chip
Two levels of caches (L1, L2) on chip 291 million transistors in 143 mm2 with 65nm technology L2 Cache Core0 Core1 DL1 IL1 Source:

Intel’s Core i7 4 cores on one chip
Three levels of caches (L1, L2, L3) on chip 731 million transistors in 263 mm2 with 45nm technology

995 million transistors in 216 mm2 with 32nm technology
Intel’s Core i7 (2nd Gen.) 2nd Generation Core i7 Sandy Bridge L1 32 KB L2 256 KB L3 8MB 995 million transistors in 216 mm2 with 32nm technology

1.4 billion transistors in 160 mm2 with 22nm technology
Intel’s Core i7 (3rd Gen.) 3rd Generation Core i7 L1 64 KB L2 256 KB L3 8MB 1.4 billion transistors in 160 mm2 with 22nm technology

AMD’s Opteron – Barcelona (2007)
4 cores on one chip 1.9GHz clock 65nm technology Three levels of caches (L1, L2, L3) on chip Integrated North Bridge

Intel Teraflops Research Chip
80 CPU cores Deliver more than 1 trillion floating-point operations per second (1 Teraflops) of performance Introduced in September 2006

Intel’s 48 Core Processor
48 x86 cores manufactured with 45nm technology Nicknamed “single-chip cloud computer” Debuted in December 2009

Model of Memory Hierarchy
Reg File L1 Data cache Inst cache L2 Cache Main Memory DISK SRAM DRAM Slide from Prof Sean Lee in Georgia Tech

x86 Operation Modes Real Mode (= real address mode) Protected Mode
Programming environment of the 8086 processor 8086 is a 16-bit processor from Intel Protected Mode Native state of the 32-bit Intel processor 32-bit mode IA-32e mode (Intel) or Long mode (AMD) 2 sub modes: Compatibility mode and 64-bit mode Compatibility mode is enabled by the operating system on a code segment basis. It means that a single 64-bit OS can support both 64-bit applications running in 64-bit mode and legacy 32-bit applications running in compatibility mode.

Registers in x86 Registers in 8086
4 segment registers (16-bit) CS, DS, SS, ES 8 general-purpose registers (16-bit) AX, BX, CX, DX, SP, BP, SI, DI Registers in x86-32 (Protected Mode) 6 segment registers (16-bit) CS, DS, SS, ES, FS, GS 8 general-purpose registers (32-bit) EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI

Registers in x86 Registers in IA-32e (Long mode)
6 segment registers (16-bit) CS, DS, SS, ES, FS, GS 16 general-purpose registers (64-bit) RAX, RBX, RCX, RDX, RSP, RBP, RSI, RDI, R8 ~ R15

EFLGAS in x86

Software Compatibility
Compatibility mode allows system software to implement binary compatibility with existing 16-bit and 32-bit x86 applications. Real mode is not supported when the processor is operating in long mode because long mode requires that paged protected mode be enabled. Virtual 8086 mode is not supported when the processor is operating in long mode. My thoughts: applications in protected mode can manipulate either 32-bit or 16-bit data. The 16-bit code can be written without entering to Virtual 8086 mode or real mode according to Table 1-1. AMD64 Architecture Programmer’s Manual. Vol 2 System Programming

Segmentation and Paging in Protected Mode

Intel Pentium Processor (1993)
TLB in Processor Translation Lookaside Buffer (TLB) TLB is there for Virtual Memory Processor Main Memory virtual address physical address TLB CPU core data Intel Pentium Processor (1993)

Real Mode Addressing In real mode (8086), general purpose registers are all 16-bit wide Segment registers specify the base address of each segment Segment registers CS: Code Segment for instructions DS: Data Segment for data SS: Stack Segment for stack ES: Extra Segment could be used to store more data Addressing method Segment << 4 + offset = physical address 0xFFFFF Main Memory (1MB) mov ax, 2000h mov ds, ax mov al, [100h] 20100h 100h offset DS 2000h 20000h = 2000h << 4 0x0

Protected Mode Addressing
TI = 0 TI = 1 CPU Main memory Index Segment Selector TI RPL Visible to software GDT LDT Segment Descriptor Segment Descriptor Segment Descriptor Segment Descriptor Invisible to software TI: Table Indicator RPL: Requested Privilege Level Segment Descriptor Segment Descriptor Segment Descriptor Segment Descriptor Base Access info Limit Segment Descriptor Segment Descriptor TI: Table Indicator RPL: Requested Privilege Level

Segment Descriptor Format
Software (OS) creates descriptor tables (GDT, LDT) When S == 1

Address Translation in Protected Mode

Segmentation in Linux (Protected Mode)
All Linux processes running in User mode or Kernel mode use the same pair of segments to address instruction and data CS, DS bases: 0x0 Limit: 0xfffff (4GB) Thus, logical address is the same as linear address

Paging

Page Translation in Protected Mode (4K Page, Non-PAE)

Page Translation in Protected Mode (4KB, PAE)
32-bit linear address 52-bit physical address PAE: Physical Address Extension

Address Translation in 64-bit Mode
Descriptor (each entry) in GDT and LDT is 16B-wide Segmentation is disabled in 64-bit mode Thus, switching a logical processor into 64-bit mode causes it to enforce the Flat Memory Model by largely disabling the segmented memory logic However, anytime the 64-bit OS kernel causes the logical processor to jump to a 16- or 32-bit legacy code segment, the segmentation logic is immediately re-enabled in order to maintain backward-compatibility Reference: x86 Instruction Set Architecture, Tom Shanley, MindShare, 2009

Code Segment Descriptor
Segmentation is disabled in 64-bit mode Compatibility mode is enabled by the operating system on a code segment basis L (Long) bit 1: 64-bit mode 0: Compatibility mode AMD64 Architecture Programmer’s Manual. Vol 2 System Programming

Page Translation in 64-bit Mode
48-bit linear address 52-bit physical address

Linear Space Segmentation
A compiled program’s memory is divided into 5 segments: Text segment (code segment) where program (assembled machine instructions) is located Data and bss segments Data segment is filled with the initialized data and static variables bss (Block Started by Symbol) is filled with the uninitialized data and static variables Heap segment for dynamic allocation and deallocation of memory using malloc() and free() Stack segment for scratchpad to store local variables and context during context switch

Stack Frame EBP (aka, Frame Pointer (FP) or Local Base (LB) Pointer) for referencing function parameters and local variables in the current stack frame Each stack frame contains Parameters to the function Local variables 2 pointers: Saved Frame Pointer (SFP) and return address SFP for restoring EBP to its previous value Return address for restoring EIP to its previous value

Stack Layout with x86 Source: Reversing, Secrets of Reverse Engineering, Eldad Eilam, 2005

Stack Frame Example

Stack Frame Example memory
High address ESP ( push ebp) ESP ebp EBP (mov ebp, esp) 5 6 4 3 2 ( sub esp, 0x20) ESP 1 ( call 0x ) ESP eip (0x ) Low address Compilation outcome could be different depending on compiler version and optimization flags

Stack Frame Example LEAVE instruction memory
High address ESP ( push ebp) ESP ebp EBP (mov ebp, esp) 5 6 4 0x14 0x10 0xC 0x08 0x04 3 2 ( sub esp, 0x20) ESP 1 ( call 0x ) ESP eip ( push ebp) ESP ebp EBP (mov ebp, esp) Result (a+b+c+d) 0x04 0x08 0x0C 0x10 J, I, H, G F, E, D, C ( sub esp, 0x10) ESP B, A Low address Compilation outcome could be different depending on compiler version and optimization flags

Stack Frame Example after RET instruction memory
High address ESP ( push ebp) ESP ebp EBP (mov ebp, esp) 5 0x04 0x08 0x0C 0x10 6 Result (a+b+c+d) 4 3 2 ( sub esp, 0x20) ESP 1 eip ebp Result (a+b+c+d) J, I, H, G F, E, D, C B, A Low address Compilation outcome could be different depending on compiler version and optimization flags

Backup Slides

Segment Selector

Floating Point Formats

Debugging Tools GDB, the GNU Project Debugger
DDD, the Data Display Debugger GUI front ends to GDB Eclipse Integrated Development Environment (IDE) Eclipse CDT (C/C++ Development Toolkit) “Install New Software” Name: Galileo URL: IDA Pro, the Interactive Disassembler Professional Audit binary with no source code Support more than 50 families of processors IDA 5.0 is free for non-commercial use

Just in case… Compile your code with gcc
gcc –g float-d.c -o float-d // compiled with debugging info Disassemble the binary with objdump objdump –M intel –Stx float-d > float-d.dump

GDB $ echo “set disassembly-flavor intel” > ~/.gdbinit
Shows disassembly in Intel format (rather than AT&T format) Operation <destination>, <source> mov ebp, esp ; ebp <- esp GDB command summary (gdb) help (gdb) help disass (gdb) list (gdb) list 1,20 (gdb) disass main (gdb) disass /mr main (gdb) info registers (or i r) ; display x86 registers Examples: (gdb) i r (gdb) i r $eip (gdb) x ; examine (gdb) x/10i $eip ; display 10 instructions from eip (gdb) x/2x $eip ; display 2 words (4 bytes) in hex. B (byte), h (halfword), w (word, 4B), g (8B) (gdb) nexti ; execute 1 machine instruction. Will step into subfunctions (gdb) stepi ; execute 1 machine instruction. Will not enter subfunctions (gdb) next ; step program (gdb) step ; step program until it reaches a different source line

x86 Instructions CALL – call procedure
In 32-bit near call, Push EIP of the instruction following the CALL instruction Then, branch to the target specified in the operand LEAVE – high level procedure exit Release the stack frame set up by an earlier ENTER instruction In 32-bit, ESP ← EBP; EBP ← pop(); RET – return from procedure In 32-bit near return, EIP ← pop();

x86 Instructions PUSH – push word, double-word or quadword onto the stack Decrement the stack pointer and then store the source operand on the top of the stack POP – pop a value from the stack Load the value from the top of the stack and increment the stack pointer LEA – load effective address For instance, LEA ecx, dword ptr [edx+edx] ECX ← EDX + EDX; Note that even though most disassemblers add the words DWORD PTR before the operands, LEA really can’t distinguish between a pointer and an integer. LEA never performs any actual memory accesses. From the book “Reversing” (p512), Starting with Pentium 4, the situation has reversed and most compilers will use ADD and SUB when generating code. However, when surrounded by several other ADD or SUB instructions, the Intel compiler still seems to use LEA. This is probably because the execution unit employed by LEA is separate from the ones used by ADD and SUB. Using LEA makes sense when the main ALUs are busy – it improves the chances of achieving parallelism in runtime

x86 Instructions TEST Compute the bit-wise logical AND of the first operand and second operands Set flags (SF, ZF, and PF) according to the result Then, discard the result Example: test eax eax From the book “Reversing” (p512), Starting with Pentium 4, the situation has reversed and most compilers will use ADD and SUB when generating code. However, when surrounded by several other ADD or SUB instructions, the Intel compiler still seems to use LEA. This is probably because the execution unit employed by LEA is separate from the ones used by ADD and SUB. Using LEA makes sense when the main ALUs are busy – it improves the chances of achieving parallelism in runtime

Buffer Overflow Protection
gcc –fstack-protector-all gcc –fno-stack-protector objdump -SD –disassembler-options=intel stack_example

COM850 Computer Hacking and Security

Similar presentations

Presentation on theme: "COM850 Computer Hacking and Security"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COM850 Computer Hacking and Security

Similar presentations

Presentation on theme: "COM850 Computer Hacking and Security"— Presentation transcript:

Similar presentations

About project

Feedback