Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.

Slides:



Advertisements
Similar presentations
Calling sequence ESP.
Advertisements

C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
University of Washington Last Time For loops  for loop → while loop → do-while loop → goto version  for loop → while loop → goto “jump to middle” version.
Machine-Level Programming III: Procedures Apr. 17, 2006 Topics IA32 stack discipline Register saving conventions Creating pointers to local variables CS213.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
1 Lecture 5: Procedures Assembly Language for Intel-Based Computers, 4th edition Kip R. Irvine.
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
Accessing parameters from the stack and calling functions.
– 1 – , F’02 ICS05 Instructor: Peter A. Dinda TA: Bin Lin Recitation 4.
1 Homework Reading –PAL, pp , Machine Projects –Finish mp2warmup Questions? –Start mp2 as soon as possible Labs –Continue labs with your.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
September 22, 2014 Pengju (Jimmy) Jin Section E
Stack Activation Records Topics IA32 stack discipline Register saving conventions Creating pointers to local variables February 6, 2003 CSCE 212H Computer.
6.828: PC hardware and x86 Frans Kaashoek
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
1 Carnegie Mellon Stacks : Introduction to Computer Systems Recitation 5: September 24, 2012 Joon-Sup Han Section F.
Today’s topics Parameter passing on the system stack Parameter passing on the system stack Register indirect and base-indexed addressing modes Register.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Fabián E. Bustamante, Spring 2007 Machine-Level Programming III - Procedures Today IA32 stack discipline Register saving conventions Creating pointers.
Report on Project CComp Zhaopeng Li Joint work with Prof. Yiyun Chen, Zhong Zhuang, Simin Yang, Dawei Fan, Zhenting Zhang Software Security Lab., USTC,
The x86 Architecture Lecture 15 Fri, Mar 4, 2005.
Recitation 2: Outline Assembly programming Using gdb L2 practice stuff Minglong Shao Office hours: Thursdays 5-6PM Wean Hall.
Activation Records (in Tiger) CS 471 October 24, 2007.
1 ICS 51 Introductory Computer Organization Fall 2009.
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Low Level Programming Lecturer: Duncan Smeed The Interface Between High-Level and Low-Level Languages.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 21: Calling.
Machine-level Programming III: Procedures Topics –IA32 stack discipline –Register saving conventions –Creating pointers to local variables.
Functions/Methods in Assembly
Compiler Construction Code Generation Activation Records
October 1, 2003Serguei A. Mokhov, 1 SOEN228, Winter 2003 Revision 1.2 Date: October 25, 2003.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
1 Assembly Language: Function Calls Jennifer Rexford.
Calling Procedures C calling conventions. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
Carnegie Mellon Midterm Review : Introduction to Computer Systems October 15, 2012 Instructor:
ICS51 Introductory Computer Organization Accessing parameters from the stack and calling functions.
Programs – Calling Conventions
Recitation 3: Procedures and the Stack
Assembly function call convention
Reading Condition Codes (Cont.)
Machine-Level Programming 2 Control Flow
Assembly language.
C function call conventions and the stack
Conditional Branch Example
143A: Principles of Operating Systems Lecture 4: Calling conventions
Aaron Miller David Cohen Spring 2011
Homework In-line Assembly Code Machine Language
Introduction to Compilers Tim Teitelbaum
Assembly IA-32.
Recitation 2 – 2/4/01 Outline Machine Model
Computer Architecture adapted by Jason Fritts then by David Ferry
Y86 Processor State Program Registers
Machine-Level Programming 4 Procedures
Condition Codes Single Bit Registers
Machine-Level Programming 2 Control Flow
Assembly Language Programming II: C Compiler Calling Sequences
Machine-Level Programming 2 Control Flow
Machine-Level Programming III: Procedures Sept 18, 2001
MIPS Procedure Calls CSE 378 – Section 3.
Machine-Level Representation of Programs III
Practical Session 4.
Machine-Level Programming 2 Control Flow
Multi-modules programming
X86 Assembly Review.
ICS51 Introductory Computer Organization
Computer Architecture and System Programming Laboratory
Presentation transcript:

Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009

Outline Design Points Assembly Language : “x86” Low-level Intermediate Language Future Work

Design Points Assembly Language – Target : SCAP with x86 abstract machine; – Maybe next version the program logic is changed; – Or another machine will be used. Low-level Intermediate Language – Hide some machine-specific things; – Note that, this level can be just a helper to generate code and proof.

Assembly Language : “x86”

Some Topics about “x86” Data Representation – 32-bit vs “fake” 32-bit Don’t care how to store the data as bits. Integer : 4 bytes Pointer : 4 bytes Data Alignment Callee-saved Registers – EBX, ESI, EDI, EBP

Some Topics about “x86” (cont.) Calling convention: 1.Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack; 2.Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function; 3.Return value is stored in the register EAX ; 4.Caller cleans up the stack (parameter).

Some Topics about “x86” (cont.) Prolog (typical) _function: push ebp ;store the old base pointer mov esp, ebp ;make the base ; pointer point to the current stack ; location sub x, esp ; x is the size, in bytes Epilog(typical) mov ebp, esp ;reset the stack to ; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function ebp old ebp old eip parameters esp local variables ebp esp old ebp old eip parameters local variables … … … … … … old eip parameters ebp … … esp func. entry after Stack frame setup after the return enter x, 0 leave ret leave ret

Assembly Abstract Machine “m86” Code Heap (C) – Code storage, – Unchanged during execution Machine State – Memory (M) – Register File (R) – Instruction Pointer (eip), current instruction c = C(eip) Or just use instruction sequence (I)

Assembly Language : “x86” “AT&T-syntax” Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp FReg. fr ::= sf | zf Int. b ::= n (integer) Instr. i ::= add r 1, r 2 | addi n, r | sub r 1, r 2 | subi n, r | mul r 1, r 2 | muli n, r | mov r 1, r 2 | movi n, r | movs r 1, n(r 2 ) | movl n(r 1 ), r 2 | push r | pop r | cmp r 1, r 2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r

Program Logic Based on SCAP Specification (p, g) – p : State -> Prop – g : State -> State -> Prop Inference Rules – Well-formed program Well-formed basic block Well-formed instruction

Main Objects Code Generation – Minimize the proof size Eg. the temporary result should be put in register not on the stack Assertion – Building (p, g) for each basic block – Generating (p, g) for each program point Proof – Generating proof for functions/basic blocks – (reusing the proof of VC in source level)

Assertion Relationship Basic block1 f : {p} //{q} Basic block1 Basic block2 L1 : {p 1 } f : {(p’, g)} L1 : {(p’ 1,g 1 )} Intermediate Language x86 Assembly Lanuage p’ = trans(p) /\ param p /\stack-reg p g = trans(q) /\ callee-saved-reg g /\ stack g p’ = trans(p) /\ param p /\stack-reg p g = trans(q) /\ callee-saved-reg g /\ stack g p’ 1 = trans(p 1 ) /\ param p 1 /\ stack-reg p 1 g 1 = ? p’ 1 = trans(p 1 ) /\ param p 1 /\ stack-reg p 1 g 1 = ?

Figure Out G push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R 0 (ebp) = R(ebp) /\ R 0 (esp) = R(esp) -4 R’(ebp) = R(ebp) /\ R 0 (ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R 0 (esp) = R(esp) -4 R’(ebp) = R(ebp) /\ R 0 (ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R 0 (esp) = R(esp) -4 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 Leave ret Leave ret R’ R R R0R0 R0R0 g0g0 g0g0 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.

Figure Out G (cont.) push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R 1 (ebp) = R 0 (esp) /\ R 1 (esp) = R 0 (esp) R’(ebp) = R 0 (ebp) /\ R 1 (ebp) = R 0 (esp) /\ R’(esp)=R 0 (esp)+8 /\ R 1 (esp) = R 0 (esp) R’(ebp) = R 0 (ebp) /\ R 1 (ebp) = R 0 (esp) /\ R’(esp)=R 0 (esp)+8 /\ R 1 (esp) = R 0 (esp) R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R0R0 R0R0 R1R1 R1R1 Leave ret Leave ret R’ R R g0g0 g0g0 g1g1 g1g1 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.

Figure Out G (cont.) push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R0R0 R0R0 R1R1 R1R1 Leave ret Leave ret R’ R R R 2 (ebp) = R 1 (ebp) /\ R 2 (esp) = R 1 (esp)-12 R’(ebp) = M 1 (R 1 (ebp)) /\ R 2 (ebp) = R 1 (ebp) /\ R’(esp)=R 1 (esp)+8 /\ R 2 (esp) = R 1 (esp)- 12 R’(ebp) = M 1 (R 1 (ebp)) /\ R 2 (ebp) = R 1 (ebp) /\ R’(esp)=R 1 (esp)+8 /\ R 2 (esp) = R 1 (esp)- 12 R’(ebp) = M 2 (R 2 (ebp)) /\ R’(esp)=R 1 (esp)+20 R’(ebp) = M 2 (R 2 (ebp)) /\ R’(esp)=R 1 (esp)+20 R2R2 R2R2 g0g0 g0g0 g1g1 g1g1 g2g2 g2g2 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.

Low-level Intermediate Language

Potential Benefits Hide some machine-specific things; Some optimizations could be done (optional); Make the implementation simple and reusable – (*Note that, this level is just a helper to generate code and proof.*) – Only add codes for translating from this level when targeting different assembly logic

The Language Loc. l ::= r | s Int. o,b ::= n (integer) Slot. s ::= local(o) | incoming(o) | outgoing(o) Reg. r ::= r 1 | r 2 | r 3 | … //infinite pseudo-registers Instr. i ::= bop(bop, l 1,l 2, l) | uop(uop, l 1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l 1, cmp,l 2, b true ) BinOp. bop::= add | sub | mul | … UnOp. Uop::= minus | … Comp. cmp::= gt | ge | eq | ne | lt | le

Code Generation (optional) Do some optimizations which do no affect proof, such as: – Branch tunneling – Dead code elimination Future optimizations – Other low-level optimizations may be done here