Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari.

Similar presentations


Presentation on theme: "Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari."— Presentation transcript:

1 Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari School on Foundations of Wide Area Network Programming

2 Recap

3 Carnegie Mellon High-Level Architecture Explanation Code Verification condition generator Checker Safety policy Agent Host

4 Carnegie Mellon Safety Policies In PCC, the proof says that the code, when executed by a reference interpreter, will never abort. I.e., the code will always make progress in an operational semantics that defines safe execution. [Necula98]

5 Carnegie Mellon Safety Policy Specifications The safety policy is given in three parts: A verification-condition generator (VCGen). A specification of the pre & post conditions for all required procedures. A specification of the inference rules for constructing valid proofs. We can use a language called LF for the rule and pre/post specifications.

6 Carnegie Mellon Example Java Type-Safety Specification Our largest example of a safety- policy specification is for the “SpecialJ” Java native-code compiler. It contains about 140 inference rules. Roughly speaking, these rules can be separated into 5 classes.

7 Carnegie Mellon Safety Policy Rule Excerpts /\: pred -> pred -> pred. \/: pred -> pred -> pred. =>: pred -> pred -> pred. all: (exp -> pred) -> pred. pf: pred -> type. truei: pf true. andi: {P:pred} {Q:pred} pf P -> pf Q -> pf (/\ P Q). andel: {P:pred} {Q:pred} pf (/\ P Q) -> pf P. ander: {P:pred} {Q:pred} pf (/\ P Q) -> pf Q. … 1. Standard rules for first-order logic. Type of valid proofs, indexed by predicate. Syntax of predicates. Inference rules.

8 Carnegie Mellon =: exp -> exp -> pred. <>: exp -> exp -> pred. eq_le: {E:exp} {E':exp} pf (csubeq E E') -> pf (csuble E E'). moddist+: {E:exp} {E':exp} {D:exp} pf (= (mod (+ E E') D) (mod (+ (mod E D) E') D)). =sym: {E:exp} {E':exp} pf (= E E') -> pf (= E' E). <>sym: {E:exp} {E':exp} pf (<> E E') -> pf (<> E' E). =tr: {E:exp} {E':exp} {E'':exp} pf (= E E') -> pf (= E' E'') -> pf (= E E''). Safety Policy Rule Excerpts 2. Rules for reasoning about arithmetic and equality. “csuble” means  in the x86 machine.

9 Carnegie Mellon Safety Policy Rule Excerpts jint : exp. jfloat : exp. jarray : exp -> exp. jinstof : exp -> exp. of: exp -> exp -> pred. faddf: {E:exp} {E':exp} pf (of E jfloat) -> pf (of E' jfloat) -> pf (of (fadd E E') jfloat). ext: {E:exp} {C:exp} {D:exp} pf (jextends C D) -> pf (of E (jinstof C)) -> pf (of E (jinstof D)). 3. Rules for the Java type system.

10 Carnegie Mellon Safety Policy Sample Rules aidxi: {I:exp} {LEN:exp} {SIZE:exp} pf (below I LEN) -> pf (arridx (add (imul I SIZE) 8) SIZE LEN). wrArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} {E:exp} pf (of A (jarray T)) -> pf (of M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (of E T) -> pf (safewr4 (add A OFF) E). 4. Rules describing the layout of data structures. This “sel4” means the result of reading 4 bytes from heap M at address A+4.

11 Carnegie Mellon Safety Policy Sample Rules nlt0_0: pf (csubnlt 0 0). nlt1_0: pf (csubnlt 1 0). nlt2_0: pf (csubnlt 2 0). nlt3_0: pf (csubnlt 3 0). nlt4_0: pf (csubnlt 4 0). 5. Quick hacks. Sometimes “unclean” things are put into the specification...

12 Carnegie Mellon A Note about Memory We define a type for valid heap memory states: mem : exp and operators for reading and writing heap memory: (sel M A) (upd M A E)

13 The VCGen, via Detailed Examples

14 Carnegie Mellon High-Level Architecture Explanation Code Verification condition generator Checker Safety policy Agent Host

15 Carnegie Mellon Example: Source Code public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; }

16 Carnegie Mellon Example: Target Code ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI _bcopy__6arrays5BcopyAIAI: cmpl$0, 4(%esp) jeL6 movl4(%esp), %ebx movl4(%ebx), %ecx testl%ecx, %ecx jgL22 ret L22: xorl%edx, %edx cmpl$0, 8(%esp) jeL6 movl8(%esp), %eax movl4(%eax), %esi L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl%esi, %edx jaeL13 movl8(%ebx, %edx, 4), %edi movl%edi, 8(%eax, %edx, 4) incl%edx cmpl%ecx, %edx jlL7 ret L13: call__Jv_ThrowBadArrayIndex ANN_UNREACHABLE nop L6: call__Jv_ThrowNullPointer ANN_UNREACHABLE nop

17 Carnegie Mellon Cut Points Each loop entry must be annotated as a cut point. VCGen requires this so that checking can be performed in a single scan of the code. As a convenience, the modified registers are also declared in the cut annotations.

18 Carnegie Mellon Example: Target Code ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI _bcopy__6arrays5BcopyAIAI: cmpl$0, 4(%esp) jeL6 movl4(%esp), %ebx movl4(%ebx), %ecx testl%ecx, %ecx jgL22 ret L22: xorl%edx, %edx cmpl$0, 8(%esp) jeL6 movl8(%esp), %eax movl4(%eax), %esi L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl%esi, %edx jaeL13 movl8(%ebx, %edx, 4), %edi movl%edi, 8(%eax, %edx, 4) incl%edx cmpl%ecx, %edx jlL7 ret L13: call__Jv_ThrowBadArrayIndex ANN_UNREACHABLE nop L6: call__Jv_ThrowNullPointer ANN_UNREACHABLE nop VCGen requires annotations in order to simplify verification process.

19 Carnegie Mellon Example: Source Code public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; }

20 Carnegie Mellon The VCGen Process (1) _bcopy__6arrays5BcopyAIAI: cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 ret L22: xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esi L7: ANN_LOOP(INV = … A0 = (type src_1 (jarray jint)) A1 = (type dst_1 (jarray jint)) A2 = (type rm_1 mem) A3 = (csubneq src_1 0) ebx := src_1 ecx := (sel4 rm_1 (add src_1 4)) A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0) edx := 0 A5 = (csubneq dst_1 0) eax := dst_1 esi := (sel4 rm_1 (add dst_1 4))

21 Carnegie Mellon The VCGen Process (2) L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13 movl 8(%ebx,%edx,4), %edi movl %edi, 8(%eax,%edx,4) … A3 A5 A6 = (csubb 0 (sel4 rm_1 (add src_1 4))) edi := edi_1 edx := edx_1 rm := rm_2 A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4)) !!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))

22 Carnegie Mellon The Checker (1) The checker is asked to verify that (saferd4 (add src_1 (add (imul edx_1 4) 8))) under assumptions A0 = (type src_1 (jarray jint)) A1 = (type dst_1 (jarray jint)) A2 = (type rm_1 mem) A3 = (csubneq src_1 0) A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0) A5 = (csubneq dst_1 0) A6 = (csubb 0 (sel4 rm_1 (add src_1 4))) A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4)) The checker looks in the PCC for a proof of this VC.

23 Carnegie Mellon The Checker (2) In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as szint : pf (size jint 4) rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).

24 Carnegie Mellon Checker (3) A proof for (saferd4 (add src_1 (add (imul edx_1 4) 8))) in the Java specification looks like this (excerpt): (rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7))) This proof can be easily validated via LF type checking.

25 Carnegie Mellon VCGen Summary VCGen is a symbolic evaluator for the object language. It essentially implements a reference interpreter, except: it uses symbolic values in order to model all possible executions, and instead of performing run-time checks, it asks a Checker to verify the safety of “dangerous” instructions.

26 Carnegie Mellon Exercises 3. The assumptions collected by the VCGen are maintained on a stack. Why? 4. When a loop invariant is encountered for the second time, what actions must the VCGen perform?

27 Carnegie Mellon Exercises 5. In principle, how big can a VC get, relative to the size of the program? 6. What kind of program might make a VC get very large?

28 Carnegie Mellon Another Example [by George Necula] void fir (int *data, int dlen, int *filter, int flen) { int i, j; for (i=0; i<=dlen-flen; i++) { int s = 0; for (j=0; j<flen; j++) s += filter[j] * data[i+j]; data[i] = s; } Skip this example

29 Carnegie Mellon Compiled Example ri = 0 sub t1 = rdl, rfl L0:CUT( ri,rj,rs,t2,t3,t4,rm ) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1:CUT(rj,rs,t2,t3,t4) lt t2 = rj, rfl jeq t2, L2 ult t2 = rj, rfl jeq t2, Labort ld t3 = [rf + 4*rj] add t2 = ri, rj ult t4 = t2, rdl jeq t4, Labort ld t2 = [rd + 4*t2] mul t2 = t3, t2 add rs = rs, t2 add rj = rj, 1 jmp L1 L2:ult t2 = ri, rdl jeq t2, Labort st [rd + 4*ri] = rs add ri = ri, 1 jmp L0 L3:ret Labort:call abort /* rd=data, rdl=dlen, rf=filter, rfl=flen */

30 Carnegie Mellon The Safety Policy The safety policy defines verification conditions of the form: true, E = E saferd(M, E), safewr(M, E, E) array(E A, E S, E L ), vector(E A, E S, E L ) Pre fir = array(rd,4,rdl), vector(rf,4,rfl) Post fir = true

31 Carnegie Mellon VCGen Example ri = 0 sub t1 = rdl, rfl L0:CUT( ri,rj,rs,t2,t3,t4,rm ) le t2 = ri, t1 jeq t2, L3 … L3:ret Assume precondition: array(cd,4,cdl) vector(cf,4,cfl) Set ri = 0 Set t1 = sub(cdl,cfl) Set rd=cd; rdl=cdl; rf=cf; rfl=cfl; rm=cm Set ri=ci; rj=cj; rs=cs; t2=c2; t3=c3; t4=c4; rm=cm’ Set t2 = le(ci, sub(cdl,cfl)) Assume not(le(ci, sub(cdl,cfl))) Check postcondition; Check rd,rdl,rf,rfl have initial values

32 Carnegie Mellon VCGen Example ri = 0 sub t1 = rdl, rfl L0:CUT( ri,rj,rs,t2,t3,t4,rm ) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1:CUT(rj,rs,t2,t3,t4) lt t2 = rj, rfl jeq t2, L2 … L2:ult t2 = ri, rdl jeq t2, Labort st [rd + 4*ri] = rs Set ri = 0 Set t1 = sub(cdl,cfl) Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’ Set t2 = le(ci, sub(cdl,cfl)) Assume le(ci, sub(cdl,cfl)) Set rs = 0 Set rj = 0 Set rj=cj’; rs=cs’; t2=c2’; t3=c3’; t4=c4’ Set t2 = lt(cj’, cfl) Assume not(lt(cj’, cfl)) Set t2 = ult(ci, cdl) Assume ult(ci, cdl) Check safewr(cm’, add(cd,mul(4,ci)),cs’)

33 Carnegie Mellon More on the Safety Policy The safety policy is defined as an LF signature. rdarray : saferd(M,add(A,mul(S,I))) <- array(A,S,L), ult(I,L). rdvector : saferd(M,add(A,mul(S,I))) <- vector(A,S,L), ult(I,L). wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).

34 Carnegie Mellon The Checker When the Checker is invoked on safewr(cm’, add(cd,mul(4,ci)), cs’) There are assumptions: assume0 : ult(ci,cdl). assume1 : not(lt(cj’,cfl)). assume2 : le(ci, sub(cdl,cfl)). assume3 : vector(cf,4,cfl). assume4 : array(cd,4,cdl).

35 Carnegie Mellon The Checker, cont’d The VC safewr(cm’, add(cd,mul(4,ci)), cs’) can be verified by using the rule wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L). and assumptions assume0 : ult(ci,cdl). assume4 : array(cd,4,cdl).

36 Carnegie Mellon Proof Representation A simple (but somewhat naïve) representation of the proof is simply the sequence of proof rules: wrarray, assume4, assume0 We shall see that better representations are possible. LF typechecking is sufficient for proofchecking.

37 Carnegie Mellon Optimized Code The previous example was somewhat simplified. More realistic code is optimized, usually based on inferences about integer values. Such optimizations require that arithmetic invariants be placed in the cut points.

38 Carnegie Mellon Optimized Example ri = 0 sub t1 = rdl, rfl L0:CUT(ri>0,{ri,rj,…}) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1:CUT(rj>0,{rj,rs,…}) lt t2 = rj, rfl jeq t2, L2 ld t3 = [rf + 4*rj] add t2 = ri, rj ld t2 = [rd + 4*t2] mul t2 = t3, t2 add rs = rs, t2 add rj = rj, 1 jmp L1 L2:st [rd + 4*ri] = rs add ri = ri, 1 jmp L0 L3:ret /* rd=data, rdl=dlen, rf=filter, rfl=flen */

39 Carnegie Mellon VCGen Example ri = 0 sub t1 = rdl, rfl L0:CUT(ri>0, {ri,rj,rs,t2,t3,t4,rm} le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 … Set ri = 0 Set t1 = sub(cdl,cfl) Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’ Set t2 = le(ci, sub(cdl,cfl)) Assume le(ci, sub(cdl,cfl)) Assume >(ci,0)

40 Practical Considerations

41 Carnegie Mellon Trusted Computing Base The trusted computing base is the software infrastructure that is responsible for ensuring that only safe execution is possible. Obviously, any bugs in the TCB can lead to unsafe execution. Thus, we want the TCB to be simple, as well as fast and small.

42 Carnegie Mellon VCGen’s Complexity Fortunately, we shall see that proofchecking can be quite simple, small, and fast. VCGen, at core, is also simple and fast. But in practice it gets to be quite complicated.

43 Carnegie Mellon VCGen’s Complexity Some complications: If dealing with machine code, then VCGen must parse machine code. Maintaining the assumptions and current context in a memory- efficient manner is not easy. Note that Sun’s kVM does verification in a single pass and only 8KB RAM!

44 Carnegie Mellon VC Explosion a == b a == c f(a,c) a := xc := x a := yc := y a=b => (x=c => safe f (y,c)  x<>c => safe f (x,y))  a<>b => (a=x => safe f (y,x)  a<>x => safe f (a,y)) Exponential growth in size of the VC is possible.

45 Carnegie Mellon VC Explosion a == b a == c f(a,c) a := xc := x a := yc := y INV: P(a,b,c,x) (a=b => P(x,b,c,x)  a<>b => P(a,b,x,x))  (  a’,c’. P(a’,b,c’,x) => a’=c’ => safe f (y,c’)  a’<>c’ => safe f (a’,y)) Growth can usually be controlled by careful placement of just the right “join-point” invariants.

46 Carnegie Mellon Stack Slots Each procedure will want to use the stack for local storage. This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory. We avoid this problem by assuming that procedures use up to 256 words of stack as registers.

47 Show simplex/Matrix.java example

48 Carnegie Mellon Exercise 7. Just as with loop invariants, our actual join-point invariants includes a specification of the registers that might be modified since the dominating block. Why might this be a useful thing to do? Why might it be a bad thing to do?

49 Carnegie Mellon Callee-save Registers Standard calling conventions dictate that the contents of some registers be preserved. These callee-save registers are specified along with the pre/post- conditions for each procedure. The preservation of their values must be verified at every return instruction.

50 Introduction to Efficient Representation and Validation of Proofs

51 Carnegie Mellon High-Level Architecture Explanation Code Verification condition generator Checker Safety policy Agent Host

52 Carnegie Mellon Goals We would like a representation for proofs that is compact, fast to check, requires very little memory to check, and is “canonical” (in the sense of accommodating many different logics without requiring a total reimplementation of the checker).

53 Carnegie Mellon Three Approaches 1. Direct representation of a logic. 2. Use of a Logical Framework. 3. Oracle strings. We will reject (1). Today we introduce (2). Tomorrow, (3).

54 Carnegie Mellon Logical Framework For representation of proofs we use the Edinburgh Logical Framework (LF).

55 Carnegie Mellon LF Example This classic example illustrates how LF is used to represent the terms and rules of a logical system.

56 Carnegie Mellon LF Example in Elf Syntax exp : type pred : type pf : pred -> type true : pred /\ : pred -> pred -> pred => : pred -> pred -> pred all : (exp -> pred) -> pred truei : pf true andi : {P:pred} {R:pred} pf P -> pf R -> pf (/\ P R) andel : {P:pred} {R:pred} pf (/\ P R) -> pf P impi : {P:pred} {R:pred} (pf P -> pf R) -> pf (=> P R) alli : {P:exp -> pred} ({X:exp} pf (P X)) -> pf (all P) alle : {P:exp -> pred} {E:exp} pf (all P) -> pf (P E) The same example, but using Pfenning’s Elf syntax.

57 Carnegie Mellon LF as a Proof Representation LF is canonical, in that a single typechecker for LF can serve as a proofchecker for many different logics specified in LF. [See Avron, et al. ‘92] But the efficiency of the representation is poor.

58 Carnegie Mellon Size of LF Representation Proofs in LF are extremely large, due to large amounts of repetition. Consider the representation of P  P  P for some predicate P: The proof of this predicate has the following LF representation: (=> P (/\ P P )) (impi P (/\ P P ) ([X:pf P ] andi P P x x))

59 Carnegie Mellon Checking LF The nice thing is that typechecking is enough for proofchecking. [The theorem is in the LF paper.] But the proofs are extremely large. (impi P (/\ P P ) ([X:pf P ] andi P P X X)) : pf (=> P (/\ P P ))

60 Carnegie Mellon Implicit LF A dramatic improvement can be achieved by using a variant of LF, called Implicit LF, or LF i. In LF i, parts of the proof can be replaced by placeholders. (impi * * ([X:*] andi * * X X)) : pf (=> P (/\ P P ))

61 Carnegie Mellon Soundness of LF i The soundness of the LF i type system is given by a theorem that states: If, in context , a term M has type A in LF i (and  and A are placeholder-free), then there is a term M’ such that M’ has type A in LF.

62 Carnegie Mellon Typechecking LF i The typechecking algorithm for LF i is given in [Necula & Lee, LICS98]. A key aspect of the algorithm is that it avoids repeated typechecking of reconstructed terms. Hence, the placeholders save not only space, but also time.

63 Carnegie Mellon Effectiveness of LF i In experiments with PCC, LF i leads to substantial reductions in proof size and checking time. Improvements increase nonlinearly with proof size.

64 Carnegie Mellon The Need for Improvement Despite the great improvement of LF i, in our experiments we observe that LF i proofs are 10%- 200% the size of the code.

65 Carnegie Mellon How Big is a Proofs? A basic question is how much essential information is in a proof? In this proof, there are only 2 uses of rules and in each case they were the only rule that could have been used. (impi * * ([X:*] andi * * x x)) : pf (=> P (/\ P P ))

66 Carnegie Mellon Next Time We will improve on the compactness of proof representation by making use of the observation that large parts of proofs are deterministically generated from the inference rules.

67 Summary

68 Carnegie Mellon Summary Code safety in PCC is provided by three components, a safety policy, a VCGen, and a checker. The safety policy defines the verification conditions and the valid proofs. One possible representation for proofs is the LF language.

69 Carnegie Mellon Additional References For LF: Harper, Honsell, & Plotkin. A framework for defining logics. Journal of the ACM, 40(1), 143- 184, Jan. 1993. Avron, Honsell, Mason, & Pollack. Using typed lambda calculus to implement formal systems on a machine. Journal of Automated Reasoning, 9(3), 309-354, 1992.

70 Carnegie Mellon Additional References For Elf: Pfenning. Logic programming in the LF logical framework. Logical Frameworks, Huet & Plotkin (Eds.), 149-181, Cambridge Univ. Press, 1991. Pfenning. Elf: A meta-language for deductive systems (system description). 12th International Conference on Automated Deduction, LNAI 814, 811-815, 1994.


Download ppt "Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari."

Similar presentations


Ads by Google