Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Trustworthy Proof Checker

Similar presentations


Presentation on theme: "A Trustworthy Proof Checker"— Presentation transcript:

1 A Trustworthy Proof Checker
. 2/5/2019 A Trustworthy Proof Checker Andrew W. Appel Aaron Stump Neophytos G. Michael Stanford University Roberto Virga Princeton University FCS & VERIFY, July 2002 A trustworthy proof checker for proofs of properties of machine-code programs. 2/5/2019

2 Trusted Computing Base
. 2/5/2019 Trusted Computing Base Theorem: Operating System: an + bn  cn gcc emacs Proof netscape rogomatic make Axioms Kernel Trusted Base 2/5/2019

3 The problem: Mobile Code Security
. 2/5/2019 The problem: Mobile Code Security Code Producer Code Consumer Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 ? Private files Network access Launch control etc. 2/5/2019

4 Existing Practice: Hardware VM protection
. 2/5/2019 Existing Practice: Hardware VM protection Code Producer Code Consumer Machine Code Machine Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Operating System virtual memory Protected resources Disadvantages: Large trusted code base of O.S. Clumsy, slow interfaces between trusted & untrusted code 2/5/2019

5 Existing Practice: Bytecode Verification
. 2/5/2019 Existing Practice: Bytecode Verification Code Producer Code Consumer ByteCode Java Program Bytecode Verifier Compiler load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Trusted Computing Base Advantage: Clean, fast, O-O interface between trusted & untrusted code Disadvantage: Huge trusted computing base: JIT OK Just-in-time Compiler Native code Execute 2/5/2019

6 Foundational Proof-Carrying Code
. 2/5/2019 Foundational Proof-Carrying Code Code Producer Code Consumer Native Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Hints Trusted Computing Base Machine Spec + Policy Machine Spec + Policy Safety Proof $-i( -i(... -r ( ...) ) Prover Checker OK 2/5/2019

7 Trusted Computing Base
. 2/5/2019 Trusted Computing Base The minimal set of code that must be trusted Our goal: make TCB as small as possible TCB consists of two pieces: The safety policy (a predicate in Higher-Order Logic that characterizes whether a program is safe to execute) The proof-checker (a small C program that checks safety proofs) 2/5/2019

8 Trusted Computing Base (cont.)
2/5/2019 Trusted Computing Base (cont.) Safety Policy Choose a logical framework (programming language for logic) Choose an object logic (axioms, inference rules) Represent our theorem in the object logic Proof Checking Build a proof-checker for the logical framework Safety Policy We choose LF We choose Higher-Order Logic We will explain... Proof Checking We use Twelf to prove theorems, but for checking we want something smaller and simpler . . . 2/5/2019

9 LF, Twelf, and Higher Order Logic
. 2/5/2019 Harper et al. 1993 LF, Twelf, and Higher Order Logic What is LF? A Logical Framework for defining and presenting logics Based on a general treatment of syntax, rules, and proofs by means of a typed first-order -calculus Its type system has three levels of terms: Objects Types that classify objects Kinds that classify families of types. Equality is taken as -conversion The judgments-as-types principle We use the Twelf implementation of LF (Pfenning et al. 99) We implement a standard HOL with arithmetic 2/5/2019

10 Programming in Twelf Define formula constructors (an LF signature):
. 2/5/2019 Programming in Twelf Define formula constructors (an LF signature): num : type. form : type. imp : form -> form -> form. Define proof constructors (axioms): pf : form -> type. imp_i : (pf A -> pf B) -> pf (A imp B). imp_e : pf (A imp B) -> pf A -> pf B. 2/5/2019

11 Theorems, proof checking in HOL
. 2/5/2019 Theorems, proof checking in HOL Proof of logical transitivity: imp_trans: pf (A imp B) -> pf (B imp C) -> pf (A imp C) = [p1 : pf (A imp B)] [p2 : pf (B imp C)] imp_i [p3 : pf A] imp_e p2 (imp_e p1 p3). This shows the general form of a Twelf definition: name :  = exp. 2/5/2019

12 The safety policy “This program accesses memory only in range 0-1000”
. 2/5/2019 The safety policy “This program accesses memory only in range ” “This program never executes an illegal instruction.” Step I: define access predicates readable(x) = 0  x  1000 writable(x) = 0  x  1000 Step II: define legal instructions . . . 2/5/2019

13 Machine states, step relation
. 2/5/2019 Machine states, step relation Machine State = Register bank + memory (r,m)  (r’,m’ ) : the step relation is a map between machine states 1 2 3 psr pc r m 1 2 3 psr pc r’ m’ 7 8 2/5/2019

14 Machine instruction = step relation
. 2/5/2019 Machine instruction = step relation add r1:=r2+r3  m’=m, r’(1)=r(2)+r(3), r’(pc)=1+r(pc), i i  1  i  pc  r’(i)=r(i) 1 2 3 psr pc r m 1 2 3 psr pc r’ m’ 7 2 6 8 2 6 2/5/2019

15 Instruction decoding; memory policy
. 2/5/2019 Instruction decoding; memory policy (r,m)  (r’,m’ )   w,i,j,k m (r (pc)) = w  w = 3212 + i28 + j24 + k  m’ = m  readable (r ( j) + k )  r’ (i) = m (r ( j)+ k)  r’ (pc) = 1+ r’ (pc)  x xi  xpc  r’ (x)=r (x) load ri := m(rj+k)  ( )  ( )  . . . op d s1 s2 w = i j k 1 2 3 psr pc r m 7 w 2/5/2019

16 Making the specification concise & trustworthy
Described in [Michael & Appel 2000] Separate syntax from semantics Factor the semantics Use “New Jersey Machine-Code Toolkit” to describe syntax Automatically translate NJMCT descriptions into concise and readable higher-order logic 2/5/2019

17 Specifying safe execution
. 2/5/2019 Specifying safe execution  relation includes only the legal instructions Safety means, “no matter how many instructions you execute, the next instruction is legal” The program is meant to be loaded at some start address loaded(m,start,prog) = i dom(prog). m(start+i) = prog(i) Example: loaded(m,100, (9017;4214;8099;4010;6231;1008)) 9017 4214 8099 4010 6231 1008 100: 2/5/2019

18 Safety theorem safe(prog) = r,m,start.
2/5/2019 Safety theorem safe(prog) = r,m,start. loaded(m,start,prog)  r(pc)=start  r’,m’. r,m  r’,m’   r’’,m’’. r’,m’  r’’,m’’ Trusted Computing Base r m start: 9017 4214 8099 4010 6231 1008 ? Theorem to be proved: safe(9017;4214;8099;4010;6231;1008) pc: start 2/5/2019

19 Size of Safety Specification (Sparc)
. 2/5/2019 Size of Safety Specification (Sparc) 2/5/2019

20 Representation Issues in the Specification
. 2/5/2019 Representation Issues in the Specification Eliminating Redundancy in LF terms Dealing with Arithmetic Representation of Axioms and Trusted Definitions: Encoding Higher-Order Logic in LF Polymorphic programming in Twelf Explicit versus implicit programming in Twelf - Avoiding term reconstruction 2/5/2019

21 Eliminating Redundancy
. 2/5/2019 Eliminating Redundancy LF signatures contain lots of redundant information imp_i : {A: form}{B: form} (pf A -> pf B) -> pf (A imp B). Twelf’s answer: parameters can be “declared” implicit imp_i : (pf A -> pf B) -> pf (A imp B). Implicit parameters in the TCB means type reconstruction in the checker Algorithm is large and complex It relies on higher-order unification which is undecidable (some valid proofs may fail) 2/5/2019

22 Eliminating Redundancy (cont.)
2/5/2019 Eliminating Redundancy (cont.) On the TCB side: We write axioms & trusted definitions in fully explicit style On the proving side: Implicit versus explicit LF term sizes Other approaches to this problem: Necula’s LFi, Oracle based checking We represent proofs as DAGs with structure sharing of common sub-expressions Proof-size blowup is avoided The checker does not need to parse proofs But constant factor is not so good, though A tradeoff: TCB size versus Proof Size 2/5/2019

23 Term Reconstruction in the Prover
Twelf’s term reconstruction algorithm (a.k.a. “type inference”) is extremely useful in writing proofs Outside TCB, write “compatibility lemmas” to interface with proofs that are written in implicit style. 2/5/2019

24 The Proof Checker A small C program (~ 803 lines, 1/3 of the TCB)
. 2/5/2019 The Proof Checker A small C program (~ 803 lines, 1/3 of the TCB) Type checks explicit LF proofs and loads and executes only safe programs Makes no use of libraries except: read, and _exit 2/5/2019

25 Why do we need a parser? Not for proofs -- they are transmitted to checker in DAG form For axioms! Humans can’t read axioms and trusted definitions in DAG form, therefore can’t trust them. (see Pollack ‘98, “How to believe a machine-checked proof”) 2/5/2019

26 DAG representation of proofs & types
Each DAG node is 5 words Entire DAG is transmitted as a single block op arg1 arg2 type match opcode left child right child computed type weak head normal form op arg1 arg2 type match op arg1 arg2 type match 2/5/2019

27 Proof-checking measurements
In the paper, we report a time of 74 seconds to check a benchmark proof (~ 6,000 lines) We have improved this to 0.48 seconds Checker marks closed terms Avoid traversing closed terms during substitutions Adds 20 lines to the Proof Checker op cl arg1 arg2 type match 2/5/2019

28 Smallest possible TCB Open-source JVM, Highly optimizing
. 2/5/2019 Smallest possible TCB Open-source JVM, non-optimizing JIT Highly optimizing Java Compiler optimizing compiler PCC system, Foundational PCC Our System: 2/5/2019

29 Future Work Machine Descriptions for other CPUs (Mips, Sparc so far)
. 2/5/2019 Future Work Machine Descriptions for other CPUs (Mips, Sparc so far) TCB is really small but proof sizes are large. Work on finding the right tradeoff between TCB size and proof size Compress DAG in some way Use another compressed form of the LF syntactic notation Add a simple Prolog interpreter to the TCB that “rediscovers” the proof based on the sequence of TAL instructions given to the checker TCB no longer minimal but proof sizes greatly reduced 2/5/2019


Download ppt "A Trustworthy Proof Checker"

Similar presentations


Ads by Google