Trustless Grid Computing in ConCert (Progress Report) Robert Harper Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Categories of I/O Devices
Distributed Processing, Client/Server and Clusters
Reliable Scripting Using Push Logic Push Logic David Greaves, Daniel Gordon University of Cambridge Computer Laboratory Reliable Scripting.
Trustless Grid Computing in Bor-Yuh Evan Chang, Karl Crary, Margaret DeLap, Robert Harper, Jason Liszka, Tom Murphy VII, Frank Pfenning
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Comparing Semantic and Syntactic Methods in Mechanized Proof Frameworks C.J. Bell, Robert Dockins, Aquinas Hobor, Andrew W. Appel, David Walker 1.
Certified Typechecking in Foundational Certified Code Systems Susmit Sarkar Carnegie Mellon University.
Foundational Certified Code in a Metalogical Framework Karl Crary and Susmit Sarkar Carnegie Mellon University.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
March 4, 2005Susmit Sarkar 1 A Cost-Effective Foundational Certified Code System Susmit Sarkar Thesis Proposal.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Multiple Processor Systems
The Design and Implementation of a Certifying Compiler [Necula, Lee] A Certifying Compiler for Java [Necula, Lee et al] David W. Hill CSCI
Distributed Processing, Client/Server, and Clusters
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB JavaForum.
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
Technical Architectures
Conductor A Framework for Distributed, Type-checked Computing Matthew Kehrt.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
Learning Objectives Understanding the difference between processes and threads. Understanding process migration and load distribution. Understanding Process.
8/14/03ALADDIN REU Symposium Implementing TALT William Lovas with Karl Crary.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 1 and 2 Computer System and Operating System Overview
A Type System for Expressive Security Policies David Walker Cornell University.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Fundamentals of Python: From First Programs Through Data Structures
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Computer System Architectures Computer System Software
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
GrIDS -- A Graph Based Intrusion Detection System For Large Networks Paper by S. Staniford-Chen et. al.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Containment and Integrity for Mobile Code Security policies as types Andrew Myers Fred Schneider Department of Computer Science Cornell University.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
Architectures of distributed systems Fundamental Models
The ConCert Project Peter Lee Carnegie Mellon University MRG Workshop May 2002.
Trustless Grid Computing in Bor-Yuh Evan Chang, Karl Crary, Margaret DeLap, Robert Harper, Jason Liszka, Tom Murphy VII, Frank Pfenning
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Towards Automatic Verification of Safety Architectures Carsten Schürmann Carnegie Mellon University April 2000.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
A Universe-Type-Based Verification Technique for Mutable Static Fields and Methods Alexander J Summers Sophia Drossopoulou Imperial College London Peter.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
Client-Server Model of Interaction Chapter 20. We have looked at the details of TCP/IP Protocols Protocols Router architecture Router architecture Now.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB Markus.
SAFE KERNEL EXTENSIONS WITHOUT RUN-TIME CHECKING George C. Necula Peter Lee Carnegie Mellon U.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Tutorial 2: Homework 1 and Project 1
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Theorem Proving Algorithm
Processes and threads.
Lecture 21 Concurrency Introduction
The ConCert Project Trustless Grid Computing
Grid Coordination by Using the Grid Coordination Protocol
A Trustworthy Proof Checker
Foundations and Definitions
Presentation transcript:

Trustless Grid Computing in ConCert (Progress Report) Robert Harper Carnegie Mellon University

Acknowledgements Co-PI’s Karl Crary, Frank Pfenning, Peter Lee. Support NSF ITR program. Students (who do the real work) Chang, Delap, Dreyer, Kliger, Magill, Moody, Murphy, Petersen, Sarkar, Vanderwaart, Watkins. Thanks to FGC Organizers for the invitation!

Grid Computing “The network is a computer.” –Exploit idle resources on the network. –Many ad hoc grids. But what is a general grid model? –Trust model, programming model, participation model?

Application Model What is the (a?) grid computer? –Parallelism? –Dependencies? –Sharing resources? –Failures? Centralized vs. distributed. –Bottlenecks (e.g., SETI traffic at UCB). –Reliability, robustness.

Application Model Most grid app’s are massively parallel. –Depth = 1, no dependencies. –Ray tracing, GIMPS, SETI. Is a grid useful for depth > 1? –Game-tree search. –Theorem proving. Is parallelism the only benefit? –What about data locality?

Host Model Active intervention required. –Must download code, apply upgrades. –Must decide on which grids to participate. Motivation to participate? –At scale, largely altruism, coolness. –Ad hoc grids on an intranet. –Economic models? (Cf Lillibridge, et al.)

Trust Relationships Hosts trust applications. –Denial of service attacks. –Privacy/secrecy attacks. –Accidental misbehavior (e.g., SETI). Applications trust hosts. –Spoofed answers. –Collusion among participants. Can we minimize these?

The ConCert Approach One computer, many keyboards. –Decentralized scheduling. –Emphasis on code mobility. Policy-based participation. –Declarative statement of participation criteria. –Applications must prove compliance. Dependency-based scheduling. –Arbitrary depth. –And/or dependencies. –Inspired by CILK/NOW.

The ConCert Network Client Hosts

Host Setup Locator Scheduler Worker Peer-to-Peer Discovery Protocol Distributed Scheduler Loader/Verifier/Runner

Scheduler Maintain ready and waiting queues. –Ready queue: available for “stealing”. –Wait queue: awaiting satisfying assignment. Work-stealing model. –Who has work to do? –Grab work, compute result, deliver to owner. Dependencies. –Supports depth > 1 parallelism. –Don’t care and don’t know parallelism.

Scheduler The unit of work on the grid is a cord.

Scheduler Cord structure: –Code: cached using MD5 fingerprints. –Certificate of compliance: (more later). –Dependencies: positive boolean formula. Assumptions: –Idempotent: can always be re-run. –Non-blocking: runs to completion (but may create more cords, often as continuations). –Communication only via dependencies. Satisfying assignment passed on activation.

Worker Steal work from (self or) neighbor. Obtain cord from host. –Typically arguments + dependencies. –Code shipped at most once. Verify certificate of compliance. Load and execute as a DLL. –Currently combined with verification. –Should verify at most once (cache result). Deliver result to owner.

Control Client. –Submit a job to the grid. –“One per keyboard.” Monitor. –Web server interface. –Displays cord status. –[Change policy.]

Moving Cords Around A client submits work, broken into cords, to the local conductor.

Idle peers steal cords to work on. Cords have destinations for their answers, shown by color here. Moving Cords Around

Some cords spawn new cords. They might depend on other cords before they can run. The destination of F and G is the green node, since they will be used to fill H’s dependencies.

Moving Cords Around When a cord finishes, the result is sent to its destination. The client interprets and displays the results. Simultaneously, unfinished cords continue to be stolen...

Moving Cords Around When the green node has answers for F and G, H is then ready to be stolen.

Popcorn/Grid Model my_cord: string £ witness ! string. –Marshals argument and result itself. –Witness is the satisfying assignment for its dependencies. Typical structure: –Input = entry point + arguments. –Dispatch on entry point. Cords as distributed continuations. –Perform some work, spawn new cords. –Supports various higher-level parallelism models.

ML/Grid Model One program for client and its cords. –Compiler separates client from cords. Compiler handles marshalling. Run-time checks enforce distinctions (more later). –Cord cannot perform I/O. –Client cannot submit itself as a cord. Compiles to TAL/Grid.

ML/Grid Model Primitives: –spawn : (unit !  ) !  task –sync :  task !  –relax :  task list !  £  task list Must be provided as primitives. –Requires access to representations. Further higher-level libraries. –E.g., parallelism models.

Examples GML ray-tracer (ICFP01 Contest). –Depth = 1. –Written in Popcorn/Grid, compiles to TALx86/Grid. Chess player. –Depth > 1, and-or dependencies. –Written in Popcorn/Grid, compiles to TALx86/Grid. Theorem prover for MLL. –Depth > 1, and-or dependencies. –Written in SML, runs on simulator. –Being ported to ML/Grid.

Some Problems Failures. –Fail-stop model is easily supported. –Demonic failures require result certification. Abandoning cords. –Or-dependencies are satisfied by first cord to deliver answer. –Parent must be prepared to receive result long after it is no longer needed. Sharing results. –Grid-wide cache of answers?

Result Certification Main idea: make host prove validity of answer. –Avoid need for application to trust hosts. Some applications admit native certification. –For theorem prover: the proof. –For factoring, the facts. Are there general result certification methods? –Work-stealing model precludes random allocation / redundancy methods (SETI, Bayanihan). –Centralized methods are not robust or scalable.

Result Certification A crazy idea: use the PCP theorem. –Use interactive dialog to spot-check a proof. Host proves that it ran given code on given data. –Execution trace is a proof that it did. –But traces can be huge! Engage in a dialog with O(1) rounds to check proof with high probability. –Avoids need to transmit trace itself. –But the representation is enormous!

Two Foundational Questions What is a type system for a GPL? –Enforce mobility constraints. –Clean type system to support development, compilation, certification. What policies can we support? –How to state policies? –How to prove compliance? –How to support multiple policies?

A Type System for GPL Main idea: modalities for mobility. –Cf. related ideas by Cardelli, Gordon, et al. –Cf. recent work by Walker. –Here: Curry-Howard applied to modal logic. Necessity (¤ A): a computation of A anywhere. –Classifies mobile code of type A. –Enforces marshalling and access restrictions. Possibility (¦ A): a computation of A somewhere. –Classifies remote code of type A. –Ensures that access is limited to remote values.

Necessity for Mobility Truth (local) typing judgement: Valid (Mobile) Bindings True (Local) Bindings

Necessity for Mobility Validity (mobile) typing judgement: Mobile = does not use local resources.

Necessity for Mobility Box = marshal value and bindings. Values of boxed type are mobile.

Necessity for Mobility Unboxing = unbox and run mobile code. Implicit un-marshalling:

Necessity for Mobility Marshalling = cast into network form. –Base types, structured types: fairly typical. –Function types: certified binary. Code mobility is a form of semantic linking. –Import object from the network. –Un-marshall, verify, load, execute. –(More later.)

Possibility for Locality Possible (somewhere) typing judgement: What is here is somewhere:

Possibility for Locality Create a local reference to something somewhere:

Possibility for Locality Move to remote entity: May be useful for managing data locality. –Return call has type ¦¤ (A! B). –Cf “upcalls”.

Modalities for Mobility These rules are for S4 modal logic. –Accessibility is reflexive and transitive. Is this the right notion of accessibility? –Symmetry = S5. “You can go home again.” –Judgmental form requires three contexts. –Explicit-world form uses a record of contexts. Other varieties of modal logic are also under consideration.

Policies and Certification Current certification methods are uniform. – 9 sec. policy 8 problems safety is assured. Eg, PCC for Java Eg, TAL for Popcorn. –Safety means memory and type safety. Baseline requirement. But not adequate for all applications. Recall: policies should be per-host.

Foundational Certification Non-uniform setup: 8 probs 9 type system –Shift the type system for object code out of the TCB (untrusted, problem-specific). –Must provide a proof that type system is safe. Compare Appel, et al. –Their goal: minimize TCB. –Our goal: support multiple safety policies. –Could be consolidated, but it’s a lot of work.

Foundational Safety Host specifies target architecture. –Fully realistic, e.g., IA-32 + OS + RTS. –No unsafe transitions. Safety policy: target does not get stuck. –Any type system must come with a proof of progress relative to the target machine. –Experience shows that progress proofs are readily mechanizable.

Foundational Certification (I)

Object code is essentially a DLL. Type system is specified in LF. –Using typical LF representations. Safety proof: well-typed ) safe. –Represented as an LF term. –Obtained with Twelf proof search engine. Derivation: type annotations for code. –Makes mechanical checking feasible.

Foundational Certification (I) May cache type system and safety proof. –Reduces certificate size. –Many cords for one type system is typical. May use oracle strings for derivation. –Relies on details of operational behavior of host-side checker. –Therefore not completely declarative. –But significantly reduces certificate size.

Foundational Certification (II)

Object code is a DLL as before. Type checker is a program. –Currently, a Twelf logic program. –Could be ML code. Safety proof shows partial correctness of the checker. –Checking succeeds ) safety. Annotations support mechanical checking. Time limit precludes looping. –Can refuse if limit is too large.

Examples TALT –Essentially TALx86 with a safety proof. –Proof is mechanically derived and checked. –Structured as a safety proof for an abstract machine plus a simulation lemma for target. TALT + Resource Bounds –Goal: ensure that object code yields processor at set intervals. –Precludes denial of CPU service.

Resource Bound Certification Type system enforces upper bound on yield interval. –Specified as a parameter of the type system. Basic method: –Conservative instruction counting (join points). –Yield processor at start of every basic block. –Prove that block can complete before next yield (else split block).

Resource Bound Certification Smarter techniques are under development. –Better analysis of code behavior across calls. –Fewer yields overall. Run-time checks reduce overhead. –Use static analysis to insert minor yields that check true interval. –Minor yields re-calibrate, possibly incurring a major yield (system call).

A Meta-Grid? ConCert Conductor represents one model of grid computing. –Compute-intensive, distributed scheduling. –Not much reason to believe this is canonical. Can we support a variety of models inside of a single meta-grid? –Applications choose grid model. –Hosts are indifferent to programming model.

A Meta-Grid? The ur-grid: –A TCP port. –Foundational code certification. A grid framework: –Scheduler, recovery model, host policy. –Runs application cords.

A Meta-Grid? Key capability: safe dynamic loading and linking. –Current ConCert framework must be certified against host safety policy. –It must be able to load application policies and application code. Requires a fairly sophisticated theory of sage linking.

Semantic Linking Marshalling is meta-programming. –Create values of a grid type system. –Cast grid values as local values. Certification is how we marshal code. –Functions are marshalled as closures plus proof of compliance with host type system. –Ensures that cast will succeed, safely. The ur-grid is just an unmarshaller. Grid frameworks are meta-programs.

Summary Declarative approach to safe grids. –Passive, policy-based participation model. –Logic and proof technology for specifying policies and proving compliance. Close interplay between systems building and foundational theory. –Type systems for mobile code. –Type systems for various safety policies.

Thanks! Web site: Demonstration available after talk. Questions or comments?