Mark Gordon1 COMET: Code Offload by Migrating Execution Transparently OSDI'12 Mark Gordon, Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao, and Xu Chen University.

Slides:

Advertisements

Similar presentations

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.

Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.

University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),

Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.

Remote Procedure Call Design issues Implementation RPC programming

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.

Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.

Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.

© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.

CS 603 Threads, Processes, and Agents March 18, 2002.

G Robert Grimm New York University Sprite LFS or Let’s Log Everything.

A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.

PRASHANTHI NARAYAN NETTEM.

Give an example to show the advantages to using multithreaded Clients See page 142 of the core book (Tanebaum 2002).

CSE 490dp Check-pointing and Migration Robert Grimm.

ThinkAir: Dynamic Resource Allocation and Parallel Execution in Cloud for Mobile Code Offloading Sokol Kosta, Pan Hui Deutsche Telekom Labs, Berlin, Germany.

Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.

User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.

Accelerating Mobile Applications through Flip-Flop Replication

DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program

@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.

Introduction to Threads CS240 Programming in C. Introduction to Threads A thread is a path execution By default, a C/C++ program has one thread called.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Clone-Cloud. Motivation With the increasing use of mobile devices, mobile applications with richer functionalities are becoming ubiquitous But mobile.

Chapter 2 (PART 1) Light-Weight Process (Threads) Department of Computer Science Southern Illinois University Edwardsville Summer, 2004 Dr. Hiroshi Fujinoki.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.

Architectures of distributed systems Fundamental Models

Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.

Games Development 2 Concurrent Programming CO3301 Week 9.

Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.

1 Process migration n why migrate processes n main concepts n PM design objectives n design issues n freezing and restarting a process n address space.

C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.

Mark Gordon1 COMET: Code Offload by Migrating Execution Transparently OSDI'12 Mark Gordon, Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao, and Xu Chen University.

Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.

Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,

Threads. Readings r Silberschatz et al : Chapter 4.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.

Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.

CSC 360- Instructor: K. Wu Review of Computer Organization.

Parallel Computing Presented by Justin Reschke

Background Computer System Architectures Computer System Software.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.

강호영 Contents ZooKeeper Overview ZooKeeper’s Performance ZooKeeper’s Reliability ZooKeeper’s Architecture Running Replicated ZooKeeper.

Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.

Introduction to threads

Applications Active Web Documents Active Web Documents.

Credits: 3 CIE: 50 Marks SEE:100 Marks Lab: Embedded and IOT Lab

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Introduction Enosis Learning.

Meng Cao, Xiangqing Sun, Ziyue Chen May 28th, 2014

Introduction Enosis Learning.

Chapter 9: Virtual-Memory Management

Multithreaded Programming

MPJ: A Java-based Parallel Computing System

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Presentation transcript:

Mark Gordon1 COMET: Code Offload by Migrating Execution Transparently OSDI'12 Mark Gordon, Anoushe Jamshidi, Scott Mahlke, Z. Morley Mao, and Xu Chen University of Michigan, AT&T Labs - Research

Mark Gordon2 Overview ● Introduction ● Distributed Shared Memory ● COMET Design ● Evaluation ● Summary

Mark Gordon3 What is offloading? ● Mobile devices – Have limited resources – Are well connected ● Can we bring network resources to mobile? – Can a system transparently make this available?

Mark Gordon4 Related Work ● MAUI and CloneCloud ● Utilize server resources – Computation, energy, memory, disk ● 'Capture and migrate' method level offloading ● Areas for improvement – Thread and synchronization support – Offload part of methods

Mark Gordon5 COMET's Goals 1. Improve mobile computation speed 2. Require no programmer effort 3. Generalize well with existing applications 4. Resist network failures

Mark Gordon6 Overview ● Introduction ● Distributed Shared Memory ● COMET Design ● Evaluation ● Summary

Mark Gordon7 Distributed Shared Memory ● COMET is offloading + DSM – Offloading bridges computation disparity – DSM provides logically shared address space ● DSM usually applied to cluster environments – Low latency, high throughput ● Mobile relies on wireless communication

Mark Gordon8 X=123 X=555 X=? X=123 DSM (continued) ● Conventional DSM (Munin) X=123 ● Waited an RTT for a write ● Read could take RTT also

Mark Gordon9 Java Memory Model ● Dictates which writes a read can observe ● Specifies 'happens-before' partial order – Access in single thread totally ordered – Lazy Release Consistency locking ● Fundamental memory unit is the field – Known alignment, known width

Mark Gordon10 Field DSM ● Track dirty fields locally ● Need 'happens-before' established? – Transmit dirty fields! (mark fields clean) ● Not clear it scales well past two endpoints – Not important to our motivation – Use classic cluster DSM on server

Mark Gordon11 Overview ● Introduction ● Distributed Shared Memory ● COMET Design ● Evaluation ● Summary

Mark Gordon12 VM-synchronization ● Used to establish 'happens-before' relation ● Directed operation between pusher and puller ● Synchronizes – Bytecode sources – Java thread stacks – Java heap

Mark Gordon13 Bytecode Update (Step 1 of 3) ● Operation begins by sending any new code I loaded file xyz.dex I have xyz.dex cachedSend xyz.dex [xyz.dex] Pusher Puller

Mark Gordon14 Stack Update (Step 2 of 3) ● Next we send over thread stacks Thread id: 2 job2::run pc:5 registers[42, 555, 0] workLoop pc:6 registers[0, [obj:9]] start pc:3 Registers[101, [obj:9]] Pusher Puller nom

Mark Gordon15 Heap Update (Step 3 of 3) ● Finally send over heap update – We send updates to any changed (or new) field – Only send updates of 'shared' heap Pusher Puller [obj:2].y = 1 [obj:4].z = [obj:3]...

Mark Gordon16 Lock ownership ● Annotate with lock ownership flag ● Establish 'happens-before' with VM-sync

Mark Gordon17 Thread Migration ● Thread migration trivial – Push VM-sync – Transfer lock ownership Pusher Puller

Mark Gordon18 Native Methods ● Written in C with bindings for Java – Math.sin(), OSFileSystem.write(), VMThread.currentThread() ● Native methods exist to – Access device resources (file system, display, etc) – For performance reasons – To work with existing libraries ● Not generally safe to run on either endpoint – Manually white list safe native methods

Mark Gordon19 Failure Recovery ● VM-synchronization is recovery safe ● Always leave enough information on client ● If server is lost resume threads running locally! ● A few caveats (native methods)

Mark Gordon20 Tau-Scheduler Τ = 2 * VM-synchronization time

Mark Gordon21 Implementation ● Built from gingerbread CyanogenMod source ● ~5000 lines of C code ● JIT not included Engine.c:offMigrateThread() offWriteU1(self, OFF_ACTION_MIGRATE); deactivate(self); offThreadWaitForResume(self);

Mark Gordon22 Overview ● Introduction ● Distributed Shared Memory ● COMET Design ● Evaluation ● Summary

Mark Gordon23 Evaluation Setup ● Samsung Captivate (1 GHz Hummingbird) ● 2 x 3.16GHz quad core Xeon X5460 cores

Mark Gordon24 Benchmarks ● 8 applications from Google Play – Average speed-up of 2.88X on WiFi / 1.28X on 3G – Average energy saving of 1.51X on WiFI / 0.84X on 3G ● 2 computation benchmark applications – 10.4X speed-up w/ WiFi on Linpack – 500+X speed-up w/ multi-threaded factoring

Mark Gordon25 Rhino ● Java JavaScript Interpreter – Ran with SunSpider JavaScript benchmark

Mark Gordon26 Overview ● Introduction ● Distributed Shared Memory ● COMET Design ● Evaluation ● Summary

Mark Gordon27 Summary ● Offloading+DSM=COMET – Improve computation speed – No programmer effort – Generalize well – Resist network failures

Mark Gordon28 Contributions ● Design/Impl. with four simultaneous goals – Fine granularity offloading – Mutli-threading support ● Field based DSM coherency

Mark Gordon29 Questions? ?

Mark Gordon30 Macrobenchmarks

Mark Gordon31 Macrobenchmarks (continued)