Asaf Cidon. , Tomer M. London

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Chapter 5: CPU Scheduling
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
ALGEBRA Number Walls
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Scalable Routing In Delay Tolerant Networks
2 pt 3 pt 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Two-step linear equations Variables.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
CALENDAR.
0 - 0.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Vulnerability Evaluation for Securely Offloading Mobile Apps in the Cloud He Zhu, Changcheng Huang and James Yan Department of Systems and Computer Engineering,
Eduardo Cuervo - Duke Aruna Balasubramanian - U Mass Amherst Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra, Paramvir Bahl – Microsoft Research.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Break Time Remaining 10:00.
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
Improving DRAM Performance by Parallelizing Refreshes with Accesses
1 PhD Defense Presentation Managing Shared Resources in Chip Multiprocessor Memory Systems 12. October 2010 Magnus Jahre.
EU market situation for eggs and poultry Management Committee 20 October 2011.
Source: IEEE Pervasive Computing, Vol. 8, Issue.4, Oct.2009, pp. 14 – 23 Author: Satyanarayanan, M., Bahl, P., Caceres, R., Davies, N. Adviser: Chia-Nian.
Extending the Capacity of Mobile Devices Through Cloud Offloading Francisco Airton – PhD Student 04 of may, 2014 Workshop MoDCS
Mostafa Ammar, School of Computer Science Georgia Institute of Technology Atlanta, GA Mobile Computing in Cirrus Clouds: Mobile Computing in Cirrus Clouds:
15. Oktober Oktober Oktober 2012.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
Management and Control of Domestic Smart Grid Technology IEEE Transactions on Smart Grid, Sep Albert Molderink, Vincent Bakker Yong Zhou
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
Clock will move after 1 minute
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
Essential Cell Biology
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Cooperative Cache Scrubbing Jennifer B. Sartor, Wim Heirman, Steve Blackburn*, Lieven Eeckhout, Kathryn S. McKinley^ PACT 2014 * ^
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
ECOS: Leveraging Software-Defined Networks to Support Mobile Application Offloading Aaron Gember, Christopher Dragga, Aditya Akella University of Wisconsin-Madison.
Slide Courtesy: Prof. Pradipta De, SUNY Korea Mobile Cloud Computing.
ThinkAir: Dynamic Resource Allocation and Parallel Execution in Cloud for Mobile Code Offloading Sokol Kosta, Pan Hui Deutsche Telekom Labs, Berlin, Germany.
Computation Offloading
Eduardo Cuervo – Duke University Aruna Balasubramanian - University of Massachusetts Amherst Dae-ki Cho - UCLA Alec Wolman, Stefan Saroiu, Ranveer Chandra,
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
Presentation transcript:

MARS: Adaptive Remote Execution Scheduler for Multithreaded Mobile Devices Asaf Cidon*, Tomer M. London*, Sachin Katti, Christos Kozyrakis, Mendel Rosenblum Hi everyone. Today I’m going to present MARS, a system that allows mobile devices to use the cloud as a system computing resource. My collaborator in this project is Tomer London, and we are advised by Sachin Katti, Christos Kozyrakis and Mendel Rosenblum, all of Stanford University. Stanford University *Equal contributors

New Class of Mobile Applications Computer Vision The computer of the future is the mobile device. It is personal, social and highly connected. Applications are the drive, the problem is the limited resources on the device There is a growing demand for higher computing performance on mobile devices, in order to support applications like computer vision, augmented reality and 3D games The problem is that device CPUs are hitting the energy wall, so we there’s an inherent limit to the amount of compute performance we can have on mobile devices. The solution to this problem is to use the cloud. And more specifically, we think the cloud should be viewed as a system resource by the mobile device (or cloud on chip), which is a new type of CPU that has a slower less reliable interconnect We designed MARS, a simple remote execution scheduler that resides natively on the device, which can utilize this “cloud-on-chip” interface We built a simulator for our system and measured network, power and performance of three application This research is meant as a proof-of-concept for dynamic remote execution, we intend to implement it Augmented Reality Motion Sensing October 23, 2011

Maximum Bandwidth (Mb/s) Mobile Client Trends Mobile CPU performance increasing Hitting ‘energy wall’ Can we improve performance and reduce energy consumption? Opportunity: network bandwidth increase utilize the cloud Maximum Bandwidth (Mb/s) Why CPUs are hitting energy wall Almost infinite amount of computing resource around us, and almost for free, coupled with wireless network bandwidth increasing exponentially, if you take a look at Wi-Fi… The question is whether we can exploit the exponentially increasing network resources to offload tasks from the mobile phone Consumer demand for performance on mobile devices: vision, graphics, games Multicore devices, new architectures from QLCM, Nvidia, Intel, GPUs Inherent limitation- power density and battery life. Client server may help October 23, 2011

Static Client-Server Partitioning Doesn’t Work Dynamic resources: Network bandwidth and latency Available CPU, memory Same code, different platforms: Smartphones (single-core, multi-core) Tablets Start off by saying this isn’t a new idea – for example google. The key point is that partitioning is static. It doesn’t work for remote execution in our context. Maybe put a static client server partition that is used now It’s tough to statically partition your code between a mobile device and a server, because of the variable network and hardware resources, and also because the wide range of mobile device capabilities. A code written to an Android device can run on a three year-old G1 and simultaneously on one of Samsung’s new tablets. October 23, 2011

MARS: Adaptive Remote Execution Opportunistically offload computations to remote server Enhance computational capabilities Decrease energy consumption Make dynamic decisions Adapt to network and CPU variability MARS works because it’s dynamic Data Center Mobile Device October 23, 2011

Agenda Design of MARS Simulator Results and Analysis Conclusions October 23, 2011

Existing Remote Execution Systems The Unit of Remote Execution Cloudlets [Satyanarayanan et al., ‘09] CloneCloud [Kirsch et al., ‘11] VM MAUI [Cuervo et al. ‘10] MARS “Cloud-on-Chip” Odessa [Ra et al. ‘11] RPC Say more why VM is bad Transition – the difference is that we do system level performance optimizations. Chroma [Balan et al. ‘03] Target of Performance Optimization Single-thread application Multi-threaded application System October 23, 2011

Previous Systems: Application Partitioning MARS “Cloud-on-Chip”: System Scheduling RPC 1 Process 1 RPC 2 RPC 3 RPC 4 RPC 5 Local Execution Remote Execution RPC 2 Process 3 RPC 1 Process 1 Process 2 RPC Queue Local Cores Remote CLOUD ON CHIP emphasize both in speech and graphically Communicate complexity of optimization solvers and lack of scalability You have to run an optimization program for each application, and the results change if it’s a multicore or if the resources changes, and this increases complexity doesn’t scale October 23, 2011

Greedy Algorithm Higher POR: better performance gain from offloading EOR ≥ 1 𝐺 ? Higher POR: better performance gain from offloading Higher EOR: better energy saving from offloading EOR < 𝐺 ? October 23, 2011

G (Greediness) trades-off utilization and energy efficiency Controller Algorithm Remote Server Available RPC 3 (POR 2.5) EOR Local Remote Both 𝟏 𝑮 Check EOR Threshold RPC 5 (POR 1.9) Priority Queue, sorted by Performance Offload Rank (POR) 𝑮 RPC 6 (POR 1.8) RPC 6 (POR 1.8) RPC 4 (POR 1.3) RPC 2 (POR 0.4) Greediness- Higher G->inf means we don’t use EOR at all G (Greediness) trades-off utilization and energy efficiency Local Core Available October 23, 2011

Agenda Design of MARS Simulator Results and Analysis Conclusions October 23, 2011

Remote Execution Applications Augmented Reality Face Recognition Pic Pic Pic Pic Pic Pic Barcode Detection Barcode Detection Barcode Detection Rendering Recognition Rendering Recognition Rendering Recognition Here are two applications, some of their parts are computationally intensive (like face recognition and detection), and the programmer marked them…

Simulator Methodology Trace-driven simulation Clients: Nokia N900 (single core) NVIDIA Tegra 250 (multicore) Server: Amazon EC2 Opteron 2007 Networks: Outdoors Wi-Fi Indoors Wi-Fi 3G June 4, 2011

MARS vs. Static Policies Lower is better

Nokia N900 Power Consumption Wi-Fi 3G Idle Network Power 1.31 Watts 0.66 Watts Upload Network Power 1.464 Watts 2.36 Watts Download Network Power 1.39 Watts 2.26 Watts Upload Network Power Overhead 10.51% 72.03% In Wi-Fi you need to maximize utilization MARS made the right choice not to fully utilize remote execution WiFi: Performance and energy are highly correlated 3G: trade-off performance and energy October 23, 2011

Same Application, Different Networks Talk about the fact that these are realistic traces MARS decided to be conservative and to hardly offload tasks

Remote Execution with Multicore October 23, 2011

Agenda Design of MARS Simulator Results and Analysis Conclusions October 23, 2011

Conclusions Can’t always be greedy Performance and energy trade-off MARS is optimized for multiple parallel applications and cores MARS “Cloud-on-Chip”: validation of system-level remote execution scheduling 57% performance increase, 33% energy savings Takeaways not conclusions: - Takeaways (can’t just be greedy, important even for multicore) Performance gains Vision: Cloud on chip October 23, 2011