1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections 6.1-6.7) – the basics followed by a look at the future.

Slides:



Advertisements
Similar presentations
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Computer.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Cloud Computing Data Centers Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of Computing, UNF.
IT Equipment Efficiency Peter Rumsey, Rumsey Engineers.
The University of Adelaide, School of Computer Science
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Computer.
Hinrich Schütze and Christina Lioma Lecture 4: Index Construction
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Chapter 6 Authors: John Hennessy & David Patterson.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Lecture 13 Warehouse Computing Prof. Xiaoyao Liang 2015/6/10
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Building Green Cloud Services at Low Cost Josep Ll. Berral, Íñigo Goiri, Thu D. Nguyen, Ricard Gavaldà, Jordi Torres, Ricardo Bianchini.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
1 Lecture: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
1 Lecture 24: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )
1 现代计算机体系结构 主讲教师:张钢 教授 天津大学计算机学院 通信邮箱: 提交作业邮箱: 2015 年.
CS203 – Advanced Computer Architecture Warehouse Scale Computing.
1 Lecture: Networks, Disks, Datacenters, GPUs Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections , App D, Ch 4)
Warehouse Scaled Computers
The University of Adelaide, School of Computer Science
Hadoop Aakash Kag What Why How 1.
Lecture 20: WSC, Datacenters
Green cloud computing 2 Cs 595 Lecture 15.
Map Reduce.
Introduction to MapReduce and Hadoop
PA an Coordinated Memory Caching for Parallel Jobs
Ministry of Higher Education
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
湖南大学-信息科学与工程学院-计算机与科学系
Lecture 18 Warehouse Scale Computing
Cse 344 May 2nd – Map/reduce.
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
The University of Adelaide, School of Computer Science
Towards Green Aware Computing at Indiana University
Lecture 18 Warehouse Scale Computing
Lecture 18 Warehouse Scale Computing
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future

2 Warehouse-Scale Computer (WSC) 100K+ servers in one WSC ~$150M overall cost Requests from millions of users (Google, Facebook, etc.) Cloud Computing: a model where users can rent compute and storage within a WSC, there’s an associated service-level agreement (SLA) Datacenter: a collection of WSCs in a single building, possibly belonging to different clients and using different hardware/architecture (note some won’t agree with this)

3 Workloads Typically, software developed in-house – MapReduce, BigTable, etc. MapReduce: embarrassingly parallel operations performed on very large datasets, e.g., search on a keyword, aggregate a count over several documents Hadoop is an open-source implementation of the MapReduce framework; makes it easy for users to write MapReduce programs without worrying about low-level task/data management

4 MapReduce Application-writer provides Map and Reduce functions that operate on key-value pairs Each map function operates on a collection of records; a record is (say) a webpage or a facebook user profile The records are in the file system and scattered across several servers; thousands of map functions are spawned to work on all records in parallel The Reduce function aggregates and sorts the results produced by the Mappers, also performed in parallel

5 MR Framework Duties Replicate data for fault tolerance Detect failed threads and re-start threads Handle variability in thread response times Use of MR within Google has been growing every year: Aug’04  Sep’09  Number of MR jobs has increased 100x+  Data being processed has increased 100x+  Number of servers per job has increased 3x

6 WSC Hierarchy A rack can hold 48 1U servers (1U is 1.75 inches high and is the maximum height for a server unit) A rack switch is used for communication within and out of a rack; an array switch connects an array of racks Latency grows if data is fetched from remote DRAM or disk (300us vs. 0.1us for DRAM and 12ms vs. 10ms for disk ) Bandwidth within a rack is much higher than between arrays; hence, software must be aware of data placement and locality

7 Power Delivery and Efficiency Figure 6.9 Power distribution and where losses occur. Note that the best improvement is 11%. (From Hamilton [2010].) Source: H&P Textbook Copyright © 2011, Elsevier Inc. All rights Reserved.

8 PUE Metric and Power Breakdown PUE = Total facility power / IT equipment power It is greater than 1; ranges from 1.33 to 3.03, median of 1.69 The cooling power is roughly half the power used by servers Within a server (circa 2007), the power distribution is as follows: Processors (33%), DRAM memory (30%), Disks (10%), Networking (5%), Miscellaneous (22%)

9 CapEx and OpEx Capital expenditure: infrastructure costs for the building, power delivery, cooling, and servers Operational expenditure: the monthly bill for energy, failures, personnel, etc. CapEx can be amortized into a monthly estimate by assuming that the facilities will last 10 years, server parts will last 3 years, and networking parts will last 4

10 CapEx/OpEx Case Study 8 MW facility : facility cost: $88M, server/networking cost: $79M Monthly expense: $3.8M. Breakdown:  Servers 53% (amortized CapEx)  Networking 8% (amortized CapEx)  Power/cooling infrastructure 20% (amortized CapEx)  Other infrastructure 4% (amortized CapEx)  Monthly power bill 13% (true OpEx)  Monthly personnel salaries 2% (true OpEx)

11 Improving Energy Efficiency An unloaded server dissipates a large amount of power Ideally, we want energy-proportional computing, but in reality, servers are not energy-proportional Can approach energy-proportionality by turning on a few servers that are heavily utilized See figures on next two slides for power/utilization profile of a server and a utilization profile of servers in a WSC

12 Power/Utilization Profile Source: H&P textbook. Copyright © 2011, Elsevier Inc. All rights Reserved.

13 Server Utilization Profile Figure 6.3 Average CPU utilization of more than 5000 servers during a 6-month period at Google. Servers are rarely completely idle or fully utilized, in-stead operating most of the time at between 10% and 50% of their maximum utilization. (From Figure 1 in Barroso and Hölzle [2007].) The column the third from the right in Figure 6.4 calculates percentages plus or minus 5% to come up with the weightings; thus, 1.2% for the 90% row means that 1.2% of servers were between 85% and 95% utilized. Source: H&P textbook. Copyright © 2011, Elsevier Inc. All rights Reserved.

14 Other Metrics Performance does matter, especially latency An analysis of the Bing search engine shows that if a 200ms delay is introduced in the response, the next click by the user is delayed by 500ms; so a poor response time amplifies the user’s non-productivity Reliability (MTTF) and Availability (MTTF/MTTF+MTTR) are very important, given the large scale A server with MTTF of 25 years (amazing!) : 50K servers would lead to 5 server failures a day; Similarly, annual disk failure rate is 2-10%  1 disk failure every hour

15 Title Bullet