Presentation is loading. Please wait.

Presentation is loading. Please wait.

System Software Considerations for Cloud Computing on Big Data

Similar presentations


Presentation on theme: "System Software Considerations for Cloud Computing on Big Data"— Presentation transcript:

1 System Software Considerations for Cloud Computing on Big Data
March 17, 2011 Michael Kozuch Intel Labs Pittsburgh

2 Outline Background: Open Cirrus Cluster software stack Big Data Power
Recent news

3 Open Cirrus

4 Open Cirrus* Cloud Computing Testbed
Collaboration between industry and academia, sharing hardware infrastructure software infrastructure research applications and data sets ISPRAS* UIUC* KIT* ETRI* China Telecom* CESGA* CMU* GaTech* China Mobile* IDA* MIMOS* Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) 14 sites currently, target of around 20 in the next two years

5 Independently-managed sites… providing a cooperative research testbed
Open Cirrus* Objectives Foster systems research around cloud computing  Enable federation of heterogeneous datacenters Vendor-neutral open-source stacks and APIs for the cloud Expose research community to enterprise level requirements Capture realistic traces of cloud workloads Each Site Runs its own research and technical teams, Contributes individual technologies Operates some of the global services Independently-managed sites… providing a cooperative research testbed

6 Intel BigData Cluster 45 Mb/s T3 to Internet Switch 24 Gb/s Switch
Mobile Rack 8 (1u) nodes 2 Xeon E5440 (quad-core) [Harpertown] 16GB DRAM 2 1TB Disk Intel BigData Cluster 1 Gb/s (x8 p2p) Switch 24 Gb/s 1 Gb/s (x4) 1 Gb/s (x8) 3U Rack 5 storage nodes 12 1TB Disk 1 Gb/s (x4) 1 Gb/s (x2x5 p2p) 45 Mb/s T3 to Internet Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s (x4) 1 Gb/s (x4) 20 nodes: 1 Xeon (single-core) [Irwindale] 6GB DRAM 366GB disk 10 nodes: 2 Xeon 5160 (dual-core) [Woodcrest] 4GB RAM 2 75GB Disk 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) (r1r5) Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s (x4x4 p2p) 1 Gb/s (x4x4 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x27 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) Blade Rack 40 nodes Blade Rack 40 nodes 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1U Rack 15 nodes 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 1U Rack 15 nodes 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 12 nodes 2 Xeon X5650 (six-core) [WestmereEP] 48GB DRAM 6 0.5TB Disk 2U Rack 15 nodes 2 Xeon E5440 (quad-core) [Harpertown] 8GB DRAM 6 1TB Disk 2U Rack 15 nodes 2 Xeon E5520 (quad-core) [Nehalem-EP] 16GB DRAM 6 1TB Disk Key: rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z x1 (r1r1) x1 (r1r2) x3 (r1r3,r1r4,r2r3) x2 (r3r2,r3r3) (r2r1c1-4) (r2r2c1-4)

7 Cloud Software Stack

8 Cloud Software Stack – Key Learnings
Enable use of application frameworks (Hadoop, Maui-Torque) Enable general IaaS use Provide Big Data storage service Enable physical resources allocation Application Frameworks IaaS Storage Service Why Physical? Virtualization overhead Access to phys resource Security issues Resource Allocator Node Node Node Node Node Node

9 Zoni Functionality Domain 0 Domain 1
Provides each project with a mini-datacenter Isolation of experiments Allocation Assignment of physical resources to users Isolation Allow multiple mini-clusters to co-exist without interference Provisioning Booting of specified OS Management OOB power management Debugging OOB console access Server Pool 0 PXE/DNS/DHCP Domain 0 Server Pool 1 Server Pool 0 DNS/PXE/DHCP Domain 1 Gateway

10 Intel BigData Cluster Dashboard

11 Big Data

12 Example Applications Application Big Data Algorithms Compute Style Scientific study (e.g. earthquake study) Ground model Earthquake simulation, thermal conduction, … HPC Internet library search Historic web snapshots Data mining MapReduce Virtual world analysis Virtual world database TBD Language translation Text corpuses, audio archives,… Speech recognition, machine translation, text-to-speech, … MapReduce & HPC Video search Video data Object/gesture identification, face recognition, … There has been more video uploaded to YouTube in the last 2 months than if ABC, NBC, and CBS had been airing content 24/7/365 continuously since Gartner

13 The value of a cluster is its data
Big Data Interesting applications are data hungry The data grows over time The data is immobile 100 1Gbps ~= 10 days Compute comes to the data Big Data clusters are the new libraries Data hungry: more data yields better accuracy- challenge is to hold response-time constant The value of a cluster is its data

14 Example Motivating Application: Online Processing of Archival Video
Research project: Develop a context recognition system that is 90% accurate over 90% of your day Leverage a combination of low- and high-rate sensing for perception Federate many sensors for improved perception Big Data: Terabytes of archived video from many egocentric cameras Example query 1: “Where did I leave my briefcase?” Sequential search through all video streams [Parallel Camera] Example query 2: “Now that I’ve found my briefcase, track it” Cross-cutting search among related video streams [Parallel Time] Big Data Cluster 14

15 Big Data System Requirements
Provide high-performance execution over Big Data repositories  Many spindles, many CPUs  Parallel processing Enable multiple services to access a repository concurrently Enable low-latency scaling of services Enable each service to leverage its own software stack  IaaS, file-system protections where needed Enable slow resource scaling for growth Enable rapid resource scaling for power/demand  Scaling-aware storage

16 Storing the Data – Choices
Model 1: Separate Compute/Storage compute servers storage servers Compute and storage can scale independently Many opportunities for reliability Model 2: Co-located Compute/Storage No compute resources are under-utilized Potential for higher throughput compute/storage servers

17 The cluster switch quickly becomes the bottleneck.
Cluster Model external network Cluster Switch BWswitch Connections to R Racks TOR Switch BWnode BWdisk p cores d disks The cluster switch quickly becomes the bottleneck. Rack of N server nodes Local computation is crucial.

18 I/O Throughput Analysis
Disk BW = 80 MB/s N*d*BW = 25 Gbps SSD BW = 250 MB/s N*d*BW = Gbps 20 racks of 20 2-disk servers; BWswitch = 10 Gbps

19 Data Location Information
Issues: Many different file system possibilities (HDFS, PVFS, Lustre, etc) Many different application framework possibilities Consumers could be virtualized Solution: Standard cluster-wide Data Location Service Resource Telemetry Service to evaluate scheduling choices Enables virtualized location info and file system agnosticism

20 Exposing Location Information
Data Location Service LA application LA application LA runtime Resource Telemetry Service LA runtime Virtual Machines DFS DFS OS Guest OS VM Runtime VMM DFS OS (a) non-virtualized (b) virtualized

21 Power

22 Power Proportionality
Demand Scaling/ Power Proportionality (System) Efficiency “A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems,” Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, and Albert Zomaya

23 Possible power savings:
Power Proportionality and Big Data The Hadoop Filesystem (10K blocks) 2000 Possible power savings: ~66% ~0% Optimal: ~95% Number of blocks stored on node i i=100 Node number i

24 filesystem for Big Data Simple Strategy: Maintain a
Rabbit Filesystem A reliable, power-proportional filesystem for Big Data workloads Simple Strategy: Maintain a “primary replica”

25 Recent News

26 Recent News “Intel Labs to Invest $100 Million in U.S. University Research” Over five years Intel Science and Technology Centers– 3+2 year sponsored research Half-dozen or more by 2012 Each can have small number of Intel research staff on site New ISTC focusing on cloud computing possible

27 Tentative Research Agenda Framing

28 Potential Questions

29 Potential Research Questions
Software stack Is physical allocation an interesting paradigm for the public cloud? What are the right interfaces between the layers? Can multi-variable optimization work across layers? Big Data Can a hybrid cloud-HPC file system provide best-of-both-worlds? How should the file system deal with heterogeneity? What are the right file system sharing models for the cloud? Can physical resources be taken from the FS and given back?

30 Potential Research Questions
Power Can storage service power be reduced without reducing availability? How should a power-proportional FS maintain a good data layout? Federation Which applications can cope with limited bandwidth between sites? What are the optimal ways to join data across clusters? How necessary is federation? How should compute, storage, and power be managed to optimize for performance, energy, and fault-tolerance?

31 Backup

32 Scaling– Power Proportionality
Demand scaling presents perf./power trade-off Our servers: 250W loaded, 150W idle, 10W off, 200s setup Research underway for scaling cloud applications Control theory Load prediction Autoscaling Scaling beyond single tier less well-understood Cloud-based App Request rate: λ Note: proportionality issue is orthogonal to FAWN design

33 Scaling– Power Proportionality
Project 1: Multi-tier power management E.g. Facebook Project 2: Multi-variable optimization Project 3: Collective optimization Open Cirrus may have key role λ e.g. Tashi IaaS e.g. Rabbit Distributed file system e.g. Zoni Resource allocator Physical resources λ


Download ppt "System Software Considerations for Cloud Computing on Big Data"

Similar presentations


Ads by Google