Presentation is loading. Please wait.

Presentation is loading. Please wait.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Similar presentations


Presentation on theme: "MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering."— Presentation transcript:

1 MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering and Computer Science (Presenter) Washington State University, Vancouver The 4th Workshop on Many-Task Computing on Grids and Supercomputers: MTAGS’11

2 2 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Outline Background –Problem and Goal System Overview Experiments Conclusion 2

3 3 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Today’s applications are increasingly data and compute intensive Many-Task Computing paradigms becoming pervasive: WF, MR E.g., Map-Reducible applications are solving common problems –Data mining –Graph processing –etc. The Problem 3

4 4 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Infrastructure-as-a-Service –Anyone, anywhere can allocate “unlimited” virtualized compute/storage resources Amazon Web Services: –Most popular IaaS provider –Elastic Compute Cloud (EC2) –Simple Storage Service (S3) Clouds 4

5 5 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Amazon Web Services: EC2 On-Demand Instances (Virtual Machines) –Types: Extra Large, Large, Small, etc. For example, Extra Large Instance: –Cost: $0.68 per allocated-hour –15GB memory; 1.7TB disk (ephemeral) –8 Compute Units –High I/O performance 5

6 6 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Amazon Web Services: S3 Accessible anywhere. High reliability and availability Objects are arbitrary data blobs Objects stored as (key,value) in Buckets –5TB of data per object –Unlimited objects per bucket 6

7 7 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Amazon Web Services: S3 (Cont.) Simple FTP-like interface using web service protocol –Put, Get (Partial Get), and Delete –SOAP and REST High throughput (~40MB/sec) –Scales well to multiple clients Low costs 7

8 8 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Amazon Web Services: S3 (Cont.) 449 billion objects in S3 as of July 2011 –Doubling each year 8

9 9 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Motivation and Focus 9 Virtualization is characteristic of any cloud environment: Clouds are black boxes Storage and elastic compute services exhibit performance variabilities we should leverage

10 10 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Goals As users are increasingly moving to cloud-based solutions for computing.... We have a need for services and tools that can… –Get the most out of cloud resources for data- intensive processing –Provide a simple programming interface 10 ?

11 11 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Outline Background System Overview Experiments Conclusion 11

12 12 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 System Design Cloud middleware that is able to –Use a set of possibly heterogeneous EC2 instances to scalably and efficiently process data stored in S3 MATE is a generalized reduction PDC structure like Map-Reduce 12

13 13 MATE T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE and Map-Reduce 13 Map-Reduce

14 14 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 Design 14

15 15 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 Design: Data Organization 15 Objects: Physical representation of the data in S3 Units: Fixed data units within a chunk for retrieval (exploits concurrency) Metadata: chunk offset, chunk size, unit size Chunks: Logical data partitions within objects (exploits memory utilization)

16 16 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 Design: Data Retrieval 16 Threaded chunk retrieval: Chunk retrieved concurrently with a number of threads

17 17 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Dynamic Load Balancing 17 Job Pool

18 18 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Dynamic Load Balancing 18 Job Pool Schedule for processing...... Simple greedy heuristic: Select a unit belonging to the least connected chunk

19 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 19 MATE-EC2 Processing Flow (1) Compute node requests a job from Master

20 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 20 MATE-EC2 Processing Flow (2) Chunk retrieved in units

21 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 21 MATE-EC2 Processing Flow (3) Pass to Compute Layer, and process

22 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 22 MATE-EC2 Processing Flow (4) Request another job from Master

23 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 23 MATE-EC2 Processing Flow Slave Instance...

24 24 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Experiments Goals –Finding the most suitable setting for AWS –Performance of MATE-EC2 on heterogeneous and homogeneous compute environments –Performance comparison of MATE-EC2 and Map- Reduce 24

25 25 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Experiments (Cont.) Setup: –4 Large EC2 slave instances –1 Large instance for master instance –For each application, the dataset is split into16 data objects on S3 Large Instance: –4 compute units (each comparable to 1.0-1.2GHz) –7.5GB (memory) –850GB (disk, ephemeral) –High I/O 25

26 26 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Experiments (Cont.) 26 AppI/OCompRObj SizeDataset KMeans Clustering Low/MedMed/HighSmall8.2GB 10.7 billion points PCALowHighLarge8.2GB PageRankHighLow/MedVery Large1GB 9.6M nodes, 131M edges

27 27 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Effect of Chunk Sizes (KMeans) Performance increase: –128KB vs. >8M –2.07x to 2.49x speedup 27 1 Data Retrieval Thread

28 28 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Data Retrieval (KMeans) 8M vs. others speedup: 1.13x - 1.30x 28 16 Data Retrieval Threads One Thread vs. others: 1.37x - 1.90x 128M Chunk Size

29 29 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Job Assignment (KMeans) Speedup: –1.01x for 8M –1.1x to 1.14x for others 29

30 30 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 vs. Elastic MapReduce 30 Speedups vs. EMR-combine 4.08x to 7.54x PageRank Speedups vs. EMR-combine 3.54x to 4.58x KMeans Chunk Size: 128MB Data retrieval Threads: 16

31 31 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) MATE-EC2 on Heterogeneous Instances Overheads –KMeans: 1% –PCA: 1.1%, 7.4%, 11.7% 31

32 32 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) In Conclusion... AWS environment is explored for data-intensive computing –64M and 128M data chunks w/ 16 data retrieval threads seems to be optimal for our middleware Our data retrieval and processing optimizations significantly improve the performance of our middleware MATE-EC2 outperforms MR both in scalability and performance 32

33 33 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) Thank You Questions & Discussion Acknowledgments: –We want to thank the MTAGS’11 organizers and the reviewers for their constructive comments –This work was sponsored by: 33 ?

34 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 34 Some Instance Types Cluster GPU –22 GB, 33.5 CU (2 x Intel Xeon X5570, quad-core “Nehalem” arch.), –2 x NVIDIA Tesla “Fermi” M2050 GPUs, –1690 GB of instance storage, –64-bit platform, I/O Performance: Very High (10 Gigabit Ethernet) Cluster Compute –23 GB, 33.5 CU (2 x Intel Xeon X5570, quad-core “Nehalem”) –1690 GB of instance storage, –64-bit platform, I/O Performance: Very High (10 Gigabit Ethernet), Small/Large Instance –1.7/7.5 GB, 1/4 CU (1/2 virtual cores with 1/2 EC2 Compute Units each) –160/850 GB instance storage –I/O Performance: Moderate/High (10 Gigabit Ethernet) 21

35 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 35 Bucket/Key 22

36 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 36 23

37 T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 37 24


Download ppt "MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering."

Similar presentations


Ads by Google