Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.

Similar presentations


Presentation on theme: "Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009."— Presentation transcript:

1 Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009

2 What is Sector/Sphere? Sector: Distributed Storage System Sphere: Run-time middleware that supports simplified distributed data processing. Open source software, GPL, written in C++. Started since 2006, current version 1.18

3 Overview Motivation Sector Sphere Experimental studies Future work

4 Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO

5 Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. BUT complicated, no data locality Sector/Sphere model: Clusters are a unity to the developer, simplified programming interface, data locality support from the storage layer. Limited to certain data parallel applications.

6 Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.

7 Sector: Distributed Storage System Security ServerMaster slaves SSL Client User account Data protection System Security Storage System Mgmt. Processing Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional

8 Sector: Distributed Storage System Sector stores files on the native/local file system of each slave node. Sector does not split files into blocks  Pro: simple/robust, suitable for wide area  Con: file size limit Sector uses replications for better reliability and availability The master node maintains the file system metadata. No permanent metadata is needed. Topology aware

9 Sector: Write/Read Write is exclusive Replicas are updated in a chained manner: the client updates one replica, and then this replica updates another, and so on. All replicas are updated upon the completion of a Write operation. Read: different replicas can serve different clients at the same time. Nearest replica to the client is chosen whenever possible.

10 Sector: Tools and API Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download  Wild card characters supported System monitoring: sysinfo. C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo.

11 Sphere: Simplified Data Processing Data parallel applications Data is processed at where it resides, or on the nearest possible node (locality) Same user defined functions (UDF) can be applied on all elements (records, blocks, or files) Processing output can be written to Sector files, on the same node or other nodes Generalized Map/Reduce

12 Sphere: Simplified Data Processing InputOutputUDF InputIntermediateUDFOutputUDF Input 1 OutputUDF Input 2

13 Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

14 Sphere: Data Movement Slave -> Slave Local Slave -> Slaves (Shuffle/Hash) Slave -> Client

15 Load Balance & Fault Tolerance The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. Detect and remove "fault" nodes.

16 Open Cloud Testbed 4 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2) 10Gb/s inter-site connection on CiscoWave 1Gb/s inter-rack connection Two dual-core AMD CPU, 12GB RAM, 1TB single disk

17 Open Cloud Testbed

18 Example: Sorting a TeraByte Data is split into small files, scattered on all slaves Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key, so that all buckets are sorted. Stage 2: On each destination node, an SPE sort all data inside each bucket.

19 TeraSort 10-byte90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket Stage 1: Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2: Sort each bucket on local node Binary Record 100 bytes

20 Performance Results: TeraSort Data Size SphereHadoop (3 replicas) Hadoop (1 replica) UIC300GB UIC + StarLight600GB UIC + StarLight + Calit2 900GB UIC + StarLight + Calit2 + JHU 1.2TB Run time: seconds Sector v1.16 vs Hadoop 0.17

21 Performance Results: TeraSort Sorting 1.2TB on 120 nodes Hash vs. Local Sort: 981sec : 545sec Hash  Per rack: 220GB in/out; Per node: 10GB in/out  CPU: 130% MEM: 900MB Local Sort  No network IO  CPU: 80% MEM: 1.4GB Hadoop: CPU 150% MEM 2GB

22 CreditStone Merchant IDTime KeyValue 3-byte merch-000X merch-001X merch-999X Stage 1: Process each record and hash into buckets according to merchant ID merch-000X merch-001X merch-999x Stage 2: Compute fraudulent rate for each merchant Trans ID|Time|Merchant ID|Fraud|Amount | | |0|66.49 Text Record Transform Text Record Fraud

23 Performance Results: CreditStone RacksJHUJHU, SLJHU, SL, Calit2 JHU, SL, Calit2, UIC Number of Nodes Size of Dataset (GB) Size of Dataset (rows)15B29.5B44.5B58.5B Hadoop (min) Sector with Index (min) Sector w/o Index (min) * Courtesy of Jonathan Seidman of Open Data Group.

24 System Monitoring (Testbed)

25 System Monitoring (Sector/Sphere)

26 Future Work High Availability  Multiple master servers Scheduling Optimize data channel Enhance compute model and fault tolerance

27 For More Information Sector/Sphere code & docs: Open Cloud Consortium: NCDM:

28 Inverted Index 1st letter word_x word_y word_y word_z 1word_x Bucket-A Bucket-B Bucket-Z Stage 1: Process each HTML file and hash (word, file_id) pair to buckets Bucket-A Bucket-B Bucket-Z Stage 2: Sort each bucket on local node, merge same word HTML page_1 1word_y 1word_z word_z 1, 5, 10word_z


Download ppt "Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009."

Similar presentations


Ads by Google