Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.

Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago @Univ. of Chicago, Feb. 17, 2009

What is Sector/Sphere? Sector: Distributed Storage System Sphere: Run-time middleware that supports simplified distributed data processing. Open source software, GPL, written in C++. Started since 2006, current version 1.18 http://sector.sf.net

Overview Motivation Sector Sphere Experimental studies Future work

Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO

Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. BUT complicated, no data locality Sector/Sphere model: Clusters are a unity to the developer, simplified programming interface, data locality support from the storage layer. Limited to certain data parallel applications.

Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.

Sector: Distributed Storage System Security ServerMaster slaves SSL Client User account Data protection System Security Storage System Mgmt. Processing Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional

Sector: Distributed Storage System Sector stores files on the native/local file system of each slave node. Sector does not split files into blocks  Pro: simple/robust, suitable for wide area  Con: file size limit Sector uses replications for better reliability and availability The master node maintains the file system metadata. No permanent metadata is needed. Topology aware

Sector: Write/Read Write is exclusive Replicas are updated in a chained manner: the client updates one replica, and then this replica updates another, and so on. All replicas are updated upon the completion of a Write operation. Read: different replicas can serve different clients at the same time. Nearest replica to the client is chosen whenever possible.

Sector: Tools and API Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download  Wild card characters supported System monitoring: sysinfo. C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo.

Sphere: Simplified Data Processing Data parallel applications Data is processed at where it resides, or on the nearest possible node (locality) Same user defined functions (UDF) can be applied on all elements (records, blocks, or files) Processing output can be written to Sector files, on the same node or other nodes Generalized Map/Reduce

Sphere: Simplified Data Processing InputOutputUDF InputIntermediateUDFOutputUDF Input 1 OutputUDF Input 2

Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

Sphere: Data Movement Slave -> Slave Local Slave -> Slaves (Shuffle/Hash) Slave -> Client

Load Balance & Fault Tolerance The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. Detect and remove "fault" nodes.

Open Cloud Testbed 4 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2) 10Gb/s inter-site connection on CiscoWave 1Gb/s inter-rack connection Two dual-core AMD CPU, 12GB RAM, 1TB single disk

Open Cloud Testbed

Example: Sorting a TeraByte Data is split into small files, scattered on all slaves Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key, so that all buckets are sorted. Stage 2: On each destination node, an SPE sort all data inside each bucket.

TeraSort 10-byte90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket-1023 0-1023 Stage 1: Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2: Sort each bucket on local node Binary Record 100 bytes

Performance Results: TeraSort Data Size SphereHadoop (3 replicas) Hadoop (1 replica) UIC300GB126528892252 UIC + StarLight600GB136128962617 UIC + StarLight + Calit2 900GB143043413069 UIC + StarLight + Calit2 + JHU 1.2TB152666753702 Run time: seconds Sector v1.16 vs Hadoop 0.17

Performance Results: TeraSort Sorting 1.2TB on 120 nodes Hash vs. Local Sort: 981sec : 545sec Hash  Per rack: 220GB in/out; Per node: 10GB in/out  CPU: 130% MEM: 900MB Local Sort  No network IO  CPU: 80% MEM: 1.4GB Hadoop: CPU 150% MEM 2GB

Performance Results: CreditStone RacksJHUJHU, SLJHU, SL, Calit2 JHU, SL, Calit2, UIC Number of Nodes305989117 Size of Dataset (GB)840165224923276 Size of Dataset (rows)15B29.5B44.5B58.5B Hadoop (min)179180191189 Sector with Index (min)46476471 Sector w/o Index (min)36375355 * Courtesy of Jonathan Seidman of Open Data Group.

System Monitoring (Testbed)

System Monitoring (Sector/Sphere)

Future Work High Availability  Multiple master servers Scheduling Optimize data channel Enhance compute model and fault tolerance

For More Information Sector/Sphere code & docs: http://sector.sf.net http://sector.sf.net Open Cloud Consortium: http://www.opencloudconsortium.org http://www.opencloudconsortium.org NCDM: http://www.ncdm.uic.eduhttp://www.ncdm.uic.edu

Inverted Index 1st letter word_x word_y word_y word_z 1word_x Bucket-A Bucket-B Bucket-Z Stage 1: Process each HTML file and hash (word, file_id) pair to buckets Bucket-A Bucket-B Bucket-Z Stage 2: Sort each bucket on local node, merge same word HTML page_1 1word_y 1word_z 1 5 10 word_z 1, 5, 10word_z

Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.

Similar presentations

Presentation on theme: "Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.

Similar presentations

Presentation on theme: "Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009."— Presentation transcript:

Similar presentations

About project

Feedback