Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While.

Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research Center) Sandeep Tata (IBM Almaden Research Center) Column-Oriented Storage Techniques for MapReduce 1 Ohio State CSE 788.11 WI 2012 Presentation by: David Fuhry and Karthik Tunga

Talk Outline 2 Motivation Merits/Limits of Row- & Column-store with MapReduce (recap) Lazy Tuple Construction Compression Experimental Evaluation Conclusion Dave Karthik

Motivation 3 MapReduce increasingly used for Big Data analysis Scalability, ease of use, fault tolerance, price But MapReduce implementations lack some advantages often seen in Parallel DBMS Efficiency & performance, SQL, indexing, updating, transactions This work (CIF) and RCFile both address Efficiency & performance with column-oriented storage

Motivation DatabasesMapReduce Column – Oriented Storage Performance Programmability Fault tolerance 4

Row-Store: Merits/Limits with MapReduce Row-Store: Merits/Limits with MapReduce ABCD 101201301401 102202302402 103203303403 104204304404 105205305405 5 Table HDFS Blocks Store Block 1 Store Block 2 Store Block 3 …  Data loading is fast (no additional processing);  All columns of a data row are located in the same HDFS block  Not all columns are used (unnecessary storage bandwidth)  Compression of different types may add additional overhead

HDFS Blocks Store Block 1 Store Block 2 Store Block 3 … Column-Store: Merits/Limits with MapReduce A 101 102 103 104 105 … 6 CD 301401 302402 303403 304404 305405 …… B 201 202 203 204 205 … Column group 1 Column group 2 Column group 3  Unnecessary I/O costs can be avoided: Only needed columns are loaded, and easy compression  Additional network transfers for column grouping

Challenges 7 How to incorporate columnar–storage into an existing MR system (Hadoop) without changing its core parts? How can columnar-storage operate efficiently on top of a DFS (HDFS)? Is it easy to apply well-studied techniques from the database field to the Map-Reduce framework given that: It processes one tuple at a time. It does not use a restricted set of operators. It is used to process complex data types.

Column-Oriented Storage in Hadoop NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} 1 st node 2 nd node Eliminate unnecessary I/O NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} NameAgeInfo John45“hobbies”:{tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age 23 32 Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name John Smith Age 45 65 Info “hobbies”: {tennis, golf} “hobbies”: {swimming} “friends”: {Helen} Introduce a new InputFormat : ColumnInputFormat (CIF) 8

Replication and Co-location HDFS Replication Policy Node ANode BNode CNode D NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age 23 32 Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name Joe David Name Joe David Age 23 32 Age 23 32 Info “hobbies”: {tennis} “friends”: {Ann,Nick} “friends”: {George} Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} CPP Introduce a new column placement policy (CPP) 9

Example AgeName Record if (age < 35) return name 23 32 45 30 50 Joe David John Mary Ann Map Method 23Joe 32David What if age > 35? Can we avoid reading and deserializing the name field? 10

Outline Column-Oriented Storage Lazy Tuple Construction Compression Experiments Conclusions 11

Lazy Tuple Construction Deserialization of each record field is deferred to the point where it is actually accessed, i.e. when the get() methods are called. Mapper ( NullWritable key, Record value) { String name; int age = value.get(“age”); if (age < 35) name = value.get(“name”); } Mapper ( NullWritable key, LazyRecord value) { String name; int age = value.get(“age”); if (age < 35) name = value.get(“name”); } 12

Skip List (Logical Behavior) R1R2R10R20R99 R100... R90... R1 R20R90R100... R10 Skip 100 Records Skip 10 13 R1R2R10R20R90R99 R1R10R20R90 R1R100

Example Age Joe Jane David Name Skip10 = 1002 Skip100 = 9017 Skip 10 = 868 … … Mary 10 rows 100 rows Skip Bytes Ann … 23 39 45 30 if (age < 35) return name … 14 John 0 1 2 102

Example Age “hobbies”: tennis “friends” : Ann, Nick Null “friends” : George Info Skip10 = 2013 Skip100 = 19400 Skip 10 = 1246 … “hobbies”: tennis, golf 10 rows 100 rows … … 23 39 45 30 if (age < 35) return hobbies … … 15

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 16

Compression # Records in B1 # Records in B2 LZO/ZLIB compressed block RID : 0 - 9 LZO/ZLIB compressed block RID : 10 - 35 B1 B2 Null Skip10 = 210 Skip100 = 1709 Skip 10 = 304 … … 0: {tennis, golf} 10 rows 100 rows … Dictionary “hobbies” : 0 “friends” : 1 Compressed Blocks Dictionary Compressed Skip Lists Skip Bytes Decompress 0 : {tennis} 1 : {Ann, Nick} 1: {George} 17

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 18

RCFile Metadata Joe, David John, Smith 23, 32 {“hobbies”: {tennis} “friends”: {Ann, Nick}}, {“friends”:{George}} {“hobbies”: {tennis, golf}}, {“hobbies”: {swimming} “friends”: {Helen}} Row Group 1 Row Group 2 NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} 45, 65 19

Compressed Metadata Compressed Column A Compressed Column B Compressed Column C Compressed Column D RCFile: Inside each Row Group 101 102 103 104 105 301 302 303 304 305 201 202 203 204 205 401 402 403 404 405 201202203204205 101102103104105 301302303304305 401402403404405 Row Grp

CIF: Separate file for each column Compressed Metadata Compressed Column A 101102103104105 Compressed Column B 201202203204205 Compressed Column C 301302303304305 Compressed Column D 401402403404405 128 MB

Experimental Setup 42 node cluster Each node: 2 quad-core 2.4GHz sockets 32 GB main memory four 500GB HDD Network : 1Gbit ethernet switch 22

Overhead of Columnar Storage Synthetic Dataset 57GB 13 columns 6 Integers, 6 Strings, 1 Map Query Select * 23 Single node experiment

Benefits of Column-Oriented Storage Query Projection of different columns 24 Single node experiment

Workload URLInfo { String url String srcUrl time fetchTime String inlink[] Map metadata Map annotations byte[] content } If( url contains “ibm.com/jp” ) find all the distinct encodings reported by the page Schema Query Dataset : 6.4 TB Query Selectivity : 6% 25

26 SEQ: 754 sec Comparison of Column-Layouts (Map phase)

27 3040 Comparison of Column-Layouts (Map phase)

Comparison of Column – Layouts (Total job) 28 SEQ: 806 sec

Conclusions Describe a new column-oriented binary storage format in MapReduce. Introduce skip list layout. Describe the implementation of lazy record construction. Show that lightweight dictionary compression for complex columns can be beneficial. 29

CIF / RCFile comparison 30 CIFRCFile Modify block placement policy YN Modify block contentYY Metadata storedIn blockSeparate file “row group” sizeLargeFlexible Lazy tuple deserialization YY Skip listsYN

Comparison of Sequence Files 31

RCFile 32

Comparison of Column-Layouts LayoutData Read (GB) Map Time (sec) Map Time Ratio Total Time (sec) Total Time Ratio Seq - uncomp.64001416-1482- Seq - record3008820-889- Seq - block2848806-886- Seq - custom30407541.0x8061.0x RCFile11137021.1x7611.1x RCFile - comp1022023.7x2912.8x CIF - ZLIB3612.859.1x7710.4x CIF9612.460.8x7810.3x CIF - LZO5412.461.0x7910.2x CIF - SL759.281.9x7011.5x CIF -DCSL617.0107.8x6312.8x 33

Comparison of Column-Layouts 34 SEQ: 754 sec CIF – DCSL results in the highest map time speedup and improves the total job time by more than an order of magnitude (12.8X).

RCFile 35 SEQ: 754 sec

36 Comparison of Sequence Files SEQ: 754 sec

Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While.

Similar presentations

Presentation on theme: "Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While.

Similar presentations

Presentation on theme: "Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While."— Presentation transcript:

Similar presentations

About project

Feedback