Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research Center) Sandeep Tata (IBM Almaden Research Center) Column-Oriented Storage Techniques for MapReduce 1

Motivation Parallel DBMS MapReduce Column – Oriented Storage Performance Ease of use Fault tolerance…. 2

Challenges 3 How to incorporate columnar–storage into an existing MR system (Hadoop) without changing its core parts? How can columnar-storage operate efficiently on top of a DFS (HDFS)? Unique problems to MapReduce : The use of complex data types which are common in many MapReduce jobs. Hadoop’s choice of Java as its default programming language.

Outline Address problems Column-Oriented Storage Lazy Tuple Construction Compression Experimental Evaluation Conclusions 4

Complex Data Types 5 The use of complex types causes two major problems: Deserialization costs Switching to a binary storage format can improve Hadoop’s scan performance by 3x. The lack of effective columnoriented compression techniques. Column-oriented storage formats tend to exhibit better compression ratios.

Hadoop’s choice of Java 6 Java require deserialization. The overhead of deserializing and creating the objects corresponding to a complex type can be substantial. Lazy Record Construction mitigates the deserialization overhead in Hadoop.

Outline Address problems Column-Oriented Storage Its interaction with Hadoop’s data replication and scheduling. Lazy Tuple Construction Compression Experimental Evaluation Conclusions 7

Column-Oriented Storage in Hadoop 8 Implement a column-oriented storage format How to generate roughly equal sized splits to guarantee effectively parallelized over the cluster. How to make sure corresponding values from different columns in the dataset are co-located on the same node. HDFS does not provide any co-location guarantees

A dataset with three columns c1,c2,c3 which are stored in three different files. They are randomly spread over the cluster. 9 We will introduce a new format to avoid these problems Remotely accessing will occur!

Row-Store:Merits/Limits with MapReduce 10 Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference

Column-Store:Merits/Limits with MapReduce 11 Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping

Read Operation in Row-store 12 Read local rows concurrently Discard unneeded columns Read local rows concurrently Discard nuneeded columns

Read Operation in Column-store 13 This slide is made by Professor Zhang RCFile Format—Avoids these problems which occur in Row- store and Column-store

Goals of RCFile 14 Eliminate unnecessary I/O costs like Column-store Only read needed columns from disks Eliminate network costs in row construction like Row-store Keep the fast data loading speed of Row-store Can apply efficient data compression algorithms Conveniently like Column-store. Eliminate all the limits of Row-store and Column-store.

RCFile Format—Avoids these problems 15 A fast and space-efficient placement structure. Metadata Describes the columns in the data region and their starting offsets. The number of rows in the Data Region Packing all columns. a special synchronization marker Metadata Data region: The laid out is in a column-oriented fashion. Data region: The laid out is in a column-oriented fashion. A row group

RCFile:Partitioning a Table into Row Groups 16

Inside a Row Group 17

RCFile: Inside each Row Group 18

RCFile:Distributed Row-Group Data among Nodes 19

Optimize RCFile 20 Main disadvantage Tuning the row-group size is critical. Extra metadata needs to be written for each row group, leading to additional space overhead. Adding a column to a dataset is expensive. The entire dataset has to be read and each block re-written.

CIF Storage Format 21 A dataset is loaded Break the dataset into smaller partitions. Each partition referred to as a split-directory. Each partition contains a set of files, one per column in the dataset. An additional file describing the schema is also kept in each split-directory. How to guarantee co-location of a row?

ColumnPlacementPolicy(CPP) 22 CPP is a new HDFS block placement policy which can solve the problem of co-locating. CPP guarantees that the files corresponding to the different columns of a split are always co-located across replicas. HDFS allows its placement policy to be changing by setting the configuration property “dfs.block.replicator.classname”to point to the appropriate class.

Column-Oriented Storage in CIF format in Hadoop NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} 1 st node 2 nd node NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} NameAgeInfo John45“hobbies”:{tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age 23 32 Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name John Smith Age 45 65 Info “hobbies”: {tennis, golf} “hobbies”: {swimming} “friends”: {Helen} 23

Replication and Co-location HDFS Replication Policy Node ANode BNode CNode D NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age 23 32 Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name Joe David Name Joe David Age 23 32 Age 23 32 Info “hobbies”: {tennis} “friends”: {Ann,Nick} “friends”: {George} Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} CPP 24 Perhaps this slide is made by the author.

Outline Column-Oriented Storage Lazy Tuple Construction It is used to mitigate the deserialization overhead in Hadoop,as well as eliminate disk I/O. Compression Experiments Conclusions 25

Implementation 26 The basic idea: Deserialize only those columns of a record that are actually accessed in a map function. We use one class called LazyRecord. CurPos pointer:It keeps track of the current record the map function is working on. LastPos pointer:It keeps track of the last record that was actually read and deserialized for a particular column file.

Class MyMapper { void map (NullWritable key, Record rec) { String url = (String) rec.get("url"); if (url.contains("ibm.com/jp")) output.collect(null, rec.get("metadata").get("content- type")); } } 27 Each time RecordReader is asked to read the next record, it increments curPos. No bytes are actually read or deserialized until one of the get() methods is called on the resulting Record object.

Example AgeName Record if (age < 35) return name 23 32 45 30 50 Joe David John Mary Ann Map Method 23Joe 32David No bytes are actually read if age > 35. We avoid reading and deserializing the name field. 28

Skip List Format 29 A skip list format can be used within each column file to efficiently skip records. A column file contains two kinds of values: Regular serialized values. Skip blocks. Skip blocks contain information about byte offsets to enable skipping the next N records. Skip() method Called by LazyRecord as skip(curPos-lastPos)

Example Age “hobbies”: tennis “friends” : Ann, Nick Null “friends” : George Info Skip10 = 2013 Skip100 = 19400 Skip 10 = 1246 … “hobbies”: tennis, golf 10 rows 100 rows … … 23 39 45 30 if (age < 35) return hobbies … … 30

Outline Column-Oriented Storage Lazy Record Construction Compression We propose two schemes to compress columns of complex types Both are amenable to lazy decompression. Experiments Conclusions 31

Compressed Blocks 32 Compress a block of contiguous column values. The compressed block size is set at load time. Compression ratio and the decompression overhead are affected. A header indicates the number of records in a compressed block and the block’s size. Advantage: One block can be skipped if no values are accessed in it Disadvantage: If a value in the block is accessed, the entire block needs to be decompressed.

Dictionary Compressed Skip List 33 This scheme is tailored for map-typed columns.  Build a dictionary of keys for each block of map values.  Store the compressed keys in a map using a skip list format. Disadvantage: Provide a worse compression ratio but with lower CPU overhead for decompression. Advantage:A value can be accessed without having to decompress an entire block of values.

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 34

Experimental Setup 42 node cluster Each node: 8 cores 32 GB main memory 5 500GB SATA 1.0 disks Network : 1Gbit ethernet switch Hadoop version: 0.21.0 35

Overhead of Columnar Storage Synthetic Dataset 57GB 13 columns 6 Integers, 6 Strings, 1 Map Query Select * 36 Single node experiment Using a binary format can dramatically improve Hadoop’s performance Scan time

Benefits of Column-Oriented Storage Query Projection of different columns 37 Single node experiment CIF reading much less data than SEQ leads to the speedup CIF reading much less data than SEQ leads to the speedup Gathering data from columns stored in different files incurs additional seeks

Conclusions Describe a new column-oriented binary storage format in MapReduce. Introduce skip list layout. Describe the implementation of lazy record construction. Show that lightweight dictionary compression for complex columns can be beneficial. 38

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.

Similar presentations

Presentation on theme: "Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.

Similar presentations

Presentation on theme: "Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research."— Presentation transcript:

Similar presentations

About project

Feedback