Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.

Similar presentations


Presentation on theme: "1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1."— Presentation transcript:

1 1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1

2 2 BigTable Dennis Kafura – CS5204 – Operating Systems Unstructured Data vs. Structured Data Unstructured data refers to computerized information that either does not have a data model  plain text, audio Structured data can be described by data model  Flat  Hierarchical  Network  Relational  Dimensional  Object-relational

3 3 BigTable Dennis Kafura – CS5204 – Operating Systems Relational Model and RDBMS most popular model of organizing structured data model based on first-order predicate logic provides a declarative method for specifying data and queries via SQL data is organized in tables of fixed-length records variety of open source and commercial implementations provides ACID properties 3

4 4 BigTable Dennis Kafura – CS5204 – Operating Systems NoSQL not relational database  no fixed table schemas  no join operations  no sql flexible and/or no data model usually do not provide ACID properties scale horizontally 4

5 5 BigTable Dennis Kafura – CS5204 – Operating Systems BigTable distributed, high performance, fault-tolerant, NoSql storage system build on top of Google File System designed to scale to a very large size on low cost commodity hardware it was designed by Google and used in various projects (web indexing) the paper was published in 2006 related implementations  HBase  Hypertable  Apache Cassandra  Neptune 5

6 6 BigTable Dennis Kafura – CS5204 – Operating Systems BigTable Data Model sparse, distributed, persistent multi-dimensional sorted map map is indexed by a row key, column family, column key, and a timestamp { row : { column_family : { column : { timestamp : value } 6

7 7 BigTable Dennis Kafura – CS5204 – Operating Systems Webtable 7 “...”“CNN”“CNN.com” “contents”“anchor:cnnsi.com“anchor:my.look.ca” t6t6 t9t9 t9t9 “com.cnn.www”

8 8 BigTable Dennis Kafura – CS5204 – Operating Systems Relational Data Model 8 Student student_id - PK first_name last_name birthday major academic_level Course crn PK course title type instructor_id seats StudentCours e student_id crn

9 9 BigTable Dennis Kafura – CS5204 – Operating Systems Student table infocourse last_name first_name birthday major academic_level student_id Row KeyColumn Family Column Qualifier

10 10 BigTable Dennis Kafura – CS5204 – Operating Systems Course table infostudents course title type instructor_id seats crn Row KeyColumn Family Column Qualifier

11 11 BigTable Dennis Kafura – CS5204 – Operating Systems Example 11 “Sergejs”“Melderis” “Computer Science” “YES”“NO” info:first_nameinfo:last_nameinfo:majorcourses:96322courses:96320 “905514” “CS5204” “Operating Systems” “1983943”“YES” info:courseinfo:titleinfo:instructor_idstudents:905514students:905520 “96322”

12 12 BigTable Dennis Kafura – CS5204 – Operating Systems Students data view in JSON { 905514: { info : { first_name : { t1 : Sergejs }, last_name : { t1 : Melderis }, major : { t1 : Comp Science } }, courses : { 96322: { t1 : “YES” }, 96320: { t2 : “NO” } } 12

13 13 BigTable Dennis Kafura – CS5204 – Operating Systems Rows row keys are arbitrary strings up to 64 KB read and write of data under a single row is atomic ordered in lexicographic order by row key row range is dynamically partitioned into blocks called tablets tablets are units of distribution and loadbalancing 13

14 14 BigTable Dennis Kafura – CS5204 – Operating Systems Columns Column keys are grouped by column families Column family is a basic unit of access control All data stored in a column family is of the same type Number of column families should be small There can be unlimited number of columns Column key is named using family:qualifier 14

15 15 BigTable Dennis Kafura – CS5204 – Operating Systems Timestamps Bigtable can contain multiple versions of the same data timestamps are 64-bit integers assigned by Bigtable or client client can specify to keep up to n versions of data 15

16 16 BigTable Dennis Kafura – CS5204 – Operating Systems Implementation client library one master server distributed lock service called Chubby many tablet servers containing several tablets tablet server  handles read and write requests  automatically splits tablets that have grown too large (100 - 200 MB) client data directly goes to tablet server 16

17 17 BigTable Dennis Kafura – CS5204 – Operating Systems Tablet Location three-level hierarchy to store tablet location first level is stored in lock service root tablet contains the location of metadata tables metadata tablets contain the location of user tables UserTable1 UserTable2 METADATA tablets Root tablet Lock Service

18 18 BigTable Dennis Kafura – CS5204 – Operating Systems Distribution of data One master server Chubby distributed lock service Hundred or thousands of tablet servers Each tablet contains a contiguous range of rows Master distributes tablets across of servers Each tablet server contains tablets with different ranges 18

19 19 BigTable Dennis Kafura – CS5204 – Operating Systems Tablet Representation 19 SSTable memtable Read Op Write Op tablet log Memory GFS

20 20 BigTable Dennis Kafura – CS5204 – Operating Systems Compactions compaction is a process of writing memtable to SSTable minor compaction write memtable to SSTable  shrinks the memory usage of the tablet server  reduces the commit log merging compaction merges several SSTables major compaction rewrites all SSTables into exactly one SSTable 20

21 21 BigTable Dennis Kafura – CS5204 – Operating Systems API create, delete tables and column families write or delete values look up values from individual rows scan over a subset of the data in a table 21

22 22 BigTable Dennis Kafura – CS5204 – Operating Systems 22


Download ppt "1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1."

Similar presentations


Ads by Google