Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bigtable: A Distributed Storage System for Structured Data

Similar presentations


Presentation on theme: "Bigtable: A Distributed Storage System for Structured Data"— Presentation transcript:

1 Bigtable: A Distributed Storage System for Structured Data
Authors: Chang et al Google Inc Presenter: Victoria Cooper

2 Introduction Create a distributed storage system for structured data that will 1. Wide applicability 2. Scalability 3. High performance 4. High availability

3 Outline Data Model API Construction of Bigtable
Implementation and refinements Evaluation Applications Conclusions

4 Data Model Dynamic control over layout and format locality properties
Names for indexing can be arbitrary strings Ability to dynamically control whether the data comes from

5 Data Model (row:string, column:string, time:int64)-> string MAP
Row key Uninterpreted array of bytes Column key Timestamp

6 Data Model: Figure 1

7 Rows: Tablets Row range for a table Dynamically partitioned
Unit of distribution and load balancing Table Tablet

8 Rows Reads of short tablets are efficient and only need a few machines
This can lead to good locality

9 Columns Unit of access control Family before key
Small number of column families Large amount of columns Column family

10 Columns Column key: Family names Must be printable Qualifiers
Family: qualifier syntax Family names Must be printable Qualifiers Arbitrary strings

11 Columns: Example 1 Column family: Language webpage is written in
Column key: stores the each web page’s language id

12 Columns: Example 2 Family: anchor Key: single anchor

13 Timestamps Multiple versions of the same data 64 bit integers
Can be assigned by big table (real time) Can be assigned by client Need to be unique to avoid collisions Stored in decreasing order Most recent read first

14 Timestamps Two settings to garbage collect from column families
1) Bigtable garbage collects automatically 2) User specifies only the last n versions be kept

15 API Create tables/columns Delete tables/columns Alters: Cluster Table
Column family metadata (access control rights)

16 API: Figure 2

17 API: Figure 3

18 API Single-row transactions Allows cells to be integer counters
Execution of client-supplied scripts Written inn Sawzall Can be used with MapReduce

19 Building Blocks Google File Systems Google SSTable (file format)
Chubby (lock service)

20 Google File System (GFS)
Stores and logs files Bigtable cluster – operates on machines that do many different operations for different reasons Cluster management: properly schedule jobs Manage resources Deal with failures Monitor status

21 KEYS VALUES SSTable Stores Bigtable data Map from keys to values
Persistent Immutable Ordered Both keys and values are arbitrary byte strings KEYS VALUES

22 SSTable 64 KB Block Index

23 Chubby Serves Requests Master Replica Replica Replica Replica

24 Chubby Namespace

25 Bigtable and Chubby One active master at a time
Store bootstrap location of Bigtable data Discover tablet servers Finalize tablet server deaths Store column family information Store access control lists

26 Implementation Library Master server
Assigning tablets to tablet servers Many Tablet servers Manages a set of tablets Tablet Servers Library Master Server Client Client Client

27 Implementation Tablet Server Client Data Tablets Master Server Reads
Writes Client Data Tablets

28 Cluster Cluster Table Table Tablet Tablet Row Range Data

29 Tablet Location: Figure 4

30 METADATA table Stores location of tablet Under row key
Typically stores: 1 KB data Library caches tablet locations If incorrect: moves up the hierarchy If cache is empty: could take 3 trips If cache is stale: could take 6 trips

31 Tablet Assignment Tablet server start
Lock Unique chubby file Master looks at server’s directory Tablet stops serving Loses its lock Loses its file Try to reacquire lock If file does not exist it kills itself When dead it releases the lock

32 Tablet Assignment Master’s job to detect tablets who stop serving
Periodically checks tablets to see if lock still exist If can’t reach or tablet has lost its lock Master tries to get exclusive lock If able: Tablet is dead Tablet is having trouble contacting Chubby Master kills tablet Moves files

33 Tablet Assignment The set of existing tables change: Tablet Splits
Table is created Table is deleted Tables are merged Tables are split Tablet Splits Initiated by tablet server Notifies Master and updates METADATA table

34 Tablet Serving: Figure 5

35 Tablet Serving Write Operation Read Operation Checks form
Checks authorization Checks a list in Chubby file Valid mutation is written Contents are inserted into memtable Checks form Checks authorization Valid read is executed on a merged view of SSTables and memtable

36 Minor Compaction Shrinks memory usage on tablet server
Reduces amount of data to be read from commit log in case of error SSTable memtable memtable memtable

37 Merging Compaction Merges a given number of SSTables and the memtable into a new SSTable Discards the old data SSTable SSTable SSTable memtable SSTable

38 Major Compaction Merging compaction that re-writes all SSTables into 1 SSTable No deleted data Reclaim resources used by deleted data Bigtable will periodically do these SSTable SSTable SSTable memtable SSTable

39 Refinements Locality groups Compression Caching for read performance
Bloom filters Commit-log implementation Speeding up tablet recovery Exploiting immutability

40 Refinements Locality Groups Compression
Multiple column families grouped together Separate SSTable Efficient reads User specified compression format Compress SSTable 2 pass compression scheme Speed and space efficient

41 Refinements Caching for read Bloom Filters 2 levels of caching
Scan Cache Block Cache Reduces number of accesses to disk memory Filter for the SSTables in a certain locality group Check to see if a SSTable might have data for a row column pair

42 Commit-log Implementation Speed-up Tablet Recovery
Refinements Commit-log Implementation Speed-up Tablet Recovery Append mutations to a single commit log per tablet server One log has performance benefits Complicates recovery Avoid duplicating the log Master moves tablet to a different server Minor compaction Tablet Server 1 stops serving tablet Tablet loaded onto Tablet Server 2 No recovery of log entries required

43 Refinements Immutability SSTables are immutable
Do not need to synchronize access Deleting data = garbage collection Split tablets quickly Memtable is mutable Each row is copy-on-write

44 Performance Evaluation
N Tablet Servers that make the cluster 1 GB Write to GFS 1 GB Write to GFS 1 GB Write to GFS Client Servers Sufficient physical memory Client Servers Sufficient physical memory Client Servers Sufficient physical memory

45 Performance Evaluation
The machines were in a two level tree-shaped switched network Gbps of bandwidth available at root Run on the same machines Tablet servers and master Test clients GFS servers Machines ran: Tablet server Client or other job processes

46 Performance Evaluation
Sequential write Random writes Sequential read Random reads Random reads from memory Scan

47 Write Benchmarks Sequential Random Used row keys with names 0 to R
Partitioned into 10N equal ranges Ranges assigned to the N clients (dynamic assignment) Wrote a single string under each row key Row keys are distinct Similar to sequential write Hashed row key row key % R before writing This ensured write load was evenly distributed across row space R is the distinct number of Bigtable row keys involved with the test

48 Read Benchmarks Sequential Random
Generation of row keys same as sequential write Reads string under row key Similar to random read Hashed the row key before reading the string under the row key

49 Read/ Scan Benchmarks Random from memory Scan Similar to random read
Locality group marked in-memory Reads from tablet server memory Similar to sequential read Scans over all values in row range Uses Bigtable API for support Reduces number of RPC’s

50 Performance Evaluation: Figure 6

51 Single Tablet-Server Slowest: random reads
Random reads from memory > random reads Random writes = sequential writes Sequential reads > random reads Scans > sequential reads

52 Scaling Increased number of tablet servers from 1 to 500
Performance does not increase linearly Drop from 1 to 50 servers Random reads were the worst with scaling

53 Real Applications Google Analytics Personalized Search Google Earth
Google Finance Orkut (Google +) Writely (Google docs)

54 Real Applications: Table 1

55 Real Applications: Table 2

56 Personalized search The user’s data goes in Bigtable Row name userid
All user actions are stored Column family for type of action Replicated over several clusters

57 Google Earth Preprocessing table Stores imagery Data cleaned
Entered into final table Rows named geographic segments Column families track sources of data

58 Google Analytics Raw click table Summery table
Row for each end-user session Row name tuple (website’s name, time created) Summery table Various pre-defined summaries for the website

59 Lessons This type of system has vulnerabilities
Memory/network corruption Problems with relied on systems Planned/unplanned matainice Understand how features will be used Have proper system level monitoring Simplicity

60 Conclusion Google’s distributed storage system
Other Google applications Bigtable is scalable and efficent Google users found Bigtable to be easy to use and helpful

61 Future Work Support for secondary indicies
Support for infrastructure for building cross-data-center replicated Bigtables with multiple master replicas Keep Bigtable working well and fixing bugs as they arise

62 Thanks/Questions?


Download ppt "Bigtable: A Distributed Storage System for Structured Data"

Similar presentations


Ads by Google