Download presentation
Presentation is loading. Please wait.
Published byBenedict McDonald Modified over 6 years ago
1
Bigtable: A Distributed Storage System for Structured Data
Authors: Chang et al Google Inc Presenter: Victoria Cooper
2
Introduction Create a distributed storage system for structured data that will 1. Wide applicability 2. Scalability 3. High performance 4. High availability
3
Outline Data Model API Construction of Bigtable
Implementation and refinements Evaluation Applications Conclusions
4
Data Model Dynamic control over layout and format locality properties
Names for indexing can be arbitrary strings Ability to dynamically control whether the data comes from
5
Data Model (row:string, column:string, time:int64)-> string MAP
Row key Uninterpreted array of bytes Column key Timestamp
6
Data Model: Figure 1
7
Rows: Tablets Row range for a table Dynamically partitioned
Unit of distribution and load balancing Table Tablet
8
Rows Reads of short tablets are efficient and only need a few machines
This can lead to good locality
9
Columns Unit of access control Family before key
Small number of column families Large amount of columns Column family
10
Columns Column key: Family names Must be printable Qualifiers
Family: qualifier syntax Family names Must be printable Qualifiers Arbitrary strings
11
Columns: Example 1 Column family: Language webpage is written in
Column key: stores the each web page’s language id
12
Columns: Example 2 Family: anchor Key: single anchor
13
Timestamps Multiple versions of the same data 64 bit integers
Can be assigned by big table (real time) Can be assigned by client Need to be unique to avoid collisions Stored in decreasing order Most recent read first
14
Timestamps Two settings to garbage collect from column families
1) Bigtable garbage collects automatically 2) User specifies only the last n versions be kept
15
API Create tables/columns Delete tables/columns Alters: Cluster Table
Column family metadata (access control rights)
16
API: Figure 2
17
API: Figure 3
18
API Single-row transactions Allows cells to be integer counters
Execution of client-supplied scripts Written inn Sawzall Can be used with MapReduce
19
Building Blocks Google File Systems Google SSTable (file format)
Chubby (lock service)
20
Google File System (GFS)
Stores and logs files Bigtable cluster – operates on machines that do many different operations for different reasons Cluster management: properly schedule jobs Manage resources Deal with failures Monitor status
21
KEYS VALUES SSTable Stores Bigtable data Map from keys to values
Persistent Immutable Ordered Both keys and values are arbitrary byte strings KEYS VALUES
22
SSTable 64 KB Block Index
23
Chubby Serves Requests Master Replica Replica Replica Replica
24
Chubby Namespace
25
Bigtable and Chubby One active master at a time
Store bootstrap location of Bigtable data Discover tablet servers Finalize tablet server deaths Store column family information Store access control lists
26
Implementation Library Master server
Assigning tablets to tablet servers Many Tablet servers Manages a set of tablets Tablet Servers Library Master Server Client Client Client
27
Implementation Tablet Server Client Data Tablets Master Server Reads
Writes Client Data Tablets
28
Cluster Cluster Table Table Tablet Tablet Row Range Data
29
Tablet Location: Figure 4
30
METADATA table Stores location of tablet Under row key
Typically stores: 1 KB data Library caches tablet locations If incorrect: moves up the hierarchy If cache is empty: could take 3 trips If cache is stale: could take 6 trips
31
Tablet Assignment Tablet server start
Lock Unique chubby file Master looks at server’s directory Tablet stops serving Loses its lock Loses its file Try to reacquire lock If file does not exist it kills itself When dead it releases the lock
32
Tablet Assignment Master’s job to detect tablets who stop serving
Periodically checks tablets to see if lock still exist If can’t reach or tablet has lost its lock Master tries to get exclusive lock If able: Tablet is dead Tablet is having trouble contacting Chubby Master kills tablet Moves files
33
Tablet Assignment The set of existing tables change: Tablet Splits
Table is created Table is deleted Tables are merged Tables are split Tablet Splits Initiated by tablet server Notifies Master and updates METADATA table
34
Tablet Serving: Figure 5
35
Tablet Serving Write Operation Read Operation Checks form
Checks authorization Checks a list in Chubby file Valid mutation is written Contents are inserted into memtable Checks form Checks authorization Valid read is executed on a merged view of SSTables and memtable
36
Minor Compaction Shrinks memory usage on tablet server
Reduces amount of data to be read from commit log in case of error SSTable memtable memtable memtable
37
Merging Compaction Merges a given number of SSTables and the memtable into a new SSTable Discards the old data SSTable SSTable SSTable memtable SSTable
38
Major Compaction Merging compaction that re-writes all SSTables into 1 SSTable No deleted data Reclaim resources used by deleted data Bigtable will periodically do these SSTable SSTable SSTable memtable SSTable
39
Refinements Locality groups Compression Caching for read performance
Bloom filters Commit-log implementation Speeding up tablet recovery Exploiting immutability
40
Refinements Locality Groups Compression
Multiple column families grouped together Separate SSTable Efficient reads User specified compression format Compress SSTable 2 pass compression scheme Speed and space efficient
41
Refinements Caching for read Bloom Filters 2 levels of caching
Scan Cache Block Cache Reduces number of accesses to disk memory Filter for the SSTables in a certain locality group Check to see if a SSTable might have data for a row column pair
42
Commit-log Implementation Speed-up Tablet Recovery
Refinements Commit-log Implementation Speed-up Tablet Recovery Append mutations to a single commit log per tablet server One log has performance benefits Complicates recovery Avoid duplicating the log Master moves tablet to a different server Minor compaction Tablet Server 1 stops serving tablet Tablet loaded onto Tablet Server 2 No recovery of log entries required
43
Refinements Immutability SSTables are immutable
Do not need to synchronize access Deleting data = garbage collection Split tablets quickly Memtable is mutable Each row is copy-on-write
44
Performance Evaluation
N Tablet Servers that make the cluster 1 GB Write to GFS 1 GB Write to GFS 1 GB Write to GFS Client Servers Sufficient physical memory Client Servers Sufficient physical memory Client Servers Sufficient physical memory
45
Performance Evaluation
The machines were in a two level tree-shaped switched network Gbps of bandwidth available at root Run on the same machines Tablet servers and master Test clients GFS servers Machines ran: Tablet server Client or other job processes
46
Performance Evaluation
Sequential write Random writes Sequential read Random reads Random reads from memory Scan
47
Write Benchmarks Sequential Random Used row keys with names 0 to R
Partitioned into 10N equal ranges Ranges assigned to the N clients (dynamic assignment) Wrote a single string under each row key Row keys are distinct Similar to sequential write Hashed row key row key % R before writing This ensured write load was evenly distributed across row space R is the distinct number of Bigtable row keys involved with the test
48
Read Benchmarks Sequential Random
Generation of row keys same as sequential write Reads string under row key Similar to random read Hashed the row key before reading the string under the row key
49
Read/ Scan Benchmarks Random from memory Scan Similar to random read
Locality group marked in-memory Reads from tablet server memory Similar to sequential read Scans over all values in row range Uses Bigtable API for support Reduces number of RPC’s
50
Performance Evaluation: Figure 6
51
Single Tablet-Server Slowest: random reads
Random reads from memory > random reads Random writes = sequential writes Sequential reads > random reads Scans > sequential reads
52
Scaling Increased number of tablet servers from 1 to 500
Performance does not increase linearly Drop from 1 to 50 servers Random reads were the worst with scaling
53
Real Applications Google Analytics Personalized Search Google Earth
Google Finance Orkut (Google +) Writely (Google docs)
54
Real Applications: Table 1
55
Real Applications: Table 2
56
Personalized search The user’s data goes in Bigtable Row name userid
All user actions are stored Column family for type of action Replicated over several clusters
57
Google Earth Preprocessing table Stores imagery Data cleaned
Entered into final table Rows named geographic segments Column families track sources of data
58
Google Analytics Raw click table Summery table
Row for each end-user session Row name tuple (website’s name, time created) Summery table Various pre-defined summaries for the website
59
Lessons This type of system has vulnerabilities
Memory/network corruption Problems with relied on systems Planned/unplanned matainice Understand how features will be used Have proper system level monitoring Simplicity
60
Conclusion Google’s distributed storage system
Other Google applications Bigtable is scalable and efficent Google users found Bigtable to be easy to use and helpful
61
Future Work Support for secondary indicies
Support for infrastructure for building cross-data-center replicated Bigtables with multiple master replicas Keep Bigtable working well and fixing bugs as they arise
62
Thanks/Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.