Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Similar presentations


Presentation on theme: "Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc."— Presentation transcript:

1 Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

2 Database Terminology

3 Structured, Semi-Structured, and Unstructured Data Structured is what RDBMS store Structured is what RDBMS store Data is broken into discrete components Data is broken into discrete components Types associated with each component: integer, floating point, date, string Types associated with each component: integer, floating point, date, string Unstructured is free-form text Unstructured is free-form text Semi-structured is combination of sturctured and semi-structured Semi-structured is combination of sturctured and semi-structured

4 Document-Oriented Semi-structured documents Semi-structured documents Accepts documents in a format such as JSON, XML, YAML Accepts documents in a format such as JSON, XML, YAML Often Schema-less Often Schema-less Auto-index fields Auto-index fields Examples: CouchDB, MongoDB Examples: CouchDB, MongoDB Best Fit: XML or Web documents Best Fit: XML or Web documents

5 Graph Databases Database designed to represent graphs Database designed to represent graphs APIs for performing graph operations APIs for performing graph operations Traversal (depth-first, breadth-first) Traversal (depth-first, breadth-first) Shortest/Cheapest path Shortest/Cheapest path Partitioning Partitioning Some allow Hypergraphs Some allow Hypergraphs Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament More info: sones graphdb landscape More info: sones graphdb landscape

6 Column-Oriented Data physically stored by column Data physically stored by column RDBMS typically row-oriented RDBMS typically row-oriented Improved performance for column operations Improved performance for column operations Better data compression Better data compression Examples: Hypertable, HBase, Cassandra, Vertica Examples: Hypertable, HBase, Cassandra, Vertica

7 In-Memory Data set stored in RAM Data set stored in RAM Extremely fast access Extremely fast access Limited capacity Limited capacity Examples: Memcached, Redis, MonetDB, VoltDB Examples: Memcached, Redis, MonetDB, VoltDB

8 Horizontal Scalability Scale out Scale out Increase capacity by adding machines Increase capacity by adding machines Opposite of vertical scalability (scale up) Opposite of vertical scalability (scale up) Commodity Hardware Commodity Hardware

9 Distributed Hash Table (DHT) Horizontally Scalable Horizontally Scalable Decentralized Decentralized Fast access Fast access Restricted API: GET,SET,DELETE Restricted API: GET,SET,DELETE Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase

10 Scalable Database Architectures

11 Auto-Sharding Splits table data into horizontal shards Splits table data into horizontal shards Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Automated glue code to handle sharding and request routing Automated glue code to handle sharding and request routing Examples: MongoDB, AsterData, Greenplum Examples: MongoDB, AsterData, Greenplum

12 MongoDB

13 Dynamo Developed by Amazon.com for their Shopping Cart Developed by Amazon.com for their Shopping Cart Designed for high write availability Designed for high write availability Eventually Consistent DHT Eventually Consistent DHT Implementations: Implementations: Cassandra Cassandra Project Voldemort Project Voldemort Riak Riak Dynomite Dynomite

14 Eventual Consistency Database update semantics in a distributed system with data replication Database update semantics in a distributed system with data replication Strong Consistency - after an update completes all processes see the updated value Strong Consistency - after an update completes all processes see the updated value Eventual Consistency - eventually all processes will see the updated value Eventual Consistency - eventually all processes will see the updated value Most well-known eventual consistency system is DNS Most well-known eventual consistency system is DNS

15 Eventual Consistency

16 Consistent Hashing

17 Amazon AWS S3 S3 Online storage web service Online storage web service Designed for larger amounts of data Designed for larger amounts of data Cost $0.15/GB per month Cost $0.15/GB per month SimpleDB SimpleDB Designed for smaller amounts of data Designed for smaller amounts of data Provides indexing and richer query capability Provides indexing and richer query capability Cost $027/GB per month + machine utilization fee Cost $027/GB per month + machine utilization fee RDS RDS Managed MySQL instances Managed MySQL instances

18 Order Preserving Partitioner (Cassandra) … … / 2 = / 2 = …

19 Order Preserving Partitioner Balance Problem

20 Bigtable: the infrastructure that Google is built on Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Implementations Implementations Hypertable Hypertable HBase HBase

21 Google Stack GFS - Replicates data inter-machine GFS - Replicates data inter-machine MapReduce - Efficiently process data in GFS MapReduce - Efficiently process data in GFS Bigtable - Indexed table structure Bigtable - Indexed table structure

22 Google File System

23 Google File System

24 System Overview

25 Data Model Sparse, two-dimensional table with cell versions Sparse, two-dimensional table with cell versions Cells are identified by a 4-part key Cells are identified by a 4-part key Row (string) Row (string) Column Family (byte) Column Family (byte) Column Qualifier (string) Column Qualifier (string) Timestamp (long integer) Timestamp (long integer)

26 Table: Visual Representation

27 Table: Actual Representation

28 Scaling (part I)

29 Scaling (part II)

30 Scaling (part III)

31 Request Routing

32 Hypertable

33 Hypertable Overview Massively Scalable Database Massively Scalable Database Modeled after Googles Bigtable Modeled after Googles Bigtable High Performance Implementation (C++) High Performance Implementation (C++) Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Open Source (GPL license) Open Source (GPL license) Project started March Zvents Project started March Zvents

34 Hypertable In Use Today

35 Hypertable vs. HBase

36 Hypertable vs. HBase TestHypertable Advantage Relative to HBase (%) Random Read Zipfian 80 GB925 Random Read Zipfian 20 GB777 Random Read Zipfian 2.5 GB100 Random Write 10KB values51 Random Write 1KB values102 Random Write 100 byte values427 Random Write 10 byte values931 Sequential Read 10KB values1060 Sequential Read 1KB values68 Sequential Read 100 byte values129 Scan 10KB values2 Scan 1KB values58 Scan 100 byte values75 Scan 10 byte values220

37 Annual EC2 Cost Savings Assuming 200% improvement Assuming 200% improvement Extra large reserved instances Extra large reserved instances

38 Resources Project Site Twitter hypertable Commercial Support Performance Evaluation Write-up blog.hypertable.com/?p=14

39 Q&A


Download ppt "Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc."

Similar presentations


Ads by Google