Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Similar presentations


Presentation on theme: "Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc."— Presentation transcript:

1 Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

2 Database Terminology

3 www.hypertable.org Structured, Semi-Structured, and Unstructured Data Structured is what RDBMS store Structured is what RDBMS store Data is broken into discrete components Data is broken into discrete components Types associated with each component: integer, floating point, date, string Types associated with each component: integer, floating point, date, string Unstructured is free-form text Unstructured is free-form text Semi-structured is combination of sturctured and semi-structured Semi-structured is combination of sturctured and semi-structured

4 www.hypertable.org Document-Oriented Semi-structured documents Semi-structured documents Accepts documents in a format such as JSON, XML, YAML Accepts documents in a format such as JSON, XML, YAML Often Schema-less Often Schema-less Auto-index fields Auto-index fields Examples: CouchDB, MongoDB Examples: CouchDB, MongoDB Best Fit: XML or Web documents Best Fit: XML or Web documents

5 www.hypertable.org Graph Databases Database designed to represent graphs Database designed to represent graphs APIs for performing graph operations APIs for performing graph operations Traversal (depth-first, breadth-first) Traversal (depth-first, breadth-first) Shortest/Cheapest path Shortest/Cheapest path Partitioning Partitioning Some allow Hypergraphs Some allow Hypergraphs Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament Examples: Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament More info: sones graphdb landscape More info: sones graphdb landscape

6 www.hypertable.org Column-Oriented Data physically stored by column Data physically stored by column RDBMS typically row-oriented RDBMS typically row-oriented Improved performance for column operations Improved performance for column operations Better data compression Better data compression Examples: Hypertable, HBase, Cassandra, Vertica Examples: Hypertable, HBase, Cassandra, Vertica

7 www.hypertable.org In-Memory Data set stored in RAM Data set stored in RAM Extremely fast access Extremely fast access Limited capacity Limited capacity Examples: Memcached, Redis, MonetDB, VoltDB Examples: Memcached, Redis, MonetDB, VoltDB

8 www.hypertable.org Horizontal Scalability Scale out Scale out Increase capacity by adding machines Increase capacity by adding machines Opposite of vertical scalability (scale up) Opposite of vertical scalability (scale up) Commodity Hardware Commodity Hardware

9 www.hypertable.org Distributed Hash Table (DHT) Horizontally Scalable Horizontally Scalable Decentralized Decentralized Fast access Fast access Restricted API: GET,SET,DELETE Restricted API: GET,SET,DELETE Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Peer-to-peer file sharing systems: BitTorrent, Napster, Gnutella, Freenet Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase Examples: Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase

10 Scalable Database Architectures

11 www.hypertable.org Auto-Sharding Splits table data into horizontal shards Splits table data into horizontal shards Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Shards managed by traditional RDBMS (e.g. MySQL, Postgres) Automated glue code to handle sharding and request routing Automated glue code to handle sharding and request routing Examples: MongoDB, AsterData, Greenplum Examples: MongoDB, AsterData, Greenplum

12 www.hypertable.org MongoDB

13 Dynamo Developed by Amazon.com for their Shopping Cart Developed by Amazon.com for their Shopping Cart Designed for high write availability Designed for high write availability Eventually Consistent DHT Eventually Consistent DHT Implementations: Implementations: Cassandra Cassandra Project Voldemort Project Voldemort Riak Riak Dynomite Dynomite

14 www.hypertable.org Eventual Consistency Database update semantics in a distributed system with data replication Database update semantics in a distributed system with data replication Strong Consistency - after an update completes all processes see the updated value Strong Consistency - after an update completes all processes see the updated value Eventual Consistency - eventually all processes will see the updated value Eventual Consistency - eventually all processes will see the updated value Most well-known eventual consistency system is DNS Most well-known eventual consistency system is DNS

15 www.hypertable.org Eventual Consistency

16 www.hypertable.org Consistent Hashing

17 www.hypertable.org Amazon AWS S3 S3 Online storage web service Online storage web service Designed for larger amounts of data Designed for larger amounts of data Cost $0.15/GB per month Cost $0.15/GB per month SimpleDB SimpleDB Designed for smaller amounts of data Designed for smaller amounts of data Provides indexing and richer query capability Provides indexing and richer query capability Cost $027/GB per month + machine utilization fee Cost $027/GB per month + machine utilization fee RDS RDS Managed MySQL instances Managed MySQL instances

18 www.hypertable.org Order Preserving Partitioner (Cassandra) www.recipezaar.comwww.recipezaar.com 1091721999 … 629750272 www.recipezaar.com + www.ribbonprinters.comwww.ribbonprinters.com 1091721999 … 965293103 www.ribbonprinters.com / 2 = / 2 = www.rgb????i?pQdpwww.rgb????i?pQdp?.??? 1091721999 … 297521687 www.rgb????i?pQdp

19 www.hypertable.org Order Preserving Partitioner Balance Problem

20 www.hypertable.org Bigtable: the infrastructure that Google is built on Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Bigtable underpins 100+ Google services, including: YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database… Implementations Implementations Hypertable Hypertable HBase HBase

21 www.hypertable.org Google Stack GFS - Replicates data inter-machine GFS - Replicates data inter-machine MapReduce - Efficiently process data in GFS MapReduce - Efficiently process data in GFS Bigtable - Indexed table structure Bigtable - Indexed table structure

22 www.hypertable.org Google File System

23 www.hypertable.org Google File System

24 www.hypertable.org System Overview

25 www.hypertable.org Data Model Sparse, two-dimensional table with cell versions Sparse, two-dimensional table with cell versions Cells are identified by a 4-part key Cells are identified by a 4-part key Row (string) Row (string) Column Family (byte) Column Family (byte) Column Qualifier (string) Column Qualifier (string) Timestamp (long integer) Timestamp (long integer)

26 www.hypertable.org Table: Visual Representation

27 www.hypertable.org Table: Actual Representation

28 www.hypertable.org Scaling (part I)

29 www.hypertable.org Scaling (part II)

30 www.hypertable.org Scaling (part III)

31 www.hypertable.org Request Routing

32 Hypertable

33 www.hypertable.org Hypertable Overview Massively Scalable Database Massively Scalable Database Modeled after Googles Bigtable Modeled after Googles Bigtable High Performance Implementation (C++) High Performance Implementation (C++) Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Thrift Interface for all popular High Level Languages: Java, Ruby, Python, PHP, etc Open Source (GPL license) Open Source (GPL license) Project started March 2007 @ Zvents Project started March 2007 @ Zvents

34 www.hypertable.org Hypertable In Use Today

35 www.hypertable.org Hypertable vs. HBase

36 www.hypertable.org Hypertable vs. HBase TestHypertable Advantage Relative to HBase (%) Random Read Zipfian 80 GB925 Random Read Zipfian 20 GB777 Random Read Zipfian 2.5 GB100 Random Write 10KB values51 Random Write 1KB values102 Random Write 100 byte values427 Random Write 10 byte values931 Sequential Read 10KB values1060 Sequential Read 1KB values68 Sequential Read 100 byte values129 Scan 10KB values2 Scan 1KB values58 Scan 100 byte values75 Scan 10 byte values220

37 www.hypertable.org Annual EC2 Cost Savings Assuming 200% improvement Assuming 200% improvement Extra large reserved instances Extra large reserved instances

38 www.hypertable.org Resources Project Site www.hypertable.org Twitter hypertable Commercial Support www.hypertable.com Performance Evaluation Write-up blog.hypertable.com/?p=14

39 Q&A


Download ppt "Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc."

Similar presentations


Ads by Google