Data Model and Storage in NoSQL Systems (Bigtable, HBase) 1 Slides from Mohamed Eltabakh.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
HBase. OUTLINE Basic Data Model Implementation – Architecture of HDFS Hbase Server HRegionServer 2.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
Christopher Shain Software Development Lead Tresata.
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
-A APACHE HADOOP PROJECT
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
Database Design – Lecture 16
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
HBase. OUTLINE Basic Data Model Implementation – Architecture of HDFS Hbase Server HRegionServer 2.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
Partitioning Design For Performance and Maintainability Martin Cairns
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
1 HBase Intro 王耀聰 陳威宇
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
HBase Elke A. Rundensteiner Fall 2013
Introduction of HBase Reporter: Hu Yi Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Distributed Time Series Database
Nov 2006 Google released the paper on BigTable.
NOSQL DATABASE Not Only SQL DATABASE
Cloudera Kudu Introduction
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
SQL Basics Review Reviewing what we’ve learned so far…….
Bigtable A Distributed Storage System for Structured Data.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
and Big Data Storage Systems
Amit Ohayon, seminar in databases, 2017
Column-Based.
HBase Mohamed Eltabakh
Software Systems Development
How did it start? • At Google • • • • Lots of semi structured data
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
Gowtham Rajappan.
NOSQL databases and Big Data Storage Systems
Introduction to Apache
Hbase – NoSQL Database Presented By: 13MCEC13.
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Presentation transcript:

Data Model and Storage in NoSQL Systems (Bigtable, HBase) 1 Slides from Mohamed Eltabakh

Bigtable Data Model 2 HBase’s Data Model is similar

Data Organization in HBase All data is stored in Tables Table rows have exactly one Key, and all rows in a table are physically ordered by key Tables have a fixed number of Column Families Each row can have many Columns in each column family Each column has a set of values, each with a timestamp Each rowkey:columnfamily:column:timestamp combination represents coordinates for a Cell

HBase Logical View 4

HBase: Keys and Column Families 5 Each row has a Key Each record is divided into Column Families Each column family consists of one or more Columns

What is a Column Family? A Column Family is a group of related columns All columns must be in a column family Each row can have a completely different set of columns for a column family Row:Column Family: Columns: Chris Friends Friends:Bob BobFriends:Chris JamesFriends:Jane

Rows and Cells Not exactly the same as rows in a traditional RDBMS Key: a byte array Data: Cells, qualified by column family, column, and timestamp (not shown here) Row Key: Column Families : (Defined by the Table) Columns: (Defined by the Row) (May vary between rows) Cells: (Created with Columns) Chris AttributesAttributes:Age30 Attributes:Height68 FriendsFriends:Bob1 (Bob’s a cool guy) Friends:Jane0 (Jane and I don’t get along)

Cell Timestamps All cells are created with a timestamp Column family defines how many versions of a cell to keep Updates always create a new cell Deletes create a tombstone Queries can include an “as-of” timestamp to return point-in-time values

Key Byte array Serves as the primary key for the table Indexed for fast lookup Column Family Has a name (string) Contains one or more related columns Column Belongs to one column family Included inside the row familyName:columnName 9 Column family named “Contents” Column family named “anchor” Column named “apache.com”

Version Number Unique within each key By default  System’s timestamp Data type is Long Value (Cell) Byte array 10 Version number for each row value

Notes on Data Model HBase schema consists of several Tables Each table consists of a set of Column Families Columns are not part of the schema HBase has Dynamic Columns Because column names are encoded inside the cells Different cells can have different columns 11 “Roles” column family has different columns in different cells

Notes on Data Model (Cont’d) The version number can be user-supplied Even does not have to be inserted in increasing order Version numbers are unique within each key Table can be very sparse Many cells are empty Keys are indexed as the primary key Has two columns [cnnsi.com & my.look.ca]

HBase Physical Model 13

HBase Physical Model Each column family is stored in a separate file Key & Version numbers are replicated with each column family Empty cells are not stored 14

Example 15 Number of Milliseconds since Epoch

HBase Data Partitioning HBase scales horizontally Needs to split data over many RegionServers Regions are the unit of scale

HBase Architecture Backup HBase Master

Three Major Components 18 The HBaseMaster One master The HRegionServer Many region servers The HBase client

HBase Components Region A subset of a table’s rows, like horizontal range partitioning Automatically done RegionServer (many slaves) Manages data regions Serves data for reads and writes Master Responsible for coordinating the slaves Assigns regions, detects failures Admin functions 19

Regions & RegionServers All HBase tables are broken into 1 or more regions Regions have a start row key and an end row key Each Region lives on exactly one RegionServer RegionServers may host many Regions When RegionServers die, Master detects this and assigns Regions to other RegionServers

Region Distribution TableRegion Server Users “Aaron” – “George”Node01 “George” – “Matthew” Node02 “Matthew” – “Zachary” Node01 Row Keys in Region “Aaron” – “George” “Aaron” “Bob” “Chris” Row Keys in Region “George” – “Matthew” “George” Row Keys in Region “Matthew” – “Zachary” “Matthew” “Nancy” “Zachary” -META- Table “Users” Table

Big Picture 22

Use Case – Time Series Requirement : Store real-time stock tick data Requirement : Accommodate many simultaneous readers & writers Requirement : Allow for reading of current price for any ticker at any point in time TickerTimestampSequenceBidAsk IBM09:15:03: MSFT09:15:04: GOOG09:15:04: IBM09:15:04:

Time Series Use Case – RDBMS Solution KeysColumnDataType Primary Key TickerVarchar TimestampDateTime Sequence_NumberInteger Bid_PriceDecimal Ask_PriceDecimal KeysColumnDataType Primary KeyTickerVarchar Bid_PriceDecimal Ask_PriceDecimal Latest Prices: Historical Prices:

Time Series Use Case – HBase Solution Row KeyFamily:Column [Ticker].[Reverse_Timestamp].[Reverse_Sequence_ Number] Prices:Bid Prices:Ask  No need to keep separate “latest price” table  A scan starting at “ticker” will always return the latest price row  Let us analyze this solution further (reverse timestamps are explained at the link below): Let us analyze this solution further (reverse timestamps are explained at the link below):

HBase vs. HDFS 26

HBase vs. RDBMS 27