Department of Computer Science, Johns Hopkins University EN 600.423 Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

Slides:



Advertisements
Similar presentations
Tomcy Thankachan  Introduction  Data model  Building Blocks  Implementation  Refinements  Performance Evaluation  Real applications  Conclusion.
Advertisements

CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Overview on ZHT 1.  General terms  Overview to NoSQL dabases and key-value stores  Introduction to ZHT  CS554 projects 2.
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
Bigtable: A Distributed Storage System for Structured Data Presenter: Guangdong Liu Jan 24 th, 2012.
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
Distributed storage for structured data
Bigtable: A Distributed Storage System for Structured Data
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Systems analysis and design, 6th edition Dennis, wixom, and roth
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Wrap-Up.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Computer Science iBigTable: Practical Data Integrity for BigTable in Public Cloud CODASPY 2013 Wei Wei, Ting Yu, Rui Xue 1/40.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
Bigtable: A Distributed Storage System for Structured Data Google’s NoSQL Solution 2013/4/1Title1 Chao Wang Fay Chang, Jeffrey Dean, Sanjay.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Bigtable: A Distributed Storage System for Structured Data 1.
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 11: Conclusion Aidan Hogan
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cloud Data Models Lecturer.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CSC590 Selected Topics Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
NOSQL DATABASE Not Only SQL DATABASE
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
Bigtable A Distributed Storage System for Structured Data.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
CC Procesamiento Masivo de Datos Otoño Lecture 12: Conclusion
CS 405G: Introduction to Database Systems
and Big Data Storage Systems
Column-Based.
HBase Mohamed Eltabakh
How did it start? • At Google • • • • Lots of semi structured data
CS122B: Projects in Databases and Web Applications Winter 2017
NoSQL Database and Application
CSE-291 (Cloud Computing) Fall 2016
NOSQL.
NOSQL databases and Big Data Storage Systems
Storage Systems for Managing Voluminous Data
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
NoSQL Databases Antonino Virgillito.
Database Systems Summary and Overview
CMSC Cluster Computing Basics
Presentation transcript:

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems

Data Models Key/value: no structure to value data – Can support range queries on keys (ordered key index) – Can support no range queries (hashed key index) benefit of easier system management Tabular data: – Columns and sometimes types. Think extensible schemas. – Typically columns stored separately so that schemas are totally logical rather than physical Document databases: – Full text searchability within fields

From: data-modeling-techniques/

Big Table: Defining Tabular A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes Implications: accessible by row (indexed) Accessible by column (schema) Values are unintrepreted (no type in BigTable) Timestamp: time evolving views of data (snapshots)

Why tabular? It adds the notion of columns to the key value store – A second dimension of indexing This translates into dynamic schemas – Create multiple columns that share rows. This can be viewed as a table. – Arbitrary views (collections) of columns can be evaluated dynamically Implication: one can run ALTER TABLE for free. The schema is not physically encoded in the disk storage so no reorganization costs. – This concept is being used in “column databases” too

Interpreting the Data Model One row, three columns (shown) Different views of data (as a function of time) Column “families”, e.g. anchor – Hint that data are of the same type (compressed together) – Can be created without any data in them – Hundreds of column families. Arbitrary numbers of columns.

Tablets: “Bigtable maintains data in lexicographic order by row key. The row range for a table is dynamically partitioned. Each row range is called a tablet, which is the unit of distribution and load balancing.” Each column indexed lexicographically by row. Hierarchy of tablets form a B+-tree-like data structure – Metadata in internal nodes and data only at leaves.

BigTable and Locking BigTable is strongly consistent – Relies on distributed Paxos Only available when lock service (chubby) is available Uses locking of ROOT table (and recoverability of locks) to maintain a single system image. In terms of the CAP theorem – CA (consistency and availability) – No P (partition tolerance)

Amazon Dynamo: AP not C Allow for divergent replicas and system delusion To the benefit of scalability and node autonomy

Divergent Replicas Concurrent updates to multiple replicas – Requires reconciliation – Determine conflicting updates and reconciliation Clever use of data structures – Merkle trees from snapshot/versioning literature – Vector clocks from Bayou

Hashing and Replica Placement Key/value mapping of data to three nodes – Three replicas Each write updates to whatever membership set it thinks it sees – These may be different, but should be symmetric within groups (a property of gossip protocols)

Data Model and CAP Does BigTable’s schema require consistency? Does consistent hashing require key/value? “Bigtable is a distributed storage system for managing structured data. It maintains a sparse, multi-dimensional sorted map and allows applications to access their data using multiple attributes [2]. Compared to Bigtable, Dynamo targets applications that require only key/value access with primary focus on high availability where updates are not rejected even in the wake of network partitions or server failures.”

Open Source Systems Hbase: open-source implementation of BigTable – Companion project with Hadoop! (Map/Reduce) and HDFS (Google File System) Cassandra: – BigTable data model – Dynamo distributed system and availability – How do ordered indexes work in Dynamo? OK, let’s do it.