NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

No SQL is not about SQL No SQL is a Zoo.. Key-Value Stores Wide Column Stores Document Stores Graph Databases.
2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
A Survey of Distributed Database Management Systems Brady Kyle CSC
NoSQL Databases: MongoDB vs Cassandra
Reporter: Haiping Wang WAMDM Cloud Group
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Introduction to Backend James Kahng. Install Node.js.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Presentation by Krishna
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS346: Advanced Databases
NoSQL Database.
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Goodbye rows and tables, hello documents and collections.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Moohanad Hassan Maedeh Pishvaei. Introduction Open Source Apache foundation project Relational DB: SQL Server CouchDB : JSON document-oriented DB (NoSQL)
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
Big Data and NoSQL What and Why?. Motivation: Size WWW has spawned a new era of applications that need to store and query very large data sets –Facebook.
NOSQL DATABASE Not Only SQL DATABASE
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Distributed databases A brief introduction with emphasis on NoSQL databases Distributed databases1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA/ Hadoop Interview Questions.
Why NO-SQL ?  Three interrelated megatrends  Big Data  Big Users  Cloud Computing are driving the adoption of NoSQL technology.
CSCI5570 Large Scale Data Processing Systems
and Big Data Storage Systems
Cloud Computing and Architecuture
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Modern Databases NoSQL and NewSQL
NOSQL.
Christian Stark and Odbayar Badamjav
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
Massively Parallel Cloud Data Storage Systems
NOSQL and CAP Theorem.
NoSQL Databases An Overview
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Overview of big data tools
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Transaction Properties: ACID vs. BASE
CS639: Data Management for Data Science
Presentation transcript:

NoSQL Or Peles

What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

RDBMS Limitations Hard to scale horizontally (for updates) – Distributed ACID requires 2 phase commit. Schema can be a bitch – Hard to change. – Data normalization can slow down queries.

Web Scale Some numbers: – Youtube serves over 100MM videos a day. – Ebay adds over 10TB of storage every week. – Facebook holds over 80 Billion photos, and serves hundreds of thousands of requests/second.

Ideal System Available – can always read and write. Consistent – Reads always pick up the latest write. Partition tolerant – The system can be split across multiple machines and datacenters.

Starbucks doesn’t use two phase commit A great example presented here.here Asynchronous execution Correlation Exception handling: – Write off – Retry – Compensation 2 phase commit would create a choke point.

CAP Theorem CAP (Eric Brewer, 2000): Simply put, of the following 3 properties: Consistency Availability Partition tolerance Only two can hold at any system.

CAP in practice

CA Two phase commits, works best at a single data center. Scaling issues. CP Sharding. Data may become unavailable if a shard fails. AP May return inaccurate data. DNS is a prime example.

Consistency Types Strict Eventual – Causual – Read your writes – Session – Monotonic read – Monotonic write

Concepts In memory vs. disk based. Shared everything vs. shared nothing. Master slave vs. server symmetry. Elastic scalability. MapReduce.

Sharding Split data across machines (database instances). – Feature based sharding. – Key based sharding. – Lookup table.

NoSQL Categories Key-Value stores Document store Tabular

Lean & Mean The Key-Value In-Memory DBs In memory DBs are simpler and faster than their on-disk counterparts. Key value stores offer a simple interface with no schema. Major limitation – data size is limited to RAM size. Often used as caches for on-disk DB systems.

Open Source In-Memory DBs Memcached/MemchachedDB Redis – Both are key-value stores that rely on hash partitioning – Memcached is an LRU based cache. – Redis is more of a data structure server. – Both use a Shared-Nothing architecture

Memcached Really a giant, distributed hash table. Advantages: – Relatively simple – Practically no server to server talk. – Linear scalability Disadvantages: – Doesn’t understand data – no server side operations. The key and value are always strings. – It’s really meant to only be a cache – no more, no less. – No recovery, limited elasticity.

Redis Like Memcached, it’s a distributed hash in memory. Offers support for lists and sets, as well as strings. Offers limited server side operations. Supports master-slave architecture and data replicas for scalability and high availability. Also supports a persistent mode that writes to disk.

Document Stores As the name implies, these databases store documents. Usually schema-free. The same database can store multiple documents. Allow indexing based on document content. Prominent examples: CouchDB, MongoDB.

Documents A document is just a collection of values, usually serialized in JSON. Many implementations offer nesting of documents Example: { "username" : "bob", "address" : { "street" : "123 Main Street", "city" : "Springfield", "state" : "NY" } }

CouchDB Written in ERLANG. Offers ACID guarantees based on multi-version control. Supports replication, but isn’t a real distributed database.

MongoDB Written in C++. Atomic operations on single documents only. Excellent scalability based on sharding. Support for server side javascript and MapReduce.

Tabular stores The original: Google’s BigTable – Proprietary, not open source. The open source elephant alternative – Hadoop with HBase. A top level Apache Project. Large number of users. Contains a distributed file system, MapReduce, a database server (Hbase), and more. Rack aware.

Hadoop components

Hadoop basic components At it’s core, Hadoop is a framework for running MapReduce operations on large data sets. The data sets are placed as text files on the distributed file system.

Hadoop MapReduce Flow

HBase A database engine built on top of Hadoop distributed file system. Scales up to Billions of rows with Millions of columns. Has a Java interface for queries.

The Tradeoff – SQL vs. NoSQL RDBMS: – Mature. – Standard SQL (but not for DDL, extensions). – Robust tools. NoSQL: – Scale – Schemaless

References Eventual Consistency (Werner Vogels, CTO, Amazon) Eventual Consistency Starbucks doesn’t use two phase commit. Starbucks doesn’t use two phase commit Hadoop the definitive guide (O’Reilly) MongoDB the definitive guide (O’Reilly) Many wiki pages

Questions