NoSQL CPSC 4670/5670.

Slides:



Advertisements
Similar presentations
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Advertisements

NoSQL Databases: MongoDB vs Cassandra
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
NoSQL W2013 CSCI 2141.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Databases with Scalable capabilities Presented by Mike Trischetta.
Software Engineer, #MongoDBDays.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
WTT Workshop de Tendências Tecnológicas 2014
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Introduction to MongoDB. Database compared.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1 Ahmed K. Ezzat, Tradeoffs Between SQL and NoSQL Data Mining and Big Data.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Intro to NoSQL Databases Tony Hannan November 2011.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Neo4j: GRAPH DATABASE 27 March, 2017
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Cloud Computing and Architecuture
CS122B: Projects in Databases and Web Applications Winter 2017
MapReduce: Simplified Data Processing on Large Clusters
Trade-offs in Cloud Databases
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Learning MongoDB ZhangGang
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
A Comparison of SQL and NoSQL Databases
A Comparison of SQL and NoSQL Databases
NoSQL Systems Overview (as of November 2011).
MongoDB Introduction, Installation & Execution
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
CS6604 Digital Libraries IDEAL Webpages Presented by
NoSQL Databases An Overview
A Comparison of SQL and NoSQL Databases
NoSQL W2013 CSCI 2141.
Database Systems Summary and Overview
relational thoughts on NoSql
Transaction Properties: ACID vs. BASE
CS639: Data Management for Data Science
NoSQL & Document Stores
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

NoSQL CPSC 4670/5670

NoSQL What does it mean? Not Only SQL.

Use Cases Massive write performance. Fast key value look ups. Flexible schema and data types. No single point of failure. Fast prototyping and development. Out of the box scalability. Easy maintenance.

Motives Behind NoSQL Big data. Scalability. Data format. Collect. Store. Organize. Analyze. Share Scalability. Data growth outruns the ability to manage it so we need scalable solutions. Data format. Manageability.

Scalability Scale up, Vertical scalability. Increasing server capacity. Adding more CPU, RAM. Managing is hard. Possible down times

Scalability Scale out, Horizontal scalability. Adding servers to existing system with little effort, aka Elastically scalable. Bugs, hardware errors, things fail all the time. It should become cheaper. Cost efficiency. Shared nothing. Use of commodity/cheap hardware. Heterogeneous systems. Controlled Concurrency (avoid locks). Service Oriented Architecture. Local states. Decentralized to reduce bottlenecks. Avoid Single point of failures. Asynchrony. Symmetry, you don’t have to know what is happening. All nodes should be symmetric.

What is Wrong With RDBMS? Nothing. One size fits all? Not really. Impedance mismatch. Object Relational Mapping doesn't work quite well. Rigid schema design. Harder to scale. Replication. Joins across multiple nodes? Hard. How does RDMS handle data growth? Hard. Need for a DBA. Many programmers are already familiar with it. Transactions and ACID make development easy. Lots of tools to use.

ACID Semantics Atomicity: All or nothing. Consistency: Consistent state of data and transactions. Isolation: Transactions are isolated from each other. Durability: When the transaction is committed, state will be durable. Any data store can achieve Atomicity, Isolation and Durability but do you always need consistency? No. By giving up ACID properties, one can achieve higher performance and scalability.

Brewer’s CAP Theorem A distributed system can support only two of the following characteristics: Consistency Availability Partition tolerance Proven by Nancy Lynch et al. MIT labs. http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf 19 September 2018

Consistency Consistency: Clients should read the same data. There are many levels of consistency. Strict Consistency – RDBMS. Tunable Consistency – Cassandra. Eventual Consistency – Amazon Dynamo. client perceives that a set of operations has occurred all at once – Pritchett More like Atomic in ACID transaction properties N : Number of nodes with a replica of data. W: Number of nodes that must acknowledge the update. R : Minimum number of nodes that succeeds read operation. W + R > N Strong Consistency W + R <= N Weak Consistency 19 September 2018

Availability Availability: Data to be available. node failures do not prevent survivors from continuing to operate – Wikipedia Every operation must terminate in an intended response – Pritchett 19 September 2018

Partition Tolerance Partial Tolerance: Data to be partitioned across network segments due to network failures. the system continues to operate despite arbitrary message loss – Wikipedia Operations will complete, even if individual components are unavailable – Pritchett 19 September 2018

BASE, an ACID Alternative Almost the opposite of ACID. Basically available: Nodes in the a distributed environment can go down, but the whole system shouldn’t be affected. Soft State (scalable): The state of the system and data changes over time. Eventual Consistency: Given enough time, data will be consistent across the distributed system.

A Clash of cultures ACID: Strong consistency. Less availability. Pessimistic concurrency. Complex. BASE: Availability is the most important thing. Willing to sacrifice for this (CAP). Weaker consistency (Eventual). Best effort. Simple and fast. Optimistic.

NoSQL Database Types Discussing NoSQL databases is complicated because there are a variety of types: Key-Value Store – Hash table of keys Column Store – Each storage block contains data from only one column Document Store – stores documents made up of tagged elements Graph Databases 19 September 2018

Other Non-SQL Databases XML Databases Codasyl Databases Object Oriented Databases Etc… Will not address these today 19 September 2018

Complexity

NoSQL Examples: Key-Value Store Hash tables of Keys Values stored with Keys Fast access to small data values Example – Project-Voldemort http://www.project-voldemort.com/ Linkedin Example – MemCacheDB http://memcachedb.org/ Backend storage is Berkeley-DB 19 September 2018

NoSQL Examples: Map Reduce Technique for indexing and searching large data volumes Two Phases, Map and Reduce Map Extract sets of Key-Value pairs from underlying data Potentially in Parallel on multiple machines Reduce Merge and sort sets of Key-Value pairs Results may be useful for other searches 19 September 2018

Map/Reduce map(key, val) is run on each item in set emits new-key / new-val pairs reduce(key, vals) is run for each unique key emitted by map() emits final output

MapReduce Examples: count words in docs Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)”

Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” Count, Illustrated see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1 see bob throw see spot run

MapReduce Example: Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line, “1”) reduce(key=line, values=uniq_counts): Don’t do anything; just emit line

Map Reduce Map Reduce techniques differ across products Implemented by application developers, not by underlying software 19 September 2018

NoSQL Example: Document Store Schema Free. Usually JSON like interchange model. Query Model: JavaScript or custom. Aggregations: Map/Reduce. Indexes are done via B-Trees. Example: CouchDB http://couchdb.apache.org/ BBC Example: MongoDB http://www.mongodb.org/ Foursquare, Shutterfly JSON – JavaScript Object Notation 19 September 2018

CouchDB JSON Example { "_id": "guid goes here", "_rev": "314159", "type": "abstract", "author": "Keith W. Hare" "title": "SQL Standard and NoSQL Databases", "body": "NoSQL databases (either no-SQL or Not Only SQL) are currently a hot topic in some parts of computing.", "creation_timestamp": "2011/05/10 13:30:00 +0004" } 19 September 2018

CouchDB JSON Tags "_id" "_rev" "type", "author", "title", etc. GUID – Global Unique Identifier Passed in or generated by CouchDB "_rev" Revision number Versioning mechanism "type", "author", "title", etc. Arbitrary tags Schema-less Could be validated after the fact by user-written routine 19 September 2018

Mongodb Data types: bool, int, double, string, object(bson), oid, array, null, date. Database and collections are created automatically. Lots of Language Drivers. Capped collections are fixed size collections, buffers, very fast, FIFO, good for logs. No indexes. Object id are generated by client, 12 bytes packed data. 4 byte time, 3 byte machine, 2 byte pid, 3 byte counter. Possible to refer other documents in different collections but more efficient to embed documents. Replication is very easy to setup. You can read from slaves.

Mongodb Supports aggregation. Map Reduce with JavaScript. You have indexes, B-Trees. Ids are always indexed. Updates are atomic. Low contention locks. Querying mongo done with a document: Lazy, returns a cursor. Reduceable to SQL, select, insert, update limit, sort etc. There is more: upsert (either inserts of updates) Several operators: $ne, $and, $or, $lt, $gt, $incr,$decr and so on. Repository Pattern makes development very easy.

Mongodb - Sharding Config servers: Keeps mapping Mongos: Routing servers Mongod: master-slave replicas

Graph Stores Based on Graph Theory. Scale vertically, no clustering. You can use graph algorithms easily.

Graph Stores -- Neo4J Nodes, Relationship. Traversals. HTTP/REST. ACID. Web Admin. Not too much support for languages. Has transactions.

NoSQL Summary NoSQL databases reject: Programmer responsible for Overhead of ACID transactions “Complexity” of SQL Burden of up-front schema design Declarative query expression Yesterday’s technology Programmer responsible for Step-by-step procedural language Navigating access path 19 September 2018

Summary SQL Databases NoSQL Database Predefined Schema Standard definition and interface language Tight consistency Well defined semantics NoSQL Database No predefined Schema Per-product definition and interface language Getting an answer quickly is more important than getting a correct answer 19 September 2018

19 September 2018