NOSQL databases and Big Data Storage Systems

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
© Copyright 2013 STI INNSBRUCK
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
NoSQL: Graph Databases
Introduction to Mongo DB(NO SQL data Base)
Neo4j: GRAPH DATABASE 27 March, 2017
Plan for Final Lecture What you may expect to be asked in the Exam?
Plan for Cloud Data Models
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
Column-Based.
Key-Value Store.
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NoSQL Database and Application
Modern Databases NoSQL and NewSQL
NOSQL.
CMPE 280 Web UI Design and Development October 17 Class Meeting
Dineesha Suraweera.
Gowtham Rajappan.
CHAPTER 3 Architectures for Distributed Systems
Hadoop and NoSQL at Thomson Reuters
NoSQL Systems Overview (as of November 2011).
Storage Systems for Managing Voluminous Data
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
CSE 482 Lecture 5: NoSQL.
April 13th – Semi-structured data
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Transaction Properties: ACID vs. BASE
Introduction to NoSQL Database Systems
CMPE 280 Web UI Design and Development March 14 Class Meeting
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Working with GEOLocation Data
Presentation transcript:

NOSQL databases and Big Data Storage Systems COP 6726: New Directions in Database Systems NOSQL databases and Big Data Storage Systems

NOSQL The term NOSQL is generally interpreted as Not Only SQL. Most NOSQL systems are distributed databases which a focus on semi-structured data storage, high performance, availability, data replication, and scalability. Structured relational SQL systems offer too many services (e.g., query language, concurrency control, etc.) and may be too restrictive.

Emergence of NOSQL Systems Google developed a NOSQL system known as BigTable, used in many of Google’s application that requires vast amounts of data storage, such as Gmail, Google Maps, and Web site indexing. This system uses concepts from column-based or wide column stores. Amazon developed a NOSQL system called DynamoDB that is available through Amazon’s cloud services. This system uses concepts from key-value stores. Facebook developed a NOSQL systems called Cassandra, which is now open source and known as Apache Cassandra. This system uses concepts from both key-value stores and column-based systems. Other software companies started developing their own solutions. For example, MongoDB and CouchDB, which are classified as document-based NOSQL systems or document stores. Another category of NOSQL systems is the graph-based NOSQL Systems, or graph databases; these includes Neo4J and GraphBase.

Characteristics of NOSQL Systems Distributed databases and distributed systems NOSQL systems emphasize high availability and scalability. Replication improves data availability and can also improve read performance. Two major replication models are used: Master-slave and master-master replication. However, write performance become more cumbersome. Many NOSQL applications do not require serializable consistency, so more relaxed forms of consistency know as eventual consistency are used. Sharding (i.e., horizontal partitioning) of the file records is often employed in NOSQL systems. Most systems use hashing or range partitioning on object keys to achieve high-performance data access.

Characteristics of NOSQL Systems Data Models and query languages. NOSQL systems emphasize performance and flexibility over modeling power and complex querying. Not Requiring a Schema. There are various languages for describing semi-structured data, such as JSON (JavaScript Object Notation) and XML (Extensible Markup Language). Less Powerful Query Languages. Search (read) queries often locate single objects in a single file based on their object keys. Many NOSQL systems do not provide join operations as part of the query language itself. Some NOSQL systems provide storage of multiple version of data items, with the timestamps of when the data version was created.

Categories of NOSQL Systems Document-based NOSQL systems: These systems store data in the form of documents using well-known formats, such as JSON. NOSQL key-value stores: These systems have a simple data model based on fast access by the key to the value associated with the key. Column-based or wide column NOSQL systems: These systems partition a table by column into column families (i.e., vertical partitioning). Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be found by traversing the edges using path expression.

The CAP theorem The three letters in CAP refer to three desirable properties of distributed system with replicated data. Consistency (among replicated copies) Availability (of the system for read and write operations) Partition tolerance (in the face of the nodes in the system being partitioned by a network fault) In a NOSQL distribute data store, a weaker consistency level is often acceptable. The other two properties (i.e., availability and partition tolerance) are important.

Document-based NOSQL Systems Document-based systems typically store data as collections of similar documents. There is no requirement to specify a schema – rather, the documents are specified as self-describing data. There are many document-based NOSQL systems, including MongoDB and CouchDB.

MongoDB MongoDB documents are stored in BSON(Binary JSON). Individual documents are stored in collection. db.createCollection(“projecrt”, {capped : true, size: 5242880, max: 200})) A collection does not have a schema. Each document in a collection has a unique ObjectID field. MongoDB has several CRUD operations, where CRUD stands for (create, read, update, delete). db.<collection_name>.insert(<document(s)>) db.<collection_name>.remove(<condition>) db.<collection_name>.find (<condition>)

Example of simple document

MongoDB Replication: The concept of replica set is used to create multiple copies of the same data set on different nodes in the distributed systems. It uses a variation of the master-slave approach for replication. Sharding (or horizontal partitioning): Sharding divides the document into disjoint partition known as shards. There are two ways to partition a collection into shard in MongoDB (i.e., range partitioning and hash partitioning). The query (CRUD operation) will be routed to the nodes that contain the shards that hold the documents that the query is requesting. Sharding focuses on improving performance via load balancing and horizontal scalability, whereas replication focuses on ensuring system availability when certain nodes fail in the distributed systems.

NOSQL Key-Value Stores The data model is relatively simple, and in many of these systems, there is no query language but rather a set of operations that can be used by the application programmers. Key-value stores include DynamoDB, Voldemort, Oracle NoSQL, Redis, and apache Cassandra.

Column-based NOSQL Systems The Google BigTable is a well-know example of this class of NOSQL systems. Big Table uses the Google File Systems (GFS) for data storage and distribution. Apache Hbase is somewhat similar to Google BigTable, but it typically used HDFS (Hdoop Distributed File System) for data storage. Apache Cassandra also uses column-based NOSQL systems.

Hbase Data is stored in tables, and each table has a table name. A table is associated with one or more column families. When the data is loaded into a table, each column family can be associated with many column qualifiers, but the column qualifiers are not specified as part of creating a table. A column is specified by a combination of ColumnFamiliy:ColumnQualifier. The concept of column family is somewhat similar to vertical partitioning, because columns are stored in the same files and accessed together. Hbase can keep several versions of a data item, along with the timestamp associated with each version. A cell holds a basic data item. The key (address) of a cell is specified by a combination of table, rowed, columnfamily, and columnqualifier, time stamp. A namespace is a collection of tables. Hbase has low-level CRUD operations. Hbase uses the Apache Zookeeper for managing the data on distributed sever nodes.

Examples in HBase

NOSQL Graph Databases The data is represented as a graph, which is a collection of vertices (nodes) and edges. Both nodes and edges can be labeled to indicate the types of entities and relationships they represent, and it is generally possible to store data associated with both individual nodes and individual edges.

Neo4j The data model organizes data using the concepts of nodes and relationships. Nodes can have zero, one, or several labels. The nodes that have the same label are grouped into a collection that identifies a subset of the nodes in the database graph for querying purposes. Relationships are directed; each relationship has a start node and end nodes as well as a relationship type. Properties can be specified via a map pattern, which is made of one or more “name:value” pairs enclosed in curly brackets; for example {Lname:’Smith, Fname:’John’} Neo4j has a high-level query languages, Cyper.

Create Nodes

Create Relationships

Basic simplified syntax of Cypher clauses

Example of simple Cyper queries

What is next?

Take Home Message NOSQL Document based NOSQL Key-values Stores Column-based NOSQL Graph Database