NOSQL databases and Big Data Storage Systems

NOSQL databases and Big Data Storage Systems
COP 6726: New Directions in Database Systems NOSQL databases and Big Data Storage Systems

NOSQL The term NOSQL is generally interpreted as Not Only SQL.
Most NOSQL systems are distributed databases which a focus on semi-structured data storage, high performance, availability, data replication, and scalability. Structured relational SQL systems offer too many services (e.g., query language, concurrency control, etc.) and may be too restrictive.

Emergence of NOSQL Systems
Google developed a NOSQL system known as BigTable, used in many of Google’s application that requires vast amounts of data storage, such as Gmail, Google Maps, and Web site indexing. This system uses concepts from column-based or wide column stores. Amazon developed a NOSQL system called DynamoDB that is available through Amazon’s cloud services. This system uses concepts from key-value stores. Facebook developed a NOSQL systems called Cassandra, which is now open source and known as Apache Cassandra. This system uses concepts from both key-value stores and column-based systems. Other software companies started developing their own solutions. For example, MongoDB and CouchDB, which are classified as document-based NOSQL systems or document stores. Another category of NOSQL systems is the graph-based NOSQL Systems, or graph databases; these includes Neo4J and GraphBase.

Characteristics of NOSQL Systems
Distributed databases and distributed systems NOSQL systems emphasize high availability and scalability. Replication improves data availability and can also improve read performance. Two major replication models are used: Master-slave and master-master replication. However, write performance become more cumbersome. Many NOSQL applications do not require serializable consistency, so more relaxed forms of consistency know as eventual consistency are used. Sharding (i.e., horizontal partitioning) of the file records is often employed in NOSQL systems. Most systems use hashing or range partitioning on object keys to achieve high-performance data access.

Characteristics of NOSQL Systems
Data Models and query languages. NOSQL systems emphasize performance and flexibility over modeling power and complex querying. Not Requiring a Schema. There are various languages for describing semi-structured data, such as JSON (JavaScript Object Notation) and XML (Extensible Markup Language). Less Powerful Query Languages. Search (read) queries often locate single objects in a single file based on their object keys. Many NOSQL systems do not provide join operations as part of the query language itself. Some NOSQL systems provide storage of multiple version of data items, with the timestamps of when the data version was created.

Categories of NOSQL Systems
Document-based NOSQL systems: These systems store data in the form of documents using well-known formats, such as JSON. NOSQL key-value stores: These systems have a simple data model based on fast access by the key to the value associated with the key. Column-based or wide column NOSQL systems: These systems partition a table by column into column families (i.e., vertical partitioning). Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be found by traversing the edges using path expression.

The CAP theorem The three letters in CAP refer to three desirable properties of distributed system with replicated data. Consistency (among replicated copies) Availability (of the system for read and write operations) Partition tolerance (in the face of the nodes in the system being partitioned by a network fault) In a NOSQL distribute data store, a weaker consistency level is often acceptable. The other two properties (i.e., availability and partition tolerance) are important.

Document-based NOSQL Systems
Document-based systems typically store data as collections of similar documents. There is no requirement to specify a schema – rather, the documents are specified as self-describing data. There are many document-based NOSQL systems, including MongoDB and CouchDB.

MongoDB MongoDB documents are stored in BSON(Binary JSON).
Individual documents are stored in collection. db.createCollection(“projecrt”, {capped : true, size: , max: 200})) A collection does not have a schema. Each document in a collection has a unique ObjectID field. MongoDB has several CRUD operations, where CRUD stands for (create, read, update, delete). db.<collection_name>.insert(<document(s)>) db.<collection_name>.remove(<condition>) db.<collection_name>.find (<condition>)

Example of simple document

MongoDB Replication: The concept of replica set is used to create multiple copies of the same data set on different nodes in the distributed systems. It uses a variation of the master-slave approach for replication. Sharding (or horizontal partitioning): Sharding divides the document into disjoint partition known as shards. There are two ways to partition a collection into shard in MongoDB (i.e., range partitioning and hash partitioning). The query (CRUD operation) will be routed to the nodes that contain the shards that hold the documents that the query is requesting. Sharding focuses on improving performance via load balancing and horizontal scalability, whereas replication focuses on ensuring system availability when certain nodes fail in the distributed systems.

NOSQL Key-Value Stores
The data model is relatively simple, and in many of these systems, there is no query language but rather a set of operations that can be used by the application programmers. Key-value stores include DynamoDB, Voldemort, Oracle NoSQL, Redis, and apache Cassandra.

Column-based NOSQL Systems
The Google BigTable is a well-know example of this class of NOSQL systems. Big Table uses the Google File Systems (GFS) for data storage and distribution. Apache Hbase is somewhat similar to Google BigTable, but it typically used HDFS (Hdoop Distributed File System) for data storage. Apache Cassandra also uses column-based NOSQL systems.

Hbase Data is stored in tables, and each table has a table name.
A table is associated with one or more column families. When the data is loaded into a table, each column family can be associated with many column qualifiers, but the column qualifiers are not specified as part of creating a table. A column is specified by a combination of ColumnFamiliy:ColumnQualifier. The concept of column family is somewhat similar to vertical partitioning, because columns are stored in the same files and accessed together. Hbase can keep several versions of a data item, along with the timestamp associated with each version. A cell holds a basic data item. The key (address) of a cell is specified by a combination of table, rowed, columnfamily, and columnqualifier, time stamp. A namespace is a collection of tables. Hbase has low-level CRUD operations. Hbase uses the Apache Zookeeper for managing the data on distributed sever nodes.

Examples in HBase

NOSQL Graph Databases The data is represented as a graph, which is a collection of vertices (nodes) and edges. Both nodes and edges can be labeled to indicate the types of entities and relationships they represent, and it is generally possible to store data associated with both individual nodes and individual edges.

Neo4j The data model organizes data using the concepts of nodes and relationships. Nodes can have zero, one, or several labels. The nodes that have the same label are grouped into a collection that identifies a subset of the nodes in the database graph for querying purposes. Relationships are directed; each relationship has a start node and end nodes as well as a relationship type. Properties can be specified via a map pattern, which is made of one or more “name:value” pairs enclosed in curly brackets; for example {Lname:’Smith, Fname:’John’} Neo4j has a high-level query languages, Cyper.

Create Nodes

Create Relationships

Basic simplified syntax of Cypher clauses

Example of simple Cyper queries

What is next?

Take Home Message NOSQL Document based NOSQL Key-values Stores
Column-based NOSQL Graph Database

NOSQL databases and Big Data Storage Systems

Similar presentations

Presentation on theme: "NOSQL databases and Big Data Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NOSQL databases and Big Data Storage Systems

Similar presentations

Presentation on theme: "NOSQL databases and Big Data Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback