CSE 482 Lecture 5: NoSQL.

Slides:



Advertisements
Similar presentations
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
NoSQL Databases: MongoDB vs Cassandra
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
ASP.NET Programming with C# and SQL Server First Edition
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
CODD’s 12 RULES OF RELATIONAL DATABASE
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
© Copyright 2013 STI INNSBRUCK
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Introduction to MongoDB
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NoSQL databases A brief introduction NoSQL databases1.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
COMP 430 Intro. to Database Systems MongoDB. What is MongoDB? “Humongous” DB NoSQL, no schemas DB Lots of similarities with SQL RDBMs, but with more flexibility.
Introduction to Mongo DB(NO SQL data Base)
Neo4j: GRAPH DATABASE 27 March, 2017
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
Mongo Database (Intermediate)
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
Cloud Computing and Architecuture
Key-Value Store.
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Modern Databases NoSQL and NewSQL
NOSQL.
CMPE 280 Web UI Design and Development October 17 Class Meeting
Dineesha Suraweera.
MongoDB CRUD Operations
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
MongoDB for the SQL DBA.
NOSQL and CAP Theorem.
What is database? Types and Examples
NoSQL Databases Antonino Virgillito.
CSE 491/891 Lecture 21 (Pig).
CS122 Using Relational Databases and SQL
Lecture 16 (Intro to MapReduce and Hadoop)
April 13th – Semi-structured data
CSE 491/891 Lecture 24 (Hive).
Contents Preface I Introduction Lesson Objectives I-2
CS5220 Advanced Topics in Web Programming Introduction to MongoDB
CS1222 Using Relational Databases and SQL
Transaction Properties: ACID vs. BASE
CMPE 280 Web UI Design and Development March 14 Class Meeting
NoSQL & Document Stores
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
CS122 Using Relational Databases and SQL
Server & Tools Business
Presentation transcript:

CSE 482 Lecture 5: NoSQL

Outline of Today’s Lecture Previous lecture talks about relational database and SQL Today’s lecture focuses on NoSQL

NoSQL Not only SQL (does not mean No SQL) Supports distributed data storage and processing across multiple servers Motivation Lots of new applications that require large data storage Traditional database systems provide many functionalities (e.g., powerful query languages, concurrency control) that are overly complex and not needed by the applications The structured data model used by traditional database systems is also too restrictive for the new applications E.g., Schema is often fixed and not flexible enough

Traditional versus New Applications Bank/grocery/credit card transactions, etc Lots of read, write, and update operations Fixed set of columns and data format Consistency is important New Facebook, Gmail or Yahoo mail, Flickr, etc Mostly read or write (few update operations) Variable set of columns and data format Availability is important

Required Characteristics of NoSQL Scalability Store data in a cluster of machines Can easily expand storage by adding more nodes in a cluster Availability Data is replicated over multiple nodes to improve availability However, write performance is cumbersome because any update must be applied to every copy of the replicated data items NoSQL assumes eventual consistency, i.e., all replicas will eventually be consistent (instead of guaranteeing consistency at all times)

CAP Theorem For distributed database systems, we want Consistency: all replicate copies are consistent Availability: each read/write request must have a response Partition tolerance: system must continue to operate even when there is a fault that partitions the nodes in a network CAP theorem: it is not possible to guarantee all three NoSQL systems satisfy weaker consistency levels

Types of NoSQL Systems Document-based systems: store data in the form of documents using well-known formats such as JSON Example: MongoDB Key-value systems: Use key-value pairs for fast access to data items; value can be a record, an object, a document, or a complex data structure Example: Amazon’s DynamoDB, Facebook’s Cassandra Column-based systems: Partition a table by column into column families, where each column family is stored in its own files Example: Google’s BigTable Graph-based systems: Data is represented as graphs, and related nodes are found by traversing the edges using path expressions Example: GraphBase

MongoDB An open-source, document database Stores data as collections of documents in binary JSON (BSON) format Each document in a given collection has a unique id (key) MongoDB database Collection 1 Collection 2 Set of JSON Documents Set of JSON Documents

CRUD Operations Create: create a document to be inserted into collection db.<collection_name>.insert(<document(s)>) Read: find a document in the collection db.<collection_name>.find(<condition>) Update: update a document in the collection db.<collection_name>.update(<condition>) Delete: remove a document from the collection db.<collection_name>.remove(<condition>) http://api.mongodb.com/python/current/tutorial.html

Obtaining and Installing MongoDB You can download MongoDB from https://www.mongodb.org/downloads#production After installation: Create a data directory to store the data files prompt> md <data_directory> Launch the server by executing mongod.exe prompt> mongod.exe --dbpath <data_directory> Launch the client instance by executing mongo.exe prompt> mongo.exe

Launching the Server

MongoDB is ready to accept new commands Launching the Client MongoDB is ready to accept new commands

Some Useful Commands use <database_name>: If database_name exist, it will switch to the named database Otherwise, it will create a new database with the given name db: To check the name of the current database show dbs: To display all the databases available show collections: To display all the collections created under the current database

Collections To create a collection of documents: db.createCollection(collection_name, collection_options) Example: db.createCollection(“posts”, {capped:true, size:1310720, max:500}) Specifies the collection has upper limits on its storage space (size in bytes) and number of documents (max)

Collections Capped versus uncapped collection Capped collection: Documents are stored in a fixed-size circular queue Documents are stored according to insertion order If number of documents exceeds max number of documents, oldest document will be removed Fast especially if there is a large number of inserts needed Does not require an index for insertion order You cannot delete documents from a capped collection. Max = 3 Max = 3 doc1 doc2 doc3 doc2 doc3 doc4 doc4

Collections To create a collection: db.createCollection(“collection_name”, {capped:true, size:1310720, max:500, autoIndexID: true}) Capped: true/false. If true, you must specify the size parameter. Size: If it is less than or equal to 4096, then the collection will have a cap of 4096 bytes. Otherwise, the size is raised to an integer multiple of 256. Max: maximum number of documents allowed in the collection autoIndexID: true/false If true, automatically create index on _id field. Default value is false.

Collections To check whether a collection is capped: db.collection_name.isCapped() To drop an existing collection: db.collection_name.drop()

Insert Syntax: db.collection_name.insert(document) Can be used to insert one or more documents If collection_name does not exist, it will be created automatically

Insert

Example Suppose we want to create a collection of social media profiles { Name: ‘bob’, City: ‘Detroit’, Interests: [ ‘sports’, ‘outdoor’ ] } { Name: ‘mary’, City: ‘Chicago’, Interests: [ ‘science’, ‘art’ ] } { Name: ‘john’, City: ‘Lansing’, Interests: [ ‘politics’, ‘music’ ] }

Example

Example Find the users who lived in Lansing Find the users whose age is above 30 years old

Example Find Lansing users who are older than 23 years old Find the users who like outdoors or travel Find the users who don’t belong to any groups

Other Query Operators

Update Syntax: Example: db.collection_name.update(selection_condition, update) Example: Change Bob’s interests in music and outdoor to music and art

Remove Syntax: Example: db.collection_name.remove(selection_condition) Remove all users who are older than 30 years old

Bulk Import of JSON file You can import the file directly using mongoimport on the command prompt: C:> mongoimport –d <database> -c <collectionName> --file <filename> Suppose you have a JSON file named users.json: C:> mongoimport –d test -c profiles --file users.json This will create a collection named profiles in the test database to store information about the 4 users

Aggregate Function Syntax: db.collection_name.aggregate(aggregate_operation) For more examples: https://www.mkyong.com/mongodb/mongodb-aggregate-and-group-example/

Accessing MongoDB using Python Should install pymongo library package conda install pymongo After installing, launch the server (see slide 13)

Using MongoDB to store tweets Launch MongoDB server Python script to download tweets and store in MongoDB Use tweepy to download tweets from CDCgov Use pymongo to Open a connection to MongoDB server Store the json tweets Query MongoDB to retrieve the tweets

Using MongoDB to store tweets Step 1: Use tweepy to retrieve tweets

Using MongoDB to store tweets Step 2: Connect to MongoDB and store tweets Selected database The tweets are stored in a collection named twitter

Using MongoDB to store tweets Step 3: Query MongoDB to retrieve tweets

Using MongoDB to store tweets Step 3b: Using regular expression to find tweets about Zika

Summary Goals of this lecture: Next lecture To introduce NoSQL and explain how it differs from SQL To introduce MongoDB To give examples on how to interact with MongoDB using Python Next lecture Data preprocessing