Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 482 Lecture 5: NoSQL.

Similar presentations


Presentation on theme: "CSE 482 Lecture 5: NoSQL."— Presentation transcript:

1 CSE 482 Lecture 5: NoSQL

2 Outline of Today’s Lecture
Previous lecture talks about relational database and SQL Today’s lecture focuses on NoSQL

3 NoSQL Not only SQL (does not mean No SQL)
Supports distributed data storage and processing across multiple servers Motivation Lots of new applications that require large data storage Traditional database systems provide many functionalities (e.g., powerful query languages, concurrency control) that are overly complex and not needed by the applications The structured data model used by traditional database systems is also too restrictive for the new applications E.g., Schema is often fixed and not flexible enough

4 Traditional versus New Applications
Bank/grocery/credit card transactions, etc Lots of read, write, and update operations Fixed set of columns and data format Consistency is important New Facebook, Gmail or Yahoo mail, Flickr, etc Mostly read or write (few update operations) Variable set of columns and data format Availability is important

5 Required Characteristics of NoSQL
Scalability Store data in a cluster of machines Can easily expand storage by adding more nodes in a cluster Availability Data is replicated over multiple nodes to improve availability However, write performance is cumbersome because any update must be applied to every copy of the replicated data items NoSQL assumes eventual consistency, i.e., all replicas will eventually be consistent (instead of guaranteeing consistency at all times)

6 CAP Theorem For distributed database systems, we want
Consistency: all replicate copies are consistent Availability: each read/write request must have a response Partition tolerance: system must continue to operate even when there is a fault that partitions the nodes in a network CAP theorem: it is not possible to guarantee all three NoSQL systems satisfy weaker consistency levels

7 Types of NoSQL Systems Document-based systems: store data in the form of documents using well-known formats such as JSON Example: MongoDB Key-value systems: Use key-value pairs for fast access to data items; value can be a record, an object, a document, or a complex data structure Example: Amazon’s DynamoDB, Facebook’s Cassandra Column-based systems: Partition a table by column into column families, where each column family is stored in its own files Example: Google’s BigTable Graph-based systems: Data is represented as graphs, and related nodes are found by traversing the edges using path expressions Example: GraphBase

8 MongoDB An open-source, document database
Stores data as collections of documents in binary JSON (BSON) format Each document in a given collection has a unique id (key) MongoDB database Collection 1 Collection 2 Set of JSON Documents Set of JSON Documents

9 CRUD Operations Create: create a document to be inserted into collection db.<collection_name>.insert(<document(s)>) Read: find a document in the collection db.<collection_name>.find(<condition>) Update: update a document in the collection db.<collection_name>.update(<condition>) Delete: remove a document from the collection db.<collection_name>.remove(<condition>)

10 Obtaining and Installing MongoDB
You can download MongoDB from After installation: Create a data directory to store the data files prompt> md <data_directory> Launch the server by executing mongod.exe prompt> mongod.exe --dbpath <data_directory> Launch the client instance by executing mongo.exe prompt> mongo.exe

11 Launching the Server

12 MongoDB is ready to accept new commands
Launching the Client MongoDB is ready to accept new commands

13 Some Useful Commands use <database_name>:
If database_name exist, it will switch to the named database Otherwise, it will create a new database with the given name db: To check the name of the current database show dbs: To display all the databases available show collections: To display all the collections created under the current database

14 Collections To create a collection of documents:
db.createCollection(collection_name, collection_options) Example: db.createCollection(“posts”, {capped:true, size: , max:500}) Specifies the collection has upper limits on its storage space (size in bytes) and number of documents (max)

15 Collections Capped versus uncapped collection Capped collection:
Documents are stored in a fixed-size circular queue Documents are stored according to insertion order If number of documents exceeds max number of documents, oldest document will be removed Fast especially if there is a large number of inserts needed Does not require an index for insertion order You cannot delete documents from a capped collection. Max = 3 Max = 3 doc1 doc2 doc3 doc2 doc3 doc4 doc4

16 Collections To create a collection:
db.createCollection(“collection_name”, {capped:true, size: , max:500, autoIndexID: true}) Capped: true/false. If true, you must specify the size parameter. Size: If it is less than or equal to 4096, then the collection will have a cap of 4096 bytes. Otherwise, the size is raised to an integer multiple of 256. Max: maximum number of documents allowed in the collection autoIndexID: true/false If true, automatically create index on _id field. Default value is false.

17 Collections To check whether a collection is capped:
db.collection_name.isCapped() To drop an existing collection: db.collection_name.drop()

18 Insert Syntax: db.collection_name.insert(document)
Can be used to insert one or more documents If collection_name does not exist, it will be created automatically

19 Insert

20 Example Suppose we want to create a collection of social media profiles { Name: ‘bob’, City: ‘Detroit’, Interests: [ ‘sports’, ‘outdoor’ ] } { Name: ‘mary’, City: ‘Chicago’, Interests: [ ‘science’, ‘art’ ] } { Name: ‘john’, City: ‘Lansing’, Interests: [ ‘politics’, ‘music’ ] }

21 Example

22 Example Find the users who lived in Lansing
Find the users whose age is above 30 years old

23 Example Find Lansing users who are older than 23 years old
Find the users who like outdoors or travel Find the users who don’t belong to any groups

24 Other Query Operators

25 Update Syntax: Example:
db.collection_name.update(selection_condition, update) Example: Change Bob’s interests in music and outdoor to music and art

26 Remove Syntax: Example: db.collection_name.remove(selection_condition)
Remove all users who are older than 30 years old

27 Bulk Import of JSON file
You can import the file directly using mongoimport on the command prompt: C:> mongoimport –d <database> -c <collectionName> --file <filename> Suppose you have a JSON file named users.json: C:> mongoimport –d test -c profiles --file users.json This will create a collection named profiles in the test database to store information about the 4 users

28 Aggregate Function Syntax:
db.collection_name.aggregate(aggregate_operation) For more examples:

29 Accessing MongoDB using Python
Should install pymongo library package conda install pymongo After installing, launch the server (see slide 13)

30 Using MongoDB to store tweets
Launch MongoDB server Python script to download tweets and store in MongoDB Use tweepy to download tweets from CDCgov Use pymongo to Open a connection to MongoDB server Store the json tweets Query MongoDB to retrieve the tweets

31 Using MongoDB to store tweets
Step 1: Use tweepy to retrieve tweets

32 Using MongoDB to store tweets
Step 2: Connect to MongoDB and store tweets Selected database The tweets are stored in a collection named twitter

33 Using MongoDB to store tweets
Step 3: Query MongoDB to retrieve tweets

34 Using MongoDB to store tweets
Step 3b: Using regular expression to find tweets about Zika

35 Summary Goals of this lecture: Next lecture
To introduce NoSQL and explain how it differs from SQL To introduce MongoDB To give examples on how to interact with MongoDB using Python Next lecture Data preprocessing


Download ppt "CSE 482 Lecture 5: NoSQL."

Similar presentations


Ads by Google