Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Tier Options NWEN304 Advanced Network Applications.

Similar presentations


Presentation on theme: "Data Tier Options NWEN304 Advanced Network Applications."— Presentation transcript:

1 Data Tier Options NWEN304 Advanced Network Applications

2 Data Tier Options What are the options? How do you use them? What are the tradeoffs? How will we use this with Heroku?

3 What is the Scale of the Problem? 1.100 hours of video are uploaded to YouTube [YouTube statistics].YouTube statistics 2.19,000 downloads from Apple’s App Store [Nerney 2012].Nerney 2012 3.276,000 photos are uploaded to SnapChat [Van Hoven 2014].Van Hoven 2014 4.350,000 tweets are posted on Twitter [Mirani 2013].Mirani 2013 5.3,000,000 likes on Facebook [Tepper 2012].Tepper 2012 6.44,000,000 messages sent on WhatsApp [Bushey 2014].Bushey 2014 7.204,000,000 emails are sent [Knoblauch 2014].Knoblauch 2014

4 Databases are a solution An organised collection of data Essential to almost every business Emphasises scalability, reliability, security, efficiency, etc. Many different types of DBMS Optimised for different things

5 Relational Databases customer_idnamedate_of_birth 1Brian Kim1948-09-23 2Karen Johnson1989-11-18 3Wade Feinstein1965-02-29 RDBMS have been dominant since the 1980s. (MySQL, PostgreSQL, DB2, SQL Server) Stores data in tables, rows, columns. Each table represents a collection of related items and each item is a row in the table An example table for bank customers:

6 Defining a Schema CREATE TABLE customers ( customer_id INT NOT NULL PRIMARY KEY, name VARCHAR(128), date_of_birth DATE ); Schema = structure of each table SQL - structured query language customer_idnamedate_of_birth

7 SQL: Inserting Data INSERT INTO customers (customer_id, name, date_of_birth) VALUES (1, "Brian Kim", "1948-09-23"); customer_idnamedate_of_birth 1Brian Kim1948-09-23

8 SQL: Retrieving Data SELECT * FROM customers WHERE name = 'Brian Kim'; customer_idnamedate_of_birth 1Brian Kim1948-09-23 SELECT date_of_birth FROM customers WHERE name = 'Brian Kim'; date_of_birth 1948-09-23

9 SQL: Multiple Tables account_idnamedate_of_birthcustomer_idaccount_typebalance 1Brian Kim1948-09-231cheque500 2Karen Johnson1989-11-182cheque8,500 3Brian Kim1948-09-231savings2,500 4Wade Feinstein1965-02-293checking160 Bank account details for each account

10 SQL: Multiple Tables account_idnamedate_of_birthcustomer_idaccount_typebalance 1Brian Kim1948-09-231cheque500 2Karen Johnson1989-11-182cheque8,500 3Brian Kim1948-09-231savings2,500 4Wade Feinstein1965-02-293checking160 What rows have to be updated if Brian decides to change his last name? Is there a better way?

11 SQL: Multiple Tables account_idcustomer_idaccount_typebalance 11cheque500 22cheque8,500 31savings2,500 43checking160 customer_idnamedate_of_birth 1Brian Kim1948-09-23 2Karen Johnson1989-11-18 3Wade Feinstein1965-02-29

12 SQL: Foreign Key CREATE TABLE accounts ( account_id INT NOT NULL PRIMARY KEY, customer_id INT FOREIGN KEY REFERENCES customers(customer_id), account_type VARCHAR(20), balance INT ); A foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table.

13 SQL: Enforcing Relationships INSERT INTO accounts (account_id, customer_id, account_type, balance) VALUES (1, 555, "checking", 500) The DB can now throw an error if you try to insert a row into the table with a customer_id that isn’t actually in the table: Error: Cannot add or update a child row: a foreign key constraint fails

14 SQL: Join: Combining Tables SELECT customers.name FROM customers JOIN accounts ON customers.customer_id = accounts.customer_id WHERE accounts.balance > 1000 name Karen Johnson Brian Kim

15 NoSQL NoSQL = “Not Only SQL” Developed by Internet companies focusing on dealing with demands in performance, availability and data volume. Great for large scale problems. Early versions: Google’s BigTable, Amazon’s Dynamo. “open source, distributed, non relational databases” Key-value stores, document stores, column-oriented databases and graph databases. We’ll focus on the first two because these are potential options for using with Heroku. There are a lot of them: http://nosql-database.org/http://nosql-database.org/

16 Key-Value Stores Examples: Redis, DynamoDB, Riak, Voldemort (LinkedIn). > put "the-key" "the-value" > get "the-key" version(0:1): "the-value" - They are optimized for a single use case: extremely fast lookup by a known identifier - Effectively they are a hash table distributed across many servers. - They do not use schemas, so you can store any kind of value. - The downside of this is that they are ‘opaque blobs’ and cannot support any querying mechanism other than lookups -Really useful for: web sessions, records with ids: user:$id:name = bob

17 Document Stores Like key-value stores except: values are typed => more complex queries key-values stored as JSON documents documents belong to collections Examples: MongoDB, CouchDB, Couchbase

18 - A collection is a group of MongoDB documents. Similar to an RDBMS table, documents in a collection are typically related or have a similar purpose. - Collections do NOT enforce a schema. - e.g. documents within a collection can have different fields >db.createCollection(“people”) { “ok” : 1 }

19 MongoDB: Inserting Data You don’t even need to do the createCollection, save is all that is required. db.people.save( {_id: "the-key", name: "Shawn", age: 24, locationId: 123})

20 MongoDB: Inserting Data No predefined schema for documents. Example (MongoDB): db.people.save( {_id: "the-key", name: "Shawn", age: 24, locationId: 123}) collection

21 MongoDB: Inserting Data No predefined schema for documents. Example (MongoDB): db.people.save( {_id: "the-key", name: "Shawn", age: 24, locationId: 123}) every document is identified by a key, you can choose this

22 MongoDB: Inserting Data IDs are auto generated if not explicitly specified. db.people.save( {name: "Shawn", age: 24, locationId: 123}) _id = ObjectId("545bdc1e")

23 MongoDB: Retrieving Data db.people.find() {"_id": "the-key", "age": 24, "name": "Shawn", "locationId": 123} {"_id": ObjectId("545bdc1e"), "age": 35, "name": "Bob", "locationId": 456} retrieve everything in the collection

24 MongoDB: Retrieving Data db.people.find({"name":"Shawn") {"_id": "the-key", "age": 24, "name": "Shawn", "locationId": 123} unlike key-value you can perform lookups on any field

25 Tradeoffs: Reading Data Database typeAccess typeJOIN RelationalVery flexible query modelYes Key-StorePrimary key lookupNo DocumentPrimary, secondary lookupNo Relational DBs are great for general purpose data storage. Very flexible query models etc. Other NoSQL databases are great for special purpose data storage. Can be optimized to fulfill specific usages.

26 Tradeoffs: Writing Data account_idbalance 1500 28,500 32,500 Consider a database storing balances for bank accounts

27 Tradeoffs: Writing Data Updating a field in a table, single value in a key-value store, single document in a document store easy. NoSQL databases are often optimised to do this. UPDATE accounts SET balance = balance - 50 WHERE account_id = 1 put "1" (get "1” + 100) db.accounts.update({_id: 1}, {$inc: {balance: -100}})

28 Tradeoffs: Writing Data Consider transferring $100 between two accounts. NoSQL DBs are not necessarily atomic, meaning there is no guarantee the system won’t crash and only do the first command > db.accounts.update({_id: 1}, {$inc: {balance: -100}}) > db.accounts.update({_id: 2}, {$inc: {balance: 100}})

29 Tradeoffs: Writing Data Relational databases support transactions. Atomic -- either both statements succeed or neither. START TRANSACTION; UPDATE accounts SET balance = balance - 100 WHERE account_id = 1; UPDATE accounts SET balance = balance + 100 WHERE account_id = 2; COMMIT;

30 Other tradeoffs Maturity: SQL databases much older. Some companies moved from noSQL to SQL (Pinterest). This will change. Scalability, replication, availability and consistency: can make tradeoffs with noSQL … more on this later.

31 General advice While developing start with relational because more flexible although potentially less able to deal with large amounts of data. Move to appropriate noSQL once you understand the problem space.

32 Heroku and Datastores During the project I suggest you use postgresql. (I think this is enforced now?) You will use this again in SWEN304 (Roma’s course?).


Download ppt "Data Tier Options NWEN304 Advanced Network Applications."

Similar presentations


Ads by Google