Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 1 From SQL to NoSQL Xiao Yu Mar 2012.

Similar presentations


Presentation on theme: "Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 1 From SQL to NoSQL Xiao Yu Mar 2012."— Presentation transcript:

1 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 1 From SQL to NoSQL Xiao Yu Mar 2012

2 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 2 RDBMS The predominant choice in storing data ›Not so true for data mining PhDs since we put everything in txt files. First formulated in 1969 by Codd ›We are using RDBMS everywhere

3 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 3 Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

4 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 4 When RDBMS met Web 2.0 Slide from Lorenzo Alberton, "NoSQL Databases: Why, what and when"

5 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 5 What’s Wrong with Relational DB? Nothing is wrong. You just need to use the right tool. Relational is hard to scale. ›Easy to scale reads ›Hard to scale writes

6 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 6 The Death of RDBMS?

7 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 7 What’s NoSQL? The misleading term “NoSQL” is short for “Not Only SQL”. non-relational, schema-free, non-(quite)- acid horizontally scalable, distributed, easy replication support simple API

8 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 8 Four (emerging) NoSQL Categories Key-value stores ›Based on DHTs/ Amazon’s Dynamo paper * ›Data model: (global) collection of K-V pairs ›Example: Voldemort Column Families ›BigTable clones ** ›Data model: big table, column families ›Example: HBase, Cassandra, Hypertable *G DeCandia et al, Dynamo: Amazon's Highly Available Key-value Store, SOSP 07 ** F Chang et al, Bigtable: A Distributed Storage System for Structured Data, OSDI 06

9 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 9 Four (emerging) NoSQL Categories Document databases ›Inspired by Lotus Notes ›Data model: collections of K-V Collections ›Example: CouchDB, MongoDB Graph databases ›Inspired by Euler & graph theory ›Data model: nodes, rels, K-V on both ›Example: AllegroGraph, VertexDB, Neo4j

10 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 10 Focus of Different Data Models Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

11 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 11 CAP theorem Consistency Availability Partition Tolerance RDBMS NoSQL (most)

12 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 12 When to use NoSQL? Bigness Massive write performance ›Twitter generates 7TB / per day (2010) Fast key-value access Flexible schema or data types Schema migration Write availability ›Writes need to succeed no matter what (CAP, partitioning) Easier maintainability, administration and operations No single point of failure Generally available parallel computing Programmer ease of use Use the right data model for the right problem Avoid hitting the wall Distributed systems support Tunable CAP tradeoffs from http://highscalability.com/

13 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 13 Key-Value Stores idhair_colorageheight 1923Red186’0” 3371Blue34NA ………… Table in relational db Store/Domain in Key-Value db Find users whose age is above 18? Find all attributes of user 1923? Find users whose hair color is Red and age is 19? (Join operation) Calculate average age of all grad students?

14 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 14 Example of Voldemort

15 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 15 Voldemort in LinkedIn Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

16 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 16 RO Store Usage Pattern Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

17 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 17 Voldemort vs MySQL Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

18 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 18 Column Families – BigTable Alike F Chang, et al, Bigtable: A Distributed Storage System for Structured Data, osdi 06

19 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 19 BigTable Data Model The row name is a reversed URL. The contents column family contains the page contents, and the anchor column family contains the text of any anchors that reference the page.

20 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 20 More on Row and Column Rows stored in lexicographic order by row key Table dynamically split into “Tablets” Each tablet contains key [startKey, endKey) Tablets are distributed on different nodes All date in the same CF are usually same type Data in same CF are compressed and stored together CF in a specific row is sorted

21 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 21 BigTable API Examples adds one anchor to www.cnn.com and deletes a different anchor uses a Scanner abstraction to iterate over all anchors in a particular row

22 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 22 BigTable Performance

23 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 23 Document Database - mongoDB Table in relational db Documents in a collection Initial release 2009 Open source, document db Json-like document with dynamic schema

24 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 24 mongoDB Product Deployment And much more…

25 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 25 mongoDB Features Document-oriented storage Full Index Support Replication & High Availability Auto-Sharding Querying Fast In-Place Updates ? Map/Reduce GridFS Commercial Support From http://www.mongodb.org/http://www.mongodb.org/

26 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 26 sum(checkout) From Gabriele Lana, CouchDB Vs MongoDB

27 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 27 And mongoDB is fast avgmeddevtotal mongoDB0.000250.000210.0001924.99955 mySQL0.001990.000320.00518199.32546 100000 Indexed Queries avgmeddevtotal mongoDB0.056620.037040.192675.66164 mySQL1.999751.684681.99266199.97469 100 Non-Indexed Queries http://www.idiotsabound.com/did-i-mention-mongodb-is-fast-way-to-go-mongo

28 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 28 Graph Database Data Model Abstraction: Nodes Relations Properties

29 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 29 Neo4j - Build a Graph Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

30 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 30 Neo4j – Traverse a Graph Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

31 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 31 A Debatable Performance Evaluation Comparing Apple to Orange

32 Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 32 Conclusion Use the right data model for the right problem


Download ppt "Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 1 From SQL to NoSQL Xiao Yu Mar 2012."

Similar presentations


Ads by Google