Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, 2012 1 From SQL to NoSQL Xiao Yu Mar 2012.

Slides:



Advertisements
Similar presentations
2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
In 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr.
Reporter: Haiping Wang WAMDM Cloud Group
Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Neo4j Adam Foust.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Group 11 Sameera Shah & Fatemah Husain [10/31/13].
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
NoSQL by Michael Britton, Mark McGregor, and Sam Howard
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
© Copyright 2013 STI INNSBRUCK
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Analytics: SQL or NoSQL? Richard Taylor Chair Business Intelligence SIG.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
NoSQL databases A brief introduction NoSQL databases1.
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Content Analytics - Gaining Insight from Your Content with NOSQL.
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
NoSql An alternative option in the DevEvenings ORM Smackdown Tarn Barford
Why NO-SQL ?  Three interrelated megatrends  Big Data  Big Users  Cloud Computing are driving the adoption of NoSQL technology.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NoSQL Database and Application
Modern Databases NoSQL and NewSQL
NOSQL.
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Storage Systems for Managing Voluminous Data
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
NoSQL Databases Antonino Virgillito.
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
Presentation transcript:

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, RDBMS The predominant choice in storing data ›Not so true for data mining PhDs since we put everything in txt files. First formulated in 1969 by Codd ›We are using RDBMS everywhere

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, When RDBMS met Web 2.0 Slide from Lorenzo Alberton, "NoSQL Databases: Why, what and when"

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, What’s Wrong with Relational DB? Nothing is wrong. You just need to use the right tool. Relational is hard to scale. ›Easy to scale reads ›Hard to scale writes

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, The Death of RDBMS?

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, What’s NoSQL? The misleading term “NoSQL” is short for “Not Only SQL”. non-relational, schema-free, non-(quite)- acid horizontally scalable, distributed, easy replication support simple API

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Four (emerging) NoSQL Categories Key-value stores ›Based on DHTs/ Amazon’s Dynamo paper * ›Data model: (global) collection of K-V pairs ›Example: Voldemort Column Families ›BigTable clones ** ›Data model: big table, column families ›Example: HBase, Cassandra, Hypertable *G DeCandia et al, Dynamo: Amazon's Highly Available Key-value Store, SOSP 07 ** F Chang et al, Bigtable: A Distributed Storage System for Structured Data, OSDI 06

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Four (emerging) NoSQL Categories Document databases ›Inspired by Lotus Notes ›Data model: collections of K-V Collections ›Example: CouchDB, MongoDB Graph databases ›Inspired by Euler & graph theory ›Data model: nodes, rels, K-V on both ›Example: AllegroGraph, VertexDB, Neo4j

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Focus of Different Data Models Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, CAP theorem Consistency Availability Partition Tolerance RDBMS NoSQL (most)

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, When to use NoSQL? Bigness Massive write performance ›Twitter generates 7TB / per day (2010) Fast key-value access Flexible schema or data types Schema migration Write availability ›Writes need to succeed no matter what (CAP, partitioning) Easier maintainability, administration and operations No single point of failure Generally available parallel computing Programmer ease of use Use the right data model for the right problem Avoid hitting the wall Distributed systems support Tunable CAP tradeoffs from

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Key-Value Stores idhair_colorageheight 1923Red186’0” 3371Blue34NA ………… Table in relational db Store/Domain in Key-Value db Find users whose age is above 18? Find all attributes of user 1923? Find users whose hair color is Red and age is 19? (Join operation) Calculate average age of all grad students?

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Example of Voldemort

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Voldemort in LinkedIn Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, RO Store Usage Pattern Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Voldemort vs MySQL Sid Anand, LinkedIn Data Infrastructure (QCon London 2012)

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Column Families – BigTable Alike F Chang, et al, Bigtable: A Distributed Storage System for Structured Data, osdi 06

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, BigTable Data Model The row name is a reversed URL. The contents column family contains the page contents, and the anchor column family contains the text of any anchors that reference the page.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, More on Row and Column Rows stored in lexicographic order by row key Table dynamically split into “Tablets” Each tablet contains key [startKey, endKey) Tablets are distributed on different nodes All date in the same CF are usually same type Data in same CF are compressed and stored together CF in a specific row is sorted

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, BigTable API Examples adds one anchor to and deletes a different anchor uses a Scanner abstraction to iterate over all anchors in a particular row

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, BigTable Performance

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Document Database - mongoDB Table in relational db Documents in a collection Initial release 2009 Open source, document db Json-like document with dynamic schema

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, mongoDB Product Deployment And much more…

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, mongoDB Features Document-oriented storage Full Index Support Replication & High Availability Auto-Sharding Querying Fast In-Place Updates ? Map/Reduce GridFS Commercial Support From

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, sum(checkout) From Gabriele Lana, CouchDB Vs MongoDB

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, And mongoDB is fast avgmeddevtotal mongoDB mySQL Indexed Queries avgmeddevtotal mongoDB mySQL Non-Indexed Queries

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Graph Database Data Model Abstraction: Nodes Relations Properties

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Neo4j - Build a Graph Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Neo4j – Traverse a Graph Slide from neo technology, “A NoSQL Overview and the Benefits of Graph Databases"

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, A Debatable Performance Evaluation Comparing Apple to Orange

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, Conclusion Use the right data model for the right problem