The NoSQL movement or the dawn of the post-relational age.

Slides:



Advertisements
Similar presentations
Leveraging Commercial Graph DB Technologies in Open Source and Polyglot Application Environments Brian Clark, VP Product Management Objectivity, Inc.
Advertisements

No SQL is not about SQL No SQL is a Zoo.. Key-Value Stores Wide Column Stores Document Stores Graph Databases.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Introduction to Backend James Kahng. Install Node.js.
Big Data, Data Warehouses, and Business Intelligence Systems
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
HADOOP ADMIN: Session -2
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
SQL vs NOSQL Discussion
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
NOSQL Implementation and examples Maciej Matuszewski.
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
+ Big Data. + Chapter Objectives Learn the basic concepts of Big Data, structured storage, and the MapReduce process Learn the basic concepts of data.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
BACS 287 Big Data & NoSQL 2016 by Jones & Bartlett Learning LLC.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
and Big Data Storage Systems
HADOOP ADMIN: Session -2
An Open Source Project Commonly Used for Processing Big Data Sets
How did it start? • At Google • • • • Lots of semi structured data
Triple Stores.
CS122B: Projects in Databases and Web Applications Winter 2017
NOSQL.
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Triple Stores.
NoSQL Databases Antonino Virgillito.
Overview of big data tools
Charles Tappert Seidenberg School of CSIS, Pace University
Introduction to NoSQL Database Systems
CS639: Data Management for Data Science
Triple Stores.
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

The NoSQL movement or the dawn of the post-relational age

What is the buzz? Job Trends Search Trends Twitter search

Something for your CV

NoSQL Not only SQL or No Sql - No SQL support Support for the full SQL language imposes constraints on datastores. So does ACID compliance. So does the need for a fixed database schema. Many applications need more specialised datastores. A movement for choice in database architecture CouchBase survey Mike Loukides at O'ReillyMike Loukides at O'Reilly an excellent overview Polyglot Persistance by Martin Fowler Wikipedia Comparision nosql-databases.org - a rather terrifying set of resources. Tim Anglade's compilation of Interviews

NoSQL is not new Despite the wide-spread adoption of the relational data model for business application, there have always been a wide variety of specialised databases: Geographic Information Systems - complex spatial relationships - ArcGIS e.g. BCC KnowYourPlaceArcGIS OLAP - OnLine Analytic Processing - for analysis of transaction data Free Text databases eg. LexisNexis for legal documents Multi-dimensional sparse arrays - Pick and MUMPS Object-oriented databases - eg ZOPE for the Plone CMS These databases were directed at the need for complex and flexible data structures.

Forces for change Volume of data - Facebook has over 30 Petabytes - 30,000 terabytes or 30 million Gigabytes Volume of transactions - order of 1 million writes/sec Changeability/flexibility of schema - constant beta Complexity of data - UK Legislation

Use case: Terabytes of data need to be stored reliably with no schema requirements Reliability is a big problem when volumes are large. In a farm of say, 1000 servers, each with 8 spindles, there is a high probability that one disk will be down at any time. Random access update is too slow - append new data and merge in batch BigTable from Google HBase from Apache Dynamo from Amazon Doug Cutting on Apache's Hadoop

Use case: Batch data analysis Where very large transaction datasets need to be filtered and summarised, for example to analysis log files by IP location. In the past these could have been overnight jobs,now they need to be done in at most minutes. Map-Reduce is an architecture for large-scale distributed computation. MapReduce should be called MapMergeReduce. Each MapReduce task is written in Java (or a high-level language like Pig). The operating system (like Hadoop) coordinates the distribution of the map, merge and reduce jobs and the dataflows. input is a database of key-value pairs which are split ('sharded') over many spindles on many servers. the user's map operation runs on every server hosting the shards and transforms each key/value input into 0,one or more key/value outputs. Merge (shuffle) merges all pairs for the same key and distributes them (e.g. by hashing the keys) to multiple Reduce servers. This to can be user configurable. the user's reduce takes each group of values for the same key and produces zero, one or more key/values for each group. Successive MapMergeReduce operations can be chained together in a pipeline.

Use case: Document storage and retrieval Document store Complex hierarchical documents present problems for storing in a relational database. Every repeated part of the document would stored in its own table -Shredding; each repeated part would need to be link to is parent with a key; to reconstruct the document would require multiple joins from data distributed all over the file system. Platforms: eXist eXist open source XML store - query with XQueryeXist MarkLogic MarkLogic commercial XML storeMarkLogic CouchDb JSON store - query with JavaScriptCouchDb MongoDb JSON store Telemetric data precessingMongoDbTelemetric data precessing

Use case: Fast put/get of keyed data Key-value store Where complex data is to be stored but the database is not interested in the internal structure. For example storing session data, user profiles, shopping carts The only operations are value = store.get(key) store.put(key, value) store.delete(key) Platforms: Project Voldemort Rhino

Use case: Page Caching Key-value cache Where the generation of a page takes a significant time, it is better to cache the pages as key/value pairs where the key is a URI and the value is the HTML page. As much of the cache as poosible is kept in RAM for rapid access Issues: cache flushing For example this site views summarized data from an eXist document store: AidViewAidView Platforms: Memecached

Use case: Linked data Graph Database Where data is composed of simple, highly interrelated facts. For example, there is an RDF version of Wikipedia called dbpedia. Some use available databases such as MySQL, but the specific form of the data and the queries on the data suggest native Triple (usually quad) stores to support RDF - Jena, Sesame Virtuoso- query with SPARQL. RDF has a rigid data model : [graph] subject- predicate- object and is widely used for linked dataJenaSesame Virtuoso Custom Graph stores - Neo4J non standard interfacesNeo4J

XML/XQuery for graphs tutorial for using Neo4j to compute relationships in a graph Friends relationship Some friends as XML a bit of XQuery The knows relationship expanded Permissions People Roles a bit of XQuery People and permissions Shortest Path is difficult - Dijkstra's algorithm is tricky to implement in functional languagesDijkstra's algorithm

Dan McCreary's Overview The CIO's Guide to NoSQL

Risks Lack of standardisation New technology Design cul-de-sac - requirements change Lack of available developer skills. R DMBS like Oracle and SQL Server are changing too - but just get more complex. A dissenting view - warning - NSFW