Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
C6 Databases.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
NoSQL Databases: MongoDB vs Cassandra
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS346: Advanced Databases
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
A Study in NoSQL & Distributed Database Systems John Hawkins.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
NOSQL Implementation and examples Maciej Matuszewski.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Data Tier Options NWEN304 Advanced Network Applications.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
CPSC-310 Database Systems
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
DBMS & TPS Barbara Russell MBA 624.
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
NoSQL Database and Application
Operational & Analytical Database
Modern Databases NoSQL and NewSQL
NOSQL.
NOSQL databases and Big Data Storage Systems
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
NoSQL Databases Antonino Virgillito.
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
April 13th – Semi-structured data
Transaction Properties: ACID vs. BASE
NoSQL Overview + Elasticsearch Quick Dive
CMPE 280 Web UI Design and Development March 14 Class Meeting
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier

The “One Size Fits All” Database Relational model dominant for decades Tons of databases, all slight variations of each other – PostgreSQL – MySQL – Oracle – SQL Server – DB2

Possible Issues SQL is full-featured – is that always necessary? Do traditional DBMSs scale? – horizontal vs. vertical scaling – parallel DBMSs ACID guarantees can be expensive – are they always necessary

Scalability What is Horizontal vs. Vertical Scalability?

Scalability What would vertical scaling mean? Advantages? Disadvantages?

Scalability What would horizontal scaling mean? Advantages? Disadvantages?

ACID Guarantees? A – Atomicity C – Consistency I – Isolation D – Durability Guarantees dB operations are processed reliably

Atomicity A transaction must be “all or nothing”. – If part of the transaction fails, the entire transaction fails and no state is changed.

Consistency Any transaction must bring the dB from one state to another, where both states are valid. – Programming errors cannot result in the violation of any defined rules (constraints, cascades, triggers, etc)

Isolation Concurrent executions of transactions result in a system state that would be obtained if transactions were instead executed serially. – How and when transactions made by one operation become visible to other operations.

Durability Once a transaction has been committed, the state will account for that transaction, even in the event of power loss, crashes, or errors. – Use of non-volatile memory is critical.

Why might ACID be important? Transactions in a dB – Account A wants to transfer money to Account B. What is involved?

Why might ACID be unimportant

NoSQL Design points – high availability – horizontal scaling no SQL – usually just key-value stores (not always) great for web applications Consistency – many (not all) use eventual consistency model Classes – Key-Value, Document, Column, Graph

Driving Force Behind NoSQL The needs of the modern tech world.

NoSQL Advantages Scales very well horizontally, easy to deploy on clusters of machines. – Traditional problem for SQL. – Better control over availability (or partial availability). Data structures can be more flexible than SQL tables. Popular for real-time applications and big data.

NoSQL Example: Key-Value Key-Value Stores – Dynamo – Voldemort – RAMCloud – Riak – Redis – Oracle NoSQL Database (OnDB) Key-Value Cache – Memcached fast, but not persistent

Key-Value dBs An old idea… – When was it first proposed? 1837 by Charles Babbage

Key-Value Stores Associative Arrays, aka a hash, or dictionary. Store objects, or records which may have multiple fields using a unique key. In contrast to dBs which have a well defined schema, key-value stores are opaque, each record may have different fields.

BASE vs. ACID Often use a different model than ACID: B – Basically A – Available S – Soft state E – Eventual consistency

Eventual Consistency

EC vs SEC Eventual Consistency – Liveness guarantee Updates will be observed eventually Strong Eventual Consistency – Safety guarantee Any two nodes that have received the same (unordered) set of updates will be in the same state.

Conflict Resolution Eventual consistency necessitates conflicts!

Conflict Resolution

Need to ensure replica convergence – a system must be able to reconcile differences between multiple copies of a distributed dB. – Exchange versions or updates between servers (anti-entropy) – Choose an appropriate final state when concurrent updates have occurred (reconciliation) Most common approach? Last writer wins

NoSQL Example: Document Stores Documents contain semi-structured data e.g. Table Students – each student “document” would contain all data for that student can vary the fields stored in each document Examples – MongoDB, Couchbase

Document Stores Central concept is a “document” Documents – Encapsulate and encode data in some standard format XML, YAML, JSON, BSON, etc – Addressed via a unique key – Distinguished from Key-value through the existence of an API or query language that can access document contents.

Document Stores How to organize documents? – Collections – Tags – Non-visible metadata – Directory hierarchies

Document Stores vs SQL SQL – strongly typed Document Stores – not strongly typed Document stores are generally more flexible, easily maps into program objects and deals with optional values without storage penalty.

Structure in a Document Store Documents are, to some degree, self describing. Bob Q. Public CEAS EECS PhD CSE

Structure in a Document Store Bob Q. Public CEAS EECS PhD CSE

Structure in a Document Store Bob Q. Public CEAS EECS PhD CSE

NoSQL Example: Column Stores Data is organized by columns, rather than rows Great for storing sparse datasets Example – HBase modeled after Google BigTable runs on HDFS (modeled after GFS) can run Hadoop jobs that input/output HBase tables

Column Stores Work very well for data warehousing, CRM systems, medical/clinical data, and other ad- hoc inquiry system Optimized for computing aggregates over large sets of similar items.

Column Stores Easy to add and modify records Requires access to unneccesary data Minimizes access to irrelevant data Record writes require multiple accesses.

Column Stores Fundamental difference is in the layout of the storage In performance, seek time dominates CPU time.

NoSQL Example: Graph Databases graph structured data can be very complex – not a good fit for relational model queries run on graph data are also unique Example – Neo4J most popular by far written in Java with Java API fully transactional and consistent

Graph Databases

Nodes, properties, edges – Nodes – entities to keep track of – Edges – relationships between nodes – Properties – information that relate to nodes or edges Powerful for graph-like queries and associative data sets.

NoSQL Today many systems are adding back SQL-like functionality – why? key-value queries are limited often referred to now as “Not Only SQL” tons of other examples, a lot of them have a free version

NewSQL NoSQL focused on scalability and availability Question: Can we do that and still maintain ACID? – financial transactions Goal is to scale out Maintain SQL, but focus on on-line transaction processing (OLTP) workloads – short-lived transactions that access small subsets of data – in contrast to OLAP (i.e. analytical workloads)

Shared-Nothing Architectures Nodes in a cluster don’t share resources In terms of databases, means data is horizontally partitioned, or sharded, across nodes in the cluster How should we shard the data? – …depends on the workload, among other things Do shared-nothing architectures always increase performance?

Shared-Nothing Diagram

Conclusion NoSQL – move away from ACID properties – come in several different forms NewSQL – designed specifically for OLTP workloads – maintain ACID properties – scale-out using sharding/partitioning