CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
Project presentation by Mário Almeida Implementation of Distributed Systems KTH 1.
The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.
What Should the Design of Cloud- Based (Transactional) Database Systems Look Like? Daniel Abadi Yale University March 17 th, 2011.
Presentation by Krishna
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#28: Modern Database Systems.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#25: OldSQL vs. NoSQL vs. NewSQL.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Distributed Database Systems (R&G.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Databases with Scalable capabilities Presented by Mike Trischetta.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Goodbye rows and tables, hello documents and collections.
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.
HadoopDB Presenters: Serva rashidyan Somaie shahrokhi Aida parbale Spring 2012 azad university of sanandaj 1.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
IMDGs An essential part of your architecture. About me
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Distributed DB.
Databases Illuminated
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#18: Physical Database Design.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Mapping the Data Warehouse to a Multiprocessor Architecture
NOSQL DATABASE Not Only SQL DATABASE
MySQL Overview Jed Reynolds Write Your Questions on the Board! Landscape, Engines, HA, Performance Questions.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#24: Distributed Database Systems (R&G.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
An Introduction to Super-Scalability But first…
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Intro to NoSQL Databases Tony Hannan November 2011.
CSCI5570 Large Scale Data Processing Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
Cloud Computing and Architecuture
CS122B: Projects in Databases and Web Applications Winter 2017
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#1: Introduction
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
Mapping the Data Warehouse to a Multiprocessor Architecture
Massively Parallel Cloud Data Storage Systems
April 30th – Scheduling / parallel
NoSQL Databases An Overview
CS 440 Database Management Systems
Taming the Big Data Fire Hose
April 13th – Semi-structured data
Transaction Properties: ACID vs. BASE
Database System Architectures
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System

CMU SCS hag·i·og·r a·phy (noun )

CMU SCS ChristosTheGreekGodofDatabases.com Pinterest meets Causal Encounters meets Kickstarter meets Twitter –With Christos!

CMU SCS More reads than writes. All media stored outside of DBMS. How do we choose the right database architecture? ChristosTheGreekGodofDatabases.com Faloutsos/PavloCMU SCS /6154

CMU SCS Outline Single-Node Databases NoSQL Systems NewSQL Systems Faloutsos/PavloCMU SCS /6155

CMU SCS Late-1990s / Early-2000s All the big players were heavyweight and expensive. –Oracle, DB2, Sybase, SQL Server, Informix. Open-source databases were missing important features. –Postgres, mSQL, MySQL. Faloutsos/PavloCMU SCS /6156

CMU SCS Mid-2000s MySQL + InnoDB is widely adopted by new web companies: –Supported transactions, replication, recovery. –Memcache for caching queries. Faloutsos/PavloCMU SCS /6157

CMU SCS Let’s go with MySQL. We’re getting a lot of traffic. Our database server is saturated! How do we increase the capacity of our database server? ChristosTheGreekGodofDatabases.com Faloutsos/PavloCMU SCS /6158

CMU SCS Buy a faster machine. Idea #1:

CMU SCS Scaling Up Application ServerDatabase Server (+) Requires no change to application. (+) Improvements are immediate. (-) Expensive! Diminishing Returns. (-) Single Point of Failure.  More disks.  More RAM.  Faster CPUs.  Use SSDs. Faloutsos/PavloCMU SCS /61510

CMU SCS Replicate database on multiple servers. Idea #2:

CMU SCS Replication Application ServerDatabase Server (+) Requires no change to application. (+) Parallelize read operations. (+) Improved fault tolerance. (-) Expensive! Diminishing Returns. (-) Writes limited to slowest node. Replicas Faloutsos/PavloCMU SCS /61512 Read Request

CMU SCS Cache query results. Idea #3:

CMU SCS Query Cache Application ServerDatabase Server (+) Reduce load on DBMS. (+) Fast API. (-) Extra roundtrip per query. (-) Requires application changes. (-) Doesn’t help write-heavy apps. Replicas Faloutsos/PavloCMU SCS /61514 Check Cache Query Request Update Cache memcache

CMU SCS Push SQL into stored procedures. Idea #4:

CMU SCS Stored Procedures Application Code Database Server (+) Reduces network roundtrips. (+) Less lock contention. (+) Modularization. (-) Application logic in two places. (-) PL/SQL is not standardized. Replicas def getPage(request): # Process request EXEC SQL # Process results if x == True: EXEC SQL else: EXEC SQL # Render HTML page return (html) def getPage(request): # Process request EXEC SQL # Process results if x == True: EXEC SQL else: EXEC SQL # Render HTML page return (html) BEGIN: EXEC SQL if x == True: EXEC SQL else: EXEC SQL return (results) END; BEGIN: EXEC SQL if x == True: EXEC SQL else: EXEC SQL return (results) END; def getPage(request): # Process request EXEC PROCEDURE # Render HTML page return (html) def getPage(request): # Process request EXEC PROCEDURE # Render HTML page return (html) Stored Procedure Faloutsos/PavloCMU SCS /61516

CMU SCS Shard database across multiple servers. Idea #5:

CMU SCS Sharding / Partitioning Application ServerDatabase Cluster (+) Parallelize all operations. (+) Much easier to add more hardware. (-) Most DBMSs don’t support this. (-) Joins are expensive. (-) Non-trivial to split database. Logical Partitions Faloutsos/PavloCMU SCS /61518

CMU SCS We want to scale out but writing a sharding layer is hard. Some parts of our application don’t need a full-featured DBMS. ChristosTheGreekGodofDatabases.com Faloutsos/PavloCMU SCS /615

CMU SCS Give up ACID guarantees for scalability. Idea #6:

CMU SCS Application Servers DBMS Servers Eventual Consistency Faloutsos/PavloCMU SCS /61521 Master Replicas Update Profile Get Profile ? ?

CMU SCS Late-2000s (NoSQL) NoSQL systems are able to scale horizontally right out of the box by giving traditional database features. Faloutsos/PavloCMU SCS /61522

CMU SCS We need to process payments. We don’t want to lose orders. We need joins and ACID transactions. ChristosTheGreekGodofDatabases.com

CMU SCS Strong Consistency Send Money Nice Christos Pictures! Thanks! -$100 +$100 Faloutsos/PavloCMU SCS /61524 Use Two- Phase Commit

CMU SCS Keep guarantees, optimize for workload type. Idea #7:

CMU SCS Early-2010s (NewSQL) New DBMSs that can scale across multiple machines natively and provide ACID guarantees.

CMU SCS Conclusion RDBMS (Single-Node): –MySQL, Postgres NoSQL (Multi-Node): –Key-Value, Documents, Graphs NewSQL (Multi-Node): –Transaction Processing, MySQL Sharding Faloutsos/PavloCMU SCS /61527

CMU SCS What DBMS should my start-up use?

CMU SCS Beyond the /615 Christos is teaching this fall: –Multimedia Databases and Data Mining Send me an if you’re interested in working on a database research project.