RetroDB ( We have seen it all) Donald Kossmann Systems Group, ETH Zurich.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Big Data: Analytics Platforms Donald Kossmann Systems Group, ETH Zurich 1.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Running Hadoop-as-a-Service in the Cloud
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
What is new in the cloud? Donald Kossmann ETH Zurich
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
1: IntroductionData Management & Engineering1 Course Overview: CS 395T Semantic Web, Ontologies and Cloud Databases Daniel P. Miranker Objectives: Get.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Chapter 1 Overview of Databases and Transaction Processing.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
RAMCloud Overview John Ousterhout Stanford University.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Simple Database.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
The Lightning Way XIV Encontro da comunidade SQLPort LX
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
DATA STRUCTURE & ALGORITHMS (BCS 1223) NURUL HASLINDA NGAH SEMESTER /2014.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Coding and Algorithms for Memories Lecture 14 1.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
Welcome to the Intermountain Big Data Conference! 2 Data Science and Machine Learning Tools from Python to R, with Hands-On R/Shiny U Student – Math major.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
CIDR 2007, Asilomar California1 Predicate-Based Indexing of Enterprise Web Applications Cristian Duda, David Graf, Donald Kossmann ETH Zurich.
CLOUD BASED STORAGE Amy. Cloud Based Storage Cloud based storage is “the storage of data online in the cloud”
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
CS 540 Database Management Systems
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Chapter 1 Overview of Databases and Transaction Processing.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
An Introduction To Big Data For The SQL Server DBA.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
- History and Motivations
Neo4j: GRAPH DATABASE 27 March, 2017
Cloud Computing and Architecuture
CS422 Principles of Database Systems Course Overview
Modern Databases NoSQL and NewSQL
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Massively Parallel Cloud Data Storage Systems
NoSQL Databases An Overview
Physical Data Modeling – Implementation
Cloud? Computing? noSQL vs SQL RAID 0,1,5,6,10
Presentation transcript:

RetroDB ( We have seen it all) Donald Kossmann Systems Group, ETH Zurich

We got it all right… why is nobody listening? 

Why is nobody listening? Web (e.g. Amazon, Facebook, Google) – reinventing the wheel is cooler than listening – do not worry about them Enterprise (e.g., Amadeus, Credit Suisse, …) – they do listen – but, new problem: No more silos! (aka Big Data) – RDBMS not a good match for that new problem – we need to repackage! (I do not know about Scientific applications)

Repackaging DB Technology Blob store as a service (HDFS++)

Repackaging DB Technology Blob store as a service (HDFS++) OLTP

Repackaging DB Technology Blob store as a service (HDFS++) OLTP OLAPOLAP OLAPOLAP Streaming

Repackaging DB Technology HDFS OLTP OLAPOLAP OLAPOLAP Streaming Graph Search … … ML

Repackaging DB Technology Data in Blob Store, Processing in Compute Nodes Great advantages – scales storage and processing individually – no need to worry about “multi-tenancy” & silos – fault-tolerance for free – commodity building blocks (KVS, 2PC, SI, SQL, …) – it is cool because Google does it Great disadvantages – poor data locality (data shipping) – poor semantics (sharing increases noise)

What we need to do! Optimize Shared Memory DBMS – split work between tiers: e.g., push down scans – shared scans in storage tier – new ways to implement ACID in client/server system – (many more optimizations) Get semantics right – it is one big soup of data – but everybody wants to look at it in different ways And build a really good HDFS++ – across the storage hierarchy (DRAM, SSD, NVRAM, disk)

What we need NOT do! 300 gazillion TPS in a single box – great, but who needs that? – what to do with the data once it is in there? Think about caching – if you have locality, make it explicit Worry about eventual consistency, NoSQL, … or dismiss anything else we have done!