The Three E’s of Big Data and What DB People can do About Them UC BERKELEY Michael Franklin – UC Berkeley Beckman Database Get Together October 14, 2013.

Slides:



Advertisements
Similar presentations
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Advertisements

Turning Data into Value Ion Stoica CEO, Databricks (also, UC Berkeley and Conviva) UC BERKELEY.
Infrastructure as a Service (IaaS) Amazon EC2
Berkeley Data Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY.
Approximate Queries on Very Large Data UC Berkeley Sameer Agarwal Joint work with Ariel Kleiner, Henry Milner, Barzan Mozafari, Ameet Talwalkar, Michael.
Cloud Computing Cory Willing Computer Science University of Wisconsin-Platteville November 24, 2009.
A Berkeley View of Big Data Ion Stoica UC Berkeley BEARS February 17, 2011.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Crowds, Clouds, and Algorithms: Exploring the Human Side of “Big Data” Panelists: Sihem Amer-Yahia (Yahoo! Research) AnHai Doan (Wisconsin) Jon Kleinberg.
Observations from a 2-year Odyssey Michael Franklin UC Berkeley and Truviso.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.
The Power of Choice in Data-Aware Cluster Scheduling
AMPCamp Introduction to Berkeley Data Analytics Systems (BDAS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Approximate Queries on Very Large Data UC Berkeley Sameer Agarwal Joint work with Ariel Kleiner, Henry Milner, Barzan Mozafari, Ameet Talwalkar, Michael.
Jiazhang Liu;Yiren Ding Team 8 [10/22/13]. Traditional Database Servers Database Admin DBMS 1.
Clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data Vaibhav Nivargi.
Oracle Application Express (Oracle APEX)
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Tyson Condie.
Devices change the picture billion.
Introduction To Windows Azure Cloud
Computation and Minimax Risk The most challenging topic… Some recent progress: –tradeoffs between time and accuracy via convex relaxations (Chandrasekaran.
Programming. What is a Program ? Sets of instructions that get the computer to do something Instructions are translated, eventually, to machine language.
A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013.
UI and Data Entry UI and Data Entry Front-End Business Logic Mid-Tier Data Store Back-End.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Introduction to Data Analytics & Business Intelligence Components Presentation Design IS Components 5-Component Framework.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
The Magic of the Cloud: Supercomputers for Everyone, Everywhere Prof. Eric A. Brewer UC Berkeley.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
DATA MINING IN CLOUD COMPUTING Xxxxxxx -11/27/ DSCI 5240.
How AWS Pricing Works Jinesh Varia Technology Evangelist.
Amazon Web Services MANEESH MOHANAVILASAM. OLD IS GOLD?...NOT Predicting peaks Developing partnerships Buying and maintaining hardware Upgrading hardware.
Real-Time Cyber Physical Systems Application on MobilityFirst Winlab Summer Internship 2015 Karthikeyan Ganesan, Wuyang Zhang, Zihong Zheng.
Cloud Interoperability & Standards. Scalability and Fault Tolerance Fault tolerance is the property that enables a system to continue operating properly.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
SampleClean: Bringing Data Cleaning into the BDAS Stack Sanjay Krishnan and Daniel Haas In Collaboration With: Juan Sanchez, Wenbo Tao, Jiannan Wang, Tim.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Haoyuan Li, Justin Ma, Murphy McCauley, Joshua Rosen, Reynold Xin,
Master Cluster Manager User Interface (API Level) User Interface (API Level) Query Translator Avro NTA Query Engine NTA Query Engine Job Scheduler Avro.
1st ACS Workshop UTFSM, Valparaiso, Chile ACS Course The Big Picture of ACS H. Sommer, G.Chiozzi.
1 The Good  HPC brings a wealth of parallelization experience, petaflop scaling and hybrid architectures.  Analytics brings new algorithms and new markets.
Lecture 8-1: Introduction to AWS CMPT 733, SPRING 2016 JIANNAN WANG.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
An Introduction To Big Data For The SQL Server DBA.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
AZURE MACHINE LEARNING Bringing New Value To Old Data SQL Saturday #
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
99s_First_Production_Server.jpg CC-BY : 10x 4Gb Hard Drives 2000: 5000 Linux PCs Today:
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Aaron Harlap, Alexey Tumanov*, Andrew Chung, Greg Ganger, Phil Gibbons
Google App Engine Mandeep Singh (37926)
So, what was this course about?
A UNIFIED ECOSYSTEM FOR MARKET DATA VISUALIZATION
From Algorithm to System to Cloud Computing
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
Building a Database on S3
What's New in eCognition 9
Machine Learning / AI in Drug Discovery
Adversarial Machine Learning in Image Recognition
What's New in eCognition 9
What's New in eCognition 9
Presentation transcript:

The Three E’s of Big Data and What DB People can do About Them UC BERKELEY Michael Franklin – UC Berkeley Beckman Database Get Together October 14, 2013

The Big Data Problem - Nutshelled TimeQualityMoney 2 Massive Diverse and Growing Data Massive Diverse and Growing Data Something’s gotta give :

The 3 E’s of Big Data:

Extreme Elasticity - Machines Option #1 – Build your own Cluster/WSC (US East – Saturday Sept Option #3 – Try your luck on the Spot Market Option #2 – Rent Machines from AWS x Servers needed 46K Servers (2010 estimate)

Extreme Elasticity - Algorithms Agarwal et al., BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. ACM EuroSys 2013.

Extreme Elasticity - People 6 Incentives Fatigue, Fraud, & other Failure Modes Latency & Prediction Work Conditions Interface Answer Quality Task Structuring Task Routing

Extreme Elasticity Algorithms Approximate Answers ML Libraries and Ensemble Methods Active Learning Machines Cloud Computing – esp. Spot Instances Multi-tenancy Relaxed (eventual) consistency/ Multi-version methods People Dynamic Task and Microtask Marketplaces Visual analytics Manipulative interfaces and mixed mode operation

The Challenge

The Good News: We already know how to do this (kinda)! SQLResultMQL Model ✦ End Users tell the system what they want, not how to get it

Query Planner / Optimizer Runtime ML Developer API ML Library MQL Parser (Contracts) Release d July 2013 initial release: Spring 2014 MLbase: Progress

For More Information amplab.cs.berkeley. edu du UC BERKELEY