Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.

Slides:



Advertisements
Similar presentations
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
Advertisements

Adaptive Dataflow: A Database/Networking Cosmic Convergence Joe Hellerstein UC Berkeley.
Indian Statistical Institute Kolkata
Online Aggregation Joe Hellerstein UC Berkeley Online Aggregation: Motivation Select AVG(grade) from ENROLL; A “fancy” interface: + Query Results AVG.
Telegraph Endeavour Retreat 2000 Joe Hellerstein.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Continuously Adaptive Processing of Data and Query Streams Michael Franklin UC Berkeley April 2002 Joint work w/Joe Hellerstein and the Berkeley DB Group.
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
Queries over Sensor Networks Sam Madden UC Berkeley Database Seminar October 5, 2001.
Search and Query: An {Over, Re}view Joseph M. Hellerstein Computer Science Division UC Berkeley
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Data-Intensive Systems Michael Franklin UC Berkeley
CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.
Overview of Web Data Mining and Applications Part I
Diversity in Smartphone Usage Hossein Falaki, Ratul mahajan, Srikanth kandula, Dimitrios Lymberopoulous, Ramesh Govindan, Deborah Estrin. UCLA, Microsoft,
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Optimal Placement and Selection of Camera Network Nodes for Target Localization A. O. Ercan, D. B. Yang, A. El Gamal and L. J. Guibas Stanford University.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
Designing Routing Protocol For Mobile Ad Hoc Networks Navid NIKAEIN Christian BONNET EURECOM Institute Sophia-Antipolis France.
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Measurement Challenges in Online Social Networks Opinions by Stefan Saroiu Microsoft Research.
How Will Applications Drive Future Data-Intensive Systems? Data-Intensive Computing Workshop Applications Break-Out Session.
C. Savarese, J. Beutel, J. Rabaey; UC BerkeleyICASSP Locationing in Distributed Ad-hoc Wireless Sensor Networks Chris Savarese, Jan Beutel, Jan Rabaey.
HABIT Understanding the mechanisms of habit-forming and habit reliance ‘Provocations assignments’ for Day 1 of the NSF/EU/NIH Behavior Change Workshop.
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
© HU-IWI 2006 · Holger Ziekow Stream Processing in Networks of Smart Devices Institute of Information Systems Humboldt University of Berlin, Germany Holger.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Danilo Florissi, Yechiam Yemini (YY), Sushil da Silva, Hao Huang Columbia University, New York, NY 10027
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Zaap Visualization of web traffic from http server logs.
HIC Meeting, 02/25/2010 NWS Hydrologic Forecast Verification Team: Status and Discussion Julie Demargne OHD/HSMB Hydrologic Ensemble Prediction (HEP) group.
Applying Control Theory to Stream Processing Systems
Wireless Sensor Network Architectures
Pervasive Data Access (PDA) Research Group
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Telegraph: An Adaptive Global-Scale Query Engine
Distributing Queries Over Low Power Sensor Networks
Author: Kazunari Sugiyama, etc. (WWW2004)
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Recombinant Computing
Control Theory in Log Processing Systems
Computational Advertising and
Adaptive Query Processing (Background)
Streams and Stuff Sirish and Sam and Mike.
Information Capture and Re-Use
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley

Online Query Processing: The CONTROL Project (’96-’01) Data Analysis on massive datasets takes forever No feedback, 100% accuracy Challenge: make queries more like image delivery But images are pre-encoded in progressive format Query is ad hoc Solution: Online Aggregation Continuous sampling w/o replacement New pipelining query processing algorithms with good statistical properties (e.g. Ripple Joins) and user control (Online Reordering – “Juggle”) Estimators and confidence intervals for aggregates Streaming samples, streaming answers

Images Are Aggregates

Can do Online “Enumeration” Too “Potter’s wheel”

Volatility in Streaming Queries: Analogies for Sensors Query engines map queries to dataflows Flow graph laid out by a query optimizer (typically on cluster) Query executor runs the flow User priorities change during CONTROL queries Breaks “compile-then-run” query optimization paradigm Dynamic reordering of commutative tasks: f(g(x))? g(f(x)) ? Dynamic reordering of data objects: x 1, x 2, x 3, … Requires dynamic competition among choices: f(x) or f’(x)? Volatile networks are similar Hard to predict rates of consumption/production a priori Volatile over time, and queries may run “forever” Imagine interactive user “cockpit" on the sensor net! Added metrics of power and data quality And different kinds of volatility, no doubt

Adaptive Dataflow: Convergence of DBs/Nets The idea from two angles Queries are flows, query optimization is routing Sensor queries need nets-style adaptivity New networking SW looks like a query engine Click, Scout. Also CANs. Sensor Qs need DB-style semantic optimization (up to app) Telegraph: An Adaptive Dataflow System Boxes & Arrows dataflow programming Adaptive reoptimization of the flow graph (Eddies) Adaptive prioritization of the delivery (Juggle) Adaptive load-balancing/FT across nodes (FLuX) Mix Push/Pull to blend streams and pools (Fjords)

Extra Slides on Telegraph

Telegraph Apps to Date Web Queries: Election Enhanced P2P functionality Query by album or artist, via joins with web data Working on pure P2P query processing Initial sensor app Join I-80 traffic movement with webcams and incidents Smart Dust Mote simulations

Telenap: Amazon Meets Napster

Movie Stars Who Donated to Bush

Query >> Search: “Federated Facts and Figures” Yahoo join FECInfo

Query >> Search: “Federated Facts and Figures” APBNews join FECInfo