Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.

Slides:



Advertisements
Similar presentations
Tableau Software Australia
Advertisements

Distributed Processing, Client/Server and Clusters
Company confidential Prepared by HERE Transit Sr. Product Manager, HERE Transit Product Overview David Volpe.
Telegraph Endeavour Retreat 2000 Joe Hellerstein.
Technical Architectures
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Federated Facts and Figures Joseph M. Hellerstein UC Berkeley.
Traffic Engineering With Traditional IP Routing Protocols
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
Fluxo: Simple Service Compiler Emre Kıcıman, Ben Livshits, Madanlal Musuvathi {emrek, livshits,
Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.
(c) 2007 Mauro Pezzè & Michal Young Ch 10, slide 1 Functional testing.
Sensor Networks: Implications for Database Systems and Vice-Versa Michael Franklin January UCB Sensor Day.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Data-Intensive Systems Michael Franklin UC Berkeley
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
Sitefinity Performance and Architecture
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
ESB Guidance 2.0 Kevin Gock
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Team Skill 6: Building the Right System From Use Cases to Implementation (25)
MapReduce With a SQL-MapReduce focus by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Architecture Planning and designing a successful system Use tried and tested techniques Easy to maintain Robust and long lasting.
© 2010 IBM Corporation IBM InfoSphere Streams Enabling a smarter planet Roger Rea InfoSphere Streams Product Manager Sept 15, 2010.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Distributed Software Engineering Lecture 1 Introduction Sam Malek SWE 622, Fall 2012 George Mason University.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Students: Anurag Anjaria, Charles Hansen, Jin Bai, Mai Kanchanabal Professors: Dr. Edward J. Delp, Dr. Yung-Hsiang Lu CAM 2 Continuous Analysis of Many.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Chapter 6 Distributed File Systems Summary Bernard Chen 2007 CSc 8230.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
1 Makes Mobile WiMAX Simple Netspan Overview Andy Hobbs Director, Product Management 5 th October 2007.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
BIG DATA/ Hadoop Interview Questions.
Stuff to memorise… "A method tells an object to perform an action. A property allows us to read or change the settings of the object."
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Applying Control Theory to Stream Processing Systems
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
University of Technology
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Pervasive Data Access (PDA) Research Group
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Introduction to Spark.
Telegraph: An Adaptive Global-Scale Query Engine
Distributing Queries Over Low Power Sensor Networks
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Ch 4. The Evolution of Analytic Scalability
B. Stegmaier und R. Kuntschke TU München – Fakultät für Informatik
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
Database System Architectures
Information Capture and Re-Use
Presentation transcript:

Telegraph Status Joe Hellerstein

Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor Data Moving Forward

Telegraph: Adaptive Dataflow Dataflow –Siphon data from the “deep web” –Harness data streaming from sensors/traces –Flow through code –The API and Architecture for ubiquitous computing Why adaptive? –Sensor nets & wide area internet: volatile! –Like Telegraph Avenue, need to roll w/the changes –Adaptive techniques for routing data to machines & code

Demos Delivered The big push: FFF Election 2000 demo 10/2000 – –Got Telegraph off the ground and live –Shows power of analysis & integration on web It’s not just search any more! –Served thousands of live, long-running queries Initial Sensor Demo –UCB Institute for Transportation Studies data –Various web cams –Project for SIMS InfoVis class A harness for more sensor-oriented work in Telegraph

Telegraph v1 (alpha) infrastructure Single-site (multi-source) dataflow engine –All Java: some lessons here (paper in preparation) Numerous dataflow operators built –TeSS (Telegraph Screen Scraper) –File reader –Relational ops (filters, joins, grouping, aggregation) –Some simple sequence analysis ops –Eddy: adaptive flow ordering operator Key architectural theme: gain adaptivity via new operators Not changes to dataflow infrastructure! This is our upgrade strategy to parallelism/distribution Lots of performance/learning work remains here –Boltzmann machines? SQL-to-Dataflow parser –SQL is a fine dataflow language for many tasks

Upcoming Telegraph Operators Goal: Further adaptivity through competition –Multiple mirrored sources Handle rate changes, failures, parallelism –Multiple alternate operators –STeM operator manages tradeoffs STate Module, unifies caches, rendezvous buffers, join state Competitive sources/operators share building/using STeMs Vijayshankar Raman static dataflow eddy + stems

Telegraph Nuts and Bolts 2 Parallelism & Fault Tolerance –Continuous/long-running flows need fault-tolerance –Big flows need parallelism Adaptive Load-Balancing req’d –FLuX operator: Exchange plus… Adaptive flow partitioning –River Mobile operator state for full Load Balancing Replicated flows & redundant state (RAID for operators) Load rebalancing vs. vulnerability Mehul Shah & Sirish Chandrasekaran

Directions 1: Sensor Queries Continuous queries over streaming data –Relates to online query processing, data dissemination –Applies to sensors and software traces Goal: Live, sequence-centric query engine –Scale with number of sensors: sampling –Scales with number of queries: CQ/Dissemination –Handles wide-area distributed computing adaptive aggregation within network fabric –Theme: queries involving live data & history are hard prefetching, caching, and scheduling + live sequence queries –Need to target some apps here! Learned a lot from Election 2000 demo Traffic data here? Sam Madden, Yanlei Diao, Asha Tarachandani

Directions 2: Deep Web Deep Web Trawling & Privacy Issues –We’re about to fire off our deep web trawler Drives the FluX work –UCB Alumni Relations wants Telegraph to help find donors –FFF lets people do some fascinating/creepy/wrong things Summarize, break down data en masse Look for anomalies, outliers, patterns Combine data from multiple sources –Consider privacy & accuracy: countermeasures, incentives, etc. How to prevent/detect a trawler (a distributed trawler?) How to ensure that data combinations are validated How to avoid “Lies, Damn Lies and Statistics” Mehul Shah (w/Hal Varian, Christos Papadimitriou, David Wagner: UCB & Lisa Hellerstein, Torsten Suel: Polytechnic)

Directions 3: Set Compression Most data is in sets –Not strings Surprise: No body of work on compressing sets! –Lossless Should be able to do as well as best permutation Should actually be able to do better! –Lossy Supporting probabilistic containment: Bloom Filters Supporting probabilistic vector lookup: nothing Supporting aggregate information: see Stat dept. Focused on fundamentals –But hope for applications to non-sequence-centric sensor work (aggregation) Amol Deshpande

Delicious Snacks Architectural Issues –Encapsulating flexibility/adaptivity in operators –Extending infrastructure, set of operators to new apps Theory/Adaptivity Issues –Formally define optimality in a volatile environment –Define adaptive policies approaching optimality –Understand set compression Performance Issues –Pick attractive applications to refine performance goals –Relax formal definitions and explore heuristic space HCI issues –Preference & “forgetting” in an ever-updating display Societal Issues –Poke at and deflect privacy and info-perception issues “Concepts are delicious snacks with which we try to alleviate our amazement” – A.J. Heschel

More? See Or try the demo at