Presentation is loading. Please wait.

Presentation is loading. Please wait.

Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.

Similar presentations


Presentation on theme: "Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor."— Presentation transcript:

1 Telegraph Status Joe Hellerstein

2 Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor Data Moving Forward

3 Telegraph: Adaptive Dataflow Dataflow –Siphon data from the “deep web” –Harness data streaming from sensors/traces –Flow through code –The API and Architecture for ubiquitous computing Why adaptive? –Sensor nets & wide area internet: volatile! –Like Telegraph Avenue, need to roll w/the changes –Adaptive techniques for routing data to machines & code

4 Demos Delivered The big push: FFF Election 2000 demo 10/2000 –http://fff.cs.berkeley.eduhttp://fff.cs.berkeley.edu –Got Telegraph off the ground and live –Shows power of analysis & integration on web It’s not just search any more! –Served thousands of live, long-running queries Initial Sensor Demo –UCB Institute for Transportation Studies data –Various web cams –Project for SIMS InfoVis class A harness for more sensor-oriented work in Telegraph

5 Telegraph v1 (alpha) infrastructure Single-site (multi-source) dataflow engine –All Java: some lessons here (paper in preparation) Numerous dataflow operators built –TeSS (Telegraph Screen Scraper) –File reader –Relational ops (filters, joins, grouping, aggregation) –Some simple sequence analysis ops –Eddy: adaptive flow ordering operator Key architectural theme: gain adaptivity via new operators Not changes to dataflow infrastructure! This is our upgrade strategy to parallelism/distribution Lots of performance/learning work remains here –Boltzmann machines? SQL-to-Dataflow parser –SQL is a fine dataflow language for many tasks

6 Upcoming Telegraph Operators Goal: Further adaptivity through competition –Multiple mirrored sources Handle rate changes, failures, parallelism –Multiple alternate operators –STeM operator manages tradeoffs STate Module, unifies caches, rendezvous buffers, join state Competitive sources/operators share building/using STeMs Vijayshankar Raman static dataflow eddy + stems

7 Telegraph Nuts and Bolts 2 Parallelism & Fault Tolerance –Continuous/long-running flows need fault-tolerance –Big flows need parallelism Adaptive Load-Balancing req’d –FLuX operator: Exchange plus… Adaptive flow partitioning –River Mobile operator state for full Load Balancing Replicated flows & redundant state (RAID for operators) Load rebalancing vs. vulnerability Mehul Shah & Sirish Chandrasekaran

8 Directions 1: Sensor Queries Continuous queries over streaming data –Relates to online query processing, data dissemination –Applies to sensors and software traces Goal: Live, sequence-centric query engine –Scale with number of sensors: sampling –Scales with number of queries: CQ/Dissemination –Handles wide-area distributed computing adaptive aggregation within network fabric –Theme: queries involving live data & history are hard prefetching, caching, and scheduling + live sequence queries –Need to target some apps here! Learned a lot from Election 2000 demo Traffic data here? Sam Madden, Yanlei Diao, Asha Tarachandani

9 Directions 2: Deep Web Deep Web Trawling & Privacy Issues –We’re about to fire off our deep web trawler Drives the FluX work –UCB Alumni Relations wants Telegraph to help find donors –FFF lets people do some fascinating/creepy/wrong things Summarize, break down data en masse Look for anomalies, outliers, patterns Combine data from multiple sources –Consider privacy & accuracy: countermeasures, incentives, etc. How to prevent/detect a trawler (a distributed trawler?) How to ensure that data combinations are validated How to avoid “Lies, Damn Lies and Statistics” Mehul Shah (w/Hal Varian, Christos Papadimitriou, David Wagner: UCB & Lisa Hellerstein, Torsten Suel: Polytechnic)

10 Directions 3: Set Compression Most data is in sets –Not strings Surprise: No body of work on compressing sets! –Lossless Should be able to do as well as best permutation Should actually be able to do better! –Lossy Supporting probabilistic containment: Bloom Filters Supporting probabilistic vector lookup: nothing Supporting aggregate information: see Stat dept. Focused on fundamentals –But hope for applications to non-sequence-centric sensor work (aggregation) Amol Deshpande

11 Delicious Snacks Architectural Issues –Encapsulating flexibility/adaptivity in operators –Extending infrastructure, set of operators to new apps Theory/Adaptivity Issues –Formally define optimality in a volatile environment –Define adaptive policies approaching optimality –Understand set compression Performance Issues –Pick attractive applications to refine performance goals –Relax formal definitions and explore heuristic space HCI issues –Preference & “forgetting” in an ever-updating display Societal Issues –Poke at and deflect privacy and info-perception issues “Concepts are delicious snacks with which we try to alleviate our amazement” – A.J. Heschel

12 More? See http://telegraph.cs.berkeley.eduhttp://telegraph.cs.berkeley.edu Or try the demo at http://fff.cs.berkeley.eduhttp://fff.cs.berkeley.edu


Download ppt "Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor."

Similar presentations


Ads by Google