Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

Martin Wagner and Gudrun Klinker Augmented Reality Group Institut für Informatik Technische Universität München December 19, 2003.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Telegraph Endeavour Retreat 2000 Joe Hellerstein.
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
Design of Web-based Systems IS Development: lecture 10.
1/13 A MJPEG Encoder for the NOW to visualize and navigate on data- intensive scenarios José María González
Copyright © 2004 South-Western. All rights reserved.18–1 Learning Goals components ways computers contribute challenges of managing today’s information.
Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.
Connecting the Invisible Extremes of Computing David Culler U.C. Berkeley Summer Inst. on Invisible Computing July,
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Slide 1 Ubiquitous Storage Breakout Group Endeavour mini-retreat January, 2000.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Data-Intensive Systems Michael Franklin UC Berkeley
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
Robots at Work Dr Gerard McKee Active Robotics Laboratory School of Systems Engineering The University of Reading, UK
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Ch 4. The Evolution of Analytic Scalability
Chapter 10 Architectural Design
The Design Discipline.
Part VII: Special Topics Introduction to Business 3e 18 Copyright © 2004 South-Western. All rights reserved. Using Information Technology.
Configuration Management and Server Administration Mohan Bang Endeca Server.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Ch. 9. The Cloud of Things 1Ch. 9. CoT.  Current M2M/IoT solutions are focusing on communications and integration. Future Web of Things (WoT) evolution.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
© Copyright IBM Corporation 2013 June 2013 IBM Integrated System Test Page 1 IBM Integrated Solutions Test Enterprise Test Series: Ideal Stack Testing.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a: Architectural Design Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a:
Hadoop implementation of MapReduce computational model Ján Vaňo.
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Comprehensive Flexible Global Storage and Search Responsive Available Secure Manageable Federation Coordination Consolidation Transformation Synchronization.
Towards an IoT Ecosystem Flavia C. Delicato 1, Paulo F. Pires 1, Thais Batista 2, Everton Cavalcante 2, Bruno Costa 1, Thomaz Barros 1 1 Department of.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert David F. Redmiles Information and Computer Science.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
An Introduction To Big Data For The SQL Server DBA.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Connected Infrastructure
TV Broadcasting What to look for Architecture TV Broadcasting Solution
Fan Engagement Solution
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
The Next Generation - UNIFIED
Open Source distributed document DB for an enterprise
Modern Data Management
Connected Infrastructure
CHAPTER 2 CREATING AN ARCHITECTURAL DESIGN.
Telegraph: An Adaptive Global-Scale Query Engine
Wavestore Integrates…
Tacit Codification An EmpFinesseTM Wisdom Modernization Solution.
Ch 4. The Evolution of Analytic Scalability
Overview of big data tools
3rd Studierstube Workshop TU Wien
Information Capture and Re-Use
Presentation transcript:

Information Capture and Re-Use Joe Hellerstein

Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust (MEMS sensors) –biomedical monitoring devices (MEMS sensors) –every item of value records its use/misuse (disposable computing) –tacit information from human behavior –video from surveillance cameras, broadcasts, etc.

There’s a Data Flood Coming

What does it look like? –Never ends: interactivity required –Big: data reduction/aggregation is key –Unpredictable: this scale of devices and nets will not behave nicely Key Technologies: –CONTROL: early answers and interactivity online aggregation for data reduction –River/Eddy: massively parallel, adaptive dataflow

CONTROL Continuous Output and Navigation Technology with Refinement On Line Data-intensive jobs are long-running. How to give early answers and interactivity? –Statistical estimators, and their performance implications –online query processing algs: ripple joins –online interactivity over feeds: data “juggle” Appreciate interplay of massive data processing, stats, and UIs Challenges: apply to sequence data, scale up

River We built the world’s fastest sorting machine –On the “NOW”: 100 Sun workstations + SAN –But it only beat the record under ideal conditions! River: performance adaptivity for data flows on clusters –simplifies management and programming –perfect for sensor-based streams Challenges: deploy over a wide area

Eddy How to order and reorder operators over time key complement to River: adapt not only to the hardware, but to the processing rates Challenges: scale up, consider parallel scheduling

Telegraph: Putting it Together Want to build next-gen global DB system. Capture and Re-Use Embodied in a vertical solution. Marriage of: –CONTROL, River & Eddy –OceanStore + optionally-Xactional storage that handle new hardware realities, scale –Federation in the wide area via Negotiation/Economics –Combinations of browse/query/mine at UI no magic bullet there! CONTROL is key.

Integration with other options Integration –Use Oceanic Data Utility for distribution, caching, protection of streams –Use negotiation architectures to connect federated and stored streams –Be data-intensive backbone to diverse clients –Be a scalable platform for tacit knowledge extraction Cooperation –Tacit information as a feed –Capture/merge classroom feeds –Use UI design tools for device-independent, interactive stream-based apps

Plan for Success One Year –Implement River/Eddy over parallel cluster, deploy CONTROL modules –Deploy data analysis apps over sequence data (MEMS/Web/Video) Three Year –Integrate w/ wide area storage & processing –Get data-intensive Endeavour apps running on architecture (e.g. tacit knowledge mining) –Develop UI tools for interacting with never- ending streams