Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,

Similar presentations


Presentation on theme: "Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,"— Presentation transcript:

1 Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak, plale}@cs.indiana.edu

2 Project Description Provenance collection in stream filtering systems Identify unique challenges posed by stream filtering systems to provenance tracking Low overhead data model and collection model that addresses these challenges

3 Outline Stream filtering systems Challenges posed by stream filtering systems Current provenance solutions applied to streams Proposed provenance data model Low overhead provenance collection model Calder stream processing system Implementation of provenance models in Calder Application in LEAD Future work

4 Stream filtering systems Data driven systems that accept events in real time –appropriate when data is continuously generated –data stream is an indefinite sequence of time ordered events Filter (query, user defined application) –a processing unit that takes one or more event sequences as input, and generates a new event sequence, as output –queries with well-defined language or customized application code –long running and associated with a lifetime Applications –monitoring, stock ticks in financial applications, performance measurements in network monitoring and traffic management, sensor data, scientific datasets

5 Challenges posed by stream filtering systems Identifying provenance entities –atomic unit? event/ stream/source Capturing stream filtering conditions with low overhead –distributed environment –environmental and configuration changes Maintaining relevance with non-persistent data –trace back source of events long after being derived Dynamic accuracy estimation –quality of service guarantees for derived streams –provenance across streams –deduce accuracy of derived streams

6 Current provenance solutions applied to streams: What is the challenge? Representing provenance for stream entities using Virtual Data Grid system –indefinite sequence of time ordered datasets –non-persistent data events –need accountability more than reproducibility Provenance collection using PASOA or Karma –provenance to be collected for each stream and filters executed on streams –communication between components of the stream filtering system not very important than the entities themselves

7 Current provenance solutions applied to streams (contd…) Logging environmental conditions using Log4j –non-trivial load on the service –aggregating provenance traces difficult Augmenting accuracy and lineage using Trio –lineage cannot be associated with datasets –need to trace the accuracy of a set of events long after the stream is generated

8 Provenance data model: What to track? Atomic units –streams generated outside the system (base streams) –declarative queries or application code that executes continuously (adaptive filters) –streams generated by executing adaptive filters on base and derived streams (derived streams)

9 Provenance data model: How to store it? Provenance stack –base provenance information and a list of changes –latest information identified by timestamp and is current from that point onwards Provenance tree –derived stream refers to provenance of input streams (base and derived) + adaptive filters –provenance can refer to annotations outside the system (SAM) Store the provenance history (compressed or uncompressed) of streams and filters

10 Low overhead provenance collection model Base provenance –collected from user when registering a stream/filter –document the available information (inputs, filters, rate, sources etc) –store system and user defined metadata as name value pairs in base provenance information –base provenance can be updated by the user Dynamic provenance –subset of a stream identified by a starting timestamp and ending timestamp –changes logged with starting timestamp current from then on

11 A simple example Temperature Feed D0010 Q0099 B0011 D0005 owner foo permissions open to everyone 13:00:00 Feb-10-2006 13:34:56 Feb-10-2006 B0011 down Sampling 0.85

12 Calder stream processing system Distributed processing of streams Service oriented access to data streams SQL based rule-action support Extends OGSA-DAI v6 GDS to streaming resources Synchronous and asynchronous data delivery Data Management Subsystem Stream Grid Data Service Query Planning Service Stream Rowset Service Provenance Service Users/ Appli- cation Computatio n Node Running Query Processing Engine Queries/ Requests Result data Data Streams Calder Pub-sub system Monitoring Service

13 Calder Query Execution

14 Provenance collection in Calder Query Planner Service Monitoring Service Monitoring Updates Prove- nance Service Query execution plan updates Subscribe to receive event of interest Monitoring updates Provenance Queries/ Updates Provenance Results Provenance Propagation XML Database Computation nodes

15 Application in LEAD Radar meta-data is sent through pub-sub system User submits filter query Calder executes filter query on incoming data streams Filtered datasets are processed using data mining algorithms (MDA & ADaM) Triggers (WS-Notifications) sent to workflows that invoke forecast models. Provenance tracking will help in understanding why and when a trigger was sent

16 Future work Complex Event Processing –processing multiple streams –identifying global behavior Context Management –informative search based on past usage –predicting system characteristics –managing profiles for users and dynamic system configuration

17 Thank you Questions and Feedback Welcome! Nithya Vijayakumar nvijayak@cs.indiana.edu


Download ppt "Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,"

Similar presentations


Ads by Google