Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting DM Tasks & DM Processes in a DSMS or a CEP System

Similar presentations


Presentation on theme: "Supporting DM Tasks & DM Processes in a DSMS or a CEP System"— Presentation transcript:

1 Supporting DM Tasks & DM Processes in a DSMS or a CEP System
Motivation: Gaining experience with current DSMS and their limitations which make it hard to support KDD applications on data streams. Case Study: Naïve Bayesian Classifiers—arguably the simplest mining algorithm, which is doable in a SQL/DBMS. Thus the question is: can we support it using a DSMS and its query language? A slightly more general question is whether the NBC can be supported various CEP systems, which claim to be powerful (e.g., support rules). Could they be extended to support generic versions of NBC, and perhaps other data stream mining methods?

2 CS240B Project Download a DSMS or a CEP system of your choice and (after explaining why you have selected this and not the others) explore how you can implement the following tasks: Training: Select the dataset you will be using and train an NBC on that. Testing: you should use the NBC of point 1 to (a) classify unclassified tuples, and (b) test the accuracy of the classifier on testing examples. Report the accuracy of your current classifier at periodic (time or count-based) intervals. Periodically derive a new NBC from the stream of pre-classified tuples of point 2.. Use the newly built to repeat Step 1 and 2. How does accuracy compare to NBC 1. See if you can generalize your software, and e.g., design/develop generic NBCs, ensemble methods, other classifiers, etc. It is understood that the limitations of DSMS and CEP systems will probably prevent you from completing all these tasks (listed in order of increasing difficulty). So, you should make sure that you (1) download a good system, (2) write clear report explaining your efforts. In particular, explain (i) the properties of the DSMS and the dataset that made you select then, (ii) Your design and implementation approach, (iii) the difficulties that prevented you from going further. Submit your project report and code. (For test sets, se:

3 DSMS (suggested by Ariyam Das)
1. Storm developed by BackType Provides an adapter to write applications in almost any language. Release link: Has an experimental feature called Storm SQL which allows users to run SQL queries over streaming data in Storm. Tutorial: Local Mode Setup:

4 DSMS (cont.) 2. Spark Streaming developed by Berkeley AMPLab
Transforms streaming computation into a series of deterministic micro-batch computations and then executed using Spark’s distributed processing framework. Tutorial 1: Tutorial 2: Release: Docs: 3. SQLStream Blaze. Not open-source. But free 30 days use or free community usage with data limits. Tutorieed to use only S-server and streamlab.

5 DSMS (cont.) 4. Apache Flink
Efficient when used with Hadoop connector. download: Example: Dataflow model: 5. StreamBase (slightly outdated) Recently acquired by TISCO. Doc: Applications can be written in StreamSQL: 6. Esper


Download ppt "Supporting DM Tasks & DM Processes in a DSMS or a CEP System"

Similar presentations


Ads by Google