Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

PNS: Personalized Multi-Source News Delivery Georgios Paliouras(1), Mouzakidis Alexandros(1), Christos Ntoutsis(2), Angelos Alexopoulos(3), Christos Skourlas(2)
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
On-Line Discovery of Hot Motion Paths D. Sacharidis 1, K. Patroumpas 1, M. Terrovitis 1, V. Kantere 1, M. Potamias 2, K. Mouratidis 3, T. Sellis 1 1 National.
指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
A Generic Framework for Monitoring Continuous Spatial Queries over Moving Objects.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
2005/11/09 Continuous Queries in P2P Networks. Motivation.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Spatio-temporal Databases Time Parameterized Queries.
Progress Report on Continuous Data Stream Management  Mining Frequent Itemsets over Data Streams  Music Virtual Channel Presented by: Dr. Yi-Hung Wu.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Continuous Data Stream Processing
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
Continuous Data Stream Management  Music Virtual Channel – copyright violations  Data Stream Monitoring – counting sketches  Continuous Query Processing.
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
Continuous Processing of Preference Queries in Data Streams : a Survey
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Chapter 1 Introduction to Data Mining
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
Copyright © 2005, Pearson Education, Inc. Slides from resources for: Designing the User Interface 4th Edition by Ben Shneiderman & Catherine Plaisant Slides.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Meta-Server System Software Lab. Overview In the Music Virtual Channel system, clients can’t query for a song initiatively Through the metadata server,
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Monitoring k-NN Queries over Moving Objects Xiaohui Yu University of Toronto Joint work with Ken Pu and Nick Koudas.
Push Technology Humie Leung Annabelle Huo. Introduction Push technology is a set of technologies used to send information to a client without the client.
Multi-object Similarity Query Evaluation Michal Batko.
SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
Collaborative Query Previews in Digital Libraries Lin Fu, Dion Goh, Schubert Foo Division of Information Studies School of Communication and Information.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
AQWA Adaptive Query-Workload-Aware Partitioning of Big Spatial Data Dimosthenis Stefanidis Stelios Nikolaou.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
DNS Traffic Management and DNS data mining Making Windows DNS Server Cloud Ready ~Kumar Ashutosh, Microsoft.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Data mining in web applications
Popular Database Management Systems
Fast Subsequence Matching in Time-Series Databases.
Online Frequent Episode Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Pervasive Data Access (PDA) Research Group
Location Privacy.
Introduction to Spatial Databases
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Continuous Motion Pattern Query
Presentation transcript:

Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6

Continuous Data Stream Processing 2 Clustering engine Clustering engine Music metadata Music metadata Music Virtual Channel … 1 1 N N 2 2 … Music collections Internet V.C. player V.C. player Filtering engine Filtering engine Music channel simulator Music channel simulator Interface Profile monitor Profile monitor Channel monitor Channel monitor Favorite channel Favorite channel Cluster monitor Cluster monitor Cluster coordinator Cluster coordinator Peer search engine Peer search engine Profile database Profile database MusicXML database MusicXML database XML Filtering engine XML Filtering engine

Continuous Data Stream Processing 3 Research Directions Streaming Data Management Mining Filtering Temporal Query Processing Spatial Query Processing Aggregate Query Processing Frequent Tree Pattern Mining Frequent Itemset Mining (sliding window) Sequence Query Matching Episode Query Matching Range Search KNN Search Top-K Search Closed Tree Pattern Mining Frequent Itemset Mining (landmark model)

Continuous Data Stream Processing 4 Sequence Query Matching  Given a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries?  Event Stream  ······················  Sequence Query , ε=1

Continuous Data Stream Processing 5 Episode Query Matching  Knowledge Discovery from Telecommunication Network Alarm Databases [ICDE96]  If an alarm of type A occurs, then an alarm of type B occurs within 30 seconds with probability 0.8  If alarms of types A and B occurs within 5 seconds, then a alarm of type C occurs within 60 seconds with probability 0.7  If an alarm of type A precedes an alarm of type B, and C precedes D, all within 15 seconds, then E will follow within 4 minutes with probability 0.6 A A B 5 seconds CD A B 15 seconds

Continuous Data Stream Processing 6 Top-K Query  Suppose there are two continuous queries  and . Then, another continuous query  is registered. Coordinator Server 1 Server 2Server 3 Server4 Queries  Which two web documents are the most popular across the first and second servers?  Which two web documents are the most popular across the third and fourth servers?  Which two web documents are the most popular across the second and third servers?

Continuous Data Stream Processing 7 Main Difficulties  Heavy Communication Cost  The serve only updates its current data when necessary  Multiple Continuous Queries  Most papers focus on one-time top-k queries or single continuous top-k query  Information sharing is necessary

Continuous Data Stream Processing 8 Search engine Search engine V.C. player V.C. player V.C. player V.C. player user profile, channel V.C. player recommended channel selected channel Vote Mechanism Spatial Query Processing  Continuous queries for moving objects in high- dimensional space  Range search  KNN search user profile

Continuous Data Stream Processing 9 Problem Definition  Given a set of objects with their positions on a N- dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motion  Continuously monitoring the results of each query point  Range Query  KNN Query

Continuous Data Stream Processing 10 Main Difficulties  Heavy Communication Cost  The object updates occur only when the results for some queries might change Safe Region [SIGMOD05]  Incremental Update  Efficiently maintain the effective results  Multiple Continuous Queries  Decide the quarantine area for each query  Mixed Types of Queries  Support both the range query and the KNN query Q1Q1 Q2Q2 Q1 Q2 Q1Q2

Continuous Data Stream Processing 11 Range Query Query Q: (x,y), r Cell C A: max < r B: min  r  max C: min > r max: dis(query,cell) min: dis(query,cell)

Continuous Data Stream Processing 12 Range Query (Cont.) Moving Query MQ How to maintain the Result for a MQ?

Continuous Data Stream Processing 13 Range Query (Cont.) When to update? Q1Q2Q3 AAAAAA AABAAB AACAAC No update and no recalculate Update and recalculate for some queries No update and no recalculate We only need to consider those objects marked with B flag = 0/1 Client Server Q1Q2Q3

Continuous Data Stream Processing 14 Range Query (Cont.) For a range query Q Result list O3O5O7 Affected queries Q2Q4Q7 A For a cell C Q3Q6Q9 B C2 Covered cells C2 C3C4C5 A C2C7C9 B Query Motion

Continuous Data Stream Processing 15 KNN Query Query Q: (x,y), 3 update the order Object Update re-computation update the order

Continuous Data Stream Processing 16 KNN Query (Cont.) Query Q: (x,y), 3 Query Q ’ : (x ’,y ’ ), r r = d ’ max d’ max

Continuous Data Stream Processing 17 KNN Query (Cont.) Query Q: (x,y), 3 d max d query Query Q ’ : (x ’,y ’ ), r r = d max +d query

Continuous Data Stream Processing 18 KNN Query (Cont.) Query Q: (x,y), 3 d max d cell Query Q ’ : (x ’,y ’ ), r r = d max +d cell

Continuous Data Stream Processing 19 Tree Pattern Mining  As the trees stream in, find out the subtrees that occur more than θ·N times, where N is the number of trees received so far and 0 ≦ θ ≦ 1 STMer Frequent Tree Patterns T1 T3 T2

Continuous Data Stream Processing 20 Closed Tree Pattern Mining  Mining closed frequent subtrees over data streams  a subtree is closed if none of its proper supertrees has the same support as its A B C D A B C B C D closed ABCD B D B C B C D A B C frequent subtrees A B 2