Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity Mohamed Medhat Gaber, Shonali Krishnaswamy, Arkady Zaslavsky In Proceedings.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
Third International Workshop on Knowledge Discovery from Data Streams, 2006 Classification of Changes in Evolving Data Streams using Online Clustering.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Machine Learning Neural Networks
CES 514 – Data Mining Lecture 8 classification (contd…)
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
Modeling Cross-Episodic Migration of Memory Using Neural Networks by, Adam Britt Definitions: Episodic Memory – Memory of a specific event, combination.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
Recommender systems Ram Akella November 26 th 2008.
Lecture14: Association Rules
Visualisation of Cluster Dynamics and Change Detection in Ubiquitous Data Stream Mining Authors Brett Gillick, Mohamed Medhat Gaber, Shonali Krishnaswamy,
Machine Learning: Ensemble Methods
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Chapter 5 Data mining : A Closer Look.
Discriminant Analysis Testing latent variables as predictors of groups.
DATA MINING AND MACHINE LEARNING Addison Euhus and Dan Weinberg.
Radial Basis Function (RBF) Networks
Evaluating Performance for Data Mining Techniques
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Basic Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Radial Basis Function Networks
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Inductive learning Simplest form: learn a function from examples
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Mobile Data Mining for Intelligent Healthcare Support By: Pari Delir Haghighi, Arkady Zaslavsky, Shonali Krishnaswamy, Mohamed Medhat.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Automatic Gain Control Response Delay and Acquisition in Direct- Sequence Packet Radio Communications Sure 2007 Stephanie Gramc Dr. Noneaker.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Sensor B Sensor A Sensor C Sensor D Sensor E Lightweight Mining Techniques Time Frame: 10 Time Threshold: 20.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences Ajumobi Udechukwu, Ken Barker, Reda Alhajj Proceedings of.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Segmenting Popular Music Sentence by Sentence Wan-chi Lee.
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
Data Mining: Concepts and Techniques
Data Transformation: Normalization
Data Mining: Concepts and Techniques
Clustering.
Image Processing for Physical Data
A Unifying View on Instance Selection
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 12 Query Processing (1)
Data Transformations targeted at minimizing experimental variance
Maintaining Frequent Itemsets over High-Speed Data Streams
Presentation transcript:

Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity Mohamed Medhat Gaber, Shonali Krishnaswamy, Arkady Zaslavsky In Proceedings of The Australasian Data Mining Workshop (AusDM 2003) Advisor : Jia-Ling Koh Speaker : Chun-Wei Hsieh 08/26/2004

The General Processing Model

The Challenges : 1) memory requirement 2) Mining algorithms require several passes. 3) challenge to transfer

The Strategies 1) Input data rate adaptation: sampling, filtering, aggregation

The Strategies (cont.) 2) Output concept level: 3) Approximate algorithms: 4) On-board analysis: 5) Algorithm output granularity:

Definition Algorithm threshold: a) Available memory. b) Remaining time to fill the available memory. c) Data stream rate. Output granularity: is the amount of generated results that are acceptable according to specified accuracy measure. Time threshold: is the required time to generate the results before any incremental integration according to some accuracy measure.

Algorithm output granularity 1) Determine the time threshold and the algorithm output granularity. 2) According to the data rate, calculate the algorithm output rate and the algorithm threshold. 3) Mine the incoming stream using the calculated algorithm threshold. 4) Adjust the threshold after a time frame to adapt with the change in the data rate using linear regression. 5) Repeat the last two steps till the algorithm lasts the time interval threshold. 6) Perform knowledge integration of the results

Light-weight Clustering Algorithm

Light-weight Frequent Items Algorithm 1) Data streams arrive item by item. Each item contains attribute values for a1,a2, …,an attributes and the class category. 2) a) Measure the distance between the new item and the stored ones. b) If the distance is less than a threshold, store the average of these two items and increase the weight for this average as an item by 1. c) After a time threshold for the training, we come up with a sample result like the one in table 1.

Light-weight Frequent Items Algorithm 3) Using the above table, we choose K- nearest -items for the classification process. 4) Find the majority class category taking into account the calculated weights from the K items

Light-weight Frequent Items Algorithm

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Experiment 5