A distributed method for mining association rules

Slides:



Advertisements
Similar presentations
Mobile Agents Mouse House Creative Technologies Mike OBrien.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
Institut für Scientific Computing - Universität WienP.Brezany 1 Datamining Methods Mining Association Rules and Sequential Patterns.
Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining Part 1 Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules
New York University EDBT’98 Department of Computer Science Courant Institute of Mathematical Sciences New York University Title Name Department of Computer.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
3.Mining Association Rules in Large Database 3.1 Market Basket Analysis:Example for Association Rule Mining 1.A typical example of association rule mining.
Mining High Utility Itemset in Big Data
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Data Mining Find information from data data ? information.
Association Rule Mining
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
Mining Dependent Patterns
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Association Rules.
Association Rules Zbigniew W. Ras*,#) presented by
Waikato Environment for Knowledge Analysis
Data Mining Association Analysis: Basic Concepts and Algorithms
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Frequent patterns and Association Rules
©Jiawei Han and Micheline Kamber
Association Analysis: Basic Concepts
Presentation transcript:

A distributed method for mining association rules Pham Nguyen Anh Huy* Department of Information Technology Vietnam National University of HoChiMinh city presented by Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science and Technology (*work done during 3 months of the author JSPS’s fellowship in JAIST)

Outline Introduction Background A distributed Apriori algorithm using mobile agents Experimental evaluation Conclusion

Introduction Association analysis is a new and attractive research area in data mining Apriori algorithm (R. Agrawal, IBM 1993) is a key technique for association analysis Though the apriori principle allows us to considerably reduce the search space, the technique still requires a huge computation, particularly for large database This research proposes a distributed version of Apriori algorithm using mobile agents. The experiments show that we can reduce computation time when using computers in a distributed computing environment.

Outline Introduction Background Association rules and Apriori algorithm Mobile agents and Aglets A distributed Apriori algorithm using mobile agents Experimental evaluation Conclusion

Association rules: Market basket analysis Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” (in the form X  Y, where X and Y are sets of items) I = {I1=beer, I2=cake, I3=onigiri} Transactional database An association rule {I1}  {I3} How often people buy onigiri and beer together? TID1: {I1, I2, I3} TID2: {I1, I2} TID3: {I2, I3} TID4: {I2} TID5: {I1, I2}

Rule measures: Support and Confidence Association rule X Y support s = probability that a transaction contains X and Y confidence c = conditional probability that a transaction having X also contains Y A  C (s=50%, c=66.6%) C  A (s=50%, c=100%) Customer buys both Customer buys beer Customer buys onigiri

Association mining: Apriori algorithm It is composed of two steps: Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a pre-determined minimum support count Generate strong association rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence (Agrawal, R., 1993)

Association mining: Apriori principle Min. support 50% Min. confidence 50% For rule A  C support = support({A and C}) = 50% confidence = support({A and C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent (if an itemset is not frequent, its supersets are not)

C1  …  Li-1  Ci  Li  Ci+1  …  Lk The Apriori algorithm: Finding frequent itemsets using candidate generation Find the frequent itemsets: the sets of items that have support higher than the minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets Lk with cardinality from 1 to k (k-itemset) by from candidate itemsets Ck (Lk  Ck) Use the frequent itemsets to generate association rules. C1  …  Li-1  Ci  Li  Ci+1  …  Lk

Example (min_sup_count = 2) Scan D for count of each candidate Compare candidate support count with minimum support count Transactional data TID List of items_IDs T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 T600 I2, I3 T700 I1, I3 T800 I1, I2, I3, I5 T900 I1, I2, I3 C1 L1 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2

Example (min_sup_count = 2) Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 {I4, I5} 0 Compare candidate support count with minimum support count Generate candidates C2 from L1 using Apriori principle Scan D for count of each candidate L2 Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 Compare candidate support count with minimum support count Generate candidates C3 from L2 using Apriori principle C3 L3 Scan D for count of each candidate Itemset {I1, I2, I3} {I1, I2, I5} Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2 Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2

Agents and Mobile agents Mobile network agents are programs that: can migrate from system to system within a network environment Performs some processing at each host Agent decides when and where to move next How does it move? Save state Transport saved state to next system Resume execution of saved state An agent is a computation entity that: Acts on behalf of other entities in autonomous fashion. Performs its actions with some level of pro-activity and re-activeness. Exhibits some level of the key attributes of co-operation.

Distributed Computing using Mobile Programs

Mobile agent tools 

What are Aglets ? Aglets (Agile Applets) are Java objects that can move from one host on the Internet to another, and perform arbitrary operations within the security limits. When an Aglet moves it takes along its program code as well as its data. The Aglets framework is implemented by the Aglets Software Development Kit (ASDK) from IBM. It is an environment for programming mobile Internet Agent in Java.

Aglets at Runtime Currently aglets use the Agent Transfer Protocol (ATP) as a default implementation of the communication layer (ATP is modeled after HTTP) Used on the Tahiti aglet server Use the Aglets Server Interface to write application capable of hosting, receiving and dispatching aglets

Outline Introduction Background A distributed Apriori algorithm using the mobile agents Experimental evaluation Conclusion

A distributed Apriori algorithm 1 (1) spawn n slave processes; (2) divide database into partitions (3) distribute partitions to each slave process 2 Master process send global candidate (k-1)-itemsets Ck-1 to each slave process wait and receive local supports, count global supports for global candidate (k-1)-itemsets Ck-1 compute frequent (k-1)-itemsets Lk-1, and send clusters of frequent (k-1)-itemsets Lk-1 to slave processes 8. wait and receive local candidate k-itemsets from slave processes 9. unionize local candidate k-itemsets and prune to form global candidate k-itemsets. Slave processes receive the global candidate (k-1)-itemsets Ck-1 count local supports for global candidate (k-1)-itemsets Ck-1, and send local supports to the master process. receive frequent (k-1)-itemsets Lk-1 from the master process generate local candidate k-itemsets and send these local candidate k-itemsets to the master process

A distributed Apriori algorithm SEND global candidate (k-1) itemsets Ck-1 COUNT and SEND local supports for global candidate (k-1)-itemsets (counting support Aglets) COUNT global supports for global candidate (k-1)-itemsets Ck-1 JOIN and SEND local candidate k-itemsets (Aprio_gen Aglet) UNIONIZE local candidate k-itemsets and PRUNE to form global candidate k-itemsets Ck e.g.,{AB} 2 3 1 … DB1 DB1 … DB DB DB2 DB2 DB 8 . . FIND and SEND frequent (k-1)-itemsets Lk-1 DBn DBn master slaves master slaves master

Global support count & Global candidate itemsets X is a candidate itemset, global support count of X is The set of global candidate k-itemsets GCk formed by local candidate k-itemsets GLk formed by Apriori-gen with ID segment (p, q) of GLk-1 GLk = {GCk ׀ GCk.G-Supp  G-Min-Supp}

Outline Introduction Background A distributed Apriori algorithm using the mobile agents Experimental evaluation Conclusion

Experiments: Synthetic datasets Using synthetic datasets of varying sizes: Name |D| |T| Size (MB) D100k.T30 100K 30 3M D100k.T100 100 10M D320k.T150 320K 150 48M |D| Number of transactions |T| Average amount of items on transactions

Experiment environment Software Database : Oracle server Language: Java – JDK1.3-Sun Mobile agents: Aglet- IBM Protocol traffic: ATP – Aglet Transfer Protocol Platform: Windows Hardware PC Petium3-300 Mhz, RAM 128MB 15 machines (at Knowledge Science Center, JAIST)

Execution time (sec.) with different minimum support thresholds 35% 40% 50%

Execution time with min_sup 35%

Execution time with min_sup 40%

Execution time with min_sup 50%

Rate of execution time The rate between execution time and number of slaves is nearly linear

Conclusion Proposed a distributed apriori algorithm for mining association rule Experimental evaluation show that when the number of slaves increases the execution time decreases nearly linear Future work: Segment both the master and GLk for support counts Develop incremental algorithms for association analysis using the MA technology