實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Association rules and frequent itemsets mining
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Association Rules
SEG Tutorial 2 – Frequent Pattern Mining.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
What Is Sequential Pattern Mining?
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Mining Frequent Patterns without Candidate Generation.
Mining various kinds of Association Rules
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
What is Frequent Pattern Analysis?
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Gspan: Graph-based Substructure Pattern Mining
Association Rule Mining
Association Rule Mining
Market Basket Analysis and Association Rules
FP-Growth Wenlong Zhang.
Presentation transcript:

實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin

2 Outline Introduction of basic data mining concepts about our research topics Brief description of doctoral research Topic 1: Mining frequent itemsets with multi- dimensional constraints Topic 2: Mining the inter-transactional association rules of multi-dimensional interval patterns Topic 3: Inter-sequence association rules mining Topic 4: Mining association rules among time- series data

3 Introduction of Data Mining Data mining is the task of discovering knowledge from large amounts of data. One of the fundamental data mining problems, frequent itemset mining, covers a broad spectrum of mining topics, including association rules, sequential patterns, etc. Frequent itemset mining is to discover all the itemsets whose supports in the database exceed a user-specified threshold.

4 Introduction of Association Rules Association rule is of the form X  Y, where X and Y are both frequent itemsets in the given database and X  Y= . The support of X  Y is the percentage of transactions in the given database that contain both X and Y, i.e., P(X  Y). The confidence of X  Y is the percentage of transactions in the given database containing X that also contain Y, i.e., P(Y|X).

5 Introduction of Sequential Patterns A sequence is an ordered list of itemsets, and denoted by, where s j is an itemset. s j is also called an element of the sequence, and denoted as (x 1 x 2 …x m ), where x k is an item. The support of a sequence  in a sequence database is the number of tuples containing . A sequence  is called a sequential pattern if support(  )  min-support.

6 Algorithm for Mining Frequent Itemsets Apriori Candidate set generation-and–test Level-wise: it iteratively generates candidate k-itemsets from previously found frequent (k-1)-itemsets, and then checks the supports of candidates to form frequent k-itemsets. L k-1 Join Support Check LkLk CkCk

7 Algorithm for Mining Frequent Itemsets (cont ’ d) FP-growth The method constructs a compressed frequent pattern tree, called FP-tree. A divide-and-conquer strategy to recursively decompose the mining task into a set of smaller tasks in conditional databases, and concatenates the suffix itemset with the frequent itemsets generated from a conditional FP-tree.

8 Algorithm for Mining Sequential Patterns - PrefixSpan It finds length-1 sequential patterns in the target database first, and partitions the database into smaller projected databases with prefix of each sequential pattern previously found. The sequential patterns can be mined by constructing corresponding projected databases and mine each recursively. It preserves the element order of each tuple in the mining process.

9 Brief Description of Doctoral Research Mining calling path patterns in GSM networks Two problems of mining calling path patterns Mining PMFCPs Mining periodic PMFCPs Graph structures [(periodic) frequent calling path graph] and graph-based mining algorithms Based on a depth-first No candidate paths are generated and the database is scanned only once if the whole graph structure can be held in the main memory.

10 Brief Description of Doctoral Research (cont ’ d) Bioinformatic data mining Gene Clustering Sequence comparisons, alignments and compression DNA sequence Protein sequence Application Phylogenetic tree to predict the function of a new protein Relationship between DNA sequence & disease

11 Topic 1: Mining Frequent Itemsets with Multi-dimensional Constraints Frequent itemset mining often generates a very large number of frequent itemsets. Only the subset of the frequent itemsets and association rules is of interest to users. Users need additional post-processing to find useful ones. Constraint-based mining pushes user-specific constraints deep inside the mining process to improve performance. With multi-dimensional items, constraints can be imposed on multiple dimensional attributes.

12 Topic 1: Mining Frequent Itemsets with Multi-dimensional Constraints itemID a 1 a 2 …. a m i k = (k 1, k 2 …, k m ) A = i A = (A 1, A 2, …, A m ) A 1 =A.a 1 attributes (dimensions) Multi-dimensional Constraints

13 Topic 1: Mining Frequent Itemsets with Multi-dimensional Constraints Multi-dimensional constraints can be categorized according to constraint properties. anti-monotone, monotone, convertible and inconvertible It can be also classified according to the number of sub-constraints included. Single constraint against multiple dimensions, Ex: max(S.cost)  min(S.price) Conjunction and/or disjunction of multiple sub- constraints, Ex: (C1: S.cost  v1)  (C2: S.price  v2)

14 Topic 1: Mining Frequent Itemsets with Multi-dimensional Constraints We extend constraints to place over multi- dimensional itemsets and develop algorithms for mining frequent itemsets with multi- dimensional constraints by extension of CFG (Constrained Frequent Pattern Growth), Overview of our algorithm Phase 1: Frequency check Phase 2: Constraint check Phase 3: Conditional database construction

15 Example: C am  max(S.cost)  min(S.price) Database BECA BEA DA BDA BDE BDECA BEC BDEC DEC BDC A-conditional Database BEC BE D BD BDEC EA-conditional Database D Frequent items: B, D, E, C, A C(BDECA)=false C(B)=true C(D)=true C(E)=true C(C)=true C(A)=true Frequent items: B, D, E, C C(BDECA)=false C(BA)=false C(DA)=true C(EA)=true C(CA)=false Frequent items: 

16 Topic 2: Mining Inter-transactional Association Rules of Multi-dimensional Interval Patterns Transaction could be the items bought by the same customer, the events happened on the same day, and so on. Intra-transactional association rules: associations among items within the same transaction. Ex: buy (X, diapers) => buy (X, beer) [support=80%] Inter-transactional association rules: association relations among different transactions. Ex: If the prices of IBM and SUN go up, Microsoft’s will most likely [80%] increases the next day.

17 Topic 2: Mining Inter-transactional Association Rules of Multi-dimensional Interval Patterns Interval data are different from the point data in that they occupy regions of non-zero size. Multi-dimensional Intervals can be represented as line segments (1-D), rectangles (2-D), hyper-cubes (n-D), etc. Extended item: denoted as  (Location) Reference point: the smallest  (Location) among all  (Location). Maxspan: a sliding window; only associations covered by it are considered.

18 Example There are two cubes in the 3-dimensional space:  0,2,1 and  1,1,0. Reference point: (0,1,0) The two items are denoted as  0,1,1 and  1,0,0.  0,2,1  1,1,0

19 Algorithm (Apriori-like) Example Support: 10% (10%*20=2) Maxspan: 4 L 1 :  0,0

20 Algorithm (Apriori-like) Example (cont ’ d) Remind: Apriori-like algorithm L k-1 L 2 : {  0,0,  1,1 }, {  1,0,  0,1 }, {  0,0,  2,0 }, {  0,0,  3,0 } L 3 : {  3,0,  2,1,  0,3 } {  1,0,  0,1,  2,1 } {  3,0,  0,3,  4,1 } {  2,0,  0,2,  4,0 } L 4 : {  0,3,  4,1,  2,1,  3,0 } JoinSupport Check LkLk CkCk

21 Topic 3: Inter-sequence Association Rules Mining Inter-sequence model Transaction Time : Transaction ID :

22 Topic 3: Inter-sequence Association Rules Mining (cont ’ d) Extended sequence (denote asΔ t ): a sequence s = at time pointΔ t. Algorithm: Step 1: Use PrefixSpan to find all sequential patterns Step 2: Use an Apriori-like method to check if some extended sequence set is large Use L-bucket (List-bucket) & C-bucket (candidate-bucket) to improve mining efficiency.

23 Example min_support = 3 maxspan = 2 Tran. IDTran. Time Sequence The database Sequential Patterns: –,, –,,,,,, – PrefixSpan

24 Example (cont ’ d) Candidates C 2 {Δ 0, Δ 1 }, {Δ 0, Δ 2 } {Δ 0, Δ 1 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 2 } {Δ 0, Δ 1 }, {Δ 0, Δ 2 } {Δ 0, Δ 1 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 2 } {Δ 0, Δ 1 }, {Δ 0, Δ 2 } PrefixSpan Result,,,,, L1L1 {Δ 0 }

25 Example (cont ’ d) L2L2 {Δ 0 }, {Δ 0 }, {Δ 0 }, {Δ 0 }, {Δ 0 },{Δ 0 } {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 1 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 2 }, {Δ 0, Δ 1 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 2 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 1 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 }, {Δ 0, Δ 2 }, {Δ 0, Δ 1 }, {Δ 0, Δ 2 } PrefixSpan Result,,,,, C2C2 Apriori-like L k-1 → C k → L k

26 Topic 4: Mining Association Rules among Time-series Data A line is an ordered and continuous list in the form {t 1, t 2, …, t m } describing the property of the subject along the time. Step 1: find the frequent lines and points in each line-set. (Apriori-like algorithm) Step 2: use those frequent-set combination to find the associations among them. (inter- transaction association rules)

27 Topic 4: Mining Association Rules among Time-series Data

28 Time-series Data Approximation For the algorithm’s efficiency Equally partition the fluctuation rate into several classes.

29 Step 1: Line Discovery (Apriori-like) Step 2: Association Rule Mining

Data Mining Part Thank You!