Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group.

Similar presentations


Presentation on theme: "Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group."— Presentation transcript:

1 Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group

2 2 Outline What is data mining? -Definition -local patterns vs global models -Supervised vs Unsupervised -What do we do? Frequent set mining More complex data types

3 3 What is data mining? DataInformation $ $ $ “the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets.”

4 4 Supervised vs Unsupervised Supervised: -data has been annotated -well-defined task: learn to annotate new data E.g.: examples of good/bad customers Unsupervised: -only data has been given -no annotation -« find knowledge » x x x x x x x y y n

5 5 Local vs Global Local pattern: -tells something about a small subset of the data E.g. « 90% of the customers that purchase beer also buy chips » Global model: -fits a global model to the data, a summary E.g. : there is a linear relationship between $ spent and the income of the customers

6 6 What do we do? Pattern mining -Local -Unsupervised Useful for -large datasets -exploration: « what is this data like? » Less suitable for -well-studied and understood problem domains

7 7 Outline What is data mining? Frequent set mining -Market Basket analysis -Association rules -Interestingness measures -Numerical attributes More complex data types

8 8 Market Basket Analysis Data: collection of transactions of customers: Goal: find sets of products frequently occuring together

9 9 Applications Supermarket -product placement -special promotions Websearch -which keywords often occur together in webpages? Health care -frequent sets of symptoms for a disease

10 10 Applications Basically works for all data that can be represented as a set of examples/objects having certain properties -patient / symptoms -movies / ratings -web pages / keywords -basket / products -…

11 11 Algorithms Computationally a very hard problem -with n products, 2 n sets of products Hundreds of algorithms have been proposed -for sparse/dense data -many rows/columns -data fits/does not fit in memory -…

12 12 Association Rules Conditional probabilities X  Y (c%): if X is in the transaction, then there is a probability of c% that Y is in it as well. Based on the frequent sets, associations can be computed easily: { Beer, Chips }  { Snack nuts }75% { adrem.html, cnts.html }  { islab.html }80% { rain }  { overcast }100%

13 13 Interestingness Measures Not all association rules are interesting -Domain knowledge pregnant  female, rain  overcast -Redundancy A  B (100%) then: AC  B, AD  B, … -Independence 70% buys product A: X  A(70%), Y  A(70%) Too many rules

14 14 Interestingness Measures Incorporating background knowledge -e.g., via Bayesian network -only produce rules that deviate from background knowledge Redundancies -Condensed representations: produce only a non- redundant subset of patterns

15 15 Interestingness Measures Independence -statistical significance tests X 2 Careful with conclusions !! 1000 tests with significance level 0.05 … (Bonferroni correction) Too many rules -Constraints -Top-k mining

16 16 Numerical Attributes Association rule mining is also possible for numerical attributes -discretization: make continuous attributes ordinal information loss not appropriate if the order between the values is important -other methods: recently new method based on rank correlation measures

17 17 Complex Patterns Sets Sequences Graphs Relational Structures Generation and Counting of such patterns becomes much more complex too!

18 18 Sequences CGATGGGCCAGTCGATACGTCGATGCCGATGTCACGA

19 19 Patterns in Sequences Substrings Regular expressions (bb|[^b]{2}) Partial orders Directed Acyclic Graphs

20 20 Graphs

21 21 Patterns in Graphs

22 22 Rules f: 5 f: 8 f: 4 f: 7 f: 4 0.8 0.5 f: 4 0.57

23 23 Relational Databases

24 24 Patterns in RDBs Queries Query 1: Select L.drinker, V.bar From Likes L, Visits V Where V.drinker = L.drinker And L.beer = ‘Duvel’

25 25 Patterns in RDBs Query 2: Select L.drinker, V.bar From Likes L, Visits V, Serves S Where V.drinker = L.drinker And L.beer = ‘Duvel’ And S.bar = V.bar And S.beer = ‘Duvel’

26 26 Patterns in RDBs Association Rule: Query 1 => Query 2 If a person that likes Duvel visits bar, then that bar serves Duvel

27 27


Download ppt "Frequent Pattern Mining Toon CaldersBart Goethals ADReM research group."

Similar presentations


Ads by Google