Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt.

Similar presentations


Presentation on theme: "Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt."— Presentation transcript:

1 Overview

2 Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt

3 Example Five transactions from a supermarket (diaper=fralda) TIDList of Items 1Beer,Diaper,Baby Powder,Bread,Umbrella 2Diaper,Baby Powder 3Beer,Diaper,Milk 4Diaper,Beer,Detergent 5Beer,Milk,Coca-Cola

4 Step 1 Min_sup 40% (2/5) C1  L1 ItemSupport Beer"4/5" Diaper"4/5" Baby Powder"2/5" Bread"1/5" Umbrella"1/5" Milk"2/5" Detergent"1/5" Coca-Cola"1/5" Item Support Beer"4/5" Diaper"4/5" Baby Powder"2/5" Milk"2/5"

5 Step 2 and Step 3 C2  L2 ItemSupport Beer, Diaper"3/5" Beer, Baby Powder"1/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5" Diaper,Milk"1/5" Baby Powder,Milk"0" ItemSupport Beer, Diaper"3/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5"

6 Step 4 C3  empty Min_sup 40% (2/5) ItemSupport Beer, Diaper,Baby Powder"1/5" Beer, Diaper,Milk"1/5" Beer, Milk,Baby Powder"0" Diaper,Baby Powder,Milk"0"

7 Step 5 min_sup=40% min_conf=70% ItemSupport(A,B)Suport AConfidence Beer, Diaper60%80%75% Beer, Milk40%80%50% Diaper,Baby Powder40%80%50% Diaper,Beer60%80%75% Milk,Beer40% 100% Baby Powder, Diaper40% 100%

8 Results support 60%, confidence 70% support 40%, confidence 100% support 40%, confidence 70%

9 Construct FP-tree from a Transaction Database {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} 1.Scan DB once, find frequent 1-itemset (single item pattern) 2.Sort frequent items in frequency descending order, f-list 3.Scan DB again, construct FP-tree F-list=f-c-a-b-m-p

10 Find Patterns Having p From p-conditional Database Starting at the frequent item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item p Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

11 From Conditional Pattern-bases to Conditional FP-trees For each pattern-base Accumulate the count for each item in the base Construct the FP-tree for the frequent items of the pattern base m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns relate to m m, fm, cm, am, fcm, fam, cam, fcam -> associations   {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

12 The Data Warehouse Toolkit, Ralph Kimball, Margy Ross, 2nd ed, 2002

13 k-means Clustering Cluster centers c 1,c 2,.,c k with clusters C 1,C 2,.,C k

14 Error The error function has a local minima if,

15 k-means Example (K=2) Pick seeds Reassign clusters Compute centroids x x Reasssign clusters x x x x Compute centroids Reassign clusters Converged!

16 Algorithm Random initialization of k cluster centers do { -assign to each x i in the dataset the nearest cluster center (centroid) c j according to d 2 -compute all new cluster centers } until ( |E new - E old | <  or number of iterations max_iterations)

17 k-Means vs Mixture of Gaussians Both are iterative algorithms to assign points to clusters K-Means: minimize MixGaussian: maximize P(x|C=i) Mixture of Gaussian is the more general formulation Equivalent to k-Means when ∑ i =I,

18 Tree Clustering Tree clustering algorithm allow us to reveal the internal similarities of a given pattern set To structure these similarities hierarchically Applied to a small set of typical patterns For n patterns these algorithm generates a sequence of 1 to n clusters

19 Example Similarity between two clusters is assessed by measuring the similarity of the furthest pair of patterns (each one from the distinct cluster) This is the so-called complete linkage rule

20 Impact of cluster distance measures “Single-Link” (inter-cluster distance= distance between closest pair of points) “Complete-Link” (inter-cluster distance= distance between farthest pair of points)

21 There are two criteria proposed for clustering evaluation and selection of an optimal clustering scheme (Berry and Linoff, 1996) Compactness, the members of each cluster should be as close to each other as possible. A common measure of compactness is the variance, which should be minimized Separation, the clusters themselves should be widely spaced

22 Dunn index

23 The Davies-Bouldin (DB) index (1979)

24 Pattern Classification (2nd ed.), Richard O. Duda,, Peter E. Hart, and David G. Stork, Wiley Interscience, 2001Richard O. DudaPeter E. HartDavid G. Stork Pattern Recognition: Concepts, Methods and Applications, Joaquim P. Marques de Sa, Springer-Verlag, 2001

25 3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o

26 Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

27 Bayes Naive Bayes

28 Example Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result (+) in only 98% of the cases in which the disease is actually present, and a correct negative result (-) in only 97% of the cases in which the disease is not present Furthermore, 0.008 of the entire population have this cancer

29 Suppose a positive result (+) is returned...

30 Normalization The result of Bayesian inference depends strongly on the prior probabilities, which must be available in order to apply the method

31 Belief Networks Burglary P(B) 0.001 Earthquake P(E) 0.002 Alarm Burg. Earth.P(A) tt.95 tf.94 ft.29 f f.001 JohnCallsMaryCalls A P(J) t.90 f.05 A P(M) t.7 f.01

32 Full Joint Distribution

33 P(Burglary|JohnCalls=ture,MarryCalls=true) The hidden variables of the query are Earthquake and Alarm For Burglary=true in the Bayesain network

34 P(b) is constant and can be moved out, P(e) term can be moved outside summation a JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%

35 Computation for Burglary=true

36 Artificial Intelligence - A Modern Approach, Second Edition, S. Russel and P. Norvig, Prentice Hall, 2003

37 ID3 - Tree learning

38

39 The credit history loan table has following information p(risk is high)=6/14 p(risk is moderate)=3/14 p(risk is low)=5/14

40 In the credit history loan table we make income the property tested at the root This makes the division into C 1 ={1,4,7,11},C 2 ={2,3,12,14},C 3 ={5,6,8,9,10,13}

41 gain(income)=I(credit_table)-E(income) gain(income)=1.531-0.564 gain(income)=0.967 bits gain(credit history)=0.266 gain(debt)=0.581 gain(collateral)=0.756

42

43 Overfitting Consider error of hypothesis h over Training data: error train (h) Entire distribution D of data: error D (h) Hypothesis h  H overfits training data if there is an alternative hypothesis h’  H such that error train (h) < error train (h’) and error D (h) > error D (h’)

44 An ID3 tree consistent with the data Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

45 Corresponding rules by C4.5 If the person‘s hair is blonde and the person uses lotion then nothing happens If the person‘s hair color is blonde and the person uses no lotion then the person turns red If the person‘s hair color is red then the person turns red If the person‘s hair color is brown then nothing happens

46 Default rule If the person uses lotion then nothing happens If the person‘s hair color is brown then nothing happens If no other rule applies then the person turns red

47 Artificial Intelligence, Partick Henry Winston, Addison-Wesley, 1992 Artificial Intelligence - Structures and Strategies for Complex Problem Solving, Second Edition, G. L. Luger and W. A. Stubblefield, Benjamin/Cummings Publishing, 1993 Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

48 Perceptron Limitations Gradient descent

49 XOR problem and Perceptron By Minsky and Papert in mid 1960

50 Gradient Descent To understand, consider simpler linear unit, where Let's learn w i that minimize the squared error, D={(x 1,t 1 ),(x 2,t 2 ),..,(x d,t d ),..,(x m,t m )} (t for target)

51 Feed-forward networks Back-Propagation Activation Functions

52 xkxk x 1 x 2 x 3 x 4 x 5

53 In our example E becomes E[w] is differentiable given f is differentiable Gradient descent can be applied

54 RBF-network

55 RBF-networks Support Vector Machines

56 Extension to Non-linear Decision Boundary Possible problem of the transformation High computation burden and hard to get a good estimate SVM solves these two issues simultaneously Kernel tricks for efficient computation Minimize ||w|| 2 can lead to a “good” classifier  ( )  (.)  ( ) Feature space Input space

57 Machine Learning, Tom M. Mitchell, McGraw Hill, 1997 Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999


Download ppt "Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt."

Similar presentations


Ads by Google