Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt.

Overview

Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt

Example Five transactions from a supermarket (diaper=fralda) TIDList of Items 1Beer,Diaper,Baby Powder,Bread,Umbrella 2Diaper,Baby Powder 3Beer,Diaper,Milk 4Diaper,Beer,Detergent 5Beer,Milk,Coca-Cola

Step 1 Min_sup 40% (2/5) C1  L1 ItemSupport Beer"4/5" Diaper"4/5" Baby Powder"2/5" Bread"1/5" Umbrella"1/5" Milk"2/5" Detergent"1/5" Coca-Cola"1/5" Item Support Beer"4/5" Diaper"4/5" Baby Powder"2/5" Milk"2/5"

Step 2 and Step 3 C2  L2 ItemSupport Beer, Diaper"3/5" Beer, Baby Powder"1/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5" Diaper,Milk"1/5" Baby Powder,Milk"0" ItemSupport Beer, Diaper"3/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5"

Step 4 C3  empty Min_sup 40% (2/5) ItemSupport Beer, Diaper,Baby Powder"1/5" Beer, Diaper,Milk"1/5" Beer, Milk,Baby Powder"0" Diaper,Baby Powder,Milk"0"

Step 5 min_sup=40% min_conf=70% ItemSupport(A,B)Suport AConfidence Beer, Diaper60%80%75% Beer, Milk40%80%50% Diaper,Baby Powder40%80%50% Diaper,Beer60%80%75% Milk,Beer40% 100% Baby Powder, Diaper40% 100%

Results support 60%, confidence 70% support 40%, confidence 100% support 40%, confidence 70%

Construct FP-tree from a Transaction Database {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} 1.Scan DB once, find frequent 1-itemset (single item pattern) 2.Sort frequent items in frequency descending order, f-list 3.Scan DB again, construct FP-tree F-list=f-c-a-b-m-p

Find Patterns Having p From p-conditional Database Starting at the frequent item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item p Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

From Conditional Pattern-bases to Conditional FP-trees For each pattern-base Accumulate the count for each item in the base Construct the FP-tree for the frequent items of the pattern base m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns relate to m m, fm, cm, am, fcm, fam, cam, fcam -> associations   {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

The Data Warehouse Toolkit, Ralph Kimball, Margy Ross, 2nd ed, 2002

k-means Clustering Cluster centers c 1,c 2,.,c k with clusters C 1,C 2,.,C k

Error The error function has a local minima if,

k-means Example (K=2) Pick seeds Reassign clusters Compute centroids x x Reasssign clusters x x x x Compute centroids Reassign clusters Converged!

Algorithm Random initialization of k cluster centers do { -assign to each x i in the dataset the nearest cluster center (centroid) c j according to d 2 -compute all new cluster centers } until ( |E new - E old | <  or number of iterations max_iterations)

k-Means vs Mixture of Gaussians Both are iterative algorithms to assign points to clusters K-Means: minimize MixGaussian: maximize P(x|C=i) Mixture of Gaussian is the more general formulation Equivalent to k-Means when ∑ i =I,

Tree Clustering Tree clustering algorithm allow us to reveal the internal similarities of a given pattern set To structure these similarities hierarchically Applied to a small set of typical patterns For n patterns these algorithm generates a sequence of 1 to n clusters

Example Similarity between two clusters is assessed by measuring the similarity of the furthest pair of patterns (each one from the distinct cluster) This is the so-called complete linkage rule

Impact of cluster distance measures “Single-Link” (inter-cluster distance= distance between closest pair of points) “Complete-Link” (inter-cluster distance= distance between farthest pair of points)

There are two criteria proposed for clustering evaluation and selection of an optimal clustering scheme (Berry and Linoff, 1996) Compactness, the members of each cluster should be as close to each other as possible. A common measure of compactness is the variance, which should be minimized Separation, the clusters themselves should be widely spaced

Dunn index

The Davies-Bouldin (DB) index (1979)

Pattern Classification (2nd ed.), Richard O. Duda,, Peter E. Hart, and David G. Stork, Wiley Interscience, 2001Richard O. DudaPeter E. HartDavid G. Stork Pattern Recognition: Concepts, Methods and Applications, Joaquim P. Marques de Sa, Springer-Verlag, 2001

3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o

Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

Bayes Naive Bayes

Example Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result (+) in only 98% of the cases in which the disease is actually present, and a correct negative result (-) in only 97% of the cases in which the disease is not present Furthermore, 0.008 of the entire population have this cancer

Suppose a positive result (+) is returned...

Normalization The result of Bayesian inference depends strongly on the prior probabilities, which must be available in order to apply the method

Belief Networks Burglary P(B) 0.001 Earthquake P(E) 0.002 Alarm Burg. Earth.P(A) tt.95 tf.94 ft.29 f f.001 JohnCallsMaryCalls A P(J) t.90 f.05 A P(M) t.7 f.01

Full Joint Distribution

P(Burglary|JohnCalls=ture,MarryCalls=true) The hidden variables of the query are Earthquake and Alarm For Burglary=true in the Bayesain network

P(b) is constant and can be moved out, P(e) term can be moved outside summation a JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%

Computation for Burglary=true

Artificial Intelligence - A Modern Approach, Second Edition, S. Russel and P. Norvig, Prentice Hall, 2003

ID3 - Tree learning

The credit history loan table has following information p(risk is high)=6/14 p(risk is moderate)=3/14 p(risk is low)=5/14

In the credit history loan table we make income the property tested at the root This makes the division into C 1 ={1,4,7,11},C 2 ={2,3,12,14},C 3 ={5,6,8,9,10,13}

gain(income)=I(credit_table)-E(income) gain(income)=1.531-0.564 gain(income)=0.967 bits gain(credit history)=0.266 gain(debt)=0.581 gain(collateral)=0.756

Overfitting Consider error of hypothesis h over Training data: error train (h) Entire distribution D of data: error D (h) Hypothesis h  H overfits training data if there is an alternative hypothesis h’  H such that error train (h) < error train (h’) and error D (h) > error D (h’)

An ID3 tree consistent with the data Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

Corresponding rules by C4.5 If the person‘s hair is blonde and the person uses lotion then nothing happens If the person‘s hair color is blonde and the person uses no lotion then the person turns red If the person‘s hair color is red then the person turns red If the person‘s hair color is brown then nothing happens

Default rule If the person uses lotion then nothing happens If the person‘s hair color is brown then nothing happens If no other rule applies then the person turns red

Artificial Intelligence, Partick Henry Winston, Addison-Wesley, 1992 Artificial Intelligence - Structures and Strategies for Complex Problem Solving, Second Edition, G. L. Luger and W. A. Stubblefield, Benjamin/Cummings Publishing, 1993 Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

Perceptron Limitations Gradient descent

XOR problem and Perceptron By Minsky and Papert in mid 1960

Gradient Descent To understand, consider simpler linear unit, where Let's learn w i that minimize the squared error, D={(x 1,t 1 ),(x 2,t 2 ),..,(x d,t d ),..,(x m,t m )} (t for target)

Feed-forward networks Back-Propagation Activation Functions

xkxk x 1 x 2 x 3 x 4 x 5

In our example E becomes E[w] is differentiable given f is differentiable Gradient descent can be applied

RBF-network

RBF-networks Support Vector Machines

Extension to Non-linear Decision Boundary Possible problem of the transformation High computation burden and hard to get a good estimate SVM solves these two issues simultaneously Kernel tricks for efficient computation Minimize ||w|| 2 can lead to a “good” classifier  ( )  (.)  ( ) Feature space Input space

Machine Learning, Tom M. Mitchell, McGraw Hill, 1997 Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999

Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt.

Similar presentations

Presentation on theme: "Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt.

Similar presentations

Presentation on theme: "Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt."— Presentation transcript:

Similar presentations

About project

Feedback