Overview. Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt.

Slides:

Advertisements

Similar presentations

Association Rules Apriori Algorithm

Advertisements

Introduction to Artificial Intelligence CS440/ECE448 Lecture 21

Mining Association Rules

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Data Mining Classification: Alternative Techniques

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.

Supervised Learning Recap

Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.

Data Mining Association Analysis: Basic Concepts and Algorithms

CPS : Information Management and Mining

Data Mining: Association Rule Mining CSE880: Database Systems.

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Discriminative and generative methods for bags of features

FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

CII504 Intelligent Engine © 2005 Irfan Subakti Department of Informatics Institute Technology of Sepuluh Nopember Surabaya - Indonesia.

Data Mining Association Analysis: Basic Concepts and Algorithms

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

Radial Basis-Function Networks. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CES 514 – Data Mining Lecture 8 classification (contd…)

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Data Mining Association Analysis: Basic Concepts and Algorithms

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Tree Clustering & COBWEB. Remember: k-Means Clustering.

Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.

Aula 4 Radial Basis Function Networks

SEG Tutorial 2 – Frequent Pattern Mining.

Radial-Basis Function Networks

Clustering Unsupervised learning Generating “classes”

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Data mining and machine learning A brief introduction.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture

Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Mining Frequent Patterns without Candidate Generation.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

Frequent itemset mining and temporal extensions Sunita Sarawagi

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

1 Data Mining: Mining Frequent Patterns, Association and Correlations.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

732A02 Data Mining - Clustering and Association Analysis

Frequent-Pattern Tree

FP-Growth Wenlong Zhang.

Presentation transcript:

Overview

Apriori Algorithm Support is 50% (2/4) Confidence is 66.67% (2/3) TX1Shoes,Socks,Tie TX2Shoes,Socks,Tie,Belt,Shirt TX3Shoes,Tie TX4Shoes,Socks,Belt

Example Five transactions from a supermarket (diaper=fralda) TIDList of Items 1Beer,Diaper,Baby Powder,Bread,Umbrella 2Diaper,Baby Powder 3Beer,Diaper,Milk 4Diaper,Beer,Detergent 5Beer,Milk,Coca-Cola

Step 1 Min_sup 40% (2/5) C1  L1 ItemSupport Beer"4/5" Diaper"4/5" Baby Powder"2/5" Bread"1/5" Umbrella"1/5" Milk"2/5" Detergent"1/5" Coca-Cola"1/5" Item Support Beer"4/5" Diaper"4/5" Baby Powder"2/5" Milk"2/5"

Step 2 and Step 3 C2  L2 ItemSupport Beer, Diaper"3/5" Beer, Baby Powder"1/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5" Diaper,Milk"1/5" Baby Powder,Milk"0" ItemSupport Beer, Diaper"3/5" Beer, Milk"2/5" Diaper,Baby Powder"2/5"

Step 4 C3  empty Min_sup 40% (2/5) ItemSupport Beer, Diaper,Baby Powder"1/5" Beer, Diaper,Milk"1/5" Beer, Milk,Baby Powder"0" Diaper,Baby Powder,Milk"0"

Step 5 min_sup=40% min_conf=70% ItemSupport(A,B)Suport AConfidence Beer, Diaper60%80%75% Beer, Milk40%80%50% Diaper,Baby Powder40%80%50% Diaper,Beer60%80%75% Milk,Beer40% 100% Baby Powder, Diaper40% 100%

Results support 60%, confidence 70% support 40%, confidence 100% support 40%, confidence 70%

Construct FP-tree from a Transaction Database {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} 1.Scan DB once, find frequent 1-itemset (single item pattern) 2.Sort frequent items in frequency descending order, f-list 3.Scan DB again, construct FP-tree F-list=f-c-a-b-m-p

Find Patterns Having p From p-conditional Database Starting at the frequent item header table in the FP-tree Traverse the FP-tree by following the link of each frequent item p Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

From Conditional Pattern-bases to Conditional FP-trees For each pattern-base Accumulate the count for each item in the base Construct the FP-tree for the frequent items of the pattern base m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns relate to m m, fm, cm, am, fcm, fam, cam, fcam -> associations   {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

The Data Warehouse Toolkit, Ralph Kimball, Margy Ross, 2nd ed, 2002

k-means Clustering Cluster centers c 1,c 2,.,c k with clusters C 1,C 2,.,C k

Error The error function has a local minima if,

k-means Example (K=2) Pick seeds Reassign clusters Compute centroids x x Reasssign clusters x x x x Compute centroids Reassign clusters Converged!

Algorithm Random initialization of k cluster centers do { -assign to each x i in the dataset the nearest cluster center (centroid) c j according to d 2 -compute all new cluster centers } until ( |E new - E old | <  or number of iterations max_iterations)

k-Means vs Mixture of Gaussians Both are iterative algorithms to assign points to clusters K-Means: minimize MixGaussian: maximize P(x|C=i) Mixture of Gaussian is the more general formulation Equivalent to k-Means when ∑ i =I,

Tree Clustering Tree clustering algorithm allow us to reveal the internal similarities of a given pattern set To structure these similarities hierarchically Applied to a small set of typical patterns For n patterns these algorithm generates a sequence of 1 to n clusters

Example Similarity between two clusters is assessed by measuring the similarity of the furthest pair of patterns (each one from the distinct cluster) This is the so-called complete linkage rule

Impact of cluster distance measures “Single-Link” (inter-cluster distance= distance between closest pair of points) “Complete-Link” (inter-cluster distance= distance between farthest pair of points)

There are two criteria proposed for clustering evaluation and selection of an optimal clustering scheme (Berry and Linoff, 1996) Compactness, the members of each cluster should be as close to each other as possible. A common measure of compactness is the variance, which should be minimized Separation, the clusters themselves should be widely spaced

Dunn index

The Davies-Bouldin (DB) index (1979)

Pattern Classification (2nd ed.), Richard O. Duda,, Peter E. Hart, and David G. Stork, Wiley Interscience, 2001Richard O. DudaPeter E. HartDavid G. Stork Pattern Recognition: Concepts, Methods and Applications, Joaquim P. Marques de Sa, Springer-Verlag, 2001

3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o

Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

Bayes Naive Bayes

Example Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result (+) in only 98% of the cases in which the disease is actually present, and a correct negative result (-) in only 97% of the cases in which the disease is not present Furthermore, of the entire population have this cancer

Suppose a positive result (+) is returned...

Normalization The result of Bayesian inference depends strongly on the prior probabilities, which must be available in order to apply the method

Belief Networks Burglary P(B) Earthquake P(E) Alarm Burg. Earth.P(A) tt.95 tf.94 ft.29 f f.001 JohnCallsMaryCalls A P(J) t.90 f.05 A P(M) t.7 f.01

Full Joint Distribution

P(Burglary|JohnCalls=ture,MarryCalls=true) The hidden variables of the query are Earthquake and Alarm For Burglary=true in the Bayesain network

P(b) is constant and can be moved out, P(e) term can be moved outside summation a JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%

Computation for Burglary=true

Artificial Intelligence - A Modern Approach, Second Edition, S. Russel and P. Norvig, Prentice Hall, 2003

ID3 - Tree learning

The credit history loan table has following information p(risk is high)=6/14 p(risk is moderate)=3/14 p(risk is low)=5/14

In the credit history loan table we make income the property tested at the root This makes the division into C 1 ={1,4,7,11},C 2 ={2,3,12,14},C 3 ={5,6,8,9,10,13}

gain(income)=I(credit_table)-E(income) gain(income)= gain(income)=0.967 bits gain(credit history)=0.266 gain(debt)=0.581 gain(collateral)=0.756

Overfitting Consider error of hypothesis h over Training data: error train (h) Entire distribution D of data: error D (h) Hypothesis h  H overfits training data if there is an alternative hypothesis h’  H such that error train (h) < error train (h’) and error D (h) > error D (h’)

An ID3 tree consistent with the data Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

Corresponding rules by C4.5 If the person‘s hair is blonde and the person uses lotion then nothing happens If the person‘s hair color is blonde and the person uses no lotion then the person turns red If the person‘s hair color is red then the person turns red If the person‘s hair color is brown then nothing happens

Default rule If the person uses lotion then nothing happens If the person‘s hair color is brown then nothing happens If no other rule applies then the person turns red

Artificial Intelligence, Partick Henry Winston, Addison-Wesley, 1992 Artificial Intelligence - Structures and Strategies for Complex Problem Solving, Second Edition, G. L. Luger and W. A. Stubblefield, Benjamin/Cummings Publishing, 1993 Machine Learning, Tom M. Mitchell, McGraw Hill, 1997

Perceptron Limitations Gradient descent

XOR problem and Perceptron By Minsky and Papert in mid 1960

Gradient Descent To understand, consider simpler linear unit, where Let's learn w i that minimize the squared error, D={(x 1,t 1 ),(x 2,t 2 ),..,(x d,t d ),..,(x m,t m )} (t for target)

Feed-forward networks Back-Propagation Activation Functions

xkxk x 1 x 2 x 3 x 4 x 5

In our example E becomes E[w] is differentiable given f is differentiable Gradient descent can be applied

RBF-network

RBF-networks Support Vector Machines

Extension to Non-linear Decision Boundary Possible problem of the transformation High computation burden and hard to get a good estimate SVM solves these two issues simultaneously Kernel tricks for efficient computation Minimize ||w|| 2 can lead to a “good” classifier  ( )  (.)  ( ) Feature space Input space

Machine Learning, Tom M. Mitchell, McGraw Hill, 1997 Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999