# G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit

## Presentation on theme: "G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit"— Presentation transcript:

G54DMT – Data Mining Techniques and Applications http://www.cs.nott.ac.uk/~jqb/G54DMT http://www.cs.nott.ac.uk/~jqb/G54DMT Dr. Jaume Bacardit jqb@cs.nott.ac.uk Topic 3: Data Mining Lecture 5: Regression and Association Rules Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber

Outline of the lecture Regression – Definition – Representations Association rules – Definition – Methods Resources

Regression Regression problems are supervised problems where the output variable is continuous Many techniques with different names are included in this category – Regression – Function approximation – Modelling – Curve-fitting Given an input vector X and a corresponding output y, we want to find a function f such that y’=f(X) is as close as possible to the true y

Evaluating regression Supervised learning: we know the true outputs, so we check how different are from the predicted ones – Mean Absolute Error – Mean Squared Error – Root Mean Squared Error

Linear Regression Most classic (and widespread in statistics) type of regression f(X) is modelled as – y’=w 0 +w 1 x 1 +w 2 x 2 +…+w n x n http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/400px-Linear_regression.png

Linear regression Simple but limited in expression power – The same model would apply to these four datasets http://en.wikipedia.org/wiki/Anscombe%27s_quartet

Linear regression How to find W? – Many mathematical methods availableavailable Least squares Ridge regression Lasso Etc – We can also use some kind of metaheuristic (e.g. a Genetic Algorithm)

Polynomial regression More complex and sophisticated functions – y=w 0 +w 1 x+w 2 x 2 +….. – Y=w 0 +w 1 x 1 +w 2 x 2 +w 3 x 1 x 2 +… Now the job is double – Choosing the correct function (human inspection may help) – Adjusting the weights of the model Still, would a single mathematical function fit any type of data?

Piece-wise regression Input space is partitioned in regions A local regression model is generated from the training examples that fell inside each region – Approximating a sine function with linear regressions (Butz, 2010)

Piece-wise regression How to partition the input space – Using a series of rules With a (hyper)rectangular condition (XCSF) With a (hyper)ellipsoidal condition (XCSF,LWPR) With a neural condition (XCSF) – Using a tree-like structure ( CART, M5 ) How to perform the regression process for each local approximation – Pick any of the functions discussed before – Plus some truly non-linear methods (SVR)

Piece-wise approximation with hyperellipsoids Using XCSF (Wilson, 02) with hyperellipsoid conditions (Butz et al, 08) Test function XCSF’s population (Stalph et al, 2010)

Other regression methods Neural networks – A MLP is natively a regression method Classification is done by discretising the output of the network – It is proven that a MLP with enough hidden nodes can approximate any function Support Vector Regression – As in SVM, depending on the kernel we got linear or non- linear regression – The margin specifies a tube around the approximated function. All points inside the tube have their errors ignored – Support Vectors are the points that lay outside the tube

Association Rules Association rules try to find frequent patterns in the dataset that appear together It can use the class label but it does not have to  we can consider it an unsupervised learning paradigm Two types of elements being generated – Association rules: They have antecedent and consequent – Frequent itemsets: They just have an antecedent. Both antecedent and consequent are logic predicates (generally of conjunctive form)

Association rules mining Witten and Frank, 2005 (http://www.cs.waikato.ac.nz/~eibe/Slides2edRev2.zip)

Origin of Association Rules These methods were originally employed to analyse shopping carts Database is specified as a set of transactions. Each of them includes one or more of a set of items An frequent itemset is a set of items that appears in many transactions These databases are extremely sparse TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E

Beers and diapers An urban myth about association rules says that when applied to analyze a very large volume of shopping carts they discovered a very simple pattern – “Customers that buy beer also tend to buy diapers” This story has changed through time. You can find an article about it herehere It is a good example of data mining, as it was able to find an unexpected pattern

Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks – Association, correlation, and causality analysis – Sequential, structural (e.g., sub-graph) patterns – Pattern analysis in spatiotemporal, multimedia, time-series, and stream data – Classification: associative classification – Cluster analysis: frequent pattern-based clustering – Data warehousing: iceberg cube and cube-gradient – Semantic data compression: fascicles – Broad applications

Evaluation of association rules Support – Percentage of examples covered by the predicate in the antecedent – Applies to both association rules and frequent itemsets Confidence – Percentage of the examples matched by the antecedent for which also match the consequent – Only apply to association rules Typically, the user specifies a minimum support and confidence and the algorithm finds all rules above the thresholds

Scalable Methods for Mining Frequent Patterns The downward closure property of frequent patterns – Any subset of a frequent itemset must be frequent – If {beer, diaper, nuts} is frequent, so is {beer, diaper} – i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} Scalable mining methods: Three major approaches – Apriori (Agrawal & Srikant@VLDB’94) – Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00) – Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)

Apriori: A Candidate Generation-and-Test Approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94) Method: – Initially, scan DB once to get frequent 1-itemset – Generate length (k+1) candidate itemsets from length k frequent itemsets – Test the candidates against DB – Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example Database TDB 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan TidItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2 Sup min = 2

The Apriori Algorithm Pseudo-code: C k : Candidate itemset of size k L k : frequent itemset of size k L 1 = {frequent items}; for (k = 1; L k !=  ; k++) do begin C k+1 = candidates generated from L k ; for each transaction t in database do increment the count of all candidates in C k+1 that are contained in t L k+1 = candidates in C k+1 with min_support end return  k L k ;

Resources “The Elements of Statistical Learning” by Hastie et al. contains a lot of detail about statistical regression List of Regression and association rules methods in KEELRegressionassociation rules Weka also contains both kind of methods Chapter 5 of the Han and Kamber book is all about association rules (Han created the Fpgrowth method) Chapter 5 Review of evolutionary algorithms for association rule mining Review

Questions?

Download ppt "G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit"

Similar presentations