Integration of association rules into WUM Bastian Germershaus

contents introduction WUM web usage miner in general association rules – a brief example association rules – the theory association rules in WUM association rules in WUM – a demo

the problem we have a large amount of data (e.g. from the server log of our website) we would like to know if there are any rules in the behavior of our costumers we could use these rules later on to optimize our business

an example ( Amazon.COM, as usual )

looking at their sales data retrieving all orders by customer and current item building rules: if a customer bought book X → he also bought books X and/or Y and/or Z what Amazon.COM does

WUM - web usage miner (1) main goal: navigation pattern discovery sequence of pages through the website typical patterns optimization of site navigation three steps log file cleaning pattern analysis visualization

WUM – web usage miner (2) Source: Myra Spiliopoulou: "Web Usage Mining for Web Site Evaluation" in Communications of the ACM, August 2000, Vol. 43

WUM – web usage miner (3) special requirements miner should understand abstract pattern descriptions 'MINT' (SQL-like query language) usage patterns should be more than a sequence of frequently accessed pages integration of statistics about the routes connecting pages frequently accessed together

WUM – web usage miner (4) Source: Myra Spiliopoulou: "Web Usage Mining for Web Site Evaluation" in Communications of the ACM, August 2000, Vol. 43

WUM – web usage miner (5) evaluation of discovered patterns is needed statistical testing semantic evaluation discovered navigation patterns may help restructuring the site redesign pages, inserting links restructuring may confuse some users

association rules (1) example we sell cell-phones, gadgets and accessories Homepage (H) cell-phones (C1) Nokia (C21) 3110 (C211) 8110 (C212) Siemens (C22) C35 (C221) S 45 (C222) gadgets (G1) Palm (G11) Palm III (G111) Palm V (G112) Compaq (G12) Ipaq (G121) accessories (A1) Nokia (A11) hands-free kit (A111) Siemens (A12) hands-free kit (A121)

association rules (2) possible association rule C212 (13)G121 & A111 (30) Support: 0,065 (6,5 %)Confidence: 0,433 (43,3 %) 200 different orders in database 13 of 30 users, that bought a Compaq Ipaq and a hands- free kit for Nokia phones also bought a Nokia 8110.

association rules (3) sequence of pages does NOT matter 'if – then – condition' parameters (support, confidence) useful rules apply reasonably often ( support) are unusually reliable ( confidence) make interesting predictions

association rules in general (1) "if a customer came to our website through a banner and it is not his first visit then he buys an article" this object has three attributes: came through banner at least second visit buys an article

association rules in general (2) binary attributes (0 or 1; yes or no) rules should have the form if attribute X ► then attribute Y (X → Y) attributes should be disjunctive (X ∩ Y=Ø)

association rules in general (3) parameters for association rules: confidence "60% where attribute X is true → attribute Y is also true" support "40% where attribute X is true → attribute Y is also true; that applies to 10% of all cases in the database"

association rules in general (4) the goal of the used Apriori algorithm is: "find all rules where minimum support and minimum confidence holds true" two iterative steps find 'large item sets' with minimum support candidate-generating and pruning

association rules in general (5) find large item sets support-calculation for every candidate (support means occurrence of candidate in relation to whole number of objects) remove every candidate with smaller support then 'minimum support' save candidates with high incidence

association rules in general (6) candidate-generating and pruning temporary candidates: for two sets X, Y of cardinality n, which have n-1 attributes in common, build a temporary candidate X U Y pruning: eliminate all candidates, where support of each candidate with a cardinality of n is lower than min. support

Seminar Webmining, Institut für Wirtschaftsinformatik 24 association rules in WUM (1)

Seminar Webmining, Institut für Wirtschaftsinformatik 25 association rules in WUM (2)

