Presentation is loading. Please wait.

Presentation is loading. Please wait.

AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07

Similar presentations


Presentation on theme: "AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07"— Presentation transcript:

1 AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07 Email lee@hud.ac.uklee@hud.ac.uk http://scom.hud.ac.uk/scomtlm/cha2555/

2 Artform Research Group Focus on one area: Data Mining involves discovering patterns from large data bases or data warehouses for different purposes. It is the science of extracting meaningful information from (large) databases. Applications - Market analysis and Retail, Decision support, Financial analysis, Discovering environmental trends Two Types of Learning: Data Mining can be supervised (“Learning from Example”) or unsupervised (“Learning from Observation”) Data Mining is often part of a larger process aimed at getting more out of data warehouses and involves data clensing data clensing: is the process of identifying and removing or correcting corrupted record from a database. This makes the data consistent with other similar data sets in the database. Eg the process may remove invalid post codes, spurious extreme values (eg -999999.999).

3 Artform Research Group Association Rule Mining(ARM) This is an “unsupervised learning activity” - briefly, looking for strong associations between features in data. Definitions: A transactional database is a set of “transactions” eg the details of individual sales. A transaction can be though of as an “item-set” where each item is an attribute-value {height=6, temp = 20. weather = warm} As a special case we could have nominal item sets {bread, cheese, milk}

4 Artform Research Group Association Rule Mining(ARM): Important Definitions An association rule is an expression X => Y where X, Y are item-sets, and The support of an association rule is defined as the proportion of transactions in the database that contain X U Y. The confidence of an association rule is defined as the probability that a transaction contains Y given that it contains X, that is = no of transactions containing (X U Y) / no of transactions containing X

5 Artform Research Group Example A trader deals in the following currencies in a series of 8 transactions… 1Sterling YenDollarEuro 2DollarEuroRandSterlingRuble 3PesosEuroRubleRupeeYen 4RupeeSterlingRubleEuroDollar 5Sterling DinarsRandYen 6Pesos KronerSterlingDollar 7RubleRupeeKronerSterlingPesos 8DollarEuroSterling What is the SUPPORT and CONFIDENCE of the following rules? {Ruble } → {Rupee} {Sterling, Euro} → {Ruble} {Sterling, Euro} → {Ruble,,Pesos} Find an association rule from the set of transactions that has - at least 2 items in its antecedents, - better support and better confidence than both rules above.

6 Artform Research Group Aims of ARM Given a transactional database D, the association rule problem is to find all rules that have supports and confidences greater than certain user-specified thresholds, denoted by minimum support (MinSupp) and minimum confidence (MinConf), respectively. The aim is the discovery of the most significant associations between the items in a transactional data set. This process involves primarily the discovery of so called frequent item- sets, i.e. item-sets that occurred in the transactional data set above MinSupp and MinConf.

7 Artform Research Group Contract: Classification Rule Mining The output of DM is a (set of) classification rule(s) WHERE classes are known apriori (supervised learning) and there is only one class on RHS. Features => C(1) …. Features => C(n)

8 Artform Research Group Classification Rule Mining Size = medium, colour = green, shape = square => c1 Size = small, colour = red, shape = square => c1 Size = small, colour = blue, shape = circle => c1 Size = small, colour = green, shape = triangle => c2 Size = large, colour = white, shape = circle => c2 Aims is to find “hypotheses” that are Characteristic – true of all members of a class Discriminating – not true of ANY members of other classes

9 Artform Research Group Associative Classification If we fuse ARM and CRM we get “Associative Classification” – use the association technique, but learning about particular items or item sets. Associative Classification is a branch in data mining that combines classification and association rule mining. In other words, it utlises association rule discovery methods in classification data sets. Typically: Find Association Rules using ARM Sift out the “Class Association Rules” – ones that have the class of interest on their Right Hand Sides

10 Artform Research Group Example in Road Traffic Control

11 Artform Research Group Example in Road Traffic Control

12 Artform Research Group Example in Road Traffic Control Data.. Numeric Data Record from individual CARS (date, time, position, actual speed, expected speed) Textual Data of INCIDENTS (date, time start, time cleared, position, severity, road type, area, incident category, cause, road-effect, traffic-effect, reporter..)

13 Artform Research Group Example in Road Traffic Control associations between variations in speeds with near- future incidents effect of a particular type of incident (eg roadworks) on average speeds on nearby trunk roads looking for predictors in "heavy/slow traffic" incidents: look for associations with speed variations or accidents on roads downstream from the incident position (hence causing the incident) looking for associations between speeds around a bypass and a later "heavy traffic" incident within the town bypassed extraction of the roads that have most impact to cause congestion formulation of rules that can predict conditions after a period of road works or an incident (depending on specific road, type of incident etc).

14 Artform Research Group Conclusions Data Mining is a powerful set of techniques to help discover hidden knowledge It can be supervised or unsupervised. ARM CRM AC Are three important classes of technique used in DM


Download ppt "AI Week 23 Machine Learning Data Mining – Week 2 Lee McCluskey, room 2/07"

Similar presentations


Ads by Google