Download presentation

Presentation is loading. Please wait.

1
**A Brief Tour of R’s arules package July 2012**

Stu Rodgers 1

2
Package arules Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. > library(arules) Loading required package: Matrix Loading required package: lattice Attaching package: ‘Matrix’ The following object(s) are masked from ‘package:base’: det Attaching package: ‘arules’ %in% Warning message: package ‘arules’ was built under R version

3
Data in arules > data() Data sets in package ‘arules’: Adult Adult Data Set AdultUCI Adult Data Set Epub Epub Data Set Groceries Groceries Data Set Income Income Data Set IncomeESL Income Data Set ...

4
**Using arules > data("Epub") > ls() [1] "Epub" > summary(Epub)**

transactions as itemMatrix in sparse format with 15729 rows (elements/itemsets/transactions) and 936 columns (items) and a density of most frequent items: doc_11d doc_813 doc_4c6 doc_955 doc_698 (Other) element (itemset/transaction) length distribution: sizes Notes – Epub contains downloads of documents from the Electronic Publication platform of the Vienna University of Economics and Business available via from January 2003 to December 2008.

5
Using arules > summary(Epub) -- continued Min. 1st Qu. Median Mean 3rd Qu. Max includes extended item information - examples: labels 1 doc_11d 2 doc_13d 3 doc_14c includes extended transaction information - examples: transactionID TimeStamp session_ :59: session_ :46: session_479a :50:38

6
Using arules > year <- strftime(as.POSIXlt(transactionInfo(Epub)[["TimeStamp"]]), "%Y") > table(year) year > Epub2003 <- Epub[year == "2003"] > length(Epub2003) [1] 987 > image(Epub2003) Notes strftime - converts between strings and POSIXlt class objects POSIXlt - objects for representing calendar dates and times (to nearest second) "%Y" – 4 digit year

7
**Visualizing transactions**

> image(Epub2003)

8
Using arules > data(“AdultUCI") > dim(AdultUCI) [1] > R demo ... Notes – AdultUCI - UCI machine learning repository (Asuncion and Newman 2007) of U.S. census bureau database. Contains attributes like age, work class, education, etc. In the original applications of the data, the attributes were used to predict the income level of individuals.

9
Questions Stu Rodgers

Similar presentations

OK

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on history of olympics rings Ppt on role of individual in conservation of natural resources Ppt on cross-sectional study vs cohort study Ppt on the road not taken lesson Share ppt online Ppt on 60 years of indian parliament Ppt on tribal communities of india Ppt on statistics in maths what is the product Train the trainer ppt on principle of training Ppt on data collection methods in quantitative research