Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

A distributed method for mining association rules
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Lecture 6 Basic Statistics Dr. A.K.M. Shafiqul Islam School of Bioprocess Engineering University Malaysia Perlis
Covariance and Correlation: Estimator/Sample Statistic: Population Parameter: Covariance and correlation measure linear association between two variables,
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
1 Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University.
Estimating Energy Efficiency of Buildings Matthew Wysocki.
Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Statistics 350 Lecture 21. Today Last Day: Tests and partial R 2 Today: Multicollinearity.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Linear Correlation To accompany Hawkes lesson 12.1 Original content by D.R.S.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Run the colour experiment where kids write red, green, yellow …first.
Outline Class Intros Overview of Course Example Research Project.
Introduction to Behavioral Statistics Correlation & Regression.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Introduction to Correlation Analysis. Objectives Correlation Types of Correlation Karl Pearson’s coefficient of correlation Correlation in case of bivariate.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
Warm-up: 9/9 Factor the following polynomials a.) b.) c.)
Chapter 3 Correlation.  Association between scores on two variables –e.g., age and coordination skills in children, price and quality.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Differential Equations Linear Equations with Variable Coefficients.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
3.2 Division of Polynomials. Remember this? Synthetic Division 1. The divisor must be a binomial. 2. The divisor must be linear (degree = 1) 3. The.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
Introduction to Differential Equations
Inference about the slope parameter and correlation
Calculating the correlation coefficient
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Linear Regression Prof. Andy Field.
Waikato Environment for Knowledge Analysis
Data Analysis Introduction To Online Education BIT
Introduction to Behavioral Statistics
Evaluate and Graph Polynomial Functions
A Parameterised Algorithm for Mining Association Rules
Prof. Eric A. Suess Chapter 3
Transactional data Algorithm Applications

6.2 Evaluating and Graphing Polynomials
The Linear Correlation Coefficient
نجاح وفشل المنشآت الصغيرة
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Use Inverse Matrices to Solve 2 Variable Linear Systems
The Least-Squares Line Introduction
Chapter 12 Linear Regression and Correlation
Correlation & Trend Lines
Presentation transcript:

Statistical Analysis of Transaction Dataset Data Visualization Homework 2 Hongli Li

Dataset Introduction Generated by IBM Quest Synthetic Data Generation Code It’s Transaction Dataset It’s for Mining Association Rules Generation Parameter  Number of transaction = 1000  Average transaction length = 10 (default)  Number of items = 30

Transaction Dataset

Metadata No Missing Values Actual Transaction Number = 980 Actual Average Transaction Length = 9.24 Actual Number of Items = 30 The Most Frequent Item Is Item 12 (64%) The second Most Freq. Item is Item 9 (62%) Other Information

Pearson Correlation – Item × Item A measured of the degree of linear relation between two variables Person correlation matrix of Item x ItemItem x Item The most correlated two items are item 24 and item 1(0.138)

Pearson Correlation – TID × TID Pivot the dataset to get Item x TID matrixItem x TID Person correlation matrix of TID x TIDTID x TID The most correlated transaction are TID 9 and TID 857, the correlation coefficient between these two is 1

Conclusion Only using statistical tools is hard! Needs mining algorithms Visualization could help