DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
An introduction to Principal Component Analysis (PCA)
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
Neural Computation Prof. Nathan Intrator
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Techniques for studying correlation and covariance structure
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
DATA MINING Ronald Westra Dep. Mathematics Knowledge Engineering
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Additive Data Perturbation: data reconstruction attacks.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Unsupervised Learning II Feature Extraction
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.
Unsupervised Learning II Feature Extraction
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
CSE 554 Lecture 8: Alignment
Principal Component Analysis
Principal Component Analysis (PCA)
Principal Component Analysis
Exploring Microarray data
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Quality Control at a Local Brewery
Descriptive Statistics vs. Factor Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University

PART 2 Exploratory Data Analysis

VISUALISING AND EXPLORING DATA-SPACE Data Mining Lecture II [Chapter 3 from Principles of Data Mining by Hand,, Manilla, Smyth ]

LECTURE 3: Visualising and Exploring Data-Space Readings: Chapter 3 from Principles of Data Mining by Hand, Mannila, Smyth. 3.1 Obtain insight in the Structure in Data Space 1.distribution over the space 2.Are there separate and disconnected parts? 3.is there a model? 4.data-driven hypothesis testing 5.Starting point: use strong perceptual powers of humans

LECTURE 3: Visualising and Exploring Data-Space 3.2 Tools to represent a variabele 1.mean, variance, standard deviation, skewness 2.plot 3.moving-average plot 4.histogram, kernel

histogram

Box Plots

Overprinting

Contour plot

LECTURE 3: Visualising and Exploring Data-Space 3.3 Tools for repressenting two variables 1.scatter plot 2.moving-average plots

scatter plot

scatter plots

LECTURE 3: Visualising and Exploring Data-Space 3.4 Tools for representing multiple variables 1.all or selection of scatter plots 2.idem moving-average plots 3.‘trelis’ or other parameterised plots 4.icons: star icons, Chernoff’s faces

Chernoff’s faces

Star Plots

Parallel coordinates

3.5 PCA: Principal Component Ananlysis 3.6 MDS: Multidimensional Scaling DIMENSION REDUCTION

3.5 PCA: Principal Component Ananlysis With sub-scatter plots we already noticed that the best projections were determined by the projection that resulted in the maximal size of the set of data points. This is in the direction of the maximum variance. This idea is worked out in the approach of the Principal Components Analysis.

3.5 PCA: Principal Component Ananlysis Principal component analysis (PCA) is a vector space transform often used to reduce multidimensional data sets to lower dimensions for analysis. Depending on the field of application, it is also named the discrete Karhunen-Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD). PCA now is the mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and contribution.

3.5 PCA: Principal Component Ananlysis PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualised as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA supplies the user with a lower-dimensional picture, a "shadow" of this object when viewed from its (in some sense) most informative viewpoint.

3.5 PCA: Principal Component Ananlysis PCA is closely related to factor analysis.

3.5 PCA: Principal Component Ananlysis Consider a multivariate set in Data Space: this is a set with normal distributions in multiple dimensions, for instance: Observe that the spatial extent appears different in each dimension. Also observe that in this case the set is almost 1- dimensional. Can we project the set so that the spatial extent in one dimension is optimal?

3.5 PCA: Principal Component Ananlysis Data X: n rows of p fields: the vectors are rows in X. STEP 1: Subtract the average value from the dataset X: mean centered data. The spatial extent of this cloud of points can be measured by the variance in the dataset X. This is an entry in the correlation matrix V = X T X. The projection of the dataset X in a direction a is: y = Xa. a The spatial extent in direction a is the variance in the projected dataset Y: i.e. the variance σ a 2 = y T y = (Xa) T (Xa) = a T X T Xa = a T V a. We now want to maximize this extent σ a 2 over all possible vectors a (why?).

3.5 PCA: Principal Component Ananlysis STEP 2: Maximize: σ a 2 = a T V a over all possible vectors a. This is unlimited, just like maximizing x 2 over x, therefore we restrict the size of vector a to 1: a T V a – 1 = 0 So we have: maximize: a T V a subject to: a T V a – 1 = 0 This can be solved with the Lagrange-multipliers method: maximize: f(x) subject to: g(x) = 0 → d/dx{ f(x) – λ g(x)} = 0 For our case this means: d/da{ a T V a – λ (a T V a – 1 )} = 0 → 2 Va – 2λa = 0 → Va = λa This means that we are looking for the eigen-vectors and eigen-values of the correlation matrix V = X T X.

3.5 PCA: Principal Component Analysis So, the underlying idea is: supose you have a high-dimensional normal- distributed data set. This will take the shape of a high-dimensional ellipsoid. An ellipsoid is structured from its centre by orthogonal vectors with different radii. The largest radii have the strongest influence on the shape of the ellipsoid. The ellipsoid is described by the covariance-matrix of the set of data- points. The axes are defined by the orthogonal eigen-vectors (from the centre – the centroid – of the set), the radii are defined by the associated values. So determine the eigen-values and order those in decreasing size:. The first n ordered eigen-vectors thus ‘explain’ the following amount of the data:.

3.5 PCA: Principal Component Ananlysis

MEAN

3.5 PCA: Principal Component Ananlysis MEAN Principal axis 2 Principal axis 1

3.5 PCA: Principal Component Ananlysis STEP 2: Plot the ordered eigen-values versus the index-number and inspect where a ‘shoulder’ occurs: this determines the number of eigen-values you take into acoount. This is a so-called ‘scree-plot’.

3.5 PCA: Principal Component Ananlysis For n points of p components there are: O(np 2 + p 3 ) operations required. Use LU-decomposition etcetera.

3.5 PCA: Principal Component Ananlysis Many benefits: considerable data-reduction, necessary for computational techniques like ‘Fisher-discriminant-analysis’ and ‘clustering’. This works very well in practice.

3.5 PCA: Principal Component Analysis PCA is closely related to and often confused with Factor Analysis: Factor Analysis is the explanation of p-dimensional data by a smaller number of m < p factors.

EXAMPLE of PCA

Dressler, et al astronomical application: PCs for elliptical galaxies Rotating to PC in B T – Σ space improves Faber-Jackson relation as a distance indicator

astronomical application: Eigenspectra (KL transform) Connolly, et al. 1995

1 pc 2 pc 3 pc 4 pc

3.6 Multi-Dimensional Scaling [MDS] 1.Same purpose : represent high-dimensional data set 2.In the case of MS not by projections, but by reconstruction from the distance-table. The computed points are represented in an Euclidian sub-space – preferably a 2D-plane. 3.MDS performs better than PCA in case of strongly curved sets.

3.6 Multidimensional Scaling The purpose of multidimensional scaling (MDS) is to provide a visual representation of the pattern of proximities (i.e., similarities or distances) among a set of objects INPUT: distances dist[A i,A j ] where A is some class of objects OUTPUT: positions X[A i ] where X is a D-dimensional vector

3.6 Multidimensional Scaling

INPUT: distances dist[A i,A j ] where A is some class of objects

3.6 Multidimensional Scaling OUTPUT: positions X[A i ] where X is a D-dimensional vector

3.6 Multidimensional Scaling How many dimensions ??? SCREE PLOT

Multidimensional Scaling: Nederlandse dialekten

3.6 Kohonen’s Self Organizing Map (SOM) and Sammon mapping 1.Same purpose : DIMENSION REDUCTION : represent a high dimensional set in a smaller sub-space e.g. 2D-plane. 2.SOM gives better results than Sammon mapping, but strongly sensitive to initial values. 3.This is close to clustering!

3.6 Kohonen’s Self Organizing Map (SOM)

Sammon mapping

All information on math-part of course on: DAM/DataMiningPage.htm