Data statistics and transformation revision Michael J. Watts

Slides:



Advertisements
Similar presentations
Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Lecture 7: Basis Functions & Fourier Series
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Chapter 10 Curve Fitting and Regression Analysis
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
3. Introduction to Digital Image Analysis
Kasabov : CH 1-2 P. 65: A General Approach to Knowledge Engineering.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Review of Probability.
Function approximation: Fourier, Chebyshev, Lagrange
Least-Squares Regression
Motivation Music as a combination of sounds at different frequencies
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Presented by Tienwei Tsai July, 2005
1 LES of Turbulent Flows: Lecture 1 Supplement (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.
The Story of Wavelets.
Image Processing © 2002 R. C. Gonzalez & R. E. Woods Lecture 4 Image Enhancement in the Frequency Domain Lecture 4 Image Enhancement.
Image Enhancement in the Frequency Domain Spring 2006, Jen-Chang Liu.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Digital Image Processing Chapter 4 Image Enhancement in the Frequency Domain Part I.
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Background Any function that periodically repeats itself.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Part 4 Chapter 16 Fourier Analysis PowerPoints organized by Prof. Steve Chapra, University All images copyright © The McGraw-Hill Companies, Inc. Permission.
Introduction to Digital Signals
Data Mining and Decision Support
Fourier and Wavelet Transformations Michael J. Watts
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Feature Selection and Extraction Michael J. Watts
Dr. Abdul Basit Siddiqui FUIEMS. QuizTime 30 min. How the coefficents of Laplacian Filter are generated. Show your complete work. Also discuss different.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Outline Random variables –Histogram, Mean, Variances, Moments, Correlation, types, multiple random variables Random functions –Correlation, stationarity,
Digital Image Processing Lecture 7: Image Enhancement in Frequency Domain-I Naveed Ejaz.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Modeling and Simulation CS 313
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Discrete Fourier Transform (DFT)
Review 1. Describing variables.
Section II Digital Signal Processing ES & BM.
LECTURE 11: Advanced Discriminant Analysis
Data Mining, Neural Network and Genetic Programming
Modeling and Simulation CS 313
The general linear model and Statistical Parametric Mapping
Image Enhancement in the
Sampling Theorem & Antialiasing
Data Clustering Michael J. Watts
Principal Component Analysis (PCA)
Fourier and Wavelet Transformations
Machine Learning Feature Creation and Selection
Digital Image Processing
4.2 Data Input-Output Representation
Statistical Methods For Engineers
Digital Image Processing
What is Regression Analysis?
BA 275 Quantitative Business Methods
Digital Image Processing
The general linear model and Statistical Parametric Mapping
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Pasi Fränti and Sami Sieranoja
Digital Image Processing
Computational Intelligence: Methods and Applications
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Data statistics and transformation revision Michael J. Watts

Lecture Outline ● Statistical operations on Data ● Data transformations – The objectives of a data transform – Linear versus non-linear transformations – Transformations for pre-processing of data – DFT and FFT Transformations – Wavelet Transformations

Why Analyse Data? ● Important part of problem solving process ● Can suggest method to use to solve problem ● Answers many important questions about the data set and the problem ● Improves understanding of the problem

Why Analyse Data? ● what are the statistical parameters of the data? – mean, standard deviation, correlation ● what is the nature of the process? – periodic, chaotic, random? – a random process cannot be predicted at all – periodic processes are more easily modelled – chaotic processes are a bit harder to model

A Periodic Process

A Chaotic Process

A Random Process

Why Analyse Data?… ● How is the available data distributed? – does it naturally cluster together? – is it uniformly distributed? – does it cover enough of the problem space to be useful?

Clustered Data

Uniform Distribution

Why Analyse Data? ● Is there missing data and how much? – is missing data a critical obstacle? – can other methods be used to compensate for the gaps?

Why Analyse Data? ● What features can be extracted from the data? – reducing the number of variables in the data set – can assist with modelling the problem – can make correlations / relationships easier to see

Statistical Data Analysis ● Discover repetitiveness in data ● Simple functions – mean – standard deviation – Histogram

Statistical Data Analysis ● Arithmetic mean – A value that is representative of the population of values ● Standard deviation – A measure of how far from the mean values deviate ● Analysis must be appropriate for the data – Measurement theory

Correlation ● Correlation – Finds linear dependencies between variables – Correlation coefficients may change in time for time series data

Regression and Interpolation ● Regression analysis: – finds a formula which approximates data for a given output variable ● Interpolation: – fills in gaps in data – fit data into curves

PCA ● Principal component analysis (PCA) – eliminates redundant variables – reduces number of variables in data set – makes it easier to model

ICA ● Independent component analysis – separates components from a set of unknown independent components – Example: the cocktail party problem - separating speakers from a signal taken from cocktail party speech - several people speaking simultaneously

Clustering Methods ● Assigns each datum into one and only one subset of the data set ● k-means clustering – finds k centres in the data space – sum of squares of distance between each data point and nearest centre is minimised – Distance between cluster centres is maximised

Vector Quantisation ● represents a n dimensional space as a m dimensional one ● m < n ● Preserves distance between examples – examples that are close in n dimensional space will be close in m dimensional space

Example: SOM for vector quantisation of data in Bioinformatics ● SOM !! ----> ● A selected subset of genes expressed in 49 tissue samples (two types of Leukaemia - ALL and AML)

The objectives of a data transformation ● Data rate reduction – meaningful features are extracted from it ● Improving the quality of the information – via noise suppression or image enhancement ● Knowledge discovery and better understanding of the processes and events ● Finding similarities and analogies between processes and events

Linear versus non-linear transformations ● Linear transformation – F(x) of a raw data vector x such that F is a linear function of x. E.g. F(x)=2x+1 ● Non-linear transformations – F(x) of a raw data vector x such that F is a non-linear function of x. E.g. F(x)=1(1+e -x.c ) ● Other non-linear transformations – The logarithmic function, F(x)=log 10 x.

Transformations for pre-processing of data ● Sampling – The process of selecting a subset of the available data. Can be applied to continuous time series data such as speech and music. ● Discretisation – Representing continuous-valued data with the use of sub- intervals where the real values lie. ● Normalisation – Moving the scale of the real data into a predefined scale e.g. [0,1]. Can be linear or non-linear.

DFT and FFT Transformations ● Discrete Fourier Transforms (DFT) – A non-linear transformation where the data is represented as a sum of harmonic Fourier series – A periodic signal (e.g sin) is characterised by one frequency. Every signal can be represented as a sum of periodic signals with different frequencies.

DFT and FFT Transformations ● Fast Fourier Transform (FFT) – The fast implementation of a DFT when the number of periodic signals is a power of 2. ● Applications of FFT Transforms – Speech and Image data. – Sunspot activity analysis.

DFT and FFT Transformations ● Fast Fourier Transform (FFT) of Images It allows us to analyse the information content of the image The FFT operator transforms the image from the spatial to the frequency domain

Wavelet Transformations ● Wavelet Transformation – A non-linear transformation. Can represent sight changes of the signal within the chosen window from the time scale. ● W a,b (x) = f(ax –b), – f = non-linear function – a = scaling parameter – b = shifting parameter

Summary ● Data analysis is an integral part of the problem solving process – can suggest means of solving problem – assists in the modelling process ● Statistical techniques, clustering techniques, vector quantisation are all available methods

Summary ● There are many transforms available to apply to datasets, some more appropriate than others. ● Linear and non-linear transformations are simple but effective operations. ● DFT, FFT, and Wavelet transforms are powerful ways of analysing signals.