Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matlab Training Sessions 8: Introduction to Statistics.

Similar presentations


Presentation on theme: "Matlab Training Sessions 8: Introduction to Statistics."— Presentation transcript:

1 Matlab Training Sessions 8: Introduction to Statistics

2 Course Outline Weeks: 1.Introduction to Matlab and its Interface (Jan 13 2009) 2.Fundamentals (Operators) 3.Fundamentals (Flow) 4.Functions and M-Files 5.Importing Data 6.Plotting (2D and 3D) 7.Plotting (2D and 3D) 8.Statistical Tools in Matlab Additional classes will begin next week (Feb 10 2009) and will continue from where the first 8 sessions left off. These sessions will be run by Andrew Pruszynski (4jap1@qlink.queensu.ca) Course Website: http://www.queensu.ca/neurosci/matlab.php

3 Week 8 Lecture Outline Basic Matlab Statistics A.Basic Matlab Statistics A.Mean, Median, Variance B.Correlations B.Statistics Toolbox A.Parametric and Non-parametric statistical tests B.Curve fitting

4 Part A: Basics The Matlab installation contains basic statistical tools. Including, mean, median, standard deviation, error variance, and correlations More advanced statistics are available from the statistics toolbox and include parametric and non-parametric comparisons, analysis of variance and curve fitting tools

5 Mean and Median Mean: Average or mean value of a distribution Median: Middle value of a sorted distribution M = mean(A), M = median(A) M = mean(A,dim), M = median(A,dim) M = mean(A), M = median(A): Returns the mean or median value of vector A. If A is a multidimensional mean/median returns an array of mean values. Example: A = [ 0 2 5 7 20]B = [1 2 3 3 3 6 4 6 8 4 7 7]; mean(A) = 6.8 mean(B) = 3.0000 4.5000 6.0000 (column-wise mean) mean(B,2) = 2.0000 4.0000 6.0000 6.0000 (row-wise mean)

6 Mean and Median Examples: A = [ 0 2 5 7 20]B = [1 2 3 3 3 6 4 6 8 4 7 7]; Mean: mean(A) = 6.8 mean(B) = 3.0 4.5 6.0 (column-wise mean) mean(B,2) = 2.0 4.0 6.0 6.0 (row-wise mean) Median: median(A) = 5 median(B) = 3.5 4.5 6.5 (column-wise median) median(B,2) = 2.0 3.0 6.0 7.0 (row-wise median)

7 Standard Deviation and Variance Standard deviation is calculated using the std() function std(X) : Calcuate the standard deviation of vector x If x is a matrix, std() will return the standard deviation of each column Variance (defined as the square of the standard deviation) is calculated using the var() function var(X) : Calcuate the variance of vector x If x is a matrix, var() will return the standard deviation of each column

8 Standard Error of the Mean Often the most appropriate measure of error/variance is the standard error of the mean Matlab does not contain a standard error function so it is useful to create your own. The standard error of the mean is defined as the standard deviation divided by the square root of the number of samples

9 Standard Error of the Mean In Class Exercise 1: Create a function called se that calculates the standard error of some vector supplied to the function Eg. se(x) should return the standard error of matrix x

10 Standard Error of the Mean In Class Exercise 1: Solution function [result] = se(input_vect) result = STD(input_vect)/sqrt(length(input_vect)); return

11 In Class Exercise 2 From the class website download the file testdata1.txt (http://www.queensu.ca/neurosci/matlab.php) This text file contains data from two subjects arranged in columns 1.Load the text file into matlab using any method you like (load, import, textread(), fscanf()) 2. Calculate the mean and standard error for each subject 3. In figure 1, plot the data distribution for each subject using the hist() plotting function 4.In figure 2, plot the mean and standard error of each subject using a bar graph (bar() function and errorbar() functions).

12 In Class Exercise 2 Solution %read data [subj1, subj2] = textread('testdata1.txt','%f%f','headerlines',1) %plot distributions of each subject figure(1) hold on subplot(2,1,1) hist(subj1) subplot(2,1,2) hist(subj2) %plot mean and standard error on bar graph figure(2) hold on bar([1,2],[mean(subj1),mean(subj2)]) errorbar([1,2],[mean(subj1),mean(subj2)],[se(subj1), se(subj2)],'r')

13 In Class Exercise 2 Solution Subject 1 Subject 2 Subject 1 Subject 2

14 Data Correlations Matlab can calculate statistical correlations using the corrcoef() function [R,P] = corrcoef(A,B) Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = AAcorA BcorA BAcorB BcorB

15 Data Correlations Matlab can calculate statistical correlations using the corrcoef() function [R,P] = corrcoef(A,B) Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = AAcorA BcorA = 1 BcorA BAcorB BcorB AcorB 1

16 Data Correlations Matlab can calculate statistical correlations using the corrcoef() function [R,P] = corrcoef(A,B) Calculates a matrix of R correlation coefficiencts and P significance values (95% confidence intervals) for variables A and B A B R = AAcorA BcorA = 1 BcorA BAcorB BcorB AcorB 1 A B P = Asig(AcorA) sig(BcorA) = 1 sig(BcorA) Bsig(AcorB) sig(BcorB) sig(AcorB) 1

17 Variable 1 Variable 2 Data Correlations

18 Variable 1 Variable 2 Data Correlations

19 Variable 1 Variable 2 Data Correlations % Compute sample correlation [r, p] = corrcoef([var1,var2])

20 Variable 1 Variable 2 Data Correlations % Compute sample correlation [r, p] = corrcoef([var1,var2]) r = 1.0000 0.7051 0.7051 1.0000 p = 1.0000 0.0000 0.0000 1.0000

21 In Class Exercise 3 From the class website download the file testdata2.txt (http://www.queensu.ca/neurosci/matlab.php) This text file contains data from variables arranged in columns 1.Load the text file into matlab using any method you like (load, import, textread(), fscanf()) 2. Plot the data points 3.Calculate the Correlation

22 In Class Exercise 3 Solution %read data [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) % Compute sample correlation [r] = corrcoef([var1,var2]) % Plot data points figure(1) plot(var1,var2,'ro') Variable 1 Variable 2

23 Part B: Statistics Toolbox The Statistics tool box contains a large array of statistical tools. This lecture will concentrate on some of the most commonly used statistics for research 1.Parametric and non-parametric comparisons 2.Curve Fitting

24 Comparison of Means A wide variety of mathametical methods exist for determining whether the means of different groups are statistically different Methods for comparing means can be either parametric (assumes data is normally distributed) or non-parametric (does not assume normal distribution)

25 Parametric Tests - TTEST [H,P] = ttest2(X,Y) Determines whether the means from matrices X and Y are statistically different. H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same) P will return the significance level

26 Parametric Tests - TTEST [H,P] = ttest2(X,Y) Determines whether the means from matrices X and Y are statistically different. H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same) P will return the significance level

27 Parametric Tests - TTEST Example: For the data from exercise 3 [H,P] = ttest2(var1,var2) >> [H,P] = ttest2(var1,var2) H =1 P = 0.00000000000014877 Variable 1 Variable 2

28 Non-Parametric Tests Ranksum The wilcoxin ranksum test assesses whether the means of two groups are statistically different from each other. This test is non-parametric and should be used when data is not normally distributed Matlab implements the wilcoxin ranksum test using the ranksum() function ranksum(X,Y) statistically compares the means of two data distributions X and Y

29 Non-Parametric Tests - RankSum Example: For the data from exercise 3 [P,H] = ranksum(var1,var2) P = 1.1431e-014 H = 1 Variable 1 Variable 2

30 Curve Fitting Plotting a line of best fit in Matlab can be performed using either a traditional least squares fit or a robust fitting method. 12345678910 -2 0 2 4 6 8 10 12 Least squares Robust

31 Curve Fitting A least squares linear fit minimizes the square of the distance between every data point and the line of best fit polyfit(X,Y,N) finds the coefficients of a polynomial P(X) of degree N that fits the data Uses least-square minimization N = 1 (linear fit) [P] = polyfit(X,Y,N) returns P, a matrix containing the slope and the x intercept for a linear fit [Y] = polyval(P,X) calculates the Y values for every X point on the line of best fit

32 Curve Fitting Example: Draw a line of best fit using least squares approximation for the data in exercise 2 [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) P = polyfit(var1,var2,1); Y = polyval(P,var1); close all figure(1) hold on plot(var1,var2,'ro') plot(var1,Y)

33 Curve Fitting A least squares linear fit minimizes the square of the distance between every data point and the line of best fit P = robustfit(X,Y) returns the vector B of the y intercept and slope, obtained by performing robust linear fit

34 Curve Fitting Example: Draw a line of best fit using robust fit approximation for the data in exercise 2 [var1, var2] = textread('testdata2.txt','%f%f','headerlines',1) P = robustfit(var1,var2,1); Y = polyval([P(2),P(1)],var1); close all figure(1) hold on plot(var1,var2,'ro') plot(var1,Y)

35 Ideas for Next Term? Additional Statistics, ANOVAs ect.. Curve fitting with quadratic functions and cubic splines Algorithms and Data structures Improving Program Execution Time Assistance Tutorials for individual programming problems Any Suggestions?

36 Getting Help Help and Documentation Digital 1.Accessible Help from the Matlab Start Menu 2.Updated online help from the Matlab Mathworks website: http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html 3.Matlab command prompt function lookup 4.Built in Demo’s 5.Websites Hard Copy 3.Books, Guides, Reference The Student Edition of Matlab pub. Mathworks Inc.


Download ppt "Matlab Training Sessions 8: Introduction to Statistics."

Similar presentations


Ads by Google