Download presentation

Presentation is loading. Please wait.

Published byChristian Sparks Modified over 2 years ago

1
CS CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter, 2006

2
CS Ordinal data X is an ordinal variable with values: a 1, a 2, a 3,... a k,... a K ordinal means that: a 1 a 2 a 3... a k... a K cumulative frequency at level k : c k = sum of frequencies of values less than or equal to a k c k = f 1 + f 2 + f f k = (f 1 + f 2 + f f k-1 ) + f k = c k-1 + f k also (%) cumulative relative frequency

3
CS CAS marks % relative frequencies % cumulative relative frequencies

4
CS Enzyme concentrations Concentration Freq. Rel.Freq. % Cum. Rel. Freq c < % 39.5 c < % 59.5 c < % 79.5 c < % 99.5 c < % c < % c < % c < % c < % c < % Totals

5
CS Cumulative histogram

6
CS Discrete two variable data

7
CS Continuous two variable data X Y

8
CS Time Series Time and space are fundamental (especially time) Time series: variation of a particular variable with time

9
CS Summarising data by numerical means Further summarisation (beyond frequencies) Measures of location (Where is the middle?) Mean Median Mode

10
CS Mean _ sum of observed values of X Sample Mean ( X ) = number of observed values x = n use only for quantitative data

11
CS Sigma Sum of n observations n x i = x 1 + x x i x n-1 + x n i = 1 If it is clear that the sum is from 1 to n then: x = x 1 + x x i x n-1 + x n Sum of squares x 2 = x x x i x n x n 2

12
CS x from frequencies If X is a categorical variable with values: a 1, a 2, a 3,... a k,... a K x = x 1 + x x i x n-1 + x n ( order of summation isnt important ) (e.g. piglets: ) Group together those x s which have value a 1, those with value a 2,... x = x.. + x.. + x x s which have value a 1 - there are f 1 of them x.. + x x s which have value a 2 - there are f 2 of them... x.. + x.. x s which have value a K - there are f K of them = f 1 * a 1 + f 2 * a f k * a k f K * a K K = f k * a k k = 1

13
CS Mean Litter size Frequency Cum. Freq a k f k Total 36 K x = f k * a k k = 1 = 1*5 + 0* 6 + 2*7 + 3*8 3*9 + 9*10 + 8*11 5*12 + 3*13 + 2*14 = 375 _ X = 375 / 36 = 10.42

14
CS Median Sample median of X = middle value when n sample observations are ranked in increasing order = the ((n + 1)/2) th value n odd: values: 183, 163, 152, 157 and 157 rank order: 152, 157, 157, 163, 183 median: 157 n even: values: 165, 173, 180, 164 rank order: 164, 165, 173, 180 median: ( )/2 = 169

15
CS Median Litter size Frequency Cum. Freq Total 36 Median = 10.5

16
CS Median from cumulative distribution cumulative % frequency polygon

17
CS Mode Sample mode = value with highest frequency (may not be unique) Litter size Frequency Cum. Freq Mode = 10

18
CS Skew left skewed symmetric right skewed mean mode

19
CS Variance Measure of spread: variance

20
CS Variance sample variance = s 2 sample standard deviation = s = variance

21
CS Variance and standard deviation Litter size Frequency Cum. Freq a k f k Total 36 K x 2 = f k * a k 2 k = 1 = 1*25 + 2*49 + 3*64 3*81 + 9* *121 5* * *196 = 4145 x = 375 ( x) 2 / n = 375*375 / 36 = 3906 s 2 = ( ) / (36-1) = 6.83 s = 2.6

22
CS Piglets Mean = Median = 10.5 Mode = 10 Std. devn. = 2.6

23
CS Quartiles and Range Lower quartile: value such that 25% of observations are below it ( Q1 ). Median: value such that 50% of observations are below (above) it ( Q2 ). Upper quartile: value such that 25% of observations are above it ( Q3 ). Range: the minimum ( m ) and maximum ( M ) observations. Box and Whisker plot: m Q1 Q2 Q3 M

24
CS Estimating quartiles

25
CS Linear Regression y = mc + c Calculate m and c so that (distance of point from line) 2 is minimised y x

26
CS Time Series - Moving Average Time Y 3 point MA 0 24 * * smoothing function can compute median, max, min, std. devn, etc. in window

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google