Download presentation

Presentation is loading. Please wait.

1
probability models- the Normal especially

2

3

4

5

6

7

8

9

10
checking distributional assumptions

11

12
Theoretical Percentile Empirical Percentile Log scale Directive Scale

13
Modelling Continuous Variables checking normality Normal probability plot Should show a straight line p-value of test is also reported (null: data are Normally distributed)

14
another statistic- the estimated standard error

15

16

17

18

19

20
Statistical inference Confidence intervals Hypothesis testing and the p-value Statistical significance vs real-world importance

21
a formal statistical procedure- confidence intervals

22
Confidence intervals- an alternative to hypothesis testing A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter. A common form is sample estimator 2* estimated standard error

23

24

25

26

27

28
another formal inferential procedure- hypothesis testing

29
Hypothesis Testing Null hypothesis: usually no effect Alternative hypothesis: effect Make a decision based on the evidence (the data) There is a risk of getting it wrong! Two types of error:- –reject null when we shouldnt - Type I –dont reject null when we should - Type II

30
Significance Levels We cannot reduce probabilities of both Type I and Type II errors to zero. So we control the probability of a Type I error. This is referred to as the Significance Level or p-value. Generally p-value of <0.05 is considered a reasonable risk of a Type I error. (beyond reasonable doubt)

31
Statistical Significance vs. Practical Importance Statistical significance is concerned with the ability to discriminate between treatments given the background variation. Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

32
Power Power is related to Type II error probability of power = 1 - making a Type II error Aim: to keep power as high as possible

33

34

35

36

37
Statistical models Outcomes or Responses these are the results of the practical work and are sometimes referred to as dependent variables. Causes or Explanations these are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to asindependent variables, but more commonly known as covariates.

38
relationships- linear or otherwise

39
Correlations and linear relationships pearson correlation Strength of linear relationship Simple indicator lying between –1 and +1 Check your plots for linearity

40
gene correlations

41
Interpreting correlations The correlation coefficient is used as a measure of the linear relationship between two variables, The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

42
A matrix plot

43

44
Correlations P and N, (p-value 0.001) Fe and N, (p-value 0.008) Fe and P, (p-value 0.000)

45
all highly significant, but do the scatterplots support this interpretation? points tend to be clustered in bottom left corner of plot, there are one or two observations well separated from the cluster both might suggest a transformation (try logs)

46

47
Correlations logP, logN (p-value 0.012) logFe, LogN (p-value 0.043) logP, log Fe, (p-value 0.000)

48
what is a statistical model?

49
Statistical models In experiments many of the covariates have been determined by the experimenter but some may be aspects that the experimenter has no control over but that are relevant to the outcomes or responses. In observational studies, these are usually not under the control of the experimenter but are recorded as possible explanations of the outcomes or responses.

50
Specifying a statistical models Models specify the way in which outcomes and causes link together, eg. Metabolite = Temperature The = sign does not indicate equality in a mathematical sense and there should be an additional item on the right hand side giving a formula:- Metabolite = Temperature + Error

51
statistical model interpretation Metabolite = Temperature + Error The outcome Metabolite is explained by Temperature and other things that we have not recorded which we call Error. The task that we then have in terms of data analysis is simply to find out if the effect that Temperature has is large in comparison to that which Error has so that we can say whether or not the Metabolite that we observe is explained by Temperature.

52
summary hypothesis tests and confidence intervals are used to make inferences we build statistical models to explore relationships and explain variation the modelling framework is a general one – general linear models, generalised additive models assumptions should be checked.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google