# Website

## Presentation on theme: "Website"— Presentation transcript:

Website http://www.mun.ca/biology/quant/ http://www.mun.ca/biology/quant/

Welcome to Biology 4605 / 7220 Model Based Statistics in Biology

Cookie Experiment Was there a preference? Chocolate chip Cinnamon Rolls Are they different? Use statistics – Binomial Test! = = χ 2 = p-value =

Are we feeding you a bunch of lies? Leonard Henry Courtney (1832-1918) Do statisticians use a bunch of fancy tests to bolster weak arguments? Are stats misused and misinterpreted? There are three kinds of lies; lies, damned lies and statistics. - Journal of the Royal Statistical Society, No. 59 (1896)

Problems: – Rare events – Zero-inflated – Mean is inappropriate Hypothetical example: Less than one endangered species was observed per transect (mean: 0.57 ind./transect). Proceed with development!

Statistics are Balderdash! Ernest Rutherford (1871-1937) If your experiment needs statistics, you ought to have done a better experiment Fair Enough…. Balance is important What about field studies?

No! Hypothesis testing is inevitable Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis R.A. Fisher (1890-1962)

Hypothesis testing is statistical flotsam Everyone will have his own pet assortment of flotsam; mine include most of the theory of significance testing, including multiple comparison tests, and non parametric statistics. John Nelder (1949-2010)

The trouble with significance testing Elementary statistics courses for biologists tend to lead to the use of a stereotyped set of tests: 1.Without critical attention to the underlying model involved; 2.Without due regard to the precise distribution of sampling errors; 3.With little concern for the scale of measurement; 4.Careless of dimensional homogeneity; 5.Without considering the ideal transformation; 6.Without any attempt at model simplification; 7.With too much emphasis on hypothesis testing and too little emphasis on parameter estimation. - M.J. Crawley 1993

So how should we analyse our data?! 1.Use Model Based Statistics 2.Dont let significance testing do the thinking for you You are always better off thinking about why a model could generate your data and then testing that model - L. Wilkinson et al. 1992 Model Plant height Time in sunlight Data

Classic approach Identify a test by name. Check its assumptions. Use automated routines provided in a package. Sort through the output for a p-value. Report whether p was less than 5%. Model Based approach What is the response variable? What are the explanatory variables? Write the model. Check the residuals. Model appropriate? Error structure correct? Take corrective action. Report the model, parameter values, and standard errors. X

In short: Write the model* and discard the search for tests Plant height Time in sunlight Data = Model + Residual Y = mX + b + Residual (Regression) *Dont panic…writing a model is easy

How to conceptualise a model Quick example Data Verbal GraphicalFormal

Data Verbal GraphicalFormal RM 10 125 20 250 325 40 4 450 50 525 575 5100 5150 5175 5200 625 650 675 6125 6150 6175 70 725 80 850 925 100 25 Continued… M = Catch of scallops (kg) R = Seabed roughness (acoustic values)

Data Verbal GraphicalFormal RM 10 125 20 250 325 40 4 450 50 525 575 5100 5150 5175 5200 625 650 675 6125 6150 6175 70 725 80 850 925 100 25 Continued… M = Catch of scallops (kg) R = Seabed roughness (acoustic values) Grab samples: 5&6 = Gravel 1-4 = Sand 7-10 = Cobble

Data Verbal GraphicalFormal Catch is higher in gravel than in finer (sand) or coarser (cobble) substrates

Data Verbal Graphical Formal Catch is higher in gravel than in finer (sand) or coarser (cobble) substrates No obvious linear trend Simplify – Two means model (gravel vs. other)

Data Verbal Graphical Formal Catch is higher in gravel than in finer (sand) or coarser (cobble) substrates Two mean model M = K 1 if R = 5 or 6 (gravel) M = K 2 if R not equal 5 or 6 Data = Model + Residual M = [K 1,K 2 ] + Residuals

The General Linear Model Data = Model + Normal Residual Data = [Two means] + Normal residual } t-test Data = [Several means] + Normal residual } Oneway ANOVA Data = [Two factors] + Normal residual } twoway ANOVA Data = [Line] + Normal residual } Regression Data = [Line + factors] + Normal residual } ANCOVA

Reasons for the model based approach 1.Statistics is modelling 2.Carryover: biological models statistics 3.Model approach leads to learning of concepts and principles

Testing models Let computers do the work ExcelMinitabSPSSSASR Spreadsheet visible L Pull down menus L Easily graph data Basic stats functions Randomise data General Linear Model ? Residual analysis Logistic regression Generalized Linear Model Easy to learn FREE

Course Goals 1.Introduce you to effective ways of thinking quantitatively about biological phenomena 2.Increase your skill and confidence in the application of quantitative methods 3.Develop your critical capacity, both for your own work and that of others