POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Slides:

Advertisements

Similar presentations

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Advertisements

Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.

PTP 560 Research Methods Week 9 Thomas Ruediger, PT.

Parametric/Nonparametric Tests. Chi-Square Test It is a technique through the use of which it is possible for all researchers to:  test the goodness.

Quantitative Skills 4: The Chi-Square Test

Chapter 13: The Chi-Square Test

The Normal Distribution. n = 20,290  =  = Population.

10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.

Final Review Session.

1 Chapter 12 Inference About a Population 2 Introduction In this chapter we utilize the approach developed before to describe a population.In this chapter.

T-Tests Lecture: Nov. 6, 2002.

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides

Today Concepts underlying inferential statistics

Hypothesis Testing Using The One-Sample t-Test

Statistics made simple Modified from Dr. Tammy Frank’s presentation, NOVA.

Inferential Statistics

AM Recitation 2/10/11.

Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Hypothesis Testing:.

Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.

Statistical inference: confidence intervals and hypothesis testing.

DATA ANALYSIS FOR RESEARCH PROJECTS

TOPIC 1 STATISTICAL ANALYSIS

1.3. Starter Recap: Definitions of: Population? Abiotic? Biotic?

Statistical Analysis Statistical Analysis

Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,

T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?

Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.

Chapter 15 Data Analysis: Testing for Significant Differences.

Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.

Statistical Analysis I have all this data. Now what does it mean?

Measures of Dispersion CUMULATIVE FREQUENCIES INTER-QUARTILE RANGE RANGE MEAN DEVIATION VARIANCE and STANDARD DEVIATION STATISTICS: DESCRIBING VARIABILITY.

Beak of the Finch Natural Selection Statistical Analysis.

Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.

Ecology is a Science – Queen of Sciences Follows Scientific Method Hypothetico-deductive approach (Popper) based on principle of falsification: theories.

© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.

Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.

PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?

Test for Significant Differences T- Tests. T- Test T-test – is a statistical test that compares two data sets, and determines if there is a significant.

Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.

The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.

Chapter 7 Inferences Based on a Single Sample: Tests of Hypotheses.

5.1 Chapter 5 Inference in the Simple Regression Model In this chapter we study how to construct confidence intervals and how to conduct hypothesis tests.

Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Psych 230 Psychological Measurement and Statistics

3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)

Chapter 8 Parameter Estimates and Hypothesis Testing.

Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.

PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.

1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.

PCB 3043L - General Ecology Data Analysis.

State the ‘null hypothesis’ State the ‘alternative hypothesis’ State either one-tailed or two-tailed test State the chosen statistical test with reasons.

STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

Ecology is a Science – Queen of Sciences Follows Scientific Method Hypothetico-deductive approach (Popper) based on principle of falsification: theories.

Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.

Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.

Chapter 13 Understanding research results: statistical inference.

Psychology 290 Lab z-tests & t-tests March 5 - 7, 2007 –z-test –One sample t-test –SPSS – Chapter 7.

1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.

PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.

Chapter 9 Introduction to the t Statistic

Outline Sampling Measurement Descriptive Statistics:

STATISTICS FOR SCIENCE RESEARCH

PCB 3043L - General Ecology Data Analysis.

Presentation transcript:

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

IF n is very, very large : we use Z distribution to calculate normal deviates Z = (x – μ) σ x STATISTICS: z-DISTRIBUTION t = (x – μ) s x Equation 3 If n is not large, we must use t distribution:

But first..WHY do we do all this?? Integral part of science… HYPOTHESIS TESTING ModelExplanation or theory (maybe >1) HypothesisPrediction deduced from model Generate null hypothesis – H 0 : Falsification test TestExperiment IF H 0 rejected – model supported IF H 0 accepted – model wrong Pattern ObservationRigorously Describe

HYPOTHESIS TESTING You can say with 95% certainty that the pattern you have observed is not due to chance alone You can say with 99% certainty that the pattern you have observed is not due to chance alone p-value Measure of certainty α Not significant Significant These are proportions…if expressed as % 1.Collect data 2.Analyse data 3.Set up hypotheses: H 0 = results are due to CHANCE alone H 1 = results are significant and are not due to chance alone 4.Test hypotheses:  Determine significance level for hypothesis testing ( α ) ~ termed ‘Alpha’  Usually either α = 0.05 or α = 0.01  Calculate probability value (p)  If p < α then reject H 0 ; accept H 1 (i.e results are significant and are NOT due to chance alone)  If p > α then reject H1; accept H0 (i.e results are not significant and ARE due to chance alone)

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

First, some important concepts about t-tests…

Because it is based on the normal distribution, the t distribution has all the attributes of the normal distribution: Completely symmetrical Area under any part of the curve reflects proportion of t values involved etc…. STATISTICS: t-DISTRIBUTION Height (mm) Frequency (%) Shape of the t distribution varies with v (Degrees of Freedom: n-1): the bigger the n, the less spread the distribution t V = 100 V = 10 V = 5 V = 1

Tails of the t-distribution 0.1 One-Tailed hypothesis testing t α (2) Two-Tailed hypothesis testing STATISTICS: t-DISTRIBUTION CONCEPTS Example: if our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed? – Two possible t-values H 0 : μ = 25 H 1 : μ < 25 H 0 : μ = 25 H 1 : μ ≠ 25 OR

Measure of certainty Critical t-value Not significant Significant T-statistic t STATISTICS: T-DISTRIBUTION: CONCEPTS Critical values p-value Measure of certainty α Not significant Significant α (2) t = (x – μ) s x α = 0.05 T-statistic compared with critical value If t-statistic > OR < then reject H0 ; accept H1 (i.e results are significant and are NOT due to chance alone) Critical values

α (2) α (1) v t α (1) One-Tailed V= t α (2) Two-Tailed V=10 If our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed (i.e what is the critical value of t)? STATISTICS: T-DISTRIBUTION: CONCEPTS Critical values are found on the t-tables

1.Establish hypotheses (determine if one-tail or two-tailed test One tail: H 0 has > or < in it Two tail: H 0 has ≠ in it 2.Determine: n, x, μ, s and v (n-1) 3.Calculate the t-statistic using 4.Determine significance level for hypothesis testing (α) ~ termed ‘Alpha Usually either α = 0.05 or α = 0.01 (area in each tail) 5.Calculate the critical value of t use T-statistic table, looking up the value for t 6.Compare t-statistic with critical value to know if you should accept or reject H 0 Steps of Student t-tests: t = (x – μ) s x t significance level (α 1 or 2), v

Based on this observation we want to determine if the intensification of agricultural practices has resulted in a significant change to the nitrate concentration of the freshwater resources. HOW? … Need to determine the probability that a the sample (n = 25, x = mg.l -1 ) could be randomly generated from a population with μ = 22 mg.l -1 ? The mean nitrate concentration of water in all the upstream tributaries of a large river prior to intensive agriculture is 22 mg.l -1. Afterwards the mean nitrate concentration in 25 of these tributaries is mg.l -1 and s = 4.24 mg.l -1 OBSERVATION MADE: STATISTICS: T-DISTRIBUTION: EXAMPLE Nitrate (before agriculture) μ = 22 mg.l -1 n= ALL tributaries Nitrate (after agriculture) x = mg.l -1 n= 25 sample tributaries

1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t use T-statistic table, looking up the value for t One tail or two tail? Student t-tests: steps for calculation t significance level (α 1 or 2), v H 0 : μ = 22H 1 : μ ≠ 22 What is the probability that a the sample (n=25, x = mg.l -1 ) could be randomly generated from a population with μ = 22 mg.l -1 ? n = 25, x = 24.23, μ = 22.00, s = 4.24, v = 24 t = (x – μ) s x (24.23 – 22) = == s x s n = √ = √ = = t = Either α = 0.05 or α = 0.01 (area in each tail) α = 0.05 t 0.05 (α 2), 24 t α (1) 0.05 One-Tailed t α (2) Two-Tailed Go to the hypothesesH 0 : μ = 22H 1 : μ ≠ 22

The critical value of t 0.05 (α 2), 24 = t

t = > critical value 1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t 6.Compare t-statistic with critical value H 0 : μ = 22H 1 : μ ≠ 22 n = 25, x = 24.23, μ = 22.00, s = 4.24, v = 24 t = α = 0.05 STATISTICS: T-DISTRIBUTION: EXAMPLE Critical value = t SO…means it is very unlikely that a random sample (size 25) would generate a mean of mg.l-1 from a population with a mean of 22 mg.l-1 So unlikely, in fact, that we don’t believe it can happen by chance…Reject H0 and accept H1 What is the probability that a the sample (n=25, x = mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?

STATISTICS: T-DISTRIBUTION: EXAMPLES Nitrate (before agriculture) μ = 22 mg.l -1 n= ALL tributaries Nitrate (after agriculture) x = mg.l -1 n= 25 sample tributaries What we can then say, is that the before and after nitrate levels in the water are (statistically) significantly different from each other (p < 0.05) We are not making any judgment about whether there is more nitrate in the water after than before, only that the concentrations are different …though some things are self evident!

Now you try… 25 intertidal crabs were exposed to air at 24.3  C, and their body temperatures were measured. Student-t steps to follow: 1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t 6.Compare t-statistic with critical value H 0 : μ = 24.3  C i.e crab body temp is NOT different from ambient temp H 1 : μ ≠ 24.3  C i.e crab body temp IS different from ambient temp Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3  C

Now you try… 25 intertidal crabs were exposed to air at 24.3  C, and their body temperatures were measured. Student-t steps to follow: 1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t 6.Compare t-statistic with critical value Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3  C Switch to Excel and do the calculations Body temp (  C) Crab ID

α = 0.05 Now you try… 25 intertidal crabs were exposed to air at 24.3  C, and their body temperatures were measured. Student-t steps to follow: 1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t 6.Compare t-statistic with critical value Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3  C t = t significance level (α 1 or 2), v

t 0.05 (α 2), v

α = 0.05 Now you try… 25 intertidal crabs were exposed to air at 24.3  C, and their body temperatures were measured. Student-t steps to follow: 1.Establish hypotheses 2.Determine: n, x, μ, s, n and v (n-1) 3.Calculate the t-statistic 4.Determine significance level (α) 5.Calculate the critical value of t 6.Compare t-statistic with critical value Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3  C t = Critical value = t = Critical value = > H 0 : μ = 24.3  C [i.e crab body temp is NOT different from ambient temp] H 1 : μ ≠ 24.3  C [i.e crab body temp IS different from ambient temp] REJECT t

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

To do this, we need a set of t-tables, and V (N-1) s x The t-Distribution allows us to calculate the 95% (or 99%) confidence intervals around an estimate of the population mean t α (2) Two-Tailed In other words, what are limits around our estimate of the population mean, WITHIN which we can be 95% (or 99%) confident that the REAL value of the population mean lies When we express dispersion around some measure of central tendency, we normally use Standard Deviation: x s ± STATISTICS: 95 % CONFIDENCE INTERVALS

To do this, we need a set of t-tables, and V (n-1) s x IF n s x x = 42.3 mm = 26 (V = 25) = 2.15 Then the 95% Confidence Interval (CI) around the mean is calculated as: s x * t ά 2 The Confidence Interval expression is then written as: 42.3 mm ± 4.43 mm i.e we are 95% confident that μ lies between and STATISTICS: 95 % CONFIDENCE INTERVALS = = 2.15 * mm mm α (2) x = 42.3 mm = 4.429

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

Nominal data – gender, colour, species, genus, class, town, country, model etc Continuous data – concentration, depth, height, weight, temperature, rate etc Discrete data – numbers per unit space, numbers per entity etc Types of Data The type of data collected influences their statistical analysis MaleFemaleBlueRedBlackWhite 100 g200 g g g g 5 people Understanding stats…

NominalContinuousDiscrete 1 DATA Type z-tests t-tests ANOVA…etc 3 Choice of statistical test Chi - squared 2 Distribution Normal Binomial Poisson…etc + Understanding stats… Data do NOT have to be normally distributed

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

Testing Patterns in Discrete (count) Data: the Chi-Square Test Examples of count data:Number of petals per flower Number of segments per insect leg Number of worms per quadrat Number of white cars on campus…etc You can covert continuous data to discrete data, by assigning data to data classes Height (m) Frequency

Often want to determine if the population from which you have obtained count data conforms to a certain prediction Q: Does the OBSERVED ratio differ (SIGNIFICANTLY) from the EXPECTED ratio? STATISTICS: CHI-SQUARED TESTS Hypothesised (EXPECTED) ratio: n =134 Observed numbers: 113 yellow21 green Expected numbers: yellow33.5 green =134 * 0.75 =134 * : 1 ¾ : ¼ OR 0.75 : 0.25 OR 113 : 21 OBSERVED ratio: 5.4 : 1 OR = Σ χ 2 (O – E) 2 E [ ] Equation 4 Where O = Observed, E = Expected The bigger the difference between O and E, the greater the χ 2 When there is no difference will be ZERO = Goodness of Fit χ 2 A geneticist raises a progeny of 134 flowers from this cross:

STATISTICS: CHI-SQUARED TESTS 1.Establish hypotheses 2.Determine Observed and Expected frequencies 3.Calculate the X 2 -statistic using 4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Calculate the critical value of X 2 use X 2 -statistic table 6.Compare X 2 -statistic with critical value 7.If X 2 -statistic > critical value reject H 0 (significant differences between O and E) 8.If X 2 -statistic < critical value accept H 0 (no significant differences between O and E) NB: must always use counts (frequencies) NOT percentages or proportions = Σ χ 2 (O – E) 2 E [ ] Steps of X 2 tests: Critical value: X 2 significance level, v Number of categories (K) -1

STATISTICS: CHI-SQUARED TESTS 1.Establish hypotheses H 0 : Observed and expected ratios are not significantly different H 1 : Observed and expected ratios are significantly different 2.Determine Observed and Expected frequencies Yellow flowers: Observed = 113 ; Expected = Green flowers: Observed = 21 ; Expected = Calculate the X 2 -statistic using 4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Calculate the critical value of X 2 Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio? Critical value: X 2 significance level, v = χ 2 (113 – 100.5) [ ] (21 – 33.5) = = 6.22 Yellow flowers Green flowers

Degrees of Freedom (v) = K – 1, where K = number of categories in this case two categories: (yellow-flowering and green-flowering) = (2 – 1) …therefore v = 1 Critical value: X , vCritical value: X , 1 Critical value = 3.841

STATISTICS: CHI-SQUARED TESTS 1.Establish hypotheses H 0 : Observed and expected ratios are not significantly different H 1 : Observed and expected ratios are significantly different 2.Determine Observed and Expected frequencies Yellow flowers: Observed = 113 ; Expected = Green flowers: Observed = 21 ; Expected = X 2 -statistic = Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Critical value = X 2 -statistic > critical value therefore reject H 0 Q: Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio? A: the observed ratio is significantly different from the expected ratio

1.Establish hypotheses 2.Determine Observed and Expected frequencies 3.Calculate the X 2 -statistic using 4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Calculate the critical value of X 2 use X 2 -statistic table 6.Compare X 2 -statistic with critical value 7.If X 2 -statistic > critical value reject H 0 (significant differences between O and E) 8.If X 2 -statistic < critical value accept H 0 (no significant differences between O and E) = Σ χ 2 (O – E) 2 E [ ] Critical value: X 2 significance level, v STATISTICS: CHI-SQUARED TESTS Q: Has the geneticist sampled from a population having a ratio of 9:3:3:1 ? A plant geneticist has done some crossing between plants and come up with the following numbers of different seeds Now you try… H 0 : Population sampled has YS:YW:GS:GW seeds in the ratio 9:3:3:1 H 1 : Population sampled does not have YS:YW:GS:GW seeds in the ratio 9:3:3:1

1.Establish hypotheses 2.Determine Observed and Expected frequencies 3.Calculate the X 2 -statistic using 4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Calculate the critical value of X 2 use X 2 -statistic table 6.Compare X 2 -statistic with critical value 7.If X 2 -statistic > critical value reject H 0 (significant differences between O and E) 8.If X 2 -statistic < critical value accept H 0 (no significant differences between O and E) = Σ χ 2 (O – E) 2 E [ ] Critical value: X 2 significance level, v Now you try… STATISTICS: CHI-SQUARED TESTS Q: Has the geneticist sampled from a population having a ratio of 9:3:3:1 ? A plant geneticist has done some crossing between plants and come up with the following numbers of different seeds Switch to Excel

1.Establish hypotheses 2.Determine Observed and Expected frequencies 3.Calculate the X 2 -statistic 4.Determine significance level for hypothesis testing 5.Calculate the critical value of X 2 use X 2 -statistic table Critical value: X 2 significance level, v Now you try… STATISTICS: CHI-SQUARED TESTS Q: Has the geneticist sampled from a population having a ratio of 9:3:3:1 ? A plant geneticist has done some crossing between plants and come up with the following numbers of different seeds χ 2 = 8.97 α = 0.05

What is the critical value of χ 2 Critical value: X , 3

1.Establish hypotheses 2.Determine Observed and Expected frequencies 3.Calculate the X 2 -statistic 4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01) 5.Calculate the critical value = Compare X 2 -statistic with critical value 7.If X 2 -statistic > critical value Now you try… STATISTICS: CHI-SQUARED TESTS Q: Has the geneticist sampled from a population having a ratio of 9:3:3:1 ? A plant geneticist has done some crossing between plants and come up with the following numbers of different seeds χ 2 = 8.97 Reject the Null Hypothesis that sample drawn from a population showing 9:3:3:1 ratio of YS:YW:GS:GW

IF Expected Counts are LESS than ONE, then you must combine the categories NB: By combining data you reduce value of K and also v STATISTICS: CHI-SQUARED TESTS…final word…

POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

Continuous Discrete DATA Looking for probabilities: Z-TESTS Comparing two means: T-TESTS Chi - squared Which stats test to use? Use Getting started with data.xls for further advice