Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Inference for Regression
Inferential Statistics
Is it statistically significant?
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Normal Distribution The Normal Distribution is a density curve based on the following formula. It’s completely defined by two parameters: mean; and standard.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Final Review Session.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Chapter 19 Data Analysis Overview
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
Today Concepts underlying inferential statistics
Statistics for Managers Using Microsoft® Excel 5th Edition
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 18-1 Chapter 18 Data Analysis Overview Statistics for Managers using Microsoft Excel.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 14 Inferential Data Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Overview of Statistical Hypothesis Testing: The z-Test
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Statistical Analysis Statistical Analysis
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Lecture 3: Review Review of Point and Interval Estimators
Choosing and using statistics to test ecological hypotheses
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Introduction to Statistics Alastair Kerr, PhD. Think about these statements (discuss at end) Paraphrased from real conversations: – “We used a t-test.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chap 18-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 18-1 Chapter 18 A Roadmap for Analyzing Data Basic Business Statistics.
Data Analysis.
© Copyright McGraw-Hill 2004
Principles of statistical testing
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Hypothesis Testing and Statistical Significance
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick

A Statistician’s ‘Scientific Method’ 1.Define your problem/question 2.Design an experiment to answer the question i.Collect the correct data ii.Choose an unbiased sample that is large enough to approximate the population iii.Quantify random variation with biological and technical replication 3.Perform experiments 4.Conduct hypothesis testing 5.Display the data/results i.Balance clutter vs. information

Important Terms Categorical vs Quantitative Variables/Data Random Variable Mean: Median Percentiles Variance: Standard Deviation: Range Interquartile Range IQR = Q 3 – Q 1 Outliers: Q 1 – 1.5 x IQR > Outliers > Q x IQR

Normal Distribution Frequently arises in nature Does not always apply to a set of data But many statistical methods require the data to be normally distributed! μ = Mean σ = Standard Deviation Probability of a random variable falling between x 1 and x 2 = the area under the curve from x 1 to x 2 “Tail” Probabilities = Probability from –∞ to x or from x to +∞

Assessing Normality: Q-Q Plots Many statistical tools require normally distributed data. How to assess normality of your data? ‘Quantile’ or Q-Q plot: Quantiles of data vs quantiles of normal distribution with same mean and SD as data

The Central Limit Theorem Population vs Sample – Sample mean and standard deviation are random variables! Central Limit Theorem for Sample Proportions: p% of a population has a certain characteristic – NOT a random variable From a sample size n, p% of the sample has the characteristic As n gets large, μ p = p and Central Limit Theorem for Sample Means: A characteristic is distributed in a population with mean μ and standard deviation σ – but not necessarily normally A sample of size n is randomly chosen and the characteristic measured on each individual The average of the characteristic,, is a random variable! If n is sufficiently large, is approximately normally distributed, μ x = μ and σ x = σ/sqrt(n) ^

Error Bars: Standard Deviation vs Standard Error Standard Deviation: The variation of a characteristic within a population. – Independent of n! – More informative Standard Error: AKA the ‘standard deviation of the mean,’ this is how the sample mean varies with different samples. – Remember sample means are random variables subject to experimental error – It equals SD/sqrt(n)

Error Bars: Confidence Intervals “95% Confidence Interval:” the range of values that the population mean could be within with 95% confidence: This is the 95% confidence interval for large n (> 40) For smaller n or different %, the equation is modified slightly. Versions for population proportions exist too. When to Use: Standard Deviation: When n is very large and/or you wish to emphasize the spread within the population. Standard Error: When comparing means between populations and have moderate n. Confidence Intervals: When comparing between populations; frequently used in medicine for ease of interpretation. Range: Almost never.

Design of Experiments: Statistical Models Mathematical models are deterministic, but statistical models are random. Given a set of data, fit it to a model so that dependent variables can be predicted from independent variables. – But never exactly! Ex: Suppose it’s known that x (independent) and y (dependent) have a linear relationship: Here, the β’s are parameters and ε is an error term of known distribution. Find the parameters  make predictions

Design of Experiments: Choosing Statistical Models Quantitative vs Quantitative: Regression Model (curve fitting) Categorical (dependent) vs Quantitative (independent): Logistic Regression, Multivariate Logistic Regression Quantitative (dependent) vs Categorical (independent): ANOVA Model Categorical vs Categorical: Contingency Tables

Design of Experiments: Sampling Problems Bias: Systematic over- or under-representation of a particular characteristic. Accuracy: a measure of bias. Unbiased samples are more accurate. Precision: measure of variability in the measurements Adjust sampling techniques to solve accuracy problems Increase the sample size to improve precision

Hypothesis Testing Null Hypothesis, H 0 : – A claim about the population parameter being measured – Formulated as an equality – The less exciting outcome i.e. “No difference between groups” Alternative Hypothesis, H a : – The opposite of the null hypothesis – What the scientist typically expects to be true – Formulated as or ≠ relation

Hypothesis Testing: Example Example: Comparing HASMC proliferation on collagen I and collagen III. The null hypothesis: the proliferation on both collagens is the same. The alternative hypothesis: the proliferation on collagens I and III is not the same. H 0 : μ collagen I = μ collagen III H a : μ collagen I ≠ μ collagen III

5 Steps to Hypothesis Testing 1.Pick a significance level, α 2.Formulate the null and alternative hypotheses 3.Choose an appropriate test statistic A test statistic is a function computed from the data that fits a known distribution when the null hypothesis is true. 4.Compute a p-value for the test and compare with α 5.Formulate a conclusion

First… what is a p-value? A p-value is the probability of observing data that does not match the null hypothesis by random chance. If p = 0.05, there is a 5% chance that the observed data is due to random chance and a 95% chance that the observed data is a real effect. Test DecisionH 0 TrueH 0 False Fail to reject H 0 Correct decisionERROR Reject H 0 ERRORCorrect decision

Hypothesis Tests for Normally Distributed Data t-tests: 1 sample t-test: Compare a single population mean to a fixed constant. 2 sample t-test: Compare 2 independent population means. Paired t-test: Compare 2 dependent population means z-tests: Like t-tests, except for population proportions instead of means. F-tests: Decides whether the means of k populations are all equal.

Non-Parametric Tests for Abnormally Distributed Data Wilcoxon-Mann-Whitney Rank Sum Test: Comparable to the 2-sample t-test. Non-parametric tests are more versatile, but less powerful. Still have assumptions to satisfy!

Displaying Data Bar chart: Categorical vs Quantitative, Small # of Sample Types Pie chart: Bar chart alternative when dealing with population proportions. Histogram: Observation frequency, use with large # of observations Dot plot: Like a histogram with fewer observations Scatter: Quantitative vs quantitative Box plot: Quantitative vs categorical. Describes the data with median, range, 1 st and 3 rd quartiles for easy comparison between many groups. Data Characteristic Statistical MeasureWhen to Use Center/”Typical ” value Mean Median No outliers, large sample Possible outliers VariabilityStandard deviation IQR Range No outliers, large sample Possible outliers Almost never

Correlation vs Causation Correlation describes the relationship between 2 random variables. Correlation coefficient:

Biological vs Technical Replicates All the cells in 1 flask are considered 1 biological source Therefore, replicate wells of cells seeded for an experiment are technical replicates. They only measure variability due to experimental error! To increase n, the number of samples, we must repeat experiments with different flasks of cells! It is not appropriate to use error bars if you have not repeated the experiment with biological replicates.

Binomial Distribution n independent trials p probability of success of each trial (1 – p) probability of failure What is the probability that there will be k successes in n independent trials? where