Random variable Distribution. 200 trials where I flipped the coin 50 times and counted heads no_of_heads in a trial.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Tests of Hypotheses Based on a Single Sample
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Hypothesis Testing IV Chi Square.
Probability & Statistical Inference Lecture 6
Chapter Seventeen HYPOTHESIS TESTING
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 9 Hypothesis Testing.
Major Points Formal Tests of Mean Differences Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors New Concepts: One and Two.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
AM Recitation 2/10/11.
Chi-Squared Test.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Hypothesis Testing.
Chapter Thirteen Part I
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chapter 16 The Chi-Square Statistic
Confidence intervals and hypothesis testing Petter Mostad
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
CHI SQUARE TESTS.
Chi square analysis Just when you thought statistics was over!!
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
© 2004 Prentice-Hall, Inc.Chap 9-1 Basic Business Statistics (9 th Edition) Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
© Copyright McGraw-Hill 2004
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
The Chi-square Statistic
Chi-Squared Χ2 Analysis
Lecture Nine - Twelve Tests of Significance.
Chapter 4. Inference about Process Quality
Inference and Tests of Hypotheses
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Hypothesis Testing Review
Chapter 9 Hypothesis Testing.
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 4
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
Presentation transcript:

Random variable Distribution

200 trials where I flipped the coin 50 times and counted heads no_of_heads in a trial

or I can describe frequencies in percentages no_of_ones

or as a cumulative histogram no_of_ones

which can also be in percentage scale no_of_ones

When my population is infinite Then my number of elements is infinite – but I can characterize it as a proportion from all observations in any interval (i.e. probability, that randomly chosen value in be in interval) For discrete variable: enumeration of all values and their probabilities p i =P(X=x i ) – as a table or formula.

Continuous variable is characterized by distribution function and probability density probability densityDistribution function

Distribution function F(x) =P(X<x) has these basic properties 1. P(a  X < b) = F(b) - F(a) ; 2. F(x 1 )  F(x 2 ) pro x 1 < x 2 ; It is actually idealized cumulative histogram with infinitely narrow columns.

How to “idealize” normal histogram If my columns are endlessly narrow, there will be “nothing” in them – therefore the percentage of observations of the interval is divided by “width” of the column. In a limit case I get for probability density

For probability distribution govern:

For distributive function, mean and variance can be computed Discrete variable Continuous variable

Quartile When this area is 0.75 or 75% than is 75% quantile of distribution (i.e. upper quartile) Probability distribution

Testing of hypothesis +  2 test

I cannot prove any hypothesis That’s why I formulate null hypothesis (H 0 ), and with rejection of it I prove its opposite. Alternative hypothesis H 1 or H A is negation of the null hypothesis I, the biologist, am the one who formulates null hypothesis – that’s why null hypothesis would be constructed in such way to be interesting if it is rejected.

Errors in decision In the case that the data are random (and it is practically every time in biology) I have to take in account that I can make wrong decision – statistics knows Type I error and Type II error, which are unavoidable part of our decision In addition we can make an error by a mistake in computation, but this isn’t necessary :-).

Recipe for hypothesis’ testing 1. I formulate the null hypothesis 2. I choose the level of significance and so I obtain critical value (from some tables) 3. I compute test criteria from my data 4. When the value of test criteria is higher than critical value, I reject the null hypothesis

 2 test (test of goodness of fit) Example – I hybridize peas: I expect F1:F1: F2:F2: I have 80 offspring – I expect 60:20, I have 70:10 Is it just random variability, or Mendel’s rates doesn’t work in this case?

1. Rejecting of null hypothesis about 3:1 ratio is interesting from the biological view. I could test statistically null hypothesis about 4,2371:1 ratio, its rejecting doesn’t bring us any biologically interesting information. 2. Null hypothesis will be in the formal way: probability of dominant phenotype’s manifestation is 0.75 (in infinitely large population of potential offspring is ratio of phenotypes 3:1)

Calculation DF=1 (number of categories - 1 for prior given hypothesis), critical value = 3,84 Value of test criteria > critical value, I reject null hypothesis – I say, ratio in F 2 is statistically significantly different from expected 3:1 with  = 0.05 – or I write (  2 = 6.66, df=1, P<0.05) f - absolute freqency, i.e. number of random independent observations

What can happen – flipping the coin Reality – the coin is OK, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 55:45 Than  2 =(55-50) 2 /50+(45-50) 2 /50 = 1.0 < I cannot reject null hypothesis. Right decision.

What can happen – flipping the coin Reality – the coin is OK, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 60:40 Then  2 =(60-50) 2 /50+(40-50) 2 /50 = 4.0 > I reject null hypothesis on the 5% level of significance. I have made Type I error (and I gibbet innocent). We know the probability of the error: it is . Level of significance  is subjected to the probability of rejecting null hypothesis providing that it is true.

What can happen – flipping the coin Reality – the coin is false, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 60:40 Then  2 =(60-50) 2 /50+(40-50) 2 /50 = 4.0 > I reject null hypothesis on 5% level of significance. Right decision (and I gibbet blackguard).

What can happen – flipping the coin Reality – the coin is false, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 55:45 Then  2 =(55-50) 2 /50+(45-50) 2 /50 = 1.0 < I cannot reject null hypothesis (and blackguard is free). I have committed Type II error. Its probability is signed as  and it is mostly unknown. 1 -  is power of the test. Generally, the power of the test is higher with deviation from null hypothesis and with number of observations. As we don’t know , the right formulation of our outcome is: Based on our data we cannot reject null hypothesis. Formulation: We have proved null hypothesis is wrong!

Decision Table By given number of observations – the better protected against one type error, the more is outcome predisposed to the second one. If I decide to test on 1% level of significance – the critical value is then 6,63 Right decision

What can happen – flipping the coin Reality – the coin is OK, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 60:40 Then  2 =(60-50) 2 /50+(40-50) 2 /50 = 4.0 <6,63. I don’t reject null hypothesis on 1% level of significance. - OK, I didn’t gibbet innocent.

What can happen – flipping the coin Reality – the coin is false, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) 100 flips, I get 60:40 Then  2 =(60-50) 2 /50+(40-50) 2 /50 = 4.0 < 6,63. I reject null hypothesis on 5% level of significance. Type II error (blackguard is free).

After 20 flips of the coin

Power of test Reality – the coin is false, i.e. P 0 =P 1 =0.5 (BUT WE DON’T KNOW THIS) – When it goes exactly according to probability. 100 flips, I get 55:45 Then  2 =(55-50) 2 /50+(45-50) 2 /50 = 1.0 < I don’t reject error II 1000 flips, I get 550:450 Then  2 =( ) 2 /500+( ) 2 /500 = 10.0 > I reject and it is OK. Reality – the coin is false, i.e. P 0 =0.51; P 1 = flips, I get 51:49 Then  2 =(51-50) 2 /50+(49-50) 2 /50 = 0.04 < I don’t reject error II 1000 hodů, dostávám 510:490 Then  2 =( ) 2 /500+( ) 2 /500 = 0.4 < I don’t reject error II flips, I get 5100:4900 Then  2 =( ) 2 /5000+( ) 2 /5000 = 4 > I reject and it is OK.

Power of test grows With number of independent observations With magnitude of deviance from null hypothesis With lowering protection against Type I error

Percentage of heads in a sample sufficient to reject the null hypothesis P 1 =P 2 =0.5 by the  2 -test as a function of totalnumber of observations P> <P<0.05 P<0.01 percentage

Examples of use phenotype ratio 3:1 9:3:3:1 (number of degrees of freedom = number of categories - 1, for a priori hypothesis, i.e. DF=3)

Examples of use Sex ratio 1:1 Assumptions! Random sampling! The same probability In praxis can be rejecting of null hypothesis sign of three facts: 1. Null hypothesis is wrong. 2. Null hypothesis is right, but the decision os consequence of Type I error. 3. Null hypothesis is right, but the assumptions of the test were violated.

Examples of use Bee’s orientation according to the disk colour H 0 : 1:1:1 How to ensure independence? Solid size of sample

Examples of use Hardy-Weiberg’s equilibrium p 2 + 2pq + q attention – we take off one degree of freedom more for a parameter that we estimate from data, so DF= = 1

What are critical values? The higher deviation from null hypothesis, the higher chi-square

What are critical values? When this is 5%, then 11.1 is critical value on 5% level of significance (here is DF=5)

Nowadays is used more often We can use the opposite procedure as well. We have computed chi- square=14 The area of the “tail” = P = is “Probability” P is probability, that these or more different result from null hypothesis is just thanks to chance, if H 0 is right.

We usually write the result is significant on  = or we write (  2 = 6.66, df=1, P<0.05)

And what is about the  2 value is near aroud zero P>0.99 Can we take it as an evidence of true of H 0 ?

TOO GOOD TO BE TRUE

 2 – is deduced just theoretically, but I simulated these values by flipping the coin. Problem - chi- square is continuous distribution, frequencies are discrete from their definition

That’s why Yates` correlation (on continuity) is sometimes used But this test is too conservative then (i.e. probability of error is usually smaller than α, and so the power of test is smaller too). It is not recommended to use, if the expected frequencies > 5, but isn’t used even if just few of them are smaller.