Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Slides:



Advertisements
Similar presentations
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Advertisements

Inferential Statistics
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Dealing With Statistical Uncertainty
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Maximum likelihood (ML) and likelihood ratio (LR) test
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Maximum likelihood (ML) and likelihood ratio (LR) test
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Topic 2: Statistical Concepts and Market Returns
DATA ANALYSIS Module Code: CA660 Lecture Block 5.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inference about a Mean Part II
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses Statistics.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Chapter 8 Introduction to Hypothesis Testing
5-3 Inference on the Means of Two Populations, Variances Unknown
Maximum likelihood (ML)
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Engineering Probability and Statistics
One Sample  M ean μ, Variance σ 2, Proportion π Two Samples  M eans, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π Multiple.
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Random Sampling, Point Estimation and Maximum Likelihood.
Interval Estimation and Hypothesis Testing
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
1 Introduction to Hypothesis Testing. 2 What is a Hypothesis? A hypothesis is a claim A hypothesis is a claim (assumption) about a population parameter:
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Mid-Term Review Final Review Statistical for Business (1)(2)
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Estimating a Population Proportion
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
© Copyright McGraw-Hill 2004
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Presentation transcript:

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests

Sources of Variation  Definition: Sampling variation results because we only sample a fraction of the full population (e.g. the mapping population).  Definition: There is often substantial experimental error in the laboratory procedures used to make measurements. Sometimes this error is systematic.

Parameters vs. Estimates  Definition: The population is the complete collection of all individuals or things you wish to make inferences about it. Statistics calculated on populations are parameters.  Definition: The sample is a subset of the population on which you make measurements. Statistics calculated on samples are estimates.

Types of Data  Definition: Usually the data is discrete, meaning it can take on one of countably many different values.  Definition: Many complex and economically valuable traits are continuous. Such traits are quantitative and the random variables associated with them are continuous (QTL).

Random We are concerned with the outcome of random experiments.  production of gametes  union of gametes (fertilization)  formation of chiasmata and recombination events

Set Theory I Set theory underlies probability.  Definition: A set is a collection of objects.  Definition: An element is an object in a set.  Notation: s  S  “s is an element in S”  Definition: If A and B are sets, then A is a subset of B if and only if s  A implies s  B.  Notation: A  B  “A is a subset of B”

 Definition: Two sets A and B are equal if and only if A  B and B  A. We write A=B.  Definition: The universal set is the superset of all other sets, i.e. all other sets are included within it. Often represented as .  Definition: The empty set contains no elements and is denoted as . Set Theory II

Sample Space & Event  Definition: The sample space for a random experiment is the set  that includes all possible outcomes of the experiment.  Definition: An event is a set of possible outcomes of the experiment. An event E is said to happen if any one of the outcomes in E occurs.

Example: Mendel I  Mendel took inbred lines of smooth AA and wrinkled BB peas and crossed them to make the F1 generation and again to make the F2 generation. Smooth A is dominant to B.  The random experiment is the random production of gametes and fertilization to produce peas.  The sample space of genotypes for F2 is AA, BB, AB.

Random Variable  Definition: A function from set S to set T is a rule assigning to each s  S, an element t  T.  Definition: Given a random experiment on sample space , a function from  to T is a random variable. We often write X, Y, or Z. If we were very careful, we’d write X(s).  Simply, X is a measurement of interest on the outcome of a random experiment.

Example: Mendel II  Let X be the number of A alleles in a randomly chosen genotype. X is a random variable.  Sample space is  = {0, 1, 2}.

Discrete Probability Distribution  Suppose X is a random variable with possible outcomes {x 1, x 2, …, x m }. Define the discrete probability distribution for random variable X as with

Example: Mendel III

Cumulative Distribution  The discrete cumulative distribution function is defined as  The continuous cumulative distribution function is defined as

Continuous Probability Distribution  If exists, then f(x) is the continuous probability distribution. As in the discrete case,

Expectation and Variance

Moments and MGF  Definition: The r th moment of X is E(X r ).  Definition: The moment generating function is defined as E(e tX ).

Example: Mendel IV  Define the random variable Z as follows:  If we hypothesize that smooth dominates wrinkled in a single-locus model, then the corresponding probability model is given by:

Example: Mendel V

Joint and Marginal Cumulative Distributions  Definition: Let X and Y be two random variables. Then the joint cumulative distribution is  Definition: The marginal cumulative distribution is

Joint Distribution  Definition: The joint distribution is  As before, the sum or integral over the sample space sums to 1.

Conditional Distribution  Definition: The conditional distribution of X given that Y=y is  Lemma: If X and Y are independent, then p(x|y)=p(x), p(y|x)=p(y), and p(x,y)=p(x)p(y).

Example: Mendel VI P(homozygous | smooth seed) =

Binomial Distribution  Suppose there is a random experiment with two possible outcomes, we call them “success” and “failure”. Suppose there is a constant probability p of success for each experiment and multiple experiments of this type are independent. Let X be the random variable that counts the total number of successes. Then X  Bin(n,p).

Properties of Binomial Distribution

Examples: Binomial Distribution  recombinant fraction  between two loci: count the number of recombinant gametes in n sampled.  phenotype in Mendel’s F2 cross: count the number of smooth peas in F2.

Multinomial Distribution  Suppose you consider genotype in Mendel’s F2 cross, or a 3-point cross.  Definition: Suppose there are m possible outcomes and the random variables X 1, X 2, …, X m count the number of times each outcome is observed. Then,

Poisson Distribution  Consider the Binomial distribution when p is small and n is large, but np= is constant. Then,  The distribution obtained is the Poisson Distribution.

Properties of Poisson Distribution

Normal Distribution  Confidence intervals for recombinant fraction can be estimated using the Normal distribution.

Properties of Normal Distribution

Chi-Square Distribution  Many hypotheses tests in statistical genetics use the chi-square distribution.

Likelihood I  Likelihoods are used frequently in genetic data because they handle the complexities of genetic models well.  Let  be a parameter or vector of parameters that effect the random variable X. e.g.  =( ,  ) for the normal distribution.

Likelihood II  Then, we can write a likelihood where we have observed an independent sample of size n, namely x 1,x 2,…,x n, and conditioned on the parameter .  Normally,  is not known to us. To find the  that best fits the data, we maximize L(  ) over all .

Example: Likelihood of Binomial

The Score  Definition: The first derivative of the log likelihood with respect to the parameter is the score.  For example, the score for the binomial parameter p is

Information Content  Definition: The information content is  If evaluated at maximum likelihood estimate, then it is called expected information.

Hypothesis Testing  Most experiments begin with a hypothesis. This hypothesis must be converted into statistical hypothesis.  Statistical hypotheses consist of null hypothesis H 0 and alternative hypothesis H A.  Statistics are used to reject H 0 and accept H A. Sometimes we cannot reject H 0 and accept it instead.

Rejection Region I  Definition: Given a cumulative probability distribution function for the test statistic X, F(X), the critical region for a hypothesis test is the region of rejection, the area under the probability distribution where the observed test statistic X is unlikely to fall if H 0 is true.  The rejection region may or may not be symmetric.

Rejection Region II 1-  F(x l ) or 1-F(x u )  1-F(x c )  Distribution under H 0

Acceptance Region Region where H 0 cannot be rejected.

One-Tailed vs. Two-Tailed  Use a one-tailed test when the H 0 is unidirectional, e.g.  H 0 :  0.5.  Use a two-tailed test when the H 0 is bidirectional, e.g.  H 0:  =0.5.

Critical Values  Definition: Critical values are those values corresponding to the cut-off point between rejection and acceptance regions.

P-Value  Definition: The p-value is the probability of observing a sample outcome, assuming H 0 is true.  Reject H 0 when the p-value .  The significance value of the test is .

Chi-Square Test: Goodness-of- Fit  Calculate e i under H 0.   2 is distributed as Chi-Square with a-1 degrees of freedom. When expected values depend on k unknown parameters, then df=a- 1-k.

Chi-Square Test: Test of Independence  e ij = np 0i p 0j  degrees of freedom = (a-1)(b-1)  Example: test for linkage

Likelihood Ratio Test  G=2log(LR)  G ~  2 with degrees of freedom equal to the difference in number of parameters.

LR: goodness-of-fit & independence test  goodness-of-fit  independence test

Compare  2 and Likelihood Ratio  Both give similar results.  LR is more powerful when there are unknown parameters involved.

LOD Score  LOD stands for log of odds.  It is commonly denoted by Z.  The interpretation is that H A is 10 Z times more likely than H 0. The p-values obtained by the LR statistic for LOD score Z are approximately 10 -Z.

Nonparametric Hypothesis Testing  What do you do when the test statistic does not follow some standard probability distribution?  Use an empirical distribution. Assume H 0 and resample (bootstrap or jackknife or permutation) to generate: