DATA ANALYSIS Module Code: CA660 Lecture Block 4.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 6 Sampling and Sampling Distributions
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Linkage genes and genetic recombination
Instructor: Dr. Jihad Abdallah Linkage and Genetic Mapping
6- GENE LINKAGE AND GENETIC MAPPING Compiled by Siti Sarah Jumali Level 3 Room 14 Ext 2123.
Linkage and Gene Mapping. Mendel’s Laws: Chromosomes Locus = physical location of a gene on a chromosome Homologous pairs of chromosomes often contain.
Introduction to Statistics
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
Basics of Linkage Analysis
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Chapter 7 Introduction to Sampling Distributions
Module Code: CA660 Lecture Block 3
31 January, 2 February, 2005 Chapter 6 Genetic Recombination in Eukaryotes Linkage and genetic diversity.
Sampling Distributions
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Chapter 6 Introduction to Sampling Distributions
Chapter 7 Sampling and Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
DATA ANALYSIS Module Code: CA660 Lecture Block 5.
Genetic Recombination in Eukaryotes
Part III: Inference Topic 6 Sampling and Sampling Distributions
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sampling Theory Determining the distribution of Sample statistics.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Fall 2013Biostat 5110 (Biostatistics 511) Week 7 Discussion Section Lisa Brown Medical Biometry I.
Statistical Inference for Two Samples
Chapter 5 Sampling Distributions
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Sampling Theory The procedure for drawing a random sample a distribution is that numbers 1, 2, … are assigned to the elements of the distribution and tables.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
 Linked Genes Learning Objective DOT Point: predict the difference in inheritance patterns if two genes are linked Sunday, June 05,
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
© Copyright McGraw-Hill 2004
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Lecture 22: Quantitative Traits II
Linkage -Genes on the same chromosome are called linked Human -23 pairs of chromosomes, ~35,000 different genes expressed. - average of 1,500 genes/chromosome.
GENERAL GENETICS Ayesha M. Khan Spring Linkage  Genes on the same chromosome are like passengers on a charter bus: they travel together and ultimately.
Linkage and Mapping Bonus #2 due now. The relationship between genes and traits is often complex Complexities include: Complex relationships between alleles.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
I. Allelic, Genic, and Environmental Interactions
Chapter 6 Sampling and Sampling Distributions
Virtual University of Pakistan
Gene Mapping in Eukaryotes
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Modern Synthesis concepts from Laboratory Genetics
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Chapter 9 Hypothesis Testing.
Econ 3790: Business and Economics Statistics
Lecture 7 Sampling and Sampling Distributions
Gene mapping March 3, 2017.
Modern Synthesis concepts from Laboratory Genetics
Presentation transcript:

DATA ANALYSIS Module Code: CA660 Lecture Block 4

2 Examples using Standard Distributions/sampling distributions Background Recombinant Interference Greater physical distance between loci  greater chance to recombine - (homologous). Departure from additivity increases with distance -hence mapping. Example: 2 loci A,B, same chromasome, segregated for two alleles at each locus  A,a,B,b  gametes AB, Ab, aB, ab. Parental types AB, ab gives Ab and aB recombinants. Simple ratio. Denote recombinant fraction as R.F. (r) Example: For 3 linked loci, A,B, C, relationship based on simple prob. theory

3 Example cont.- LINKAGE/G.M CONSTRUCTION Genetic Map -Models linear arrangement of group of genes / markers (easily identified genetic features - e.g. change in known gene, piece of DNA with no known function). Map based on homologous recombination during meiosis. If two or more markers located close together on chromosome, alleles usually inherited through meiosis 4 basic steps after marker data obtained. Pairwise linkage - all 2- locus combinations (based on observed and expected frequencies of genotypic classes). Grouping markers into Linkage Groups (based on R.F.’s, significance level etc.). If good genome coverage –many markers, good data and genetic model, No. linkage groups should  haploid no. chromosomes for organism. Ordering within group markers (key step, computationally demanding, precision important). Estimation multipoint R.F. (physical distance - no. of DNA base pairs between two genes vs map distance => transformation of R.F.). Ultimate Physical map = DNA sequence (restriction map also common)

4 STANDARD DISTRIBUTIONS - Examples/Extensions GENETIC LINKAGE and MAPPING Linkage Phase - chromatid associations of alleles of linked loci - same chromosome =coupled, different =repulsion Genetic Recombination - define R.F. (in terms of gametes or phenotypes); homologous case - greater the distance between loci, greater chance of recombining. High interference = problem for multiple locus models. R.F. between loci not additive. Need Mapping Function Haldane’s Mapping Function Assume crossovers occur randomly along chromosome length and average number =, model as Poisson, so P{NO crossover} = e - and P{Crossover} = 1- e -

5 Example - continued P{recombinant} = 0.5  P(Crossover} (each pair of homologs, with one crossover resulting in one-half recombinant gametes) Define Expected No. recombinants in terms of mapping function (m = 0.5 ) R.F. r = 0.5(1-e -2m ) (form of Haldane’s M.F.) with inverse m = ln (1-2r) so converting an estimated R.F. to Haldane’s map distance Thus, for locus order ABC m AC = m AB + m BC (since m AB = - 0.5ln(1-2r AB ) ) etc. Substituting for each of these gives us the usual relationship between R.F.’s (for the no interference situation) Net Effect - transform to straight line i.e. m AC vs m AB or m BC In practice - too simple/only applies to specific conditions; may not relate directly to physical distance = common Mapping Fn. issue).

6 Examples RECOMBINANTS, BINOMIAL and MULTINOMIAL Binomial No. of recombinant gametes, produced by a heterozygous parent for a 2-locus model, with parameters, n and  = P{gamete recombinant} (= R.F.) So for r recombinants in sample of n Multinomial 3-locus model (A,B,C) - 4 possible classes of gametes (non-recombinants, AB recombinants, BC recombinants and double recombinants at loci ABC). Joint probability distribution for r.v.’s requires counting number in each class where a+b+c+d = n and P 1, P 2, P 3, P 4 are probabilities of observing a member of each of 4 classes respectively

7 Sampling and Sampling Distributions – Extended Examples: refer to primer Central Limit Theorem If X 1, X 2,… X n are a random sample of r.v. X, (mean , variance  2 ), then, in the limit, as n , the sampling distribution of means has a Standard Normal distribution, N(0,1) Probabilities for sampling distribution – limits for large n U = standardized Normal deviate

8 Large Sample theory In particular is the C.D.F. or D.F. In general, the closer the random variable X behaviour is to the Normal, the faster the approximation approaches U. Generally, n  30  “Large sample” theory

9 Attribute and Proportionate Sampling recall primer sample proportion and sample mean synonymous Probability Statements If X and Y independent Binomially distributed r.v.’s parameters n, p and m, p respectively, then X+Y ~ B(n+m, p) - (show e.g. by m.g.f.’s) So, Y=X 1 + X 2 +…. + X n ~ B(n, p) for the IID X~B(1, p). Since we know  Y = np,  Y =  (npq) and, clearly then and, further is the sampling distribution of a proportion

10 Difference in Proportions Can use  2 : Contingency table type set-up Can set up as parallel to difference estimate or test of 2 means (independent) so for 100 (1-  C.I. Under H 0 : P 1 – P 2 =0 so, can write S.E. as for pooled X & Y =# successes S.E., n 1, n 2 large. Small sample n-1 2-sided

11 C.L.T. and Approximations summary General form of theorem - an infinite sequence of independent r.v.’s, with means, variances as before, then approximation  U for n large enough. Note: No condition on form of distribution of the X’s (the raw data) Strictly - for approximations of discrete distributions, can improve by considering correction for continuity e.g.

12 Generalising Sampling Distn. Concept - see primer For sampling distribution of any statistic, a sample characteristic is an unbiased estimator of the parent population characteristic, if the mean of the corresponding sampling distribution is equal to the parent characteristic. Also the sample average proportion is an unbiased estimator of the parent average proportion Sampling without replacement from a finite population gives the Hypergeometric distribution. finite population correction (fpc) =  [( N - n) / ( N - 1)], N, n are parent population and sample size respectively. Above applies to variance also.

13 Examples in context Rates of prevalence of CF antibody to P1 virus among given age group children. Of 113 boys tested, 34 have antibody, while of 139 girls tested, 54 have antibody. Is evidence strong for a higher prevalence rate in girls? H 0 : p 1 =p 2 vs H 1: p 1 < p 2 (where p 1, p 2 proportion boys, girls with antibody respectively). Soln. Can not reject H 0 Actual p-value = P{U ≤ -1.44) =

14 Examples – contd. Large scale 1980 survey in country showed 30% of adult population with given genetic trait. If still the current rate, what is probability that, in a random sample of 1000, the number with the trait will be (a) < 250, (b) 316 or more? Soln. Let X = no. successes (with trait) in sample. So, for expected proportion of 0.3 in population, we suppose X ~B(1000,0.3) Since np=300, and √npq = √210 =14.49, distn. of X ~N(300,14.49) (a)P{X<280} or P{X≤279}  (b) P{X≥316} 

15 Examples contd. Blood pressure readings before and after 6 months on medication taken in women students, (aged 25-35); sample of 15. Calculate (a) 95% C.I. for mean change in B.P. (b) test at 1% level of significance, (  = 0.01) that the medication reduces B.P. Data: Subject st (x) nd (y) d =x-y (a) So for 95% C. limits

16 Contd. Value for t based on d.o.f. = 14. From t-table, find t = So, 95% C.I. is: i.e. limits are 8.80  6.08 or (2.72, 14.88), so 95% confident that there is a mean difference (reduction) in B.P. of between 2.72 and (b) The claim is that  > 0, so we look at H 0 :  = 0 vs H 1 :  > 0, So t-statistic as before, but right-tailed (one sided only) Rejection Region. For d.o.f. = 14, t 0.01 = So calculated value from our data clearly in Rejection region, so H 0 rejected in favour of H 1 at  = 0.01 Reduction in B.P. after medication strongly supported by data. 0 t 14 Accept Reject = 1% t 0.01 =