Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Two-locus systems. Scheme of genotypes genotype Two-locus genotypes Multilocus genotypes genotype.
Mapping genes with LOD score method
Traveling Salesperson Problem
Great Theoretical Ideas in Computer Science for Some.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 Optimization Algorithms on a Quantum Computer A New Paradigm for Technical Computing Richard H. Warren, PhD Optimization.
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
Basics of Linkage Analysis
Joint Linkage and Linkage Disequilibrium Mapping
Algorithms, games, and evolution Erick Chastain, Adi Livnat, Christos Papadimitriou, and Umesh Vazirani Nasim Mobasheri Spring 2015.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Visual Recognition Tutorial
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
MMLS-C By : Laurence Bisht References : The Power to Detect Linkage in Complex Diseases Means of Simple LOD-score Analyses. By David A.,Paula Abreu and.
Point estimation, interval estimation
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
BCOR 1020 Business Statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Confidence Intervals and Hypothesis Testing - II
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Grouping loci Criteria Maximum two-point recombination fraction –Example -r ij ≤ 0.40 Minimum LOD score - Z ij –For n loci, there are n(n-1)/2 possible.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Logistic Regression Analysis Gerrit Rooks
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistics for Political Science Levin and Fox Chapter Seven
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
CS621: Artificial Intelligence
ENGR 610 Applied Statistics Fall Week 8 Marshall University CITE Jack Smith.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Physics 114: Exam 2 Review Material from Weeks 7-11
CONCEPTS OF ESTIMATION
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Boltzmann Machine (BM) (§6.4)
Presentation transcript:

Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering

Linkage Grouping and Locus Ordering  Definition: Linkage grouping is the combining of loci into linkage groups based on the linkage to each other.  Definition: Locus ordering is the arrangement of linkage group loci in a linear order.

Linkage Group  Biologically speaking, a linkage group is a group of loci located on the same chromosome.  Statistically speaking, a linkage group is a group of loci grouped together because of some statistical criteria.  There may be difference between the two definitions if there are large segments of genome without markers that statistically isolate multiple groups of loci on the same chromosome.

The Basis for Linkage Grouping  By this time, you are familiar with recombination fractions  ij, lod scores z ij, and p-values for linkage p ij, where i and j indicate two different markers.  Any or all of these statistics can be used to group loci.

Overview of Linkage Grouping  Iterate through grouping criteria until a grouping with acceptable properties is encountered.  Given particular grouping criteria, iterate through loci to determine grouping pattern.

Identifying a Acceptable Grouping  A grouping that is robust to changes in grouping criteria is desirable.  A grouping pattern that matches with biological expectation (e.g. the number of groups equals the number of chromosomes) is desirable.  Very large linkage groups are a sign that that the grouping criteria are too loose.  Many isolated markers suggest bad data, small population, or small number of markers.

Log Likelihood in Terms of Gamete Frequencies

Log-Likelihood in Terms of Recombination Fractions

Log Likelihood in Terms of Interference

Determining Order: Two-Locus Recombination Fraction  When order is unknown, assume ABC and calculate the three pairwise recombination fraction.  Order from largest to smallest.  Or order from smallest to largest.

Backcross Example Gamete Count ABC and abc  ABc and abC  AbC and aBc  10 9 Abc and aBC  11 1

Backcross Example (contd)  The largest recombination fraction is between B and C (0.30). Therefore, B and C are located at the ends and A is in the middle: BAC.  The smallest recombination fraction is between A and B. Place AB together. Then C is closest to A, so place C next to A to get CAB.  We arrive at the same order. The second method is more easily extended to greater than 3 loci.

Determining Order: Log Likelihood Approach Order ABCACBBAC C l( , C) l( , C=0) l( , C=1) l( , C  1)

Log Likelihood Approach (contd)  When the likelihoods are fully parameterized, there is no difference in log likelihood between locus orders. Cannot use log likelihood to order!  One must constrain the parameters in order to use the log likelihood to order. The most general and probably biologically reasonable restriction is C  1.

Log Likelihood Approach (contd)  Even when the parameters are constrained, there is no degree of freedom to undertake a statistical test (i.e. log likelihood ratio test statistic).  One can resort to the odds ratio, but interpretation is difficult.

Log Likelihood Approach (contd)  The interpretation is that order BAC is 7708 times more likely than ABC.  But, you get different odds with different assumptions on C. In addition there is no distributional information, no way to determine significance.

Multilocus Ordering  Three-Locus Approach  Maximum Likelihood Approach  SARF and PARF  SALOD  Least Square method based on map distances (covered later).

Three-Locus Approach  Find the triplet with highest linkage.  Add next tightest locus by comparing with each of the 3 pairs possible from the established triplet.  Problems: Loci for which there is a contradiction cannot be placed. Local optimal ordering not global optimal order. Information potentially lost by considering triplets only.

 Suppose you order markers 1, 2, and 3 in order , using recombination fraction ranking or log- likelihood.  You are now hoping to place marker 4.  You find 1-2-4, 1-4-3, are supported.  What is the four-locus order?  What if instead you find 1-2-4, 1-4-3, and are supported? Three-Locus Approach (example)

Multilocus Ordering: Maximum Likelihood  Assume that there is no interference. Then we can write the likelihood of the data for all neighboring pairs of loci.  Let there be l loci. Let j=i+1,  ij be the recombination fraction and n ij the number of informative gametes between loci i and j.  Then the likelihood for a particular order is given by:

SARF or PARF  SARF is the Sum of Adjacent Recombination Fractions.  PARF is the product of Adjacent Recombination Fractions.  The goal is the find the order that minimizes SARF or PARF.

SALOD  SALOD stands for Sum of Adjacent Lod Score and one tries to maximize SALOD in order to find the best order.

SALOD PIC

SALOD EN

Traveling Salesman Problem (TSP)  Locus ordering is a special case of TSP. Given l cities with distances between them of x ij, find the shortest route such that all cities are visited once.

Seriation  Used to obtain minimum SARF.  Start: Select the ith locus. Place the most tightly linked loci next to it.  Iterate: Find the remaining unplaced loci which is most closely linked to the ith locus. Place it beside one of the two external loci of all ordered loci to which it is closest.  Finish: There are l locus orders. Find the order with minimum CI.

Simulated Annealing Algorithm  Let the current state of the system be i with energy E i.  Perturb the system to generate the next state j with energy E j.  If E i – E j  0, then the new state j is accepted as being better.  If E i – E j < 0, then state j is accepted with probability given by

Simulated Annealing Algorithm (cont)  For a given temperature T, a thermal equilibrium is eventually obtained and the distribution of states is given by  Once thermal equilibrium is established, lower T and perturb state until a new thermal equilibrium is reached.

Simulated Annealing for SARF  Choose a starting order of l loci.  Perturb the arrangement. Sample perturbations include selecting a random segment and inverting its order, or randomly replacing one locus with another.  Acceptance of new orders is based on the proceeding equations where E i is the SARF for gene order i.  T is chosen to be larger than the largest changes of E i in each stage. Cooling speed of 0.9 has proven useful in simulation.

Statistical Support for Locus Order  We know we can calculate the likelihood ratio when using likelihood-based methods. However, the likelihood ratio is difficult to interpret.  A bootstrap approach is also possible.

Bootstrap Confidence in Locus Order Locus 1Locus 2Locus 3Locus 4Locus 5 Gene 1p 11 p 12 p 13 p 14 p 15 Gene 2p 21 p 22 p 23 p 24 p 25 Gene 3p 31 p 32 p 33 p 34 p 35 Gene 4p 41 p 42 p 43 p 44 p 45 Gene 5p 51 p 52 p 53 p 54 p 55

Confidence Interval for Gene Order Locus NameP ii 95% Interval A B C D E F G H

Dropping Problematic Loci  Sometimes particularly difficult to order loci can affect the order of other loci and removing them can improve confidence in the map. These tend to be tightly linked loci that would need a larger sample size to resolve.  A framework map is one where certain loci are dropped based on some criteria (e.g. log 10 (L 1 /L 2 )>3)

Bootstrap Plus Jacknife  Drop each locus in turn and estimate PCO.  If PCO increases significantly when a locus is discarded, that locus is considered “bad”.  If PCO decreases significantly when a locus is discarded, then that locus is considered essential.

Summary  How to make linkage groups.  How to calculate the log likelihood for three- locus model.  Determining order for for three-locus model.  Computational approaches to multilocus ordering.  Statistical support for locus order.