Estimating Hypermutation Rates During In Vivo Immune Responses Steven H. Kleinstein Department of Computer Science Princeton University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

The story beyond Artificial Immune Systems Zhou Ji, Ph.D. Center for Computational Biology and Bioinformatics Columbia University Wuhan, China 2009.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Fast Algorithms For Hierarchical Range Histogram Constructions
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Response Optimization in Oncology In Vivo Studies: a Multiobjective Modeling Approach Maksim Pashkevich, PhD (Early Phase Oncology Statistics) Joint work.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
Phylogenetic Trees Lecture 4
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Resampling techniques
Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Statistics for Managers Using Microsoft® Excel 7th Edition
Habil Zare Department of Genome Sciences University of Washington
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Toward Quantitative Simulation of Germinal Center Dynamics Steven Kleinstein Dept. of Computer Science Princeton University J.P.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Part III Gathering Data.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Comp. Genomics Recitation 3 The statistics of database searching.
Resampling techniques
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
Understanding Sampling
Estimating the Mutation Rate from Clonal Tree Data Steven H. Kleinstein,Yoram Louzoun Princeton University Mark J. Shlomchik Yale University.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Pairwise Sequence Analysis-III
Why are there so few key mutant clones? Why are there so few key mutant clones? The influence of stochastic selection and blocking on affinity maturation.
Estimating Mutation Rates from Clonal Tree Data (using modeling to understand the immune system) Steven H. Kleinstein Department of Computer Science Princeton.
Toward Quantitative Simulation of Germinal Center Dynamics Toward Quantitative Simulation of Germinal Center Dynamics Biological and Modeling Insights.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Step 3: Tools Database Searching
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Simulating the Immune System Martin Weigert Department of Molecular Biology Steven Kleinstein, Erich Schmidt, Tim Hilton, J.P. Singh Department of Computer.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
FYP 446 /4 Final Year Project 2 Dr. Khairul Farihan Kasim FYP Coordinator Bioprocess Engineering Program Universiti Malaysia Perls.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Presentation By SANJOG BHATTA Student ID : July 1’ 2009.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Towards More Realistic Affinity Maturation Modeling Erich R. Schmidt, Steven H. Kleinstein Department of Computer Science, Princeton University July 19,
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
RE-Tree: An Efficient Index Structure for Regular Expressions
Learning Sequence Motif Models Using Expectation Maximization (EM)
William Norris Professor and Head, Department of Computer Science
William Norris Professor and Head, Department of Computer Science
Volume 19, Issue 7, Pages (May 2017)
Some Design Recommendations For ASAP Studies
Thomas Willems, Melissa Gymrek, G
The Most General Markov Substitution Model on an Unrooted Tree
Sequence Similarity Andrew Torda, wintersemester 2006 / 2007, Angewandte … What is the easiest information to find about a protein ? sequence history.
Volume 29, Issue 2, Pages (August 2008)
Geology Geomath Chapter 7 - Statistics tom.h.wilson
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Convergent expansion in less predominant B-lineage clones.
Bcl-2 Obstructs Negative Selection of Autoreactive, Hypermutated Antibody V Regions during Memory B Cell Development  Shailaja Hande, Evangelia Notidis,
Presentation transcript:

Estimating Hypermutation Rates During In Vivo Immune Responses Steven H. Kleinstein Department of Computer Science Princeton University

Immune System Protects Body From Pathogens B cells bind antigens with antibody receptors Commonly Accepted: Somatic Hypermutation Restricted to Germinal Centers Commonly Accepted: Somatic Hypermutation Restricted to Germinal Centers B Cell …ATGC… Flu Antigens B B B ~3 weeks B B B B Hypermutation & Selection

Motivating Experiment B cells T cells Estimate mutation rate to show (hyper?)mutation In auto-immune mouse model, observed mutating B cells in extra-follicular areas of spleen (not germinal centers) Auto-immune Mouse MRL/lpr AM14 heavy chain transgenic (William, Euler, Christensen, and Shlomchik. Science ) Extra-Follicular Areas Dividing B cells FDC Control Primary anti-hapten response to NP (Jacob et al., 1991; Jacob and Kelsoe, 1992; Jacob et al., 1993; Radmacher et al., 1998) Germinal Centers Microdissection (10 cells)

What’s hard about estimating the mutation rate? The number of divisions in vivo is unknown Most recognized in vivo estimates took educated guesses (McKean et al, 1984 and Sablitzky et al, 1985) Most recognized in vivo estimates took educated guesses (McKean et al, 1984 and Sablitzky et al, 1985) Number of Cell Divisions Number of Mutations High Mutation Rate Low Mutation Rate Observed Number of Mutations

Clonal Trees Provide Needed Information Analyze pattern of shared and unique mutations among sequences from each microdissection GermlineGGGATTCTC 1-C-----G G- 3A------GA 4A---C--GA Clonal tree ‘shapes’ reflect underlying dynamics 1(G  A) 9(C  A) (T  G) 2(G  C) 5(T  C) Germline

Relating Tree Shapes to Underlying Dynamics BACD gctc BAD Initial Sequence a t a D t gct BA Investigate with computer simulation of B cell clonal expansion Parameters: # divisions (d), mutation rate (  ), lethal frequency ( ), # sequences (s) Compare:Rate of 0.2 division -1 for 14 divisions Rate of 0.4 division -1 for 7 divisions Relevant shape measures can differentiate similar clones (Simulation Data – 5 sequences / tree)

Intermediate Vertices is Useful Measure Compare:Rate of 0.2 division -1 for 14 divisions Rate of 0.4 division -1 for 7 divisions Shape measures can supplement information from mutation counting (Simulation Data)

Method for Estimating Mutation Rate (  ) Find mutation rate that produces distribution of tree ‘shapes’ most equivalent to observed set of trees Also developed analytical method based on same underlying idea (The Journal of Immunology (2003) Vol. 171 No. 9, ) Also developed analytical method based on same underlying idea (The Journal of Immunology (2003) Vol. 171 No. 9, ) Experimental Observations Set of Observed Tree Shapes Mutation Rate Distribution of Tree Shapes Simulation of B cell expansion Number of mutations Intermediate vertices Sequences at root = Assumes equivalent mutation rate in all trees, although number divisions may differ

Details of the Simulation Method 12…D Tree Tree …0000 Tree T0000 # divisions (d) Equivalent Matrix, E(t,d) # simulated trees ‘equivalent’ to observed tree after d divisions For each value of the mutation rate (  ), calculate likelihood by… t Use Golden Section Search to optimize mutation rate (  ) 2.Likelihood of experimentally observed tree t: 3.Likelihood of experimental dataset: 1.Run simulation many times to fill in equivalent matrix Sample space is subset of all simulation runs

Finding the Optimal Mutation Rate Golden Section Search works by successive bracketing of minimum/maximum Direct Search Method (No Derivative) Simple Implementation, Linear Convergence Method is effective with 128,000 simulations per Likelihood Not tolerant of noise, Make sure evaluation is precise

Master-Worker Framework Naturally parallel application easily makes use of cluster resources Distribution using MPI -- ask about my code sample Master Process Parameters Results Worker 1 Worker 2Worker P …

Estimating the Lethal Frequency ( ) Simulation Model Parameters: mutation rate (  ), # divisions (d), # sequences (s), lethal frequency ( ) Choose so expected R/(R+S) equals observed value over all mutations Only replacement mutations can be lethal, so… Fraction of all mutations that are replacements Experimental Data Observed R / (R + S) Lethal Frequency ( ) Expected R / (R + S) Germline DNA Sequence = H …CAT… H …CAC… Y …TAT… Silent Replacement

Validating the Simulation Method Use simulation to construct synthetic data sets with limited number of trees/sequences reflecting currently available experimental data Method works even with limited number of clonal trees and sequences Method Precision (SD = division -1 )

Testing Method Assumption… All cells in single microdissection divided same number of times (i.e., division is synchronous) Assumption does not significantly impact rate estimate Asynchronous Division Synchronous Division Generation of synthetic data

Mutation Rate in Autoimmune Response Estimated mutation rate is 1.0 ± 0.1 x base-pair -1 division -1 Experimental data set: 31 trees from 7 mice,  6 sequences / tree from extra-follicular areas (Williams et al, Science, 2002) Estimated  0.55 based on R/(R+S)

Mutation Rate in Primary NP Response Estimated mutation rate is 1.1 ± 0.1 x base-pair -1 division -1 Experimental data set: 23 trees,  7 sequences / tree from germinal centers (Jacob et al., 1991; Jacob and Kelsoe, 1992; Jacob et al., 1993; Radmacher et al., 1998) Testing impact on estimate: Data based on larger picks Positive selection may be factor

Summary  Developed simulation and analytical methods to estimate in vivo mutation rates (and lethal frequencies)  First rigorous method for in vivo estimates  Synthetic datasets used to show that…  Methods are precise (± 0.1 x base-pair -1 division -1 )  Major assumptions do not impact results  Distributed computing used to speed evaluation  Extra-follicular B cells in autoimmune mouse hypermutate  Mutation rate (1.0 ± 0.1 x ) similar to NP response (1.1 ± 0.1 x ) Rigorous method to compare mutation rates under varying experimental conditions

Acknowledgements For more information: Yoram Louzoun (Bar-Ilan University) Mark Shlomchik (Yale University) Jaswinder Pal Singh (Princeton University) Kleinstein, Louzoun and Shlomchik The Journal of Immunology (2003) Vol. 171 No. 9, PICASso PICASso Program in Integrative Information, Computer and Application Sciences