IMa2(Isolation with Migration)

Slides:



Advertisements
Similar presentations
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Advertisements

The Coalescent Theory And coalescent- based population genetics programs.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Recombination and genetic variation – models and inference
Bayesian Estimation in MARK
Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida.
Markov-Chain Monte Carlo
Sampling distributions of alleles under models of neutral evolution.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular phylogenetics
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Extensions to Basic Coalescent Chapter 4, Part 2.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
A brief introduction to phylogenetics
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Selectionist view: allele substitution and polymorphism
California Pacific Medical Center
Ben Stöver WS 2012/2013 Ancestral state reconstruction Molecular Phylogenetics – exercise.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Lab 7. Estimating Population Structure
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Species Tree Workshop January 14, 2012 Practice with BEST Please download MrBayes 3.2 for either Windows, Macintos, or UNIX from
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Lecture 6 Genetic drift & Mutation Sonja Kujala
Probability and Statistics
Evolutionary genomics can now be applied beyond ‘model’ organisms
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
DIVERSIFYING SELECTION AND FUNCTIONAL CONSTRAINT: ESTIMATING
Advanced Statistical Computing Fall 2016
Figure Legend: From: Bayesian inference for psychometric functions
Bayesian inference Presented by Amir Hadadi
Goals of Phylogenetic Analysis
Statistical Modeling of Ancestral Processes
Ranking Tumor Phylogeny Trees by Likelihood
Molecular Clocks Rose Hoberman.
the goal of Bayesian divergence time estimation
The coalescent with recombination (Chapter 5, Part 1)
Appetizer: Statistical methods in XSPEC
Chapter 16 Table of Contents Section 1 Genetic Equilibrium
Outline Cancer Progression Models
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

IMa2(Isolation with Migration) Reporter: Junning Liu 2017.01.15

INTRODUCTION(https://bio. cst. temple. edu/~hey/software/software The program implements a method for generating posterior probabilities for complex demographic population genetic models. IMa2 works similarly to the older IMa program, with some important additions. IMa2 can handle data and implement a model for multiple populations (for numbers of sampled populations between one and ten) have a known phylogenetic history– not just two populations (as was the case with the original IM and IMa programs) The program is based on the ‘Isolation with Migration’ model and Bayesian inference and Markov chain Monte Carlo.

IM and IMa Assumptions 1: The major overall assumption is that the history of a sample from two populations can reasonably be described by an Isolation with Migration model. 2:Selective Neutrality. The method assumes that the variation within the data set is neutral (i.e. not affected by directional or balancing selection). 3:No Recombination Within Loci 4:Free Recombination Between Loci. 5:Mutation has Followed the Model Applied to the Data

IM and IMa Mutation Models : 1:The Infinite Sites (IS) model (Kimura 1969). 2:The Hasegawa-Kishino-Yano (HKY) model (Hasegawa et al. 1985). 3:The Stepwise Mutation Model (SMM) (Kimura and Ohta 1978). 4:Compound Locus Models.

IM and IMa Parameter Parameters of Isolation with Migration model

IMa2---More parameters exist when there are more than two populations An isolation-with-migration model for three sampled populations.

IMa2 The general k-population model includes the following assumptions: 1:The history of the sampled populations can be represented by a bifurcating phylogenetic tree. 2: The population phylogeny is rooted, and the topology of the tree and the sequence of splitting events in time is known. 3: Each sampled population, as well as each ancestral population, is constant in size and follows Fisher–Wright population assumptions (Ewens 1979). 4: Gene flow may have occurred, in either or both directions, between each pair of populations that coexist over one or more time intervals. 5: No gene flow occurred between unsampled populations and sampled populations or their ancestors.

Running IMa2 IMa2 works with several types of files, some of which must be prepared by the user if they are needed. The primary input file is of course the data file and all analyses require one of these. Data File Format Parameter Prior File Format Nested Model File Format

Input data file format: If there are two populations then the tree string is: (0,1):2

RUNNING IMa2 Input data file formats: line 1 - arbitrary text, usually explaining the content of the file . After line 1, but before line 2, comments can be included to provide explanatory information. Each line of comment must begin with a ‘#’ . line 2 - an integer, the number of populations, npops. Line 3 – the population names in order, separated by one or more spaces. Line 4 – the population string in modified Newick format. line 5 - an integer, the number of loci in the data set, nloci . line 6 - basic information for locus 1. This line contains, in order and each separated by spaces: the locus names; the sample sizes for each population; the size of the locus; the mutation model; the inheritance scalar; possibly a mutation rate; and possibly a range of mutation rates . line 7 - data for gene copy # 1 from population 0.

RUNNING IMa2 The program is run by typing and entering a command at a command line prompt. It is usually simplest to have the program and data files in the same directory (folder), and for the command prompt window to be open In that directory (folder). ./IMa2 -iinputfile -ooutputfile -q2 -m1 -t3 -b10000000 -hn20 -s123

Output file The program generates up to five different types of output files, including: the main results file, genealogy files (ending in .ti), Markov chain state file (ending in .mcf extension) for restarting a run; migration histogram files (ending in .mpt extension) for plotting counts times of migration events; burntend files for showing trend lines during the burnin period.

Output file Main Results File : Input and starting information Load genealogies(L) mode information(L mode only) MCMC information (M mode only) Parameter comparisons, greater than probabilities Means, variances and correlations Marginal peak locations and probabilities Joint peak location and posterior probabilities(L mode only) Histograms ASSCII curves- Approximate posterior densities ASCII plots of parameter trends