Parametric Inference and Drosophila Alignments Female Male Karyotype A project to compare and contrast Drosophila.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Discrete models of biological networks Segunda Escuela Argentina de Matematica y Biologia Cordoba, Argentina June 29, 2007 Reinhard Laubenbacher Virginia.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Molecular Evolution Revised 29/12/06
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter + and thanks.
Comparative Genomics of Drosophila Lior Pachter Department of Mathematics & Computer Science UC Berkeley (on sabbatical at Oxford ) Joint work.
Heuristic alignment algorithms and cost matrices
Picking Alignments from (Steiner) Trees Fumei Lam Marina Alexandersson Lior Pachter.
Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Bioinformatics and Phylogenetic Analysis
Finding Genes based on Comparative Genomics Robin Raffard November, 30 th 2004 CS 374.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple sequence alignment
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Multiple Sequence Alignments
Annotation and Alignment of the Drosophila Genomes.
Parametric Inference for Biological Sequence Analysis Lior Pachter and Bernd Sturmfels Mathematics Dept., U.C. Berkeley.
Annotation and Alignment of the Drosophila Genomes.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Biology 4900 Biocomputing.
Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas, May 29, 2006.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequencing a genome and Basic Sequence Alignment
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Phylogeny Ch. 7 & 8.
Construction of Substitution matrices
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
What is genomics? Genes, promoters, regulatory elements, alignments, trees, …
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Multiple sequence alignment (msa)
Basics of Comparative Genomics
LMO
Eukaryotic Gene Finding
In Bioinformatics use a computational method - Dynamic Programming.
Chapter 19 Molecular Phylogenetics
Basics of Comparative Genomics
Chapter 10.
Basic Local Alignment Search Tool
Multiple Sequence Alignment
Presentation transcript:

Parametric Inference and Drosophila Alignments

Female Male Karyotype A project to compare and contrast Drosophila

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT-DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** Alignment of an intron

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon Alignment of an intron droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon Alignment of an intron droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna droMel droPse TACTTAC droMoj TCAC--- droSim droVir TCAC--- droYak

DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT- DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- droAna droMel droPse TACTTAC droMoj TCAC--- droSim droVir TCAC--- droYak droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC 64% 50% 43%

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroEre_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCA-TTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** X

Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline Beata Bielinska, Jining Lü, David Sturgill and Brian Oliver Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland Vol. 169, , January 2005

DroAna_ _ TTTCGGTGATTTTGAGTCT CATATTGTATATTGTCTTCTT----- DroEre_ _ TCCGGGTGATTTTCCGTTG CTTTTT-TTTTTTGCCTGCTT----- DroMel_4_ TC--GGTGATTTTCCGTTG CTTTTT-TATTGTGTGTGCAC----- DroMoj_ _ TTTCGTTGTTATTACATTCTATTTTAATTTCGGAGTAATCTTCGTT CTCTTG DroPse_1_ TCTCGGCAGTTTTTCGTTGTAATATA-TTGGGGACTATTTGT DroSim_ _ DroVir_ _ TTTCGTTGTTATTTAATT ATTTAAGGCTCGTTTTCTTTTGCCCACCCCCCTA DroYak_1_ TC--GGTGATTTTCCGTTG TTTCTT-T-TTTCGCCCGCAC----- DroAna_ _ CTCGAAAGTTCCTTGACTCCTAGCATCCA------TTACATTACATTAGA---- DroEre_ _ TCGAAAAGTTCTAT------TGGGTTCCACACGGTTTTCATATAGTTTGAA--- DroMel_4_ TCG-AAAGTTCTAT------TAGGTTCCACAGGGTTTTTATA CA--- DroMoj_ _ CGCTTTTCGC----TTTCGGGCAAGTGCCGTT----AACTTTTGCTTTACA--AGAATGT DroPse_1_ GAAATTTTCT TTTAGATACAAAAATAC--- DroSim_ _ DroVir_ _ CCCTATTCGCTCGGTTTCGGGCAACTGCCGTTGCACATTTATAACGTAAC----GAATGT DroYak_1_ TC-----GTTCTAT------TAGGTTCCACAAGGTTTTCATA TA--- DroAna_ _ TCTATTATT TCTA DroEre_ _ CATAAT DroMel_4_ TATGATT----AATT CGTA DroMoj_ _ AAAACTTATG CGCGCATCAGTGCATACATACAAACATA-- DroPse_1_ AAAGGATCGGT--TT TATC DroSim_ _ DroVir_ _ AAAACTCATGATGCGCATGCAGCACTAACACATGCATACATGCATACATACATACATATA DroYak_1_ CATAGTTTGATAGTT TGTA Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline Beata Bielinska, Jining Lü, David Sturgill and Brian Oliver Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland Vol. 169, , January 2005

Female Male Karyotype Differences among the Drosophila

Available Drosophila whole genome multiple alignments MAVID MULTIZ (currently no D. erecta )

DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroEre_ _ CTGCGGGATTAGGGGTCATTGGTGT GCCAAAAGTCGC GTTT DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT-DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroEre_ _ ACTTTATAGACTCAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p MAVID

DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT-DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p MAVID

droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC *** ** Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p MULTIZ

droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_2959 TATTGACTCAC dp3.chr4_group3 TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_6 GGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT-DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** **

droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna droMel droMoj TCAC--- droPse TACTTAC droSim droVir TCAC--- droYak Higgins et al.,CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research 22 (1994) p CLUSTAL W

droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_2959 TATTGACTCAC dp3.chr4_group3 TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_6 GGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC >dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC >dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC How is an alignment made from the sequences?

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ TGTACTTAC Each alignment can be summarized by counting the number of matches ( #M ), mismatches ( #X ), gaps ( #G ), and spaces ( #S ). #M=31, #X=22, #G=3, #S=12 #M=27, #X=18, #G=3, #S=28 2(#M+#X)+#S=n+m ( n,m length of seqs.) so #X,#G and #S suffice. This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the new book Algebraic Statistics for Computational Biology (edited by L. Pachter and B. Sturmfels).

We can mark a point in space for every alignment… In the example of our two sequences there are different alignments, but only different summaries. So we don’t need to mark that many points. But is still quite a large number. Fortunately, there are only 69 vertices on the convex hull. That is something we can draw…

>mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC For the sequences: 49 #x=24, #S=10, #G=2 There are eight alignments that have this summary. the polytope is:

mel CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGA GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGA GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAG AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAG AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGAC pse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC Consensus at a vertex

The vertices of the polytope have special significance. They correspond to optimal alignments, that is alignments that maximize X*(#X)+G*(#G)+S*(#S) For some choice of X,G and S. Example: the eight alignments summarized by are optimal for the parameters: M = 100, X = -100, S = -30, G = #x=24, #S=10, #G=2

(#X #S #G)[#alignments] 40 (15,16,16)[1080] 41 (17,30,2)[4] 42 (18,14,5)[4] 43 (18,16,4)[56] 44 (20,10,6)[16] 45 (20,10,7)[24] 46 (23,8,6)[6] 47 (23,8,8)[165] 48 (24,8,3)[38] 49 (24,10,2)[8] 50 (25,8,2)[24] 51 (25,62,3)[2] 52 (28,48,2)[1] 53 (29,8,1)[6] Finding the polytope is what we call parametric inference. Colin Dewey’s polytope propagation software can find the vertices in 20 seconds.

The MAVID, MULTIZ and CLUSTALW multiple alignments did not contain a D. melanogaster -- D. pseudoobscura pairwise alignment corresponding to a vertex on the polytope. Reasons: D. melanogaster and D. pseudoobscura are not neighbors on the tree and were therefore aligned during a heuristic “progressive alignment” step. The correct alignment requires a model that includes a parameter for the transition /transversion ratio.

Example where robust alignments are crucial: Transcription-associated mutational asymmetry in mammalian evolution Green et al. Nature Genetics, 33 (2003). Observation: A G > T C G A > C T in mammalian genes on the coding strand of transcribed regions. In fact, A G transitions were 58% more frequent than T C and G A transitions were 18% more frequent than C T. This is established by examining human-chimpanzee-baboon alignments (with baboon the outgroup): human chimp baboon Peter Huggins has confirmed that there is transcription-associated mutational asymmetry in Drosophila (ratios of 40%). D. mel D. sim D.yak

from Green et al. But the real problem is testing non-coding DNA…

>mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC Associated to every pair of sequences is a polynomial built from the “summaries” of the alignments. 49 #x=24, #S=10, #G=2 corresponds to the monomial 8x 24 S 10 G 2 For example: How do we build the polytope for ?

NP i,j = S*NP i-1,j +S*NP i,j-1 +(X or M)*NP i-1,j-1 A A C A T T A G A AGATTACCACA Newton polytope for positions [1,i] and [1,j] in each sequence Convex hull of union Minkowski sum Polytope propagation

Next Steps Biology Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes.

Next Steps Biology Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes. Distinguish robust alignments that do not depend critically on changes in parameters, from unreliable alignments. Investigate biological questions parametrically. Mathematics Optimize polytope propagation, and investigate other fast methods for building alignment polytopes. Study the structure of alignment polytopes. Develop a parametric framework for multiple alignment.