Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments.

Similar presentations


Presentation on theme: "1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments."— Presentation transcript:

1 1/30 Comparative Genomics

2 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

3 3/30 Evolution at the DNA Level …ACTGACATGTACCA… …AC----CATGCACCA… Mutation Sequence edits Rearrangements Deletion Inversion Translocation Duplication

4 4/30 We can better understand evolution/ speciation We can find important, functional regions of the sequence (codons, promoters, regulatory regions) It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments). Why Compare Genomes?

5 5/30  Mammals have roughly 3 billion base pairs in their genomes  Over 98% human genes are shared with primates, wth more than 95-98% similarity between genes.  Even the fruit fly shares 60% of its genes with humans! (March 2000)  Differences: gene structure, sequence Remember… one nucleotide change can cause disease such as sickle cell anemia and cancer. Comparing Genomes

6 6/30 Uses all the species Uses a representative protein (the longest) for every gene Builds a gene tree EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E. Genome Res. 2008 Nov 24. How Does Ensembl Predict Homology?

7 7/30 Load longest protein for every gene from all species WU Blastp + SmithWaterman longest translation of every gene against every other (Blast Reciprocal Hit/ Blast Score Ratio) Protein clustering, build multiple alignments (MCoffee) From each alignment, build a gene tree (TreeBest) Reconcile each gene tree with the species tree to determine internal nodes (TreeBest) Orthologues, paralogues… Steps in Homology Prediction..MEDPATA…

8 8/30 Viewing Trees in Ensembl

9 9/30 Types of Homologues Orthologues : any gene pairwise relation where the ancestor node is a speciation event Paralogues : any gene pairwise relation where the ancestor node is a duplication event

10 10/30 The Gene Tree for INS (insulin precursor) A red square is a duplication event (Paralogues) A blue square is a speciation event (Orthologues)

11 Reconciliation M R H M R H species tree unrooted gene tree Duplication node Speciation node MRHMRH MHRMHR gene loss R’ H’ M’

12 12/30 Orthologue Types What is ‘1 to 1’? What is ‘1 to many’?

13 13/30 Protein Families How: Cluster proteins for every isoform in every species + UniProt proteins. BLASTP comparison of: –all Ensembl ENSP… –all metazoan (animal) proteins in UniProt

14 14/30 1.Find the human MYL6 gene: go to its gene summary. 2.How many paralogues does it have? Find them in the gene tree. 3.Which paralogue is closest to the human MYL6 gene? In what taxon is the common ancestor? Homologues Exercise

15 15/30 Pan-taxonomic compara Anopheles gambiae Caenorhabditis elegans Drosophila melanogaster Aspergillus nidulans Neurospora crassa Saccharomyces cerevisiae Schizosaccharomyces pombe B_aphidicola_Tokyo_1998 B_burgdorferi_DSM_4680 B_subtilis E_coli_K12 M_tuberculosis_H37Rv N_meningitidis_A P_horikoshii S_aureus_N315 S_pneumoniae_TIGR4 S_pyogenes_SF370 W_pipientis_wMel Anolis carolinensis Ciona savignyi Danio rerio Equus caballus Gallus gallus Homo sapiens Macaca mulatta Monodelphis domestica Mus musculus Ornithorhynchus anatinus Pan troglodytes Pongo pygmaeus Xenopus tropicalis Dictyostelium discoideum Plasmodium falciparum Plasmodium vivax Arabidopsis thaliana Oryza sativa Vitis vinifera

16 16/30 www.ensemblgenomes.org

17 17/30 Families

18 18/30 Ensembl Proteins in the Family

19 19/30 Overview of the Talk Comparing Genomes Homologies and Families Sequence Alignments

20 20/30 To identify homologous regions To spot trouble gene predictions Conserved regions could be functional To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved) Aligning Whole Genomes- Why?

21 21/30 Aligning large genomic sequences Difficulties: Requires a significant computer resource Scalability, as more and more genomes are sequenced Time constraint As the «true» alignment is not known, then difficult to measure the alignment accuracy and apply the right method

22 22/30 Whole Genome Alignments BLASTZ-net (nucleotide level) closer species e.g. human – mouse Translated BLAT (amino acid level) more distant species, e.g. human – zebrafish EPO/PECAN multispecies alignments ORTHEUS used to determine ancestral alleles

23 23/30 Which Multispecies Alignments? Mercator-Pecan 16 amniota vertebrates + constrained elements Enredo-Pecan-Ortheus (EPO) For 6 primates For 5 teleost fish + constrained elements For 12 eutherian mammals For 34 eutherian mammals + constrained elements

24 24/30 “Phylogenetic Footprinting” – conserved noncoding regions can be functional Regulatory regions discovered in this way for genes: Hoxb-1, Hoxb4, PAX6, SOX9 Non-Coding Regions

25 25/30 More Examples Highly conserved transcription factor binding sites discovered eg. 401 bp non-coding sequence involved in transcriptional regulation of Interleukins. New genes (human-mouse comparison) eg. APOA5, identified as a paralogue to APOA4 in human and mouse.

26 26/30 Going Beyond Mammals Where human-mouse is too conserved, go to other species: Chicken (Mammals and birds: 300MYA) e.g. A cardiac-specific enhancer of Nkx2-5 Human and fish (400-450 MYA) In 2002, comparison of human to Fugu rubripes led to identification of 1000 genes.

27 27/30 Regulatory Features of the PDX1 gene Region in Detail shows conservation of sequence in regions involved in PDX1 transcriptional regulation (1.6-2.8 kb upstream of the gene).

28 28/30 1.Have a look at Region in Detail for the ACN9 gene. 2.Turn on the BLASTZ alignment against macaque. What parts of the macaque genome aligns to this region in human? 3.Turn on the constrained elements for the 33 eutherian mammals. How does this track differ from the BLASTZ alignment? Alignments Exercise

29 29/30 1.Zoom out one box in the zoom slide. Are there constrained elements upstream of the ACN9 transcript that overlap a regulatory feature? 2. View the ‘6 primates alignment’ using the Alignments links at the left. Alignments Continued

30 30/30 Compara Team at EBI Javier Herrero Kathryn Beal Stephen Fitzgerald Leo Gordon


Download ppt "1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments."

Similar presentations


Ads by Google