Todd D. Taylor, Ph.D. Genome Annotation and Comparative Analysis Team Computational and Experimental Systems Biology Group RIKEN Genomic Sciences Center.

Slides:



Advertisements
Similar presentations
Introduction to genomes & genome browsers
Advertisements

Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Active Lecture Questions for BIOLOGY, Eighth Edition Neil Campbell & Jane Reece Questions prepared by Jung Choi, Georgia Institute of Technology Copyright.
Gene Structure and Identification
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Identification of fusion transcripts with retroviral elements and its application as a cancer biomarker Yun-Ji Kim 1, Jae-Won Huh 2, Dae-Soo Kim 3, Hong-Seok.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
The progress of Glossina genomics at RIKEN GSC Todd Taylor RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)
NEW NEWS of HUMAN FROM MOUSE and CHIMP Nature 420 (6915), 5 Dec 2002 Genome Research 13(3), March 2003.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Genomes and Their Evolution
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Development: differentiating cells to become an organism.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Gene Regulations and Mutations
Sackler Medical School
Identification of Copy Number Variants using Genome Graphs
Mark D. Adams Dept. of Genetics 9/10/04
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Gene action in a mouse model for Down syndrome Joan T. Richtsmeier Department of Anthropology The Pennsylvania State University
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Vertebrates Hair Mammary Glands Amniotic Egg Endothermy Four Limbs
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Accessing and visualizing genomics data
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
CAMPBELL BIOLOGY IN FOCUS © 2014 Pearson Education, Inc. Urry Cain Wasserman Minorsky Jackson Reece 18 Genomes and Their Evolution Questions prepared by.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Genomes and Their Evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Introduction to Bioinformatics II
Functional Impact of Transposable Element using Bioinformatic Analysis
Gene Density and Noncoding DNA
Testis Restricted Expression and Alternative Splicing of SVA-derived Transcripts of MRGPRX3 Gene Yu-Na Noh1, Jae-Won Huh2, Dae-Soo Kim3, Hong-Seok Ha1,
First Draft of Chimpanzee Genome
Testis Restricted Expression and Alternative Splicing of SVA-derived Transcripts of MRGPRX3 Gene Yu-Na Noh1, Jae-Won Huh2, Dae-Soo Kim3, Hong-Seok Ha1,
Molecular detection of SVA-derived transcripts
School of Pharmacy, University of Nizwa
A HERV-K provirus in chimpanzees, bonobos and gorillas, but not humans
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Todd D. Taylor, Ph.D. Genome Annotation and Comparative Analysis Team Computational and Experimental Systems Biology Group RIKEN Genomic Sciences Center Bioinformatics and Comparative Genome Analysis Course Institut Pasteur Tunis - Tunisia April 2, 2007

 Human  Chromosome 21 (Nature, May 2000)  17 of 33.5 Mb  Chromosome 18p (Nature, September 2005)  16 Mb  Chromosome 11q (Nature, March 2006)  81 Mb  ~4-5 % contribution to the Human Genome Project  Chimpanzee  Chromosome 22q (Nature, May 2004)  33.5 Mb (syntenic to human chr21)  Chromosome Y (Nature Genetics, January 2006)  Development of novel methods for gene and promoter prediction  Identifying genes missed by other high-throughput methods  Identification of unique regulatory mechanisms

 Looking for similarities  Compare with distant species, like mouse  Regions that are conserved may be important  Looking for differences  Compare with close species, like primates  Regions that are different may be important  Of course, there are exceptions to every rule!

Homo Pan Gorilla Gibbons Old world monkeys New world monkeys Prosimians Hominidae Catarrhini Hominoidea Anthropoidea Primates Eutheria (placentalia) Mammalia Lagomorpha Rodents Sauropsida Amniota (amniotes) Pongo 5 MYa Hominoidea Primates Mammalia Reptilia + Aves ~350MYa ~ 25 0MYa Metatheria Prototheria Hominidae Heterodonty Mammary glands Homoeothermic Hair Placentation (in most), amnion, internal fertilization Sweat and sebaceous glands Anucleate red blood cells

 34% maps to identical sequence in human genome Hiram Clawson and Kate Rosenbloom (UCSC). 09 June 2006

 95% maps to identical sequence in human genome Hiram Clawson and Kate Rosenbloom (UCSC). 09 June 2006

Nobrega, et al. Science 302, 413 (2003)

 Size  Intelligence  Language  Ageing  Disease susceptibility  Cancer  Schizophrenia  Autism  Triplet expansion diseases  AIDS  Hepatitis

Newton,2002 年4月号

Science 295, (2002)

1.23% substitution

 Number of simple repetitive sequences  Insertion of Alu and L1 elements  Unique sequences  Local duplications  Translocations  Inversions  Fewer CpG Islands predicted in chimp

Compare with small ‘representative’ human chromosome (21) Clone-based sequencing strategy Map chimp BAC-end sequences to human chr. 21 Screen libraries for additional clones to fill gap regions 3 gaps, over 99% coverage

Human Chr21 q-arm Chimp Chr22 q-arm 100% 85% 5Mb Identity

Human Chr21 q-arm Chimp Chr22 q-arm 100% 85% 1Mb Identity

Chimpanzee Sequencing & Analysis Consortium. Nature (205) 437:69-87

 Overall : 1.44% SINE/Alu1.81% LINE/L11.38% CpG islands2.26% Simple repeats4.06%

Base change Insertion frequency Base change Insertion frequency Insertion size

FamilySubfamilyHS21PTR22 LINE/L1L1HS112 LTR/ERV1HERVIP10FH145 MER41A-int102 MER4A1-int50 MER83B-int110 MER SINE/AluAluYa5233 AluYb8372 AluYb971 DNA/MER2Tigger34267 LTR/ERV1LTR49-int1123 LTR/MaLRMLT1E-int05

Human-specific characteristics have been acquired during the 5 million years since the divergence between Pan and Homo. Phylogeny of Hominidae Time Gorilla Pan (Chimpanzee) Homo (Human) Pongo (Orangutan) 5 〜6 MYa Human(?) Chimpanzee Gorilla Orangutan

Homo ACGTGTTTGAAATATTACTGATTGTAA Pan ACGAGTTTGAAATATTATTGATTGTAA Gorilla ACGTGTTTGAATCATTATTGATTGTAA Orangutan ACGTGTTTAAATTATTATTGGTTGCAA LCA ACGTGTTTGAAATATTATTGATTGTAA Gorilla Pan (Chimpanzee) Homo Pongo (Orangutan) Time LCA Outgroup (LCA: The Last Common Ancestor)

Human Chimpanzee Gorilla Orangutan IN/DEL examination based on 10,292,002 finished sequences RIKEN total PCR primers designable good amplification for both* insertion to the human sequence insertion to the chimp sequence * positive amplification found for both chimp and human template DNA

Example 1 Deletion in Human Lineage Example 2 Insertion in Human Lineage Pt Hs Gg Pp Example 3 Deletion in Chimp Lineage Pt Hs Gg Pp Example 4 Allelic Deletion in Chimp Lineage Pt Hs Gg Pp

 284 genes  223 known  19 novel CDS  25 novel transcripts  12 putative  5 predicted  85 pseudogenes

 We lacked information for 6 genes located in sequencing gaps  6 hsa21 genes are absent from the ptr22 sequence (H2BFS, 5 KAP genes from the 21q22.1 cluster)  4 hsa21 genes appear to be pseudogenes in chimp  3 ptr22 pseudogenes are absent from the hsa21 sequence  1 hsa21 pseudogene has a complete ORF in ptr22

 83% of genes have at least one amino acid replacement  10% of the potential ptr22 proteins are predicted to have a different length  Amino acid insertion or deletion  Different start codon  Different stop codon  Other, more complex rearrangement

 Shorter in chimp: ADAMTS5 Longer in chimp: C21orf30

17 bp deletion in chimpanzee Human and chimpanzee splice sites are different Splice-site diversity

C21orf71 C21orf9 TCP10L C21orf96 FLJ32835 The human chr21 genes ordered according to their chromosomal position Sequence identity

Human-specific replacements 1. KIAA COL6A2 3. HUNK 4. AGPAT3 5. DSCR3 6. PWP2H 7. STCH 8. SLC5A3 9. CHAF1B 10. SIM2 11. KCNE2 12. APP 13. C21orf C21orf IFNAR1 16. UBASH3A 17. TMPRSS3 18. DSCR1 19. C21orf7 20. ADARB1 21. TSGA2 22. IFNAR2 23. C21orf KCNE1 25. C21orf2 26. C21orf ATP5A 28. CLDN8 29. C21orf DNMTA1 Chimp-specific replacements 1. BACE2 2. TIAM1 3. BACH1 4. FAM3B 5. C21orf33 6. ADAMTS1 7. C21orf ITGB2 9. HLCS 10. DNMT3L 11. IFNGR2 12. PPIA3L 13. C21orf MRPL CLDN KRTAP CCT8 18. DSCR2 19. TFF2 20. BTG3 21. HSF2BP 22. C21orf115

Chimpanzee Sequencing & Analysis Consortium. Nature (205) 437:69-87

Correralate phenotype with genotype Using Affymetrix arrays it could be shown that the amount of transcript/gene varies in a species-specific manner (Enard et al. 2001). -> What DNA sequence differences are responsible for the observed differences in transcript-levels?

Transcription start site (TSS) Promoter Enhancer 3‘UTR 5‘UTR Transcriptional control RNA stability

ANNOTATED GENES DETECTED GENES UPREGULATED (IN HUMAN) DOWNREGULATED (IN HUMAN) 237 genes annotated for chromosome represented on the affymetrix A-E arrays

189 annotated genes represented on the Affymetrix A-E arrays (Hellmann, Pääbo)

 Identifying cis-regulatory elements in the human genome is a major challenge of the post-genomic era  Promoters and enhancers that regulate gene expression in normal and diseased cells and tissues  Inter-species sequence comparisons have emerged as a major technique for identifying human regulatory elements  Particularly those to the sequenced mouse, chicken and fish genomes  A significant fraction of empirically defined human regulatory modules  Too weakly conserved in other mammalian genomes, such as the mouse, to distinguish them from nonfunctional DNA  Completely undetectable in nonmammalian genomes  Identification of such significantly divergent functional sequences will require complementary methods in order to complete the functional annotation of the human genome  Deep intra-primate sequence comparison is a novel alternative to the commonly used distant species comparisons

Non-coding sequences with primate-specific conservation include three regulatory elements

Nature (2003) 424:

Transcript A-B combines at least one exon (complete or partial overlap) from both Gene A & Gene B – Usually only supported by a few mRNA/EST sequences, and rarely by a CCDS Currently, about 32 known cases found by searching NCBI Entrez (including 8 from chr 11 recently submitted by our group) Transcript A-B combines at least one exon (complete or partial overlap) from both Gene A & Gene B – Usually only supported by a few mRNA/EST sequences, and rarely by a CCDS Currently, about 32 known cases found by searching NCBI Entrez (including 8 from chr 11 recently submitted by our group) Child gene A Child gene B Conjoined Gene A – B Fused transcript formed by combining the exons of two or more distinct genes (child genes) Exon Intron

Chr1 SRP9 – EPHX1 fusion (1 EST evidence-DA417873) Alternate splicing and novel exons observed in fused mRNA

Number of mRNAs examined 456 (326 conjoined genes) At least one exon* from both child genes conserved in Number Chimpanzee mRNAs 125 (69 conjoined genes) Mouse mRNAs 30 (15 conjoined genes) Both Chimpanzee and Mouse mRNAs 25 (11 conjoined genes) 27% Conjoined genes conserved in Chimpanzee 6.5% Conjoined genes conserved in Mouse * Exons considered were part of conjoined gene mRNAs

RIKEN Yoshiyuki Sakaki Tulika P. Srivastava Vineet K. Sharma Asao Fujiyama Masahira Hattori Atsushi Toyoda Yoko Kuroki Yasushi Totoki Hideki Noguchi Hidemi Watanabe Takehiko Itoh (MRI) Chimpanzee Chr 22 Sequencing Consortium Chinese National Human Genome Center at Shanghai, China KRIBB Genome Research Center, Daejeon, Korea National Yang Ming University Genome Research Center, Taipei, Taiwan National Institute of Genetics, Mishima, Japan RIKEN Genomic Sciences Center, Yokohama, Japan GBF, Dept. of Genome Analysis, Braunschweig, Germany Institute for Molecular Biotechnology, Jena, Germany Max-Planck Institute for Molecular Genetics, Berlin, Germany