Genome organization & its genetic implications Lander, ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach,

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

Introduction to genomes & genome browsers
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Introduction to Genetic Analysis TENTH EDITION Introduction to Genetic Analysis TENTH EDITION Griffiths Wessler Carroll Doebley © 2012 W. H. Freeman and.
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & )Gene content 2)Proteome content 3)SNP identification.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
ECE 501 Introduction to BME
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
chromosome organization, what about genome organization?
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Introduction to Molecular Biology. G-C and A-T pairing.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Kinetics and Components
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Reading the blueprint of life DNA sequencing. Introduction The blueprint of life is contained in the DNA in the nuclei of eukaryotic cells and simply.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Using mutants to clone genes Objectives 1. What is positional cloning? 2.What is insertional tagging? 3.How can one confirm that the gene cloned is the.
Eukaryotic Gene Expression The “More Complex” Genome.
CO 10.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Eukaryotic Genomes Demonstrate Sequence Organization Characterized by Repetitive DNA Honors Genetics Lemon Bay High School
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Fig Genome = Genic + Intergenic (or non-genic) Eukaryotic genomes: composition of human genome.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
Fig.1.8 DNA STRUCTURE 5’ 3’ Antiparallel DNA strands Hydrogen bonds between bases DOUBLE HELIX 5’ 3’
Chapter 7 Analyzing DNA and gene structure, variation and expression 1.Sequencing and genotyping DNA Standard/manual DNA sequencing using dideoxynucleotide.
Genomics and Forensics
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Lecture 10 Genes, genomes and chromosomes
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Structure of the Genome Denaturation, Renaturation and Complexity.
Genomics Chapter 18.
How many genes are there?
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Who is smarter and does more tricks you or a bacteria? YouBacteria How does my DNA compare to a prokaryote? Show-off.
Molecular structure of gene and chromosome Gene: In molecular terms, a gene is the entire DNA sequence required for synthesis of functional protein or.
Gene structure and function
Ch 12: Genomes.
Genetics and Evolutionary Biology
Chapter 5 The Content of the Genome
SGN23 The Organization of the Human Genome
Relationship between Genotype and Phenotype
Relationship between Genotype and Phenotype
Gene architecture and sequence annotation
Chapter 9 Organization of the Human Genome
Gene Density and Noncoding DNA
Organisms are made up of cells, cells are largely protein and DNA carries the instructions for the synthesis of those proteins.
Chapter 6 Clusters and Repeats.
The Content of the Genome
DNA Profiling Vocabulary
Relationship between Genotype and Phenotype
Forensic DNA Sadeq Kaabi
Presentation transcript:

Genome organization & its genetic implications Lander, ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach, J Rogers, PS Schnable, K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant Sci 16:77

DNA sequencing technologies First genNext gen (Sanger)(454/Illumina/APG) Read length800 bases bases Speed0.1Gb/day1-5 Gb/day Cost / human genome $70, 000,000$75,000-$250,000 Metzker, M (2010) Sequencing technologies – the next generation. Nature Rev Genet 11:31

What are the challenges for the correct assembly of genome sequence information? Genome size Eukaryotic genomes ~ 10 9 – bp Genome composition Eukaryotic genomes ~ 50 % repetitive DNA

Genome size – the C-value paradox genome size in basepairs

The amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes Genome Size – the C value paradox:

Complexity = length in nucleotides of longest non- repeating sequence that can be formed by splicing together all unique sequence in a sample Eukaryotic genomes contain different classes of DNA based on sequence complexity: highly repetitive middle repetitive unique Genome composition

Genome composition – DNA re- association kinetics complexity in [moles of nucleotide / liter] x sec

Genome composition - DNA re-association kinetics for a complex eukaryotic genome [moles of nucleotide / liter] x sec highly repetitive sequences middle repetitive sequences single copy sequences

From genome composition to genome organization How are unique, middle repetitive and highly repetitive sequences organized in the genome?

Genome organization E. coli S. cerevisiae H. sapiens Z. mays = Repeat= Gene gene island gene desert

Genetic complexity Eukaryotic genomes contain ~ 20,000 – 30,000 genes 30% of protein coding genes are members of gene families duplication & divergence of sequence & gene function

Gene complexity What does a gene look like from a sequence or transcript perspective? no “typical gene” Introns and exons introns can be numerous and long, i.e. some genes are more intron than exon! alternative splicing variants are common Not all genes encode proteins non-coding structural RNAs (e.g. rRNA, tRNA, snRNA, snoRNA) non-coding regulatory RNAs (e.g. miRNA, lncRNA)

Implications of gene and genetic complexity Forward genetics: Have mutant – want gene Via map-based cloning: Map your mutation Look at the genome sequence in the map interval to identify candidate genes Candidate gene identification may not be trivial, even with good genome annotation! Especially an issue for plant genome sequences – only arabidopsis and rice are considered “finished” quality Note further genetic tests required, even if the perfect candidate is identified.

Gene identification - open reading frames 5'atgcccaagctgaatagcgtagaggggttttcatcatga frame 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca taa M P K L N S V E G F S S * frame 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat tgg C P S * I A * R G F H H How to tell real orfs from random chance orfs?

Galindo et al. PLoS Biol 5(5): e106 doi: /journal.pbio Gene identification - short orfs can be translated! e.g. the drosophila tarsal-less gene

Gene identification – database searching e.g.

Gene identification – shared synteny Preserved localization of genes on chromosomes of different species e.g. mouse chromosome 11 and parts of 5 different human chromosomes Perfect correspondence in order, orientation and spacing of 23 putative genes, and 245 conserved sequence blocks in noncoding regions Caution! Even regions of high synteny may not show perfect gene-for-gene correspondence from Gibson & Muse (2002) A Primer of Genome Science,Sinauer Inc.

Gene identification – shared synteny Preserved localization of genes on chromosomes of different species e.g. maize – sorghum (G) - rice (H) Schnable et al. Science 326:1112

Gene identification – promoter elements TATA – box elements 5'-TATAAA-3' or variant plant and animal promoters CpG islands Regions of higher than expected CpG dinucleotide content, un-methlylated in active promoters ~ 40% of mammalian promoters ~ 70% of human promoters but NOT in plant promoter regions Y patch (pyrimidine-rich patch) plant not mammalian promoters

Gene identification – introns & exons Long gene space more intron than exon Extreme example - human clotting factor VIII gene

Gene identification – alternative splicing variants Pistoni et al. RNA Biol 7:441

Gene identification – trans-splicing Gingeras, Nature 461: 206

Gene identification – non-coding RNAs non-coding structural RNAs rRNA & tRNA – transcription & translation snoRNA – small nucleolar RNAs guide chemical modification of rRNAs & tRNAs snRNA – small nuclear RNAs guide splicing reactions non-coding regulatory RNAs miRNA & siRNA - small interfering RNAs RNAi pathway lncRNA - long noncoding RNAs

Origins of long non-coding RNAs Kapranov, Nature Rev Genet 8:413 Overlapping transcriptional architecture e.g. the human phosphatidylserine decarboxylase (PISD) gene

Wilusz et al. Genes Dev. 23: 1494–1504 Functions of lncRNAs

Genome - Transcriptome - Proteome Genome Full complement of an organism’s hereditary information Transcriptome Full set of RNA molecules, coding and non-coding, transcribed from the genome Proteome Full set of proteins expressed from a genome Not a 1:1:1 correspondence

Implications of gene and genetic complexity What is the take-home message for forward genetics?

Implications of gene and genetic complexity Reverse genetics: Have gene – want phenotype Predict phenotypes based on gene function in other organisms Knock out or knock down your gene of interest & look for corresponding changes in phenotype

Gene families Gene duplication followed by: Duplication of gene function Divergence of gene function Loss of gene function leading to a pseudogene e.g. human globin gene family

Gene families Gene duplication followed by: Duplication of gene function Divergence of gene function Loss of gene function leading to a pseudogene e.g. human beta-globin gene cluster chromosome 11 Five functional genes and two pseudogenes

Gene families – paralogs & orthologs Homologs Protein or DNA sequences having shared ancestry Orthologs Homologs created by a speciation event May or may not retain the same function! Paralogs Homologs created by a gene duplication event May or may not retain the same function! It is not always easy or possible to distinguish orthologs from paralogs when comparing genes or proteins between species

Gene families – paralogs & orthologs globin gene paralogs

Gene families – paralogs & orthologs orthologs paralogs orthologs Storz et al. IUBMB Life 63:313

Implications of gene and genetic complexity What are the implications of gene families for forward genetics (i.e. looking for candidate genes that condition a mutant phenotype?) What are the implications of gene families for reverse genetics (i.e. altering gene function and looking for a phenotype)?

Genome organization – repeated sequences ~ 50% of the genome Segmental duplications and copy number variation Tandemly repeated genes rRNA, tRNA and histone gene products needed in large amounts Duplicated gene families Transposons Tandem simple sequence repeats centromeric & telomeric repeats minisatellites microsatellites

Repeated sequences – segmental duplications & copy number variants Segmental duplications > 1 kb block of duplicated sequence with > 90% sequence identity recombine to mediate further copy number variants Koszul & Fischer, C.R. Biologies 332:254

Repeated sequences – segmental duplications & copy number variants

Girirajan et al. Annu Rev Genet 45:203 Copy number variant (CNV) Deviation from diploid copy number at a locus Copy number polymorphism (CNP) CNV present in >1% of a population Recent association with human developmental syndromes

Transposon-derived repeated sequences ~ 45% of human & 85% of maize genome

Transposon-derived repeated sequences Gogvadze & Buzdin Cell Mol Life Sci 66:3727 Many are truncated & inactive Considered to be important in the evolution of genome organization & function

Repeated sequences – short tandem repeats Centromeric Long array (~100,000 bp) of short tandem repeats ~ 5bp drosophila, ~150 bp maize, ~170 bp human not conserved across species in some cases not even conserved in all chromosomes of the same species Association with a centromere-specific histone H3 Telomeric Length varies between species ~ 300 base pairs kilobasepairs Conserved, G-rich repeat sequence vertebrates TTAGGG ; most plants TTTAGGG

Repeated sequences – short tandem repeats Minisatellites (Variable number tandem repeats, VNTRs) bp repeat units ,000 bp arrays The original DNA fingerprinting marker via Southern blotting Now supplanted by microsatellites

Repeated sequences – short tandem repeats [CACACACA] [GTGTGTGT] variety A [CACA] [GTGT] variety B Microsatellites (Simple sequence repeats, SSRs) Di, tri or tetra-nucleotide repeats; 1-10 repeat units per locus Repeat numbers expand or contract over a short evolutionary, or even generational time-frame Amplified by PCR Primers based on unique flanking sequence Products fractionated by capillary or acrylamide gel electrophoresis Co-dominant mapping & fingerprinting markers Both alleles can be detected in a heterozygous individual