chromosome organization, what about genome organization?

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

DNA Organization Lec 2. Aims The aims of this lecture is to investigate how cells organize their DNA within the cell nucleus, how is the huge amount of.
Chromatin Structure & Genome Organization. Overview of Chromosome Structure Nucleosomes –~200 bp DNA in 120 Å diameter coil –3.4 Å /bp x 200 = 680 Å –680/120.
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
Genomics and Gene Recognition CIS 667 April 27, 2004.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Retroviruses and Retroposons Chapter Introduction Figure 22.1.
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Stability of the Genome Duplication, Deletion, Transposition.
Genome Organization & Protein Synthesis and Processing in Plants
Genome organization & its genetic implications Lander, ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach,
GENE DUPLICATIONS A.Non-homologous recombination B.Transposition C.Non-disjunction in meiosis.
ECE 501 Introduction to BME
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Introduction to Molecular Biology. G-C and A-T pairing.
(CHAPTER 12- Brooker Text)
Kinetics and Components
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Central Dogma First described by Francis Crick
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Reading the blueprint of life DNA sequencing. Introduction The blueprint of life is contained in the DNA in the nuclei of eukaryotic cells and simply.
Chap. 6 Genes, Genomics, and Chromosomes (Part A)
Organization of the human genome Genome structure Nuclear vs. mitochondrial genomes Gene families Transposable elements Other repeated sequences.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Chapter 19: Eukaryotic Genomes Most gene expression regulated through transcription/chromatin structure Most gene expression regulated through transcription/chromatin.
Nature and Action of the Gene
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Gene Prediction in silico Nita Parekh BIRC, IIIT, Hyderabad.
Selfish DNA Honors Genetics.
Eukaryotic Genomes Demonstrate Sequence Organization Characterized by Repetitive DNA Honors Genetics Lemon Bay High School
Fig Genome = Genic + Intergenic (or non-genic) Eukaryotic genomes: composition of human genome.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
Chapter 21 Eukaryotic Genome Sequences
Fig.1.8 DNA STRUCTURE 5’ 3’ Antiparallel DNA strands Hydrogen bonds between bases DOUBLE HELIX 5’ 3’
HUMAN GENOME Gene density 1/100 kb (vary widely); Averagely 9 exons per gene 363 exons in titin gene Many genes are intronsless Largest intron is 800.
Lecture 10 Genes, genomes and chromosomes
Control of Eukaryotic Genome
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Differences in DNA Heterochromatin vs. Euchromatin
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
The Secret of Life! DNA. 2/4/20162 SOMETHING HAPPENS GENE PROTEIN.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Molecular structure of gene and chromosome Gene: In molecular terms, a gene is the entire DNA sequence required for synthesis of functional protein or.
Human Molecular Genetics Institute of Medical Genetics.
Aim: How is DNA organized in a eukaryotic cell?. Why is the control of gene expression more complex in eukaryotes than prokaryotes ? Eukaryotes have:
Chromosome Organization & Molecular Structure. Chromosomes & Genomes Chromosomes complexes of DNA & proteins – chromatin Viral – linear, circular; DNA.
Organization of prokaryotic, eukaryotic and viral genomes
bacteria and eukaryotes
Eukaryotic Gene Structure
Thursday, March 2, 2017 GOALS: Finish Ghost in your Genes
Organization of the human genome
Gene Action and Expression
Genomes and their evolution
SGN23 The Organization of the Human Genome
Gene architecture and sequence annotation
Evolution of eukaryote genomes
Recitation 7 2/4/09 PSSMs+Gene finding
What kinds of things have been learned?
Organization of the human genome
Gene Density and Noncoding DNA
The gene: structure, function and location
Presentation transcript:

chromosome organization, what about genome organization? We have talked about chromosome organization, what about genome organization?

Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.

C value paradox: the amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes.

There are different classes of eukaryotic DNA based on sequence complexity.

Reassociation Kinetics

3 Main Components in Eukaryotic Genomes

The human genome - Two versions of human genome sequences were published in February 2001. DNA sequences that encode proteins make up only 5% of the genome - ~50% sequences are transposable elements; clusters of gene-rich regions are separated by gene deserts - CH 19 has the highest gene density, CH 13 & Y show the lowest gene density

The human genome -Gene total estimated 30,000-40,000, w/ an average gene size of 27 Kb - Hundreds of genes share homology w/ those of bacteria - The number of introns vary greatly (from 0 for histone to 234 for titin)

The human genome -Genes larger & contain more and larger introns compared to these in invertebrates (dystrophin gene is 2.5 Mb) - Genes are not evenly spaced on CHs - The most common genes include those: involved in nucleic acid metabolism-7.5%; receptors-5%; protein kinases-2.8% & cytoskeletal structural proteins-2.8%

Genome organization in plants - Size of genome varies widely (100 Mb-5,500 Mb) - Many tandem gene duplications & larger duplications; some interchromosomal duplications also observed - Large-genome plants also have genes clustered with long stretches of intergenic DNA - In maize, the intergenic sequences are composed mainly of transposons

Single Copy Sequences

Genes can be difficult to identify/predict. Why?

The human genome turns out to have only about half or fewer (30,000 to 40,000) genes than we predicted (100,000). Why? Drosophila – 13,000 Nematode – 19,000

Problems? It is more complicated than that. Some gene products are RNA (tRNA, rRNA, others) instead of protein Some nucleic acid sequences that do not encode gene products (noncoding regions) are necessary for production of the gene product (protein or RNA).

Coding region

Noncoding regions Regulatory regions Introns RNA polymerase binding site Transcription factor binding sites Introns Polyadenylation [poly(A)] sites

Unique genes

Promoters Sequences can be quite distant from coding region

Introns/exons Most eukaryotic genes have introns Introns are often much longer than exons Often many introns mRNA much shorter than genomic DNA Can vary between the same gene in different species

Splice Sites Eukaryotes only Removal of internal parts of the newly transcribed RNA. Takes place in the cell nucleus Splice sites difficult to predict

Alternative splicing Different splice patterns from the same sequence, therefore different products from the same gene.

Alternative splicing Multiple promoters Multiple terminators Alternatively spliced introns 59% of genes Average of ~3 forms

Exon Shuffling

Why genome size doesn’t matter More sophisticated regulation of expression? Proteome vastly larger than genome? Alternate splicing RNA editing Postranslational modifications? Cellular location? Moonlighting

Gene Identification Open reading frames Sequence conservation Database searches Synteny Sequence features CpG islands Evidence for transcription ESTs, microarrays, SAGE Gene inactivation Transformation, TEs, RNAi

Open reading frames 5'                                                   3'    atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa  1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa     M   P   K   L   N   S   V   E   G   F   S   S   F   E   D   D   V   *   2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat      C   P   S   *   I   A   *   R   G   F   H   H   L   R   T   M   Y   3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata       A   Q   A   E   *   R   R   G   V   F   I   I   *   G   R   C   I 

Database searches

Synteny

CpG islands CpG is subject to methylation, and most eukaryotes (not Drosophila) show less of this nonmethylated dinucleotide than base composition would indicate. Concentrations of CpG may be detected using restriction enzymes whose recognition sequences include CpG.

CpG islands Defined as regions of DNA of at least 200 bp in length that have a G+C content above 50% and a ratio of observed vs. expected CpGs close to or above 0.6. Used to help predict gene sequences, especially promoter regions.

Evidence for Transcription cDNAs, ESTs (expressed sequence tags) microarrays

Gene families E.g. globins, actin, myosin Clustered or dispersed Pseudogenes

Pseudogenes Nonfunctional copies of genes Formed by duplication of ancestral gene, or reverse transcription (and integration) Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing, or due to lack of regulatory sequences

Duplicated genes Encode closely related (homologous) proteins Formed by duplication of an ancestral gene followed by mutation Five functional genes and two pseudogenes

Coding sequences less than 5% of the genome!

Noncoding RNAs Do not have translated ORFs Small Not polyadenylated

Noncoding RNAs Transfer RNAs Ribosomal RNAs < 500 Ribosomal RNAs Tandem arrays on several chromosomes Small nucleolar RNAs (snoRNAs) Single genes Small nuclear RNAs (snRNAs) Spliceosomes Multiple dispersed copies Many pseudogenes

Some noncoding sequences are being found to be highly evolutionarily conserved across diverse species over millions of years. Some of them are in “gene deserts”. They must have a function to be maintained. What is it?

Repetitive DNA Moderately repeated DNA Simple-sequence DNA Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts) Large duplicated gene families Mobile DNA Simple-sequence DNA Tandemly repeated short sequences Found in centromeres and telomeres (and others) Used in DNA fingerprinting to identify individuals

Segmental duplications Found especially around centromeres and telomeres Often come from nonhomologous chromosomes Many can come from the same source Tend to be large (10 to 50 kb) Unique to humans?

Repeat sequences – 50% or more of the genome

Mobile DNA Moves within genomes Most of the moderately repeated DNA sequences found throughout higher eukaryotic genomes L1 LINE is ~5% of human DNA (~50,000 copies) Alu is ~5% of human DNA (>500,000 copies) Some encode enzymes that catalyze movement

Transposon derived repeats Long interspersed elements – LINEs Short interspersed elements - SINEs LTR (long terminal repeat) retrotransposons DNA transposons 45% or more of genome

LINEs LINE1 – active Line2 – inactive Line 3 – inactive Many truncated inactive sequences

Exception – Alu elements Derived from signal recognition particle 7SL Does not share its 3’ end with a LINE Only active SINE in the human genome

LTR (long terminal repeat) Flank viral retrotransposons and retroviruses Repeats contain genes necessary for movement and replication Retroviruses have acquired a CP gene Many fossils

DNA transposons Terminal inverted repeats Transposase 7 major classes Transposition doesn’t occur in humans anymore Horizontal transfer

Different regions of the genome differ in density of repeats Most LINEs accumulate in AT rich regions Alu elements accumulate in GC rich regions – why? Promote protein translation under stress?

Simple sequence repeats Tamdem repeats of a particular k-mer 1 – 13 base repeat unit – microsatellite Trinucleotide repeats 14 – 500 repeats – minisatellites “variable numbers of tandem repeats” 3% of genome Used in mapping