Yaroslav Ryabov Lognormal Pattern of Exon size distributions in Eukaryotic genomes.

Slides:



Advertisements
Similar presentations
Mutations in DNA Gene Expression Protein Synthesis DNA Patterns of Inheritance
Advertisements

Section 8.6: Gene Expression and Regulation
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
RNA Ribonucleic Acid.
Gene Structure: DNA RNA Protein Dr. Jason Tasch. Nucleic Acids Sequence of Nucleotides Nucleotide composed of: –Nitrogenous Base Purine Pyrimidine –Sugar.
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
How Proteins are Made. I. Decoding the Information in DNA A. Gene – sequence of DNA nucleotides within section of a chromosome that contain instructions.
Biology 1060 Chapter 17 From Gene to Protein. Genetic Information Important: Fig Describe how genes control phenotype –E.g., explain dwarfism in.
Protein Synthesis - The “Stuff of Life”
What must DNA do? 1.Replicate to be passed on to the next generation 2.Store information 3.Undergo mutations to provide genetic diversity.
RNA and Protein Synthesis
RNA AND PROTEIN SYNTHESIS RNA vs DNA RNADNA 1. 5 – Carbon sugar (ribose) 5 – Carbon sugar (deoxyribose) 2. Phosphate group Phosphate group 3. Nitrogenous.
RNA and Protein Synthesis
Chapter 10 Transcription RNA processing Translation Jones and Bartlett Publishers © 2005.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
8.4 Transcription KEY CONCEPT Transcription converts a gene into a single-stranded RNA molecule.
DNA to Protein – 12 Part one AP Biology. What is a Gene? A gene is a sequence of DNA that contains the information or the code for a protein or an RNA.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Chapter 17 From Gene to Protein. Gene Expression DNA leads to specific traits by synthesizing proteins Gene expression – the process by which DNA directs.
What’s So Special About DNA? DNA is one of the most boring macromolecules imaginable - its made of only four building blocks and has a perfectly monotonous.
GENETICS UNIT FIVE DAY 1. OPENER If you did your Spring Break Genetics Review HW, take it out and raise your hand in the NEXT 30 SECONDS.
Gene Regulations and Mutations
AP Biology Discussion Notes Friday 02/06/2015. Goals for Today Be able to describe RNA processing and why it is EVOLUTIONARILY important. In a more specific.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Protein Synthesis - The “Stuff of Life”
Transcription and mRNA Modification
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Chapter 10 Opener. Figure 10.1 Metabolic Diseases and Enzymes.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Protein Synthesis -The “Stuff of Life”
Mutations Learning Goal: Identify mutations in DNA (point mutation and frameshift mutation caused by insertion or deletion) and explain how they can affect.
BASIC GENETICS, COMMON TO ALL LIVING THINGS GENOME NUCLEOTIDES CHROMOSOME GENE DNA MUTATION NATURAL SELECTION.
Exam #1 is T 2/17 in class (bring cheat sheet). Protein DNA is used to produce RNA and/or proteins, but not all genes are expressed at the same time or.
Chapter 19 The Organization & Control of Eukaryotic Genomes.
D.N.A Describe how you would go about genetically engineering a bacterium to produce human epidermal growth factor (EGF), a protein used in treating burns.
CFE Higher Biology DNA and the Genome Transcription.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
DNA TranscriptionTranslation The Central Dogma TraitRNA Protein Molecular Genetics - From DNA to Trait RNA processing.
Protein Synthesis - The “Stuff of Life”. 2 Proteins Proteins are the “workhorse” molecule found in organisms. The blue print for proteins is coded in.
Biotechnology is a field of applied biology and biochemistry, that involves the use of living organisms and bioprocesses in engineering, technology, medicine.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Gene Structure and Regulation. Gene Expression The expression of genetic information is one of the fundamental activities of all cells. Instruction stored.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Gene Expression and Protein Synthesis
Fig Prokaryotes and Eukaryotes
Gene Structure: DNA RNA Protein
Exam #1 is T 9/23 in class (bring cheat sheet).
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
I. Central Dogma "Central Dogma": Term coined by Francis Crick to explain how information flows in cells.
Exam #1 W 9/26 at 7-8:30pm in UTC 2.102A Review T 9/25 at 5pm in WRW 102 and in class 9/26.
Chapter 5 RNA and Transcription
Transcription and Gene Expression
Lecture 6 By Ms. Shumaila Azam
DNA Test Review.
From Gene to Protein.
Transcription Chapter 10 Section 1a.
How Proteins are Made.
Gene Expression I pp
7.2 Transcription & Gene Expression
The Structure of the Genome
Unit 7 Part 2 Notes: From Gene to Protein
Genetics Transcription & Translation.
From gene to protein.
Presentation transcript:

Yaroslav Ryabov Lognormal Pattern of Exon size distributions in Eukaryotic genomes

Outline Global and local approaches in analysis of genetic information Vocabulary of contemporary genetics: Exons and Introns What we can learn from exon size distributions? Lognormal pattern and two clases of exons How we can model exon size distributions of real genomes? What could be the biological reason for observed pattern of exons size distributions? Conclusions

Analyzing Genetic Information Local approach analyzing details of nucleotide sequence Basic Local Alignment Search Tool 1990 Watson and Crick 1953 Marshall Nirenberg 1968 Global approach analyzing properties of entire genome Georg Johann Mendel 1866 Recombination of inherited properties Frequency of mutations Size of genome etc.

Analyzing Genetic Information in Post-Genomic Era More than 60 animal genomes with complete annotations

Analyzing Genetic Information in Post-Genomic Era Back to Global Approach ? More than 60 animal genomes with complete annotations

Prokaryote and Eukaryote pro + karyon before + nucleos eu + karyon good + nucleos

Gene expression DNA Transcription with RNA polymerase mRNA Translation with ribosome Protein

DNA Transcription with RNA polymerase mRNA Exons Introns Splicing Splicing in Eukaryotes

Prokaryote vs Eukaryote Most of DNA code is used to produce some cellular product Substantial fraction “silent” DNA regions ExonsIntrons Short genomes: ~ exons Long genomes: ~ exons Long exons: ~ base pairs Short exons: ~ 100 base pairs

Prokaryote vs Eukaryote Exon size distributions Homo SapiensFlavobacteria Bacterium

Homo Sapiens Flavobacteria Bacterium exons b.p. mean exon size exons 267 b.p. mean exon size What we can learn from exon size distributions ?

Assumption: Probability to split DNA at any location and any time is Constant Consequence: The lengths of intervals between splitting points obey Exponential distribution Poisson Process = Const frequency of splitting

Assumption: Probability to split DNA at any location and any time is Constant or in logarithm scale Consequence: The lengths of intervals between splitting points obey Exponential distribution Poisson Process = Const frequency of splitting

Poisson Process Kolmogoroff Process Lognormal distribution Exponential distribution

Lognormal distribution is a consequence of Central Limit Theorem Sum of random variables Normal distribution Lognormal distribution Product of random variables

Kolmogoroff process (1941)

Assumption: Probability to split any exon is Independent of exon size

Kolmogoroff process (1941) Assumption: Probability to split any exon is Independent of exon size

Kolmogoroff process (1941) Assumption: Probability to split any exon is Independent of exon size

Kolmogoroff process (1941) Assumption: Probability to split any exon is Independent of exon size

Kolmogoroff process (1941) Log (Exon Size) Number of Counts Assumption: Probability to split any exon is Independent of exon size Consequence: The lengths of exons obey Lognormal distribution M 

Real Genomes: Two Lognormal Peaks

Parameters of Two Lognormal peaks model: M location of peak maximum Ryabov & Gribskov, Nucleic Acids Research, 36, (2008)

Parameters of Two Lognormal peaks model:  peak width Ryabov & Gribskov, Nucleic Acids Research, 36, (2008)

Parameters of Two Lognormal peaks model: peak area Narrow and Wide, %

Summary of Observed Facts Exon lengths distributions of studied eukaryotic genomes can be fitted by Two Lognormal Distributions Parameters of those two peaks follow two distinctive patterns: changes of peak width and relative peak area correlate with complexity of species. This may indicate presence of two different classes of exons with different evolutionary histories

How we can model exon size distributions of real genomes ? Growth of total genome length Duplications in genome code Merging exons together (intron loss) Exon loss

any exon of length has probability to be modified during time interval If Parameters of Kolmogoroff Process as model parameters for elementary “exon modification” event average number of exons with sizes and with peak positionpeak width Then the process converges to a lognormal distribution with

distribution Exons splitting (decreasing Exon length) Increasing Exon length Parameters of Q(k) function Exons duplicating

Initial exon size distributionA mockup of a bacterial genome with 500 exons Having 1000 bp mean exons length Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

time steps for p=0.001 Modeling exon size distributions in real genomes

What cloud be the biological reason for two exons peaks? Narrow peak Wide peak Number of Counts Log (Exon Size)

Narrow peak Holds approximately Constant position Has approximately Constant width Has greater relative occupation for Complex organisms Wide peak Changes position Width increases with increasing complexity of and organism Has greater relative occupation for Simple organisms What could be the biological reason for two exons peaks?

DNA mRNA Protein Alterative Splicing Untranslated regions of mRNA Introns Two ways of Gene Expression Regulation

DNA mRNA Protein Untranslated regions of mRNA Introns Two ways of Gene Expression Regulation

Conclusions Analysis of global properties of eukaryotic genomes reveals two distinct peaks in statistical distribution of exon sizes The observed peak could be fitted by a sum of two lognormal distributions which may imply that they originated by two different exon splitting pathways described in the general frameworks of Kolmogoroff splitting process Two observed peaks of exons could be correlated with the phenomenon of alternative splicing and with exons contributing into untranslated regions of mRNA. This suggests that the observed separation of exons in two different classes may be originated from two different ways of protein expression regulation.

Acknowledgments Michael Gribskov Alexander Berezhkovskii