Download presentation
Presentation is loading. Please wait.
1
Plant Genomic Structure
Phillip SanMiguel Director/Purdue Agricultural Genomics Center
2
DNA Sequencing: Machines and Methods
3
“Next Generation” Sequencing Platforms at the Purdue Genomics Core Facility
Roche “454” GS-FLX Long read (~400 bp) instrument Replacing “Sanger” sequencers (for most high-throughput assays) Applied Biosystems SOLiD Short read (50 bp) instrument Mainly for re-sequencing (SNP discovery) Also expression analysis
4
Sequencer Niches Sanger 454 SOLiD
Smaller sample sets (eg, check a construct) Longer reads (~800 bases) Most accurate base calls 454 most de novo sequencing projects SOLiD resequencing, especially for SNP discovery RNA sequence – microarray-like
5
Raw Capacity First Gen Sequencers Second Gen Sequencers
Sanger Roughly 1 million $1200 per million nt Second Gen Sequencers 454 Roughly 400 million $25 per million nt SOLiD Roughly 6 billion $0.50 per million nt Third Gen Sequencers ???
6
“Library” Construction
No currently existing DNA sequencing instrument can “read” the order of bases in a DNA strand for long stretches ~1000 bases is the maximum read length (Sanger sequencers) Two limitations: “clone” length read length
7
Length Limitations “Clone” length
“Cloning” is classically defined as storing a fragment of DNA from one species in a bacterial (E. coli) “host” Also effectively amplifies that fragment Next gen sequencing methods use PCR “amplicons” to accomplish Both bacterial clone and amplicons have length limitations
8
Bacterial (1st gen) Clones
~300 kb for single copy vectors (BACs) ~50 kb for phage mediated (fosmids, cosmids, phage lambda) ~20 kb for high copy number vectors (pUC, pBluescript, pGEM, etc.)
9
Amplicon (2nd gen) Length Limits
Ultimately limited by PCR amplification length limits Standard PCR limited to ~2 kb But amplicons are created, millions at a time under conditions of limited reactants (microreactors) ~1 kb limit for 454 ~0.5 kb limit for SOLiD
10
Read Length Sanger read lengths max out at 1 kb
Have heard of >1500 base reads with the ancient Licor (Sanger) sequencer 454 read lengths will go to ~ bases within 6 months SOLiD read lengths 50 bases Illumina read length max ~100 bases Future: Pacific Bioscience. 10 kb?
11
“Paired ends” and “Mate pairs”
Can read both ends of a fragment of known length Currently “paired end” denotes a normal fragment library amplicon – read both ends span limited to amplicon size Currently “mate pair” denotes a specialized library construct span not limited to amplicon size
12
Plant Genome Structure
13
What is a Genome? All the DNA (chromosomes) in a cell
Exceptions? Each cell has a complete copy
14
Useful approximate conversion of base numbers to mass
1 billion (109) bp ~= 1 pg of DNA 1 trillion (1012) bp ~= 1 ng of DNA 1 quadrillion (1015) bp ~= 1 ug of DNA
15
The C-value Paradox Arabidopsis 120 Mb Rice 400 Mb Sorghum 750 Mb
Maize Mb
16
Small Genome Size Angiosperms
Genome Size (Gbp) 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Cardamine amara "Large Bitter-cress" .046 Aesculus hippocastanum "Horse Chestnut" .104 Rosa wichuraiana .104 Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... .124 Epilobium palustre Thlaspi alpestre "Penny Cress" .124 Arabidopsis thaliana "Thale Cress" .145
17
Smallest Genome Size Cereals
Genome Size (Gbp) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Oropetium thomaeum 0.21 Chloris gayana “Callide Rhodesgrass” 0.29 Agropyron smithii “Western Wheatgrass” 0.33 Brachypodium sylvaticum "False Brome" Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... 0.39 Hygroryza aristata 0.41 Oryza sativa "Rice" 0.41
18
Brachypodium sylvaticum
Cereal Genome Sizes Genome Size (Gbp) 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 Oropetium thomaeum Chloris gayana Agropyron smithii Brachypodium sylvaticum Hygroryza aristata Oryza sativa Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... Sorghum bicolor Zea mays Triticum urartu Lygeum spartum "Esparto grass" Triticum aestivum
19
Angiosperm genome sizes, linear scale
Genome Size (Gbp) 0.0 20.0 40.0 60.0 80.0 100.0 Oropetium thomaeum Chloris gayana Agropyron smithii Brachypodium sylvaticum Hygroryza aristata Oryza sativa Sorghum bicolor Zea mays Triticum urartu Lygeum spartum Triticum aestivum Cardamine amara Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... Aesculus hippocastanum Rosa wichuraiana Epilobium palustre Thlaspi alpestre Arabidopsis thaliana Fritillaria davisii Trillium rhombifolium Fritillaria assyriaca
20
Angiosperm genome sizes, log scale
Genome Size (Gbp) 0.0 0.1 1.0 10.0 100.0 Oropetium thomaeum Chloris gayana Agropyron smithii Brachypodium sylvaticum Hygroryza aristata Oryza sativa Sorghum bicolor Zea mays Triticum urartu Lygeum spartum Triticum aestivum Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... Cardamine amara Aesculus hippocastanum Rosa wichuraiana Epilobium palustre Thlaspi alpestre Arabidopsis thaliana Fritillaria davisii Trillium rhombifolium Fritillaria assyriaca
21
The Elements of Genomes That Are Shaped by Evolution
To a first approximation, there are only two components of genomes: “Genes” Defined here as “functional genes” – those that directly contribute to cellular structure and function Transposable Elements Can catalyze their own duplication via “transposition” “Selfish DNA”, cellular “purpose” ambiguous
22
What is a Gene? Y1-Q60 In a study of the maize genome is it appropriate to start with a gene. Y1 was cloned by Brent Buckner and I sequenced it. Y1-Q60 is a non-mutant allele -- a “typical” maize gene. 6 exons, introns, transcription start sites, poly-adenylations sites -- and transposable elements. A number of Miniature Inverted-repeat Transposable Elements (MITEs) elements are associated with Y1 and many other, if not most other maize genes. An Ins2 and a Stowaway elements are just 5’ of the gene. Keep in mind this is a non-mutant allele of Y1. Further, in another non-mutant Y1 line -- B73 -- a tourist is inserted in the 6th intron, in the non-coding region but before the poly-adenylation sites of the Q60 lines. In B73 the tourist provides the poly-adenylation site. The checkerboard element is the Mu3 element used by Brent to tag Y1 and subsequently clone it. Finally, back to theQ60 allele, a couple of kb upstream of Y1 is another transposable element, a retroelement.
23
Transposable Element Interlude
24
Why Consider TEs Differently From Genes?
Although their fate is tied to the fate of the organism they “inhabit”, TEs have their own agenda Maintenance of the sequence integrity of genes from generation to generation is critical to survival of a species Deleting a TE is rarely detrimental to an organism
25
Why Not Ignore TEs? TEs frequently compose the majority of a eukaryotic genome TEs contribute regulatory regions to genes by transposing upstream of a gene TEs serve as focal points for deletion events that can delete genes as well TEs can carry genes!
26
Types of Transposable Elements (TEs)
Type I aka “retroelements” Transpose via an RNA intermediate Tend to dominate larger genomes (More detail in next slide) Type II DNA transposable elements Thousands of different types Usually “cut and paste” but can be “copy and paste”
27
Retroelements Retrovirus Ty3/Gypsy-like Ty1/Copia-like LINE
PBS PPT Retrovirus LTR gag prot RT RNase H int env LTR PBS PPT Ty3/Gypsy-like LTR gag prot RT RNase H int LTR PBS PPT Retroelements are transposable elements that transpose via an RNA intermediate. Here 4 types are shown. The Retroviruses to my knowledge have only been shown to exist in animals. The next two lines are called viral-like retroelements or retrotransposons. They are related to retroviruses and share a numer of features with them. They are flanked by long direct duplications “Long Terminal Repeats” LTRs anywhere from 30 bp to 4.5 kb. I don’t want to dwell too much on the internal structure of retroelements -- but they ususally have a long ORF or a few ORFs with lots of identifyable features. Lines do not have LTRs. For my purposes LTRs are critical features of a retrotransposon... Ty1/Copia-like LTR gag prot int RT RNase H LTR LINE gag RT RNase H (A) n
28
Long Terminal Repeat PBS 5' LTR
PPT LTR gag prot int RT RNase H LTR PBS 5' LTR Here is the LTR of a retrotransposon I’ll introduce to you later. Here I’m just using it as an example of a typical retrotransposon. Retrotransposons, once you have sequenced them, have obvious ends. This is because they have Long terminal repeats. An LTR, by itself, doesn’t look like much. Sure, it usually begins with a “TG” and ends with an “AC” but that is not much to go on -- unless you hav ethe other LTR to compare it with. Upon integration of the retrotransposon, the LTRs should be identical. If you compare one with another it is obvious where the element begins and ends. Further, immediately flanking the element are small target site duplications of host DNA -- generally 5 bp -- caused by the staggered cutting mechanism of the retrotransposon’s integrase and subsequent DNA repair. Also the internal end of the LTR should either be next to a Primer Binding Site or a Polypurine Tract. The PBS is usually homologous to the 3’ end of a transfer RNA -- in this case tRNA Methionine. The polypurine tract is less identifiable -- just a stretch of “G’s” and “A’s”. Generally, if you are focused on a lambda clone sized insert of genomic maize DNA containing a gene you may not be overwhelmed by the presensce of retrotransposons. Yeah, you’ll find part of one or two some distance from the gene, but these are large elements, not like MITEs. But genes only make up a small part of the maize genome. To study the maize genome you need to characterize a bigger clone. 5 bp dup. of PBS host DNA 5' end of LTR 3' end of LTR Opie2 -LTR GGACC TGAAAGGGAAA AGGTGCTCTCA AT TGGTATCGGA... 1254 bp
29
LTRs Can Be Used as Molecular “Clocks”
The 2 LTRs of a retrotransposon are initially identical (upon insertion) Subsequent mutations will tend to cause LTRs to diverge Hence, the higher the divergence, the older the element
30
Insertion Times Millions of Years Ago Estimated Substitutions/kb 10 1
10 1 20 2 30 3 40 adh1-S and 4 adh1-F diverge 50 60 5 70 6 10 kb 80 adh1 -F u22
31
Forces that Shape Genomes
Genomes are linear so there are only 2 possible ways to change genome size: Insertion Duplication – copy number increase Deletion Copy number decrease But either type of change can happen at any scale – from a single base to the entire genome
32
Types of changes Translocations Whole genome duplication
Polyploidy “Tandem Segmental Duplication” Vastly different scales Transposition Catalyzed by TE encoded protein Simple Sequence Repeats Polymerase Slipping
33
Gene Duplications – Substrates for Evolution
Single genes may have crucial roles Their mutation would have detrimental effects upon the organism After duplication, one copy of a gene becomes expendable Frequently the extra copy is deleted In cases where it is retained, mutation may change the function of one copy
34
Putting Together the Pieces
The complexity of a genome is generated via moderately simple mechanisms Duplications and transpositions occur Extra gene copies and TEs may mutate under neutral or even positive selection pressures. The result: Genic areas are fairly stable, whereas upstream and downstream regions rapidly evolve.
35
The Picture? Complex, but Driven by Understandable Mechanisms
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.