Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Medicago truncatula genome: a progress report Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University.

Similar presentations


Presentation on theme: "The Medicago truncatula genome: a progress report Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University."— Presentation transcript:

1 The Medicago truncatula genome: a progress report Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University of Oklahoma broe@ou.edu www.genome.ou.edu Plant and Animal Genome San Deigo January 11, 2004 Photos by Steve Hughes, Genetic Resource Centre (PIRSA-SARDI), Adelaide, Australia. http://www.fao.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/meditrunc.htm

2 An important forage crop A genetically tractable model legume A relatively small (~500 Mbp) diploid genome Active legume research community Medicago Research Consortium Large collection of ESTs Excellent BAC library Integrated physical and genetic map Large number of BAC-end sequences Why sequence the Medicago genome?

3 DNA GenBank Sequence Pipeline at the University of Oklahoma Genome Center, OU-ACGT DNA shearing (Hydroshear TM ) Colony Piking (QPixII TM ) Growing subclones (HiGro TM) Subclone Isolation I (Mini-Staccato TM ) Subclone isolation II (VPrep TM ) Thermocycling (ABI 9700) Sequencing (ABI 3700) Data assembly and Analysis Primer Synthesis Miscelaneous liquid handling Closure

4 This Zymark robot has 384 cannula array, four built in shakers, three attached storage racks, built-in barcoding and a Twister II robotic arm. This automation has allow us to perform the DNA isolation completely unattended from as many as eighty 384 well plates of bacterial cells per day. Subclone Isolation (Mini-Staccato TM )

5 Once all three solutions have been added, the plates are transferred from the SciClone workspace deck to a storage rack by the Twister II robotic arm. Subclone Isolation (Mini-Staccato TM )

6 Liquid handling station with 384-channel pipettor head Four movable shelves on either side of the pipettor head Used for subclone isolation, sequencing reaction set-up and clean-up. Subclone Isolation and Sequencing Reaction Pipetting (Velocity 11 VPrep)

7 Data assembly and Analysis 32 GB RAM running Solaris 8 OS and 3 TB of data stored on RAID-5 arrays with autoloader tape backup Also: 12 workstations each with 1 GB RAM Sun V880 server Phred/Phrap/Consed Exgap

8 Initial WGS Skimming for ~500 Mb Medicago truncatula genome Collected ~25,000 end-sequences from ~12,500 plasmid-based WGS clones. Of these ~25,000 sequences, ~1,000 have homology with Medicago truncatula ESTs. URL: http://www.genome.ou.edu/medicago.html

9 Phrap assembly of our Medicago truncatula whole genome shotgun survey sequencing data at 0.005-fold genomic sequence coverage

10 DotPlot of a Phrap assembled whole genome shotgun contig showing multiple repeated regions 0 100 200 300 400 500 600 700 700 600 500 400 300 200 100 0 Bases

11 DotPlot of a Phrap assembled whole genome shotgun contig showing 4 repeated blocks of ~600 bases 0 500 1000 1000 500 0 Bases

12 Yet another genomic contig showing extensive repeated regions Contig 1931 0 200 400 600 600 400 200 0 Bases

13 >Contig1931 TTTACGTCCCCGTAGTGAACTATTTCCTAAGTTGACTAGTCAATTAGGTG ATAGTTCGTCCGGATGACGTACCGCCGTGAACCCGATATGAGAATTTCAT GTGGTGCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGA ACGTGGCTGGTGTCGTTCACGATAGAGGCACGTTTAGGTCCCTACGGTGA ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATAGTTTGTCCGGATGAC GTACCTCCGTGAACCCGATCTGAGAAATTCAAGTTTCTGCATCCTTCTAT GTTTGATAAGGTCATTTTGAACGGTCGGATTGAAGGTGGCTGGTGTTCTT CACATTCTAGGCACGTTTAGGTTCCCGCGGTGAACTAGTTCCTAAGTTGA CTAGTCAATTAGGTGATAGTTCGTCCGGATGACCTACCTCCGTGAACCCG ATATTAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTT TGAACGGTCAGATTGAACGTGGCTGGTGTCGTTCACGATCTAGGCACGTT TAGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGAT AGTTTGTCCGGATGACGTGACTCCGTAAAGCCAGTATGAGAACTTCTAGT TTCTGCATCCTTTTATGTTTGATAAGGTCATTTTGAACGGTGGGATTGAA CGTTGTTGGTGTCGTTCACGATCTAGGCACGTTTAGGTCCCCGCAGTGAA CTAGTTCCTTAGTTGACTAGTCAATTAGGTGATAGTTCGTCCGGATGACG TATCTCCGTCAGCCCGATCTGAGAAATTCAAATTTCTGCATCCTTCTATG TTTGATAAGGTCATTTTGAACGGTCGGATTGAACGTGGCTGGTGTCGTGC ACGATCAAGGCACGTTTAGGTCCCCGCAGCGAACTAGTTCCTAAGTTGAC TAGTCAATTAGGTGATACCTTGTCCGGATGACGTACCTCCGTGAACCCGA TCTGAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTTT GAACGGTTGGATTGAACATGGCTGGTGTCGTTCACGATCTAGGCACGTTT AGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATA GTTCGTCTGGATGACGTACCTCCTTGAACCCAATATGAGAAATTCAATTT TCTTCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGAAC GTGCCTGGTGTCGTTCACGATCGAGGCACGTTTAGGTCCCCGCAGTGAAC...

14 Summary of our Medicago truncatula WGS Sequencing Assembly with only 0.005-fold Genomic Sequence Coverage The largest contig (21,157 bp) contained the 26S rRNA genes 19 smaller contigs (105,455 bp total) were from the chloroplast genome The remaining ~500 contigs, ranging in size from 2,000 to 12,000 bp contain highly repetitive DNA, which were unique to Medicago, as they had no significant homology in the GenBank database We concluded that a more directed strategy was needed

15 Mapped BAC approach in collaboration with Doug Cook and DJ Kim at U.C. Davis with funding from the Noble Foundation, Ardmore, OK

16 The first ~1000 Medicago truncatula BACs Initially concentrated on BACs with known biological markers and in regions of biological interest that were supplied to us by the UC Davis group.Initially concentrated on BACs with known biological markers and in regions of biological interest that were supplied to us by the UC Davis group. Requests for sequencing specific BACs were directed to Doug Cook and DJ Kim at UC Davis and they supplied us with the BACs once these BACs have been characterized.Requests for sequencing specific BACs were directed to Doug Cook and DJ Kim at UC Davis and they supplied us with the BACs once these BACs have been characterized. Once the BACs were received, we created the shotgun libraries, isolated the sequencing templates and obtained the working draft sequence followed by closure and finishing.Once the BACs were received, we created the shotgun libraries, isolated the sequencing templates and obtained the working draft sequence followed by closure and finishing. All data was made publically available in GenBank within 24 hours of sequence assembly.All data was made publically available in GenBank within 24 hours of sequence assembly.

17 UC Davis -------- Oklahoma University

18

19 The next ~750 Medicago truncatula BACs With recent NSF funding, we will be sequencing BACs from chromosomes 1,4, 6, and 8 with the goal of completing the sequence of the euchromatic regions of these chromosomes over the next 3 years.With recent NSF funding, we will be sequencing BACs from chromosomes 1,4, 6, and 8 with the goal of completing the sequence of the euchromatic regions of these chromosomes over the next 3 years. Chromosomes 2 and 7 will be sequenced at TIGR, chromosome 3 at The Sanger Institute and and chromosome 5 at Genoscope.Chromosomes 2 and 7 will be sequenced at TIGR, chromosome 3 at The Sanger Institute and and chromosome 5 at Genoscope. All data will be released immediately as before.All data will be released immediately as before.

20 www.genome.ou.edu/medicago.html

21 www.genome.ou.edu/medicago_totals.html

22 Medicago-specific gene with ESTs but no known homology Gene density of this BAC is ~1 gene per 10 kb

23 Medicago-specific gene with ESTs but no known homology

24 myosin-like protein Gene density ~1 gene per 10 kb

25 myosin-like protein

26

27 Gene Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

28 Exon Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

29 Intron Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

30 FgeneSHGenscan Total number of genes13,39711,488 Total length of genes30,793,32651,687,528 Total exon length15,794,24314,400,445 Total number of exons59,80855,792 Total intron length14,999,08337,287,083 Total number of introns46,41244,305 _______________________________________________________ Base Pairs Sequenced 87,423,45787,423,457 _______________________________________________________ Gene Space (Gene Length/BP Sequenced)35%59% _______________________________________________________ Gene Density (Genes/200Mb)30,64926,281 1 gene/6.5 kb1 gene/7.6 kb _______________________________________________________ Arabidopsis 25,498 protein coding genes Gene Density of the ~450 Mb Medicago truncatula genome

31 Medicago GC Content for ~90 Mb of Genomic BAC Clones Sequenced (mainly from gene rich regions)

32 Metabolic Overview of Medicago 13,396 FgeneSH predicted genes using the COG Database DNA Metabolism 23% Cellular Processes 23% Metabolism 24% Poorly Characterized 17% No Hits 5% Multiple COG Hits 8%

33 Metabolic Overview (detailed view) of Medicago 13,396 FgeneSH predicted genes using the COG Database

34 Gene Duplication: Three copies of the phosphoglycerate kinase gene in one BAC

35 AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHL----- AC138448.fg.11 MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTIKYLIQNGAKVILSSHL----- AC138448.fg.8 MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHLEIYKT AC138448.fg.10 ------------------------------------------GRPKGVTPKYSLKPLVPRLSELLGTQVK AC138448.fg.11 ------------------------------------------GRPKGVTPKYSLAPLVPRLSELIGIEVI AC138448.fg.8 EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTPKYSLKPLVPRLSELLETQVK AC138448.fg.10 IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.11 KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALADLYVNDAFGTAHRAHASTEGV AC138448.fg.8 ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.10 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.11 TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.8 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIYTFYKA AC138448.fg.10 QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.11 QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANSQIVPASAIPDGWMGLDIGPD AC138448.fg.8 QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.10 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.11 SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTTIIGGGDSVAAVEKVGVADVM AC138448.fg.8 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.10 SHISTGGGASLELLEGKPLPGVLALDDA* 401 amino acids AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEATPVAV* 405 amino acids, differs at 42 positions AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLALDDA* 448 amino acids, differs at 6 positions Gene Duplication: Three copies of phosphoglycerate kinase in one BAC

36 Printrepeat Analysis of M. truncatula BAC AC121240 vs. A. thaliana Chr.2 Expansion, Duplication, Repeat Elements ~5 kb region ~25 kb region

37 PIP of M. truncatula BAC AC121240 vs. A. thaliana Chr.2

38 Medicago truncatula Summary and Conclusions Average Predicted Gene Density of 1 gene per 6.5 to 7.6 Kb by FgeneSH and Genscan, respectively.Average Predicted Gene Density of 1 gene per 6.5 to 7.6 Kb by FgeneSH and Genscan, respectively. Genome characteristics such as %GC, intron/exon size and conserved unique 5’ splice sites reveal Medicago characteristicsGenome characteristics such as %GC, intron/exon size and conserved unique 5’ splice sites reveal Medicago characteristics The sequence of the Medicago truncatula genome shows homology to the sequenced Arabidopsis thalianagenome but expansion, rearrangements and duplications are evident.The sequence of the Medicago truncatula genome shows homology to the sequenced Arabidopsis thaliana genome but expansion, rearrangements and duplications are evident.

39 Data Release and Preliminary Annotation All our sequence data is available through links on our web site to GenBank and on our ftp site at URL: ftp.genome.ou.edu/medicagoAll our sequence data is available through links on our web site to GenBank and on our ftp site at URL: ftp.genome.ou.edu/medicago keyword and blast searches can be done on our web site at URL: http://www.genome.ou.edu/medicago.htmlkeyword and blast searches can be done on our web site at URL: http://www.genome.ou.edu/medicago.html Additional annotation via Genome Browser database are available on our web site at URL: http://www.genome.ou.edu/medicago_table.htmlAdditional annotation via Genome Browser database are available on our web site at URL: http://www.genome.ou.edu/medicago_table.html E-mail suggestions for additional annotation to Bruce Roe at: broe@ou.eduE-mail suggestions for additional annotation to Bruce Roe at: broe@ou.edubroe@ou.edu

40 Three Year Plan Obtain the contiguous sequence of the Gene Rich regions of four of the 8 Medicago truncatula genome at OU, with the remaining four being completed by our international partners at TIGR, Sanger, and Genoscope.Obtain the contiguous sequence of the Gene Rich regions of four of the 8 Medicago truncatula genome at OU, with the remaining four being completed by our international partners at TIGR, Sanger, and Genoscope. This information will serve as a solid foundation for anticipated comparative and functional legume genomics.This information will serve as a solid foundation for anticipated comparative and functional legume genomics.

41 Laboratory Organization Bruce Roe, PI Informatics Support Teams ProductionAdministration Jim White Steve Kenton Hongshing Lai Sean Qian Rose Morales-Diaz* Mounir Elharam* Yonas Tesfai Steve Shaull** Doug White Work-study Undergraduates** Kay Lynn Hale Dixie Wishnuck Tami Womack Mary Catherine Williams DNA Synthesis Phoebe Loh* Sulan Qi Bart Ford* Reagents & Equip. Maint. Mounir Elharam* Doug White Axin Hua Weihong Xu Jami Milam Sara Downard** Limei Yang Angie Prescott* Audra Wendt** Mandi Aycock** Ziyun Yao Steve Shaull* Youngju Yoon Trang Do Anh Do Lily Fu Yang Ye James Yu Tessa Manning** Fu Ying Liping Zhou Ruihua Shi Junjie Wu Stephan Deschamps Shelly Oommen Christopher Lau Yanhong Li Research Teams Doris Kupfer Julia Kim* Sun So Graham Wiley** Lauren Ritterhouse** Lin Song Ying Ni Huarong Jiang ShaoPing Lin Honggui Jia Hongming Wu Baifang Qin Peng Zhang Fares Najar Chunmei Qu Keqin Wang Carson Qu Shuling Li Funding from the Noble Foundation, DOE, and NSF Collaborators at Univ. Minnesota, UC Davis, TIGR, Sanger, Genoscope, and the Noble Foundation Pheobe Loh * Sulan Qi Bart Ford* * Previous undergraduate research student ** Present undergraduate research student

42 ACGT The ACGT Team

43

44 Conserved Intron/Exon Boundry Features by a FELINEs** Analysis of 181,444 Medicago truncatula ESTs in GenBank vs Genomic Sequence Size RangeMean Length Exons6 - 5,789 nt268 nt Introns20 - 3,921 nt429 nt Intron Conserved Splice Site Sequence ElementsPercent Introns w/ 5’ GU99.21% Introns w/ 5’ GC 0.36%* Introns w/ 5’ AU 0.31% Introns w/ U12 branch sites instead of A12 0.13% *Compared to 0.5 - 2.5% in fungi, and 0.5% in mammals with an EST minimum identity of 90% ** S. Drabensctot, D. Kupfer, J. White, D. Dyer, B. Roe, K. Buchanan and J. Murphy. FELINES: A Utility for Extracting and Examining EST-Defined Introns and Exons. Nucleic Acid Research 31(22), E141 (2003).

45 Consensus Logogram of the 5’GU vs the 5’AU Class of Introns in Medicago truncatula determined by FELINES AU intron consensus GU intron consensus


Download ppt "The Medicago truncatula genome: a progress report Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University."

Similar presentations


Ads by Google