Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Slides:



Advertisements
Similar presentations
Maize Genetics, Genomics, Bioinformatics workshop
Advertisements

Stein Lab In-House Symposium The Plan  Overview of my lab’s activities  Detailed look at the Gramene Database  Run out of time  Talk really.
Development of COS markers in grasses Isabelle Bertin, Pauline Stephenson and Michelle Leverington-Waite John Innes Centre.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
Bioinformatics resources for IITA Crops GO Workshop 3-6 August 2010.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Gene Expression Networks Esra Erdin CS 790g Fall 2010.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Plant genomes: phenotypes evolving by new rules Todd J. Vision Department of Biology University of North Carolina at Chapel Hill.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Fine Structure and Analysis of Eukaryotic Genes
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Tomato genome annotation pipeline in Cyrille2
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
What is SGN? S GN is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such.
Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
A Comparative mapping resource GRAMENE Doreen Ware USDA ARS Cold Spring Harbor Laboratory
ANEXdb: An Integrated Animal ANnotation and Microarray EXpression Database Oliver Couture 1,2, Keith Callenberg 2,3#, Neeraj Koul 4, Sushain Pandit 4,
Why do we need good quality annotations? Pankaj Jaiswal Oregon State University Gene Annotation Workshop July 31, 2010 ASPB Plant Biology 2010 Montreal,
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Chapter 21 Eukaryotic Genome Sequences
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
A Comparative Genomic Mapping Resource for Grains.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Maps & Markers - Noel Yap Proteins - Pankaj Jaiswal Phenotypes Mutants –Junjian Ni QTLs- Literature - all Curation.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
INTRODUCTION ● Expressed sequence tags offer a low cost approach to gene discovery ● For a range of non-model organisms, ESTs represent the only sequence.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Prospecting for Genes that Fueled the Green Revolution
Mapping and cloning Human Genes. Finding a gene based on phenotype ’s of DNA markers mapped onto each chromosome – high density linkage map. 2.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the combined BLAST and Genome Browser Tutorial.
US Contribution to the International Tomato Genome Sequencing Effort Current structure of contributions Ongoing activity summary Funding issues.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
The Oryza Map Alignment Project (OMAP) Overview of the OMAP project OMAP data in Gramene Future directions Bonnie Hurwitz, Gramene SAB meeting, January.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Human Genome Project.
BIOL 433 Plant Genetics Term 2,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Identify D. melanogaster ortholog
BIOL 433 Plant Genetics Term 2,
Figure 1.   Fluorescent in situ hybrization reveals that the SBEIIa is located on the long arm of chromosome 2 in wheat Plant Physiology Minorsky March.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
*Supported by the NSF Plant Genome Research and REU Programs
Presentation transcript:

Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology

BLAST for Recognition of Undesirable Clones Summary of 84 Barley Libraries (ver. 0.90) #. % High quality sequences282,720 E. coli genome Lambda genome rRNA6, Chloroplast2, Mitochondrion Fungal cDNA Repetitive Elements Low complexity1, Odd vector Both polyA & polyT Total Good271,

Unigenes in ESTs in Current Assembly Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice. Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly: Contigs24,208 Singletons24,899 Total49,107 Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends: Contigs14,589 Singletons 7,219 Total21,880

Microarray ChipGene Expression Data The Immediate Objective

Barley 2H Caleosins Hvcal1Hvcal2 Barley 2H Steptoe x Morex Rice R4 Gene Map Oscal1 Oscal2 BAC OSJB cM 0cM 77cM EST alignment

TIGR Rice Caleosin Gene Models OSCal01(R4) OSCal03(R3) OSCal02(R4)

Comparison of Gene Structures of Barley and Rice Caleosins

Homology of Wheat G3 Deletion line mapped ESTs to Rice Chromosomes

General Comclusions EST sequence May lack polyA Reading frame may be ambiguous Exon/intron boundaries may not be obvious We don’t have all barley genes despite >330,000 ESTS. (probably between 33% to 50%. Value of comparative studies with rice BUT poor annotation (actually appalling) Rice genomic sequencing is work in progress Comparative route is OK but can’t be only game in town. Several examples of genes not being there !!!

Major Issues Data validation »Errors in public database sequence »Errors in annotation »‘Chinese whispers’ – anchoring annotation in biochemistry Comparative Data »Rice > wheat > maize – but also Arabidopsis »When is homology actually orthology ? »Partial data sets »% match only part of the story »Need for domain/feature information – mammalian/bacterial bias »Everything in work in progress ? Where are the data sources »dbEST »Nr nucleotide database at NCBI »Gramene at CSHL »TIGR »GrainGenes/wEST at USDA, Albany »CUGI > AGI »Iowa State/USDA »Harvest/Foxpro »ContEST at SCRI »The horses mouth

Phenotype Sequence Sd1 – green revolution gene in rice. Mutation in gibberellin- 20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation. Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes Barley - commercially significant dwarfs from both of these and several other pathway or response genes.

Acknowledgements Robbie Waugh Peter Hedley, David Caldwell, Luke Ramsay, Hui Liu Linda Cardle Paul Shaw Arnise Druker Doreen Ware Dave Mathews Tim Close Olin Anderson