ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Homology Based Analysis of the Human/Mouse lncRNome
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
EVOLUTIONARY CHANGE IN DNA SEQUENCES - usually too slow to monitor directly… … so use comparative analysis of 2 sequences which share a common ancestor.
GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM01 GM GM02 GM GM02 GM GM02 GM
Lettuce genetic map viewer is written in PHP and uses GD library. The viewer interacts with tables in the relational mySQL database and creates graphical.
Basics of Comparative Genomics Dr G. P. S. Raghava.
GenomePixelizer - a visualization tool for comparative genomics within and between species. A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable.
Sequence Similarity Searching Class 4 March 2010.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Asteraceae (Compositae) Genome Resources at NCBI GenBank.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
SAGExplore web server tutorial for Module II: Genome Mapping.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Construction of Substitution Matrices
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Sequence Specific DNA Uptake Genetic exchange & bacterial evolution DNA uptake is primitive genetic exchange Some important human pathogens have DNA uptake.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Construction of Substitution matrices
SAGExplore web server tutorial. The SAGExplore server has three different modules …
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
What is BLAST? Basic BLAST search What is BLAST?
Copyright OpenHelix. No use or reproduction without express written consent1.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
What is BLAST? Basic BLAST search What is BLAST?
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
Lettuce/Sunflower EST CGPDB project.
Visualization of genomic data
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Identify D. melanogaster ortholog
Comparative Genomics.
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
BLAT Blast Like Alignment Tool
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Presentation transcript:

ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore University of California, Davis, Dept. of Vegetable Crops, Davis, CA 95616, USA Approximately 3,700 of the genes in the Arabidopsis Col-0 genome are single copy. These genes were used to identify conserved orthologs in several other plant species. Using computational approaches we identified 1104 lettuce, 686 sunflower, 1704 tomato, 2016 soybean, 1701 maize and 1290 rice ESTs that are conserved orthologs to these Arabidopsis genes. Each EST sequence from these sets has an unambiguous single strong BLAST hit to the Arabidopsis genome. Reciprocal BLAST searches (Arabidopsis single copy genes versus EST assemblies) showed that more than 80% of BLAST hits had only a single strong hit. It indicated that the majority of these conserved orthologs are represented by single genes in multiple plant species. The total number of Arabidopsis genes that have similarity (BLAST score 1e-20 or better) to at least one of these selected ESTs is 2205, which is 60% of total number of single copy genes in Arabidopsis. Only 248 sequences were in common between EST collections from different species and Arabidopsis single copy genes. This can be partially explained by the incomplete representation within each EST collection. Analysis and visualization of single copy genes over Arabidopsis chromosomes ( revealed that these genes were distributed throughout the genome regardless of large scale chromosomal duplications. This indicates that deduction of order of genes in common ancestors is required for informative analyses of synteny. SINGLE COPY ORTHOLOGS SUMMARY source number of single copy orthologs lettuce1104 sunflower686 tomato1704 soybean2016 maize1701 rice1290 common between all 248 common between lettuce and sunflower 431 Arabidopsis (total) 2205 (out of 3,714 single copy genes) Graphical representation of BLAST search of lettuce, sunflower, tomato, soybean, maize and rice ESTs against Arabidopsis genome. The picture displays potential conserved orthologs (single copy genes in Arabidopsis). Each box (element) is a single copy Arabidopsis gene having homology to selected sets of plant ESTs. Genes are plotted along five Arabidopsis chromosomes according to their physical positions. Patterns of segmental duplications in Arabidopsis genome (generated by GenomePixelizer Regions selected by white boxes are shown in large scale above. CHRM 5 CHRM 4 Segmental duplication between Arabidopsis chromosomes 4 and 5 Color Scheme: Black - single copy genes Purple - kinases Green - cytochrome Red - resistance genes Yellow - ribosomal proteins Gray lines connect genes with sequence identity 40% or greater Note: Single copy genes are distributed evenly through both segments of the duplicated region. Image was generated by GenomePixelizer using the “locus zoomer” function. Additional information is available at: Credits: This work was funded by USDA IFAFS Plant Genome Program to the Compositae Genome Project Questions and comments to Alexander Kozik, Raw data and detailed description of the sequence extraction pipeline is available at: PIPELINE TO IDENTIFY SINGLE COPY ORTHOLOGS PIPELINE TO EXTRACT ALIGNMENTS AT NUCLEOTIDE LEVEL MULTIPLE ALIGNMENT VISUALIZED WITH TkLife ( ) Arabidopsis  lettuce  sunflower  alignment summary  codon mismatch and amino acid mismatch (non-synonymous substitutions) codon match (and amino acid match) codon mismatch and amino acid match (synonymous substitutions) Putative scenario of gene loss after segmental duplication Because of extensive gene loss after duplication, deduction of gene order in ancestral genomes is required for informative synteny analysis between different genomes. GenBank files of Arabidopsis genome (DNA sequences of entire chromosomes and corresponding annotation) GenBank Parser spliced DNA sequences corresponding to ORFs translation translated (protein) sequences [subject] ESTs (unigene) set [query] BLASTX search [ESTs vs proteins] [step 1] [step 2] [step 3] [step 4] SeqsExtractorFromBlastX (Python script) BLAST output (alignment) extraction of DNA sequences corresponding to BLAST alignments from “spliced DNA” (subject) and EST (query) files. Script automatically counts codon usage. Output: spreadsheet with info about codon usage BLAST parser (Tcl/Tk script) tab-delimited file with info about BLAST alignments (start points and end points for each sequence in BLAST report) [step 5] final step of the pipeline: Arabidopsis predicted proteins (27,169 seqs) BLAST search Arabidopsis proteins against themselves and selection of Arabidopsis single copy genes [step 1] Arabidopsis single copy genes (3,714 seqs) lettuce ESTs (68,197 seqs) sunflower ESTs (67,180 seqs) tomato ESTs (113,932 seqs) maize ESTs (362,510 seqs) soybean ESTs (341,564 seqs) rice ESTs (107,329 seqs) BLAST search of selected ESTs versus all Arabidopsis predicted proteins and selection of ESTs with a single strong hit to Arabidopsis genome (Exp cutoff 1e-20) [step 3] BLAST search of Arabidopsis single copy genes versus full sets of ESTs selection of ESTs with BLAST hits to Arabidopsis single copy subset [step 2]