Strain-, species-, and genus-specific core unique proteins from selected organisms CUPID: Core and Unique Protein IDentification Raja Mazumder and Darren.

Slides:



Advertisements
Similar presentations
Replication transcription processingtranslation Molecular Analysis possible to detect and analyze DNA, RNA, and protein DNA sequence represents 'genotype'
Advertisements

OLEH SUDRAJAT FMIPA UNMUL Klasifikasi Bakteri Somewhat different: a clinical rapid ID is often important when trying to find causative agent of.
Chapter 9: Inversion of a Transcription- Changing Region that Switches Salmonella Flagellar Antigens By Cheryl Weddle October 10, 2002.
An Introduction to “Bioinformatics to Predict Bacterial Phenotypes” Jerry H. Kavouras, Ph.D. Lewis University Romeoville, IL.
Benefit to Society Good Science. Human genes claimed in granted U.S. patents Jensen and Murray, Science 310: (14 Oct. 2005) “Specifically, this.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Medical Diagnostics NIAID Funding Opportunities Maria Y. Giovanni, Ph.D. Assistant Director for Microbial Genomics and Advanced Technologies National Institute.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CHAPTER 25 TRACING PHYLOGENY. I. PHYLOGENY AND SYSTEMATICS A.TAXONOMY EMPLOYS A HIERARCHICAL SYSTEM OF CLASSIFICATION  SYSTEMATICS, THE STUDY OF BIOLOGICAL.
AN INTRODUCTION TO TAXONOMY: THE BACTERIA
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Is it E. coli O157:H7? Using Bioinformatics to Develop and Test Hypotheses I’d like to thank Shellie Kieke, Concordia University – St. Paul, and Ruth Gyure,
REGIONAL CENTRE FOR BIOTECHNOLOGY Seminar series Chittur Srikanth, PhD Department of Microbiology and Immunology University of Massachusetts Medical School.
Microbial taxonomy and phylogeny
Development of Bioinformatics and its application on Biotechnology
ERA-NET PathoGenoMics Meeting Bonn 7-8 April, 2005 Research topics of interest in the Area of Genomics of Bacterial and Fungal Pathogens of Humans Prof.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Rapid detection of pathogenic bacteria in surface water by bacteria universal primer The increase of urban population often results in higher percentage.
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Ch 10 Classification of Microorganisms.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Conclusions and Future Work (301) Kamal Kumar, Valmik Desai, Li Cheng, Maxim Khitrov, Deepak Grover, Ravi Vijaya Satya,
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Chapter 13 Table of Contents Section 1 DNA Technology
Bioinformatics at NIAID-Biodefense Proteomics Administrative Resource Center Peter McGarvey Ph.D. Senior Bioinformatics Scientist, Project Manager Protein.
Chapter 8 Page Case Study – The seeds that poisoned Identification is important – knowing the difference between poisonous and harmless plants.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Ch 10 Classification of Microorganisms.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Lecture # 04 Cloning Vectors.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Construction of Substitution matrices
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
References 1. Ames G.F., Kustu S.G., “Method for obtaining periplasmic proteins from bacterial cells using chloroform”. Patent Number:
Genome sequencing of disease causing microorganism Romana Siddique.
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
The Artemis Comparison Tool
Gaurav Arora 1, Vinayak Mathur 2, and Anne Rosenwald 2 1 Gallaudet University, Department of Science, Technology, and Mathematics, Washington, DC
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Metagenomic Species Diversity.
Computational Evaluation of 22 identified malaria Drug targets
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
H = -Σpi log2 pi.
PIR Bio-defense Related Pathogen Data Mining
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genomics of medical importance
Protein Sequence Analysis - Overview -
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Comparative Genomics.
Protein Sequence Analysis - Overview -
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Changes in bacterial adhesion with hmLF addition and the purified glycan from hmLF. Changes in bacterial adhesion with hmLF addition and the purified glycan.
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Essential knowledge 1.B.1:
Presentation transcript:

Strain-, species-, and genus-specific core unique proteins from selected organisms CUPID: Core and Unique Protein IDentification Raja Mazumder and Darren A Natale & Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationships with close relatives. Such proteins can also serve as taxon- specific diagnostic targets. We developed a general protocol to identify proteins unique to different taxa, and applied it to a set of food- and water-borne pathogens. Specifically, we: a) identify proteins that are unique to a particular strain, species, or genus; b) extract the set of proteins common to two or more strains or species; and c) determine the organism most closely related to a particular genus, species or strain. Computational identification of strain-, species- and genus-specific proteins A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. A flow chart of the method used to detect strain-, species-, or genus-specific proteins is shown here. Proteins were designated as not unique if (a) the protein is part of a merged entry (containing identical proteins from different organisms) and one of the source organisms is considered non-self; (b) the query protein hits a non-self subject with E < 0.001; (c) the non-self best hit, when itself used as a query (in a reverse BLAST), retrieves the initial query as an organism-specific best hit (that is, they are reciprocal best hits, and thus potential orthologs); and (d) the non-self best hit fails to retrieve the initial query as its potential ortholog — but does indeed retrieve the initial query — and manual inspection reveals homology. Sequences that could not retrieve themselves with an expect value of 1e-14 or better in the forward BLAST using the BLOSUM62 matrix were retested using the PAM30 matrix. Example output from CUPID Acknowledgements: Development of CUPID was supported by grant AWD from the District of Columbia Water and Sewer Authority. The bioinformatics infrastructure on which CUPID is based was supported in part by grant U01-HG02712 from the National Institutes of Health. This work is described in: Mazumder R, Natale DA, Murthy S, Thiagarajan R, Wu CH. Computational identification of strain-, species- and genus-specific proteins. BMC Bioinformatics Nov 23;6:279. Discussion: The salient features of CUPID are: a) provides sets of proteins unique to a strain, species, and genus; b) includes a check for additional homologs based on reciprocal hits; c) uses different parameters for short sequences; d) provides the identity of the nearest non-self neighbor; and e) allows retrieval of unique, core, and core unique proteins at different taxonomic levels. One use of the CUPID system is to help identify diagnostic targets specific to a particular clade of these pathogens. The unique proteins form a short list of diagnostic targets which can be validated in the laboratory. A summary of the annotations show that many of the proteins are not annotated. A significant number of the ones that are annotated are related to virulence and/or are external proteins. Proteins predicted to be external to the bacterial cell – possibly involved in host interactions and virulence – may be used to develop protein-based detection systems. In addition, it should be possible to use the DNA encoding these proteins as the basis for diagnostics. However, it is important to note that protein-identified DNA probes must be verified as to their uniqueness at the DNA level Escherichia coli O157:H Escherichia coli K Helicobacter pylori J Helicobacter pylori Genus- specific proteins Species- specific proteins Strain- specific proteins Organism Helicobacter hepaticus ATCC Escherichia coli O157:H Salmonella enterica subsp. enterica serovar Typhi Ty Salmonella typhimurium LT External Phage Unknown 00262Escherichia coli K Helicobacter pylori J Helicobacter pylori Misc.VirulenceUniqueOrganism Summary of annotations for genus-specific unique proteins from selected water- and wastewater-borne pathogens and pathogen-related organisms.