Biological Data Integration July 22, 2003 GTL Data and Tools Workshop Gaithersburg, MD Cathy H. Wu, Ph.D. Professor of Biochemistry & Molecular Biology Director, Protein Information Resource Georgetown University Medical Center
2 AdoCbl supports AdoCbl- dependent diol/glycerol dehydratases (EC ) (Salmonella PduO is experimentally characterized, many predicted to support the same enzyme based on gene context) AF1290 (EC ) AF1288 (EC ) In Archaeoglobus fulgidus, AF1290 gene (SF member) co-occurs with methylmalonyl CoA mutase (EC ) gene Leads to prediction that ATR of the PduO type can support AdoCbl- dependent methylmalonyl CoA mutase, therefore corresponds to the cblB complementation group of the methylmalonic aciduria disorder Prediction is experimentally verified, human ATR cloned by complementation of ATR- deficiend Salmonella mutant AdoCbl Propionyl-CoA Metabolism Propanediol Utilization AdoCbl Cofactor Biosynthesis Three types of ATR (EC ): PduO type (SF036411, SF015651) EutT type (SF012294) CobA type (SF015617)
3 Bioinformatics System Requirements for Function and Pathway Discovery Data Integration: coupling homology search results with integrative biology information (genome context, protein fusions, phylogenetic profiles, pathways, protein interactions, complexes, gene/protein expression) Associative Analysis: associating complete genomes with phylogenies, pathways, and networks Evidence Attribution: attributing sources and strengths of evidence User Interactivity: allowing interactive, iterative, and custom-tailored analyses
4 Biological Data Integration Challenge Voluminous, Complex, Dynamic, Heterogeneous, Distributed Issues Nomenclature and Ontology Distribution Formats Annotation Errors and Error Propagation Approaches (UniProt/iProClass) Standardized Nomenclature (Protein Names) Controlled Vocabulary (Features, Keywords) Accepted Nomenclature/Ontologies (EC, GO, NCBI Taxonomy) Common Distribution Formats (XML/DTD, MySQL/DB Schema, Object Models) Evidence Attribution Family Classification and Rule-Based Annotation
5 Evidence Attribution Sources and Strengths of Evidence Experimentally Verified vs. Computationally Predicted Retrospective Literature Survey Classification-Driven, Rule-Based Annotation Systematic Detection and Correction of Annotation Errors Consistent Annotation of Protein Names, Features, Keywords/GO Terms