Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.

Similar presentations


Presentation on theme: "Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania."— Presentation transcript:

1 Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania

2 Issues associated with Data Collection Heterogeneity of free text Difficulty in data integration, requires human intervention Complex queries are limited 2

3 Examples: GenBank

4 Data Collection for EuPathDB Apply ontology to data submission form design – Form to collect sequence data and information on isolates of pathogens Geographic location from where isolate specimen collected Host organism information: species, age, clinical information – Genetic manipulation with resulting phenotype data collection form Mutation method Effects of genetic modification on the parasite and on the location, function, and involvement in biological process of the resultant modified protein These data are important for parasite epidemiology and research on vaccines and anti-parasitic drugs Enable Queries – Compare sequence data from Plasmodium isolates that are restricted to East Africa to those from West Africa and are controlled for age and health of hosts – List genes that when knocked out result in a defect in parasite growth during the erythrocytic cycle – List genes fused to green fluorescent protein (GFP) that when expressed are located in the cell membrane

5 EupathDB EupathDB (Eukaryotic Pathogen Database Resources ) is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, et al.Nucleic Acids Res. 2010

6 Isolate Data Need to import and integrate datasets from GenBank But GenBank did not specify needed metadata for isolates Manual curation required Harmonize: enable host queries: Human-> Homo sapiens Deconvolute descriptions in free text: isolated from storm waters isolated from Homo sapiens patient infected with HIV

7 Isolate Data: GenBank ->EuPathDB

8 Isolate Submission Form Target isolate information Geographic location Source organism samples information or Environmental samples information Sequence information

9 Ontology-based Representation of Isolate Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box

10 Isolate Submission Form

11 Ontology Selection

12 Excel Format Generally already collected in this format according to our community advisors – Lowers the barrier for usage Easily converted to GenBank submission- ready format automatically Allows multiple sequence submission

13 Parser for GenBank Submission

14 Genetic Manipulation and Phenotype Data T. bruceiRNAi knockdowns Integrate phenotype data from other resources (GeneDB) Allow individuals to submit phenotype data via the EuPathDB web site via User Comments on Gene pages Either way these are free text descriptions limiting utility for data exploration

15 Genetic Manipulation and Phenotype Submission Form Genetic Manipulation – Mutation method including selective marker, report if available – Mutation type (effect on gene function) Phenotype data – impact of genetic manipulation on four possible observed features: – Quality of the organism – Cellular location of gene product – Molecular function of gene product – Biological process of gene product

16 Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box. Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage

17 Ontology-based Representation of Genetic Manipulation – Gene Knock Out

18 Genetic Manipulation Section OBI

19 Phenotype Section Cellular location Biological process GO OBI OPL GO PATO OBI

20 Web-based Form Collect the data directly from specific components of the EuPathDB web site Change dynamically based on user’s inputs (lifecycle stage based on species, display selective marker, report, etc. section when needed)

21 Future Work Submission forms are at the prototype stage Distribute isolate submission forms to EuPathDB users Incorporate genetic manipulation and phenotype form into EuPathDB website Evaluation of submission forms based on the data collected Improve the submission forms based on feedback

22 Acknowledgements Stoeckert Lab Haiming Wang and EuPathDB Team EuPathDB Community Dr. G Robinson, Dr. R Chalmers, Dr. CJ Janse, Dr. G. Widmer, Dr. L. Xiao, Dr. SM Khan Funding – NIH grant 5R01GM93132-1 – National Institute of Allergy and Infectious Diseases at the National Institutes of Health Award NO1-AI900038C Contract No. HHSN272200900038C

23 Thank You!


Download ppt "Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania."

Similar presentations


Ads by Google