Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,"— Presentation transcript:

1 www.ebi.ac.uk/arrayexpress EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology evaluation @ ArrayExpress Helen Parkinson, PhD

2 www.ebi.ac.uk/arrayexpress Content ArrayExpress use cases Fuzzy matching of ontology terms Data driven ontology building Wish list

3 www.ebi.ac.uk/arrayexpress ArrayExpress: Overview Submit Hybs Experiment queries Public/Private ATLAS Summarize Public Only Re-annotate Gene queries Genes Cross expt/ species queries

4 www.ebi.ac.uk/arrayexpress Fuzzy matching of ontology terms – why? Clean up ArrayExpress OE and synonym tables OE based integration Constrain OEs on data entry/validation Improved searches in repository/DW web interface Data integration across species, experiments and experimental designs Automated mapping of free text to ontology terms for data imporrt

5 www.ebi.ac.uk/arrayexpress Phonetic Matching Precompute phonetic encodings of all terms in the ontology Match each target term by comparing these encodings Soundex: Robert Russell and Margaret Odell (1918), famously described by Donald Knuth Double Metaphone: Lawrence Philips (2000)‏ Metaphone: Lawrence Philips ‏ Most matches are single Highest success rate

6 www.ebi.ac.uk/arrayexpress Algorithm comparisons

7 www.ebi.ac.uk/arrayexpress Percent matches using automated mapping

8 www.ebi.ac.uk/arrayexpress Failures to match Species (or Kingdom)-specific terms (e.g. plant anatomy)‏ Conflated terms (e.g. diseased cell types)‏ Compound terms (e.g. "cerebral cortex and hypothalamus")‏ Genuinely missing terms Esoteric terms less of a priority Most trivial misspellings, however, were matched Dirty input data

9 www.ebi.ac.uk/arrayexpress Implications Need more terms in some commonly-used ontologies Synonyms are important generating less noise better coverage Choice of ontology can limit expressivity - this will be frustrating to biologists

10 www.ebi.ac.uk/arrayexpress Why? Clean up ArrayExpress OE and synonym tables Add accessions/DB links to these tables Constrain OEs on data entry/validation Improved searches in repository/DW web interface Generate suggestions for new OE terms Evaluate domain coverage by a given ontology

11 www.ebi.ac.uk/arrayexpress ArrayExpress Ontology Development and Future Directions17.10.2015 11 Developing the Ontology Define Scope: ArrayExpress already has some useful structure given the current database plus rich source of use cases and competency questions. Build: Ontology Capture: Identify key concepts and relationships within our domain and give explicit definitions to these features: Middle-out approach – specify core of basic terms then specialise and generalise as required Mappings – text mining approach to do initial semi-automated mappings to external resources for rapid coverage Manual mapping for data warehouse data, and selected data sets

12 www.ebi.ac.uk/arrayexpress ArrayExpress Ontology Development and Future Directions17.10.2015 Capture to Code: Definitions and Hierarchy

13 www.ebi.ac.uk/arrayexpress ArrayExpress Ontology Development and Future Directions17.10.2015 Semantic Roadmap Position of the ArrayExpress Experimental Factor Ontology in the ‘bigger picture’ AE Ontology Disease Ontology Common Anatomy Reference Ontology Cell Type Ontology Chemical Entities of Biological Interest (ChEBI) NCI Various Species Anatomy Ontologies Key is orthogonal coverage, reuse of existing resources and shared frameworks

14 www.ebi.ac.uk/arrayexpress Wish list NOT to build our own anatomy ontology CARO extension CARO evaluation Mapping CARO to relevant multi-species ontologies Application of CARO to ArrayExpress data Use of CARO in ArrayExpress tools

15 www.ebi.ac.uk/arrayexpress Acknowledgments Anna Farne Ele Holloway James Malone Margus LukkArrayExpress Production Team Helen Parkinson Tim Rayner Faisal Rezwan Eleanor Williams Mengyao Zhao Holly Zheng


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Anatomy ontology ArrayExpress Helen Parkinson,"

Similar presentations


Ads by Google