Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics

Similar presentations


Presentation on theme: "Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics"— Presentation transcript:

1 Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics (nigam@stanford.edu)

2 Tissue Microarrays www.nature.com/clinicalpractice/onc

3 Stanford tissue microarray database http://tma.stanford.edu/tma_portal/

4 Key analysis issue  Tissue microarrays query a large number of samples/patients for one protein.  The key query dimension in TMA data is a tissue sample  Because of the lack of a commonly used ontology to describe the diagnosis [or annotations] for a given TMA sample in TMAD it is not easy to perform such as query.

5 Ontologies considered  The NCI Thesaurus, version 05.09g  The SNOMED-CT, from UMLS 2005 AA

6 Available annotations for a block  Each donor block in the TMA has semi- structured text associated with it. IDOrganDiagnosisSubclass 1Subclass 2Subclass 3Subclass 4 2334OvaryMMMT 3335ProstateCarcinomaAdenointraductal 7022BladderCarcinomaTransitional cell In situ 7288TestisteratomaimmatureEmbryonal carcinoma 8060LiverCarcinomahepatocellularNo vascular invasion HepC cirrhosis 6662Soft tissueSarcomaLeiomyoepithelioid 6663lungSarcomaLeiomyoepithelioid 4713stomachcarcinomaunknown

7 Map text to ontology terms  Make all possible permutations  Rules to weed out bad permutations  Check for an exact match with NCI and SNOMED-CT terms (and/or synonyms)  Rules to weed out bad matches ProstateCarcinomaAdenointraductal 24 permutations Prostate Carcinoma Adeno intraductal : Carcinoma Prostate intraductal Adeno : Adeno Carcinoma intraductal Prostate : Prostate intraductal Adeno Carcinoma Prostate_Ductal_Adenocarcinoma

8 Sample matches (from NCI-T) OrganDiagnosisSubclass 1Subclass 2Subclass 3Ontology Terms 2334OvaryMMMTMalignant_Mixed_Mesodermal_Mullerian_Tu mor 3335ProstateCarcinomaAdenointraductalProstate_Ductal_Adenocarcinoma 7022BladderCarcinomaTransitional cell In situStage_0_Transitional_Cell_Carcinoma Transitional_Cell_Carcinoma Bladder_Carcinoma Carcinoma_in_situ 7288TestisteratomaimmatureEmbryonal carcinoma Immature|Teratoma Testicular_Embryonal_Carcinoma Immature_Teratoma 8060LiverCarcinomahepatocellularNo vascular invasion HepC cirrhosis Hepatocellular_Carcinoma 6662Soft tissueSarcomaLeiomyoepithelioidSoft_Tissue_Sarcoma Leiomyosarcoma Epithelioid_Sarcoma 6663lungSarcomaLeiomyoepithelioidLung_Sarcoma Leiomyosarcoma Epithelioid_Sarcoma 4713stomachcarcinomaunknownGastric_carcinoma

9 Results and validation  Mapped the term-sets for 8495 records, which correspond to 783 distinct term-sets.  577 term-sets (6614 records) matched to the NCI thesaurus  365 term-sets (3465 records) matched to SNOMED-CT  In total mapped 6871 records (80%) of annotated records in TMAD (641 distinct term-sets) to one or more ontology terms. Validation NCISNOMED-CT AppropriateInappropriateAppropriateInappropriate Set-1419 9 Set-2428437 Set-34643812 Total1292112228 Average (%)43.0 (86%)7.0 (14%)40.66 (81%)9.33 (19%)

10 Browsing interface

11 Parents & Siblings nodes with data (Burly wood) Child nodes with data (Yellow) Child nodes with no data (Grey)

12 Click on the “anchor” link to get data

13 Updates since February

14 How do ontology based annotation help?  Better search: we can retrieve samples of all the retroperitoneal tumors or malignant uterine neoplasms for example.  Better Integration of data: we can correlate gene expression with protein expression across multiple tumor types.  Tissue microarray data from TMAD  Gene expression data from GEO

15 Integrating mRNA and protein expression Proteins Samples Genes Samples

16 Partial alignment of NCI-T and SNOMED-CT as a “bonus”

17 Steps in Alignment  Anchor identification  Identify similar class labels in the ontologies to be aligned  Usually done by string matching  Ontology structure  Use the “similar” classes as anchors and examine the local [graph] structure around them to inform the “similarity” metric Root Term-1 Term-2 Term-3 Term-4 Term-5 R t1 t2 t4 t5 t6 t7 t3

18 We might improve alignment … Root Term-1 Term-2 Term-3 Term-4 Term-5 R t1 t2 t4 t5 t6 t7 t3 Term-2 t1 Term-5 t5 Ontology [graph] structure based step Provide Anchors from annotated data S2 t5 Term-5 S2 t5 Term-5

19 Better Text-mapping  Better Alignment 2/177/23 783791Distinct Terms 577620Terms with NCI match 365610Terms with SNOMEDCT match 641654Terms with any match 295576Terms with both match

20 Summary Ability to map word-groups to ontology terms

21 Credits and acknowledgements  Pathology  Robert Marinelli  Matt van de Rijn  Medical Informatics  Kaustubh Supekar  Daniel Rubin  Mark Musen  Funding  NIH


Download ppt "Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics"

Similar presentations


Ads by Google