Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike.

Similar presentations


Presentation on theme: "A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike."— Presentation transcript:

1 A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike Bain, Mark Temple *CSE, UNSW/School of Biomedical and Health Sciences,UWS 1 The Sixth Australasian Ontology Workshop, Adelaide University of South Australia

2 Outline  Machine learning and data mining in bioinformatics  Domain Ontologies in biomedical applications  Formal Concept Analysis  MCW algorithm (Mining Closed itemsets for Web apps)  BioLattice – a web based browser  Experimental Application: systems biology Part-1: Concept ranking by gene interaction Part-2: Relational learning of multiple-stress rules

3 Machine learning & Data mining in Bioinformatics  Bioinformatics “Bioinformatics is the study of information content and information flow in biological systems and processes” (Michael Liebman,1995)  Machine Learning & Data mining -Can offer automatic knowledge acquisition -Process to discover knowledge by analyzing data from different perspectives and can contribute greatly in building knowledge base  Our work: focus on knowledge-based machine learning - Previous work: learning from ontologies - Current work: ontology construction by learning - Potential application areas: ontologies – central to eCommerce, eHealth - Current application area: systems biology – predict gene function, data integration

4 Ontology  In philosophy - concerned with nature and relations of being  In knowledge representation - study of categorization of things: Informal Ontology Formal Ontology Natural language First order logic or a variant Upper Ontology Domain Ontology Specific General Ontology Ontology – "specification of a conceptualization” (Gruber, 1993) Conceptualization – "formalization of knowledge in declarative form” (Genesereth and Nilsson, 1987)

5 Gene Ontology  Missing concepts and relations  One gene annotated with different GO terms with a term specialization of other a b xyxy x gene: x concepts : a,b relations : (i) x  a (ii) x  b and (iii) b  a

6 Formal Concept Analysis (FCA)  Mathematical order theory (Rudolf Wille in the early 80s) -Derives conceptual structures out of data -Method for data analysis, knowledge representation and information management  Components -Formal context, concept, concept lattice four- legged hair- covered intelligentmarinethumbed catsxx dogsxx dolphinsxx gibbonsxxx humansxx whalesxx

7 Formal concepts in a concept lattice ({cats, gibbons, dogs, dolphins, humans, whales}, {  }) Bottom ({gibbons, dolphins, humans, whales}, {intelligent}) ({dolphins, whales}, {intelligent, marine}) ({cats, gibbons, dogs}, {hair-covered}) ({cats, dogs}, {hair- covered, four-legged}) ({gibbons, humans}, {intelligent, thumbed}) ({gibbons}, {intelligent, hair-covered, thumbed}) ({  }, {intelligent, hair-covered, thumbed, marine, four-legged}) 2 1 5 6 Top 3 4  Formal context: an n by m Boolean matrix m attributesAcolumns n objects Orows  Formal concept: Galois connection X is a subset of A, Y is a subset of O  Concept lattice loosely interpretable in ontology terms: concept definitions andcf. T-box sub-concept relations concept membershipcf. A-box by objects

8 FCA in data mining  FCA can be seen as a clustering technique in machine learning -Most of the work is in a propositional framework  In data mining closed itemset mining is an efficient alternative to FCA A frequent itemset X is closed if there exists no proper superset Y such that Y ⊃X with support(Y)=support(X) E.g., if X = {a,b,c,d} and Y ={a,b,c,d,e} and support(Y)=support(X), then X is not closed  Parameters to avoid building entire lattice -Extent size must be greater than minsup  Existing closed itemset mining algorithms -Data structures to speed up closed itemset mining -But may not build lattice, or include extents

9 MCW algorithm (Mining Closed itemsets for Web apps)  Vertical data format  IT-tree (itemset-tidset tree) search space -node has X x t(X) and all children have prefix X  Pruning - 4 set difference closure operators  Subsumption check - A look-up table to record all attributes and their occurrences in closed concepts  Lattice - adding concepts following a general to specific order D 2 4 5 6 A 1 3 4 5 C 1 2 3 4 5 6 T 1 3 5 6 W 1 2 3 4 5 attributeConcept_id DC1,C2 TC3,C4 AC4,C5 WC2,C4,C5,C6 CC1,C2,C3,C4, C5,C6,C7 Is {TA}{135} closed? i(135)={TAWC}

10 Closure operators {TA}{135}={TW}{135} ->{TAW}{135} {D}{2456} ⊂ {C}{123456}->{DC}{2456} {D}{2456} and {W}{12345}->{DW}{245} D 2 4 5 6 A 1 3 4 5 C 1 2 3 4 5 6 T 1 3 5 6 W 1 2 3 4 5 Based on CHARM (Zaki, 2005)

11  Visual analytics -combination of information visualization with machine learning and data analysis (Keim et al., 2008)  Visualization of concept lattice - provides overview of the structure of the domain -means for further data analysis, e.g., classification, clustering, implication discovery, rule learning  Previous work - lattice navigation since Godin et al. (1993) -Browsable concept lattice, e.g., Kim & Compton (2004)  Our current work - on augmenting concept lattice by integrating multiple sources of knowledge (Gene Ontology, protein interactions) for further analysis & machine learning Concept lattice as a visual analytics approach

12 Case study: Yeast systems biology

13 Browsable concept lattice more general

14 Biological validation (1) : synthetic lethality Synthetic lethal interaction if cell is viable when either gene A or B are individually deleted, but cannot grow when both are deleted. Our results show that 72 (119) concepts in the lattice more likely than random chance at p < 0.01 (p < 0.05) to contain synthetic lethal pairs.

15 Protein-protein interaction data Microarray gene- expression data Transcription factor binding data (ChIP-chip) Ontology data Biochemical pathway data Inductive Logic Programming concept(A):- ppi(B,A,C), ppi(B,A,E), ppi(B,C,E) tfbinds(D,C),fbinds(F,E) First-order rule Biological validation (2) : ILP learning of concept definitions

16 Transcription factors RSM19 required for H2O2 response; RSM19, RSM22 and MRPS17 in “mitochondrial ribosomal small subunit” stable complex; and RSM22, MRPS17 bound by transcription factors under amino acid starvation. Example rule:

17 Conclusions  Many real-world domains are data-intensive  Machine learning and data mining applications required to generate predictive and useful outputs  We focus on knowledge-based learning for comprehensibility – use ontologies  Formal concept analysis as a framework for ontology structure  Use data mining techniques for efficient concept lattice generation  Visual analytics approach: browsable lattice, added background knowledge  Initial validation on a case study from yeast systems biology

18  Investigate pseudo-intents to simplify concept lattice  Investigate variants of concept lattice structures -e.g., concept lattice of inverse context  Add concept definitions to background knowledge in ILP Future work


Download ppt "A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike."

Similar presentations


Ads by Google