Presentation is loading. Please wait.

Presentation is loading. Please wait.

BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007.

Similar presentations


Presentation on theme: "BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007."— Presentation transcript:

1 BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007

2 BeeSpace Technology: From V3 to V4 Literature Search & Navigation Query Docs Function Analysis Entities Relations ER Graph Mining Question Answers Knowledge Base Inference Engine Question Answers Expert Knowledge Genes Function

3 New Functions in V4 Massive Entity/Relation Extraction Graph Indexing and Mining Integration of Expert Knowledge & Reasoning Personalization & Info/Knowledge Sharing “Plug and Play” (PnP)

4 Massive Entity Recognition Class1: Small Variation (Dictionary/Ontology) –Organism, Anatomy, Biological Process, Pathway, Protein Family Class2: Medium Variation –Gene, cis Regulatory Element Class3: Large Variation –Phenotype, Behavior

5 Massive Relation Extraction Expression Location –the expression of a gene in some location (tissues, body parts) Homology/Orthology –one gene is homologous to another gene Biological process –one gene has some role in a biological process Genetic/Physical/Regulatory Interaction –one gene interacts with another gene in a certain fashion (3 types of relations) –a simple case: Protein-Protein Interaction (PPI)

6 Entity Relation Graph Mining The extracted entities and relations form a weighted graph Need to develop techniques to mine the graph for knowledge –Store graphs –Index graphs –Mining algorithms (neighbor finding, path finding, entity comparison, outlier detection, frequent subgraphs,….) –Mining language

7 Integration of Expert Knowledge How can we combine expert knowledge with knowledge extracted from literature? Possible strategies: –Interactive mining (human knowledge is used to guide the next step of mining) –Trainable programs (focused miner, targeting at certain kind of knowledge) –Inference-based integration

8 Inference-Based Discovery Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example –Regulate (GeneA, GeneB, ContextC). [Literature mining] –SeqSimilar(GeneA,GeneA’) [Sequence mining] –Regulate(X,Y,C)  Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] –  Regulate(GeneA’,GeneB,ContextC) –ADD: InPathway(GeneB, P1) –InPathway(X,P)  Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] –  InvolvedInPathway(GeneA’,P1)

9 Personalization & Workflow Management Different users have different tasks  personalization –Tracking a user’s history and learning a user’s preferences –Exploiting the preferences to customize/optimize the support –Allowing a user to define/build special function modules Workflow management

10 Information/Knowledge Sharing Different users may perform similar tasks  Information/Knowledge sharing –Capturing user intentions –Recommend information/knowledge –How do we solve the problem of privacy? Massive collaborations? –Each user contributes a small amount of knowledge –All the knowledge can be combined to infer new knowledge

11 Plug and Play Users’ tasks vary significantly Need flexible combinations of basic modules Need to move toward a “discovery workbench” –How do we design basic modules? –How do we support synthesis of information and knowledge?

12 BeeSpace V4 Literature Search & Navigation Text Mining Entities Relations ER Graph Mining Knowledge Base Inference Engine Expert Knowledge Vertical Search Services PnP Function Analyzers Customized Knowledge Base User

13 Discussion Task Model? PnP Modules? Massive Collaboration?

14 BeeSpace V4: System Architecture Literature Search & Navigation Entities Relations ER Graph Mining Machine Learning NLP Expert Knowledge Special Search PnP Function Analyzers User Information Extraction User Modeling & Personalization Topic Modelng NCBI Genome Databases … Hypothesis Knowledge Base Inference Engine User Interface/ Workflow Manager

15 BeeSpace V4: System Architecture Literature Search & Navigation Entities Relations ER Graph Mining Machine Learning NLP Expert Knowledge Special Search PnP Function Analyzers User Information Extraction User Modeling & Personalization Topic Modelng NCBI Genome Databases … Hypothesis Knowledge Base Inference Engine User Interface/ Workflow Manager Yue Peixiang Xin, Xu, Yue Xin, Xu, Moushumi Peixiang Yuanhua Xu, Yue Moushumi Yuanhua Xin, Yuanhua Yuanhua, Moushumi Yue, Xin, Moushumi

16 Modules Navigation & Search (Improve V3) [Yuanhua] Information Extraction [Yue] ER Graph Mining [Peixiang] Specialized Search [Xu] Function Analyzers [Xin] User Modeling, Personalization, Workflow [Yuanhua] Inference Engine [Yue]

17 Informatics Research Themes Specialized Search –Hypothesis search Information Extraction –Entities, relations Graph Mining –Indexing, query language, mining algorithms Function analyzers –Gene set annotator Personalization –User model Inference engine –Knowledge representation language, uncertainty

18 Example of Interactive Graph Mining Gene A2 Gene A1 Gene A4 Gene A3 Gene A4’ Gene A1’ Behavior B4Behavior B3 Behavior B2 Behavior B1 isa Co-occur-fly Orth-mos Co-occur-mos Co-occur-bee Co-occur-fly Reg orth Reg 1.X=NeighborOf(B4, Behavior, {co-occur,isa}) {B1,B2,B3} 2. Y=NeighborOf(X, Gene, {c-occur, orth} {A1,A1’,A2,A3} 3. Y=Y + {A5, A6} {A1,A1’, A2, A3,A5,A6} 4. Z=NeighborOf(Y, Gene, {reg}) {A4, A4’} Gene A5 Reg X= PathBetween({A4,A4’}, B4, {co-occur, reg,isa})

19 Inference-Based Discovery Encode all kinds of knowledge in the same knowledge representation language Perform logic inferences Example –Regulate (GeneA, GeneB, ContextC). [Literature mining] –SeqSimilar(GeneA,GeneA’) [Sequence mining] –Regulate(X,Y,C)  Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human knowledge] –  Regulate(GeneA’,GeneB,ContextC) –ADD: InPathway(GeneB, P1) –InPathway(X,P)  Regulate(X,Y,C) & InPathway(Y,P) [Human knowledge] –  InvolvedInPathway(GeneA’,P1)

20 PnP Function Analyzers Basic objects –GeneSet, DocSet, SentSet, TermSet Basic operators –Gene summarizer –GeneSet annotator –…

21 EntitySet GeneSet BehaviorSet … Doc/SentSet ModelOrg …. Splitter Filter/Attractor Converter …. GeneSearch: GeneSet  Doc/SentSet DocSplitter: Doc/SentSet  {Set1, …,Setk}


Download ppt "BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 14, 2007."

Similar presentations


Ads by Google