Download presentation
Presentation is loading. Please wait.
1
The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis seeking strategies. Statistical language analysis techniques for extracting propositions Two Main Thrusts:
2
LINDI: Target Components 1. Special UI for retrieving appropriate docs 2. Language analysis on docs to detect causal relationships between concepts 3. Probabilistic representation of concepts and relationships 4. UI + User: Hypothesis creation
3
Design Goals of LINDI UI Support for the development of extended search strategies 1. Text filtering and manipulation tool to help the development of strategies 2. Text visualization and analysis tool to help the formulation of hypotheses
4
The User Interface l A general search interface should support –History –Context –Comparison –Operators: Intersection, Union, Slicing –Operator Reuse –Visualization (where appropriate) l We have an initial implementation l It needs lots of work
5
Scenario: Explore Functions of a Gene l Objective –Determine the functions of a newly sequenced Gene X. l Known facts –Gene X co-expresses (activated in the same cell) with Gene A, B, C –The relationship of Gene A, B, C with certain types of diseases (from medical literature) l Question –What types of diseases are Gene X related to?
6
Medical Literature Explore Functions of New Gene X Possible Function For Gene-X Gene-A Keywords Gene-B Keywords Slide adapted from K. Patel Slicing Gene-C Keywords Projection Keywords Intersection Mapping Query
7
Medical Literature Explore Functions of New Gene X Possible Function For Gene-X Gene-A Keywords Gene-B Keywords Slide adapted from K. Patel Slicing Gene-C Keywords Projection Keywords Intersection Mapping Query
8
Architecture of LINDI UI l Data Layer l Annotation Layer l User Interface Layer
9
Data Layer l Purpose –Hide different formats of text collections l Components –Data: Abstractions representing records of a text collection –Operations: performed on the data l Data –A set of records –Each record is a set of tuples with types l Operations –union, intersection, projection, mapping
10
Annotation Layer l Purpose –Associate data set with operations that produced them (history) –History is a first class object l Advantage –Streamline a sequence of operations –Reuse operations –Parameterize operations
11
User Interface l This version completed Aug 10, 2000 –Designed by Marti Hearst and Hao Chen –Code written by Hao Chen l Direct manipulation of information objects and access operations –Query –Intersection –Union –Mapping –Slicing l Record and reuse of past operations l Parameterization of operations l Streamlining of operations
12
Initial Palette
13
Query Structure Determined by Collection Type
14
Query Operation Results
15
Projection Operation and Subsequent Results
16
Parameterized Query: Repeat operations with different values GC GB GA
17
Intersection over Projected Attribute
19
Example Interaction with UI Prototype 1 Query on Gene names 2 Project out only mesh headings 3 Intersect the results 4 Map to create a ranking 5 Slice out the top-ranked.
20
Second Version of UI l LINDI Miner l Circa May 2002 –Designed by Marti Hearst –Implemented by Melody Ivory l Emphasize reusing results of prior text analysis l See lindi-miner.ppt
21
The Language Analysis Component l Goal: Extract Propositions from Text and Make Inferences l Why Extract Propositions from Text? –Text is how knowledge at the propositional level is communicated –Text is continually being created and updated by the outside world
22
Example: Etiology l Given –medical titles and abstracts –a problem (incurable rare disease) –some medical expertise l find causal links among titles –symptoms –drugs –results
23
Traditional Semantic Grammars l Example (Burton & Brown 79) –Interpreting “What is the current thru the CC when the VC is 1.0?” := when := what is := := is := VC –Resulting semantic form is: (RESETCONTROL (STQ VC 1.0) (MEASURE CURRENT CC))
24
Example: Statistical Semantic Grammar l To detect causal relationships between medical concepts –Title: Magnesium deficiency implicated in increased stress levels. –Interpretation: related-to –Inference: »Increase(stress, decrease(mg))
25
Statistical Semantic Grammars l Empirical NLP has made great strides –But mainly applied to syntactic structure l Semantic grammars are powerful, but –Brittle –Time-consuming to construct l Idea: –Use what we now know about statistical NLP to build up a probabilistic grammar
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.