Presentation is loading. Please wait.

Presentation is loading. Please wait.

Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.

Similar presentations


Presentation on theme: "Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela."— Presentation transcript:

1 Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela

2 Outline Motivation Infrastructure Path Mining: Discovering Sequences of Associations Path Content Retrieval Method Validation: Comparing to Traditional Meta Analysis Process Conclusion

3 Motivation (1/2) – Knowledge discovery Increasingly, scientific discovery requires the connection of concepts across disciplines Often there are no direct association between two given concepts in existing scientific literature In such situations, we must search for chains of associations – How to search for chains of associations? Traditional search methods require researchers to manually review documents in a potential chain When searching a large corpus, a manual search of all returned documents becomes infeasible This can lead to biased or arbitrary methods of reduction

4 What GENES are associated with ADHD? ADHD Attention Deficit Working Memory Dysfunction PFC DRD2 A1 ADHD DRD2 A1 Motivation (2/2)

5 Path Knowledge Discovery

6 Infrastructure for Path Mining Discovery (1/2) Sources of Knowledge – Multilevel Lexicon Evolving concept hierarchy Concepts are mapped to specific domains/matched with synonyms – Semi-Structured Corpus Distributed in HTML/XML format Maps concepts to documents at varying granularities SYNDROME ADHD ADD Attention Deficit Disorder Attention Deficit Hyperactivity Disorder Bipolar Disorder … COGNITIVE CONCEPT Declarative Memory Episodic Memory … Content… … … …

7 Facilitating Knowledge Discovery – Association index How frequently two concepts occur together in a paper Measures the strengths of relations Facilitates path mining – Document element index In which documents the concepts occur Provides evidence of relations between concepts Facilitates path content retrieval Infrastructure for Path Mining Discovery (2/2)

8 Path Mining Given a query, find the sequences of associations among concepts between different domains of knowledge Find the paths based on their occurrences in corpus (i.e. pair-wise associations) Measure the strengths of the path Path Ranking: Find the most relevant path for a query Syndromes: Shrink-Wrap-Loving Tech Syndrom Symptoms: Impaired Response Inhibition Cognitive Concepts: Impulsivity Brain Signaling: Thinner Orbitofrontal Cortex Genes: DRD4 VNTR

9 Using Wildcards in a Path Query – Allow paths to match with any concept in a concept domain Example: Researcher is interested in paths connecting concept C to concepts from the γ domain, via any concept in domain β

10 Types of Associations in Path Local AssociationGlobal Association

11 Types of Associations in Path Local Association ApproachGlobal Association Approach

12 Types of Associations in Path Local Association ApproachGlobal Association Approach

13 Phenograph: Aggregated Results of Path Mining Combine the paths that satisfy the path query.

14 Path Ranking Pick top K paths for a query Weakest link approach – For each path, use the strength of the weakest link as the strength of the whole path – Among all paths, pick the top K paths with highest strengths

15 Path Content Retrieval Content is important for understanding the interrelations specified by the paths Differences from traditional information retrieval: – Query is a set of relations instead of query terms – Retrieved content should be in fine granularity so that it can explicitly explain the relations – Specific types of content may be required (e.g. quantitative results from experiments, tables, etc.)

16 Process Flow of Path Content Retrieval

17 Path Content Retrieval Example: Document Content Explorer (1/2) Facilitates Path Content Retrieval – Coarse Granularity: Displays list of papers returned using the user-defined query Papers listed with summary data

18 – Fine Granularity: Content from paper is displayed with relevant material highlighted for easier viewing Different type of contents in corresponding tabs Concepts are highlighted in the matching content Path Content Retrieval Example: Document Content Explorer (2/2)

19 Method Validation: Applying Path Knowledge Discovery to Phenomics Research Mined corpus of 9000 papers – Retrieved from PubMed Central using query designed by domain experts Searched for data supporting the heritability of cognitive control Cognitive control – Complex process that involves different phenotype components – Each phenotype component is measured by different behavioral tasks – Heritability of these behavioral tasks are reported in scientific publications

20 Traditional Manual Approach: Meta-Analysis Search corpus to find “relevant” publications – Publications retrieved using a literature search engine – Researcher manually reviews the publications to determine which are relevant – Researcher determines which publications form a chain of associations Using content found, extract the measures of cognitive tasks (e.g. heritability) and their corresponding cognitive processes Combine the heritability measures for different cognitive processes to compute the heritability of “cognitive control” Problems of the manual approach: – Reading papers, digesting the content, and picking the numbers manually is time consuming, biased and not scalable.

21 Automated Approach: Path Knowledge Discovery (1/2) Path mining: – Searched for paths connecting cognitive control with indicators Path content retrieval: – Found relevant quantitative results in those publications Meta-Analysis: – Researchers then reviewed those results to perform the meta-analysis cognitive control sub- processes cognitive tasks

22 Comparison to manual analysis: – 12 out of 15 tasks were correctly associated with corresponding sub-processes – Increased corpus size: 150 (manual) << 9000 (automated) Able to use quantitative measures for ranking relation rather than matching manually – Reduces error and bias Automated Approach: Path Knowledge Discovery (2/2)

23 Conclusion Path Knowledge Discovery – Identifies and measures a path of knowledge – Retrieves relevant coarse- and fine-granularity content describing the relations specified in the path Validated the methodology using the heritability example in cognitive control Significantly increases the scalability and efficiency of conducting complex cross-discipline analysis

24 Back up slides

25 Path Content Retrieval Query processing – Translate the path to queries digestible by search systems Example – Schizophrenia -> working memory -> PFC – Translate to: (schizophrenia AND working memory) OR (working memory AND PFC)

26 Lexicon-Based Query Expansion ADHD AND impaired response inhibition underactive prefrontal cortex AND dopamine receptors underactive prefrontal cortex AND (DRD1 OR DRD2 OR D5-like) (attention deficit hyperactivity disorder OR attention deficit disorder OR ADHD OR ADD) AND impaired response inhibition (attention deficit hyperactivity disorder OR attention deficit disorder OR ADHD OR ADD) AND impaired response inhibition – Expand according to the synonyms: – Expand according to concepts/sub-concepts:

27 Path Content Retrieval Retrieve relevant path content – Vector space model Multi-granularity content – First rank by coarse-granularity content Documents Sections – For each item of coarse-granularity content, rank its fine-granularity content Assertions (sentences) Figures Tables


Download ppt "Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela."

Similar presentations


Ads by Google