Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.

Similar presentations


Presentation on theme: "Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley."— Presentation transcript:

1 Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley

2 Outline Role of Keyphrases Phrase Extraction Algorithms Phrase Extraction with Multi-Objective Genetic Algorithm Experiment and Results Results Evaluation Conclusion Future Research

3 Role of Keyphrases Concept representations Document indexing Enhance document retrieval / Browsing Query formulation assistance Document surrogates

4 Vision of Unified Language System Design Research Repository Corporate Design Repository Design Education Materials Unified Language System for Engineering Design Unified Subject Headings Context Mapping Mechanism Semantic Network

5 Keyphrase Extraction Algorithms Heuristic, Syntactic, Machine Learning Requires prior training Heuristic cut-off thresholds in number of phrases Focuses on single document Redundancy when aggregated for the whole document collection

6 Keyphrase Extraction with MOGA Phrase extraction as an optimization problem Candidate phrases generation Optimize phrase selection with MOGA  Model & Genetic Operators 3d scanning abstractionactive control system 101 Candidate Phrases Chromosome CrossoverPhenotype & Genotype 10011 01101 10001 Parents Offspring

7 Keyphrase Extraction with MOGA Optimize phrase selection with MOGA (cont.)  Model & Genetic Operators (cont.)  Evaluation fitness functions Minimize clustering measure / dispersion (Bookstein ’98) Minimize number of phrases  Non-Dominated Sorting Genetic Algorithm (NSGA-II) Mutation 1001011010

8 Experiment and Results Data set 34 papers from Design Theory and Methodology Conference ’01 Candidate phrases ~5000 noun phrases extracted Genetic Algorithm Parameters  Population size 100  Converges at 5000 generations  5 hours on Xeon 1.8GHz CPU

9 Experiment and Results Pareto plot of Dispersion versus Number of Phrases

10 Experiment and Results Histogram of number of optimal solutions a keyphrase appears

11 Evaluation

12 6 domain experts participated in the evaluation. Core phrases vs. Non-core phrases. Less than 10% are deemed irrelevant. Significant deviation between evaluators. Relevant Core Phrases (out of 385 candidates) Relevant Non-Core Phrases (out of 994 candidates) Relevant Noise Phrases (out of 300 phrases) Average363.5905.526.0 Percentage Relevant94.42%91.10%8.67% Standard Deviation13.0874.774.61

13 Conclusion Keyphrase extraction can be successfully implemented as a multi-objective global optimization problem. Reasonably good keyphrases can be extracted without prior training or domain knowledge. Trade-off information between objectives such as number of phrases vs. average quality of phrases can be gained from Pareto solutions. Preferences can be made based on the user needs and trade-off information.

14 Future Research Test on larger text collection. Implement extracted keyphrases in IR system as browsing and query expansion tool and compare to full-text search IR system. Evaluate with more raters and 1-5 scale. Build domain thesauri with extracted keyphrases and semantic discovery algorithms (e.g. Latent Semantic Analysis).

15 Metathesaurus in Digital Library

16 Thank you! Comments? Questions? jialong@me.berkeley.edu aagogino@me.berkeley.edu

17 Mode Analysis of Scaled Evaluation


Download ppt "Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley."

Similar presentations


Ads by Google