Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Similar presentations


Presentation on theme: "Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC."— Presentation transcript:

1

2 Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth CS @ UIUC

3 Understanding Research Communities Consider following questions  What are the key applications studied by the community?  What applications have matured enough to be used as a technique of other applications?  What methods were developed to solve a particular problem? In this paper  Extract concepts from scientific papers A concept is a cluster of possible mentions {svm, support vector machines, maximal margin classifiers,…}  Analyze computational linguistic research by answering above questions 2

4 Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 3

5 Concept Mention Extraction Identify and categorize mentions of concepts (Gupta and Manning, 2011)  TECHNIQUE and APPLICATION “We apply support vector machines on text classification.”  Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) The proposed algorithm 1. Extract noun phrases (Punyakanok and Roth, 2001) 2. For each category, initialize a decision list by seeds. 3. For several rounds, 1. Annotate NPs using the decision lists. 2. Extract top features from new annotated phrases, and add them into decision lists. 4

6 Paper1…………………………………… support vector machine………………... …………………………………………… ………………………………………. c4.5…….. Paper2…………………………………… svm-based classification………………….…………………………………............. decision_trees………….…….…………… …………………… Paper4…………………………………… maximal_margin_classifiers……………… …………………….……………………… ………………………………………….. Paper3.…………………………………… …………………………………….. svm….…………………………………….………………………………………… ………… (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) c4.5 decision trees support vector machine svm-based classification svm maximal margin classifiers Citation-Context Based Concept Clustering (CitClus) Cluster mentions into semantic coherent concepts 1.Group concept mentions by citation context 2.Merge clusters based on lexical similarity between mentions in the clusters

7 Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 6

8 Evaluation of Mention Extraction ACL Anthology Network Corpus (Radev et al., 2009) Training data: 11,005 abstracts Test data: 474 abstracts (Gupta and Manning 2011) 7 Approach TechniqueApplication Pre.Rec.F1Pre.Rec.F1 GM 201130.546.736.927.657.537.3 Our approach48.248.848.544.047.345.6

9 Evaluation of Concept Clustering Manually cluster the extracted mentions from 1000 full text papers.  CitClus: the proposed approach  LexClus: group the concept mentions by lexical similarity CitClus groups  “maximal entropy classifier” and “logistic classifier”  “topic modeling” and “latent dirichlet allocation” 8 ApproachTechniqueApplication LexClus1.721.62 CitClus1.281.49

10 Outline Computational Approach  Concept Mention Extraction  Citation-Context based Concept Clustering Evaluation of Algorithms Understanding Computational Linguistic Research 9

11 Trends Analysis 10 CitClus LexClus LDA The emergence of SVM The emergence of Topic modeling Topic modeling is high in 90’s, because LDA cannot generate a tight enough cluster for a specific concept

12 Predictive Quality For a concept, predict the number of papers in a year, given the number of papers in the previous three years Linear regression over every three consecutive years The better the grouping of mentions into coherent concept is, the more stable the trend graph is. 11 ApproachSVMDecision Tree Topic Modeling Sentiment Analysis LexClus0.970.830.730.48 CitClus0.520.37 0.46

13 Relations Between Concept Categories For a given concept, calculate the ratio between number of application mentions and technique mentions. Three concepts in ACL community  Support vector machines, Machine translation, POS tagging 12 SVM, #app/#tech MT, #tech/#app POS tagging, #tech/#app

14 Relations Between Concept Categories For a given application, what techniques have been applied to it. 13 Machine translation Named entity recognition Phrase-based and MERT Decision Tree Decision Tree disappears CRF

15 Conclusion This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts. These tools can provide rather deep understanding and useful insight of research communities. 14 Named entity recognition


Download ppt "Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC."

Similar presentations


Ads by Google