Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.

Similar presentations


Presentation on theme: "A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim."— Presentation transcript:

1 A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim 2, Yunbo Cao 3, Chin Yew Lin 3 and Chew Lim Tan 1 1 National University of Singapore Text Analysis Conference, November 14-15, 2011 I2R-NUS-MSRA at TAC 2011: Entity Linking 2 Institute for Infocomm Research 3 Microsoft Research Asia

2 A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 2 I2R-NUS-MSRA at TAC 2011: Entity Linking  I2R-NUS team at TAC  incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)  Combination system  Offline Combination with the system of MSRA team at KB linking step

3 A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 3 I2R-NUS-MSRA at TAC 2011: Entity Linking  I2R-NUS team at TAC  incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)  Combination system  Combine with the system of MSRA team at KB linking step

4 A Two Tier Framework for Context-Aware Service Organization & Discovery Acronym Expansion - Motivation Text Analysis Conference, November 14-15, 2011 4 I2R-NUS-MSRA at TAC 2011: Entity Linking  Expanding an acronym from its context to reduce the ambiguities of a name  E.g.TSE in Wikipedia refers to 33 entries Vs. Tokyo Stock Exchange is unambiguous.

5 A Two Tier Framework for Context-Aware Service Organization & Discovery Step 1 – Find Expansion Candidates Text Analysis Conference, November 14-15, 2011 5 I2R-NUS-MSRA at TAC 2011: Entity Linking Identifying Candidate Expansions (e.g. for ACM)

6 A Two Tier Framework for Context-Aware Service Organization & Discovery Step 2 – Candidate Expansions Ranking Text Analysis Conference, November 14-15, 2011 6 I2R-NUS-MSRA at TAC 2011: Entity Linking  Using SVM classifier to rank the candidates  Our SVM based acronym expansion  can handle link acronyms and full strings in the different sentences in the articles  Number of common characters between acronym and leading character of the expansion.  can handle acronym with swapped letters.  E.g. Communist Party of China Vs. CCP  Sentence distance between acronym and expansion

7 A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 7 I2R-NUS-MSRA at TAC 2011: Entity Linking  I2R-NUS team at TAC  incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)  Combination system  Combine with the system of MSRA team at KB linking step

8 A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work on Context Similarity The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 8 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection  Zhang et al., 2010; Zheng et al., 2010; Dredze et al., 2010  Term Matching  However, 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season. No Term Match

9 A Two Tier Framework for Context-Aware Service Organization & Discovery Our System - A Wikipedia-LDA model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 9 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season. Topic: Basketball Topic: Science

10 A Two Tier Framework for Context-Aware Service Organization & Discovery Wikipedia – LDA Model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 10 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection P( word i | category j ) Document P( category i | document j ) Document … …

11 A Two Tier Framework for Context-Aware Service Organization & Discovery Wikipedia – LDA Model The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 11 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection 1) Michael Jordan is a leading researcher in machine learning and artificial intelligence. 2) Michael Jordan is currently a full professor at the University of California, Berkeley. 3) Michael Jordan (born February, 1963) is a former American professional basketball player. 4) Michael Jordan wins NBA MVP of 91-92 season.

12 A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 12 I2R-NUS-MSRA at TAC 2011: Entity Linking  I2R-NUS team at TAC  incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)  Combination system  Combine with the system of MSRA team at KB linking step

13 A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 13 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection  Vector Space Model  Difficult to combine bag of words (BOW) with other features.  Performance needs to be improved  Supervised Approaches  Using manual annotated training instances  Dredze et al., 2010; Zheng et al., 2010  Using automatically generated training instances  Zhang et al. 2010

14 A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 14 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection  Auto-generate training instance (Zhang et al., 2010) (News Article) Obama Campaign Drops The George W. Bush Talking Point …

15 A Two Tier Framework for Context-Aware Service Organization & Discovery Related Work The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 15 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection  From “George W. Bush” articles  No positive instances for “George H. W. Bush” “George P. Bush” and “George Washington Bush” generated  No negative instances for “George W. Bush” generated  Such positive negative training instance distributions may not be the same with the original ambiguous cases in the raw text collection  The distribution of the unambiguous mentions may not be the same in test data

16 A Two Tier Framework for Context-Aware Service Organization & Discovery The Approach in Our System The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 16 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection  An instance selection approach  Select an informative, representative, and diverse subset from the auto-generated data set.  Reduce the effect of the distribution differences

17 A Two Tier Framework for Context-Aware Service Organization & Discovery Instance Selection The 5th International Joint Conference on Natural Language Processing, November 8-13, 2011, Chiang Mai, Thailand 17 A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection Small Initial data set training SVM Classifier Test on auto- generated data set 2-D data set Illustration SVM hyperplane Select Informative, representative and diverse Instances Add these selected instances to Initial data set

18 A Two Tier Framework for Context-Aware Service Organization & Discovery Outline Text Analysis Conference, November 14-15, 2011 18 I2R-NUS-MSRA at TAC 2011: Entity Linking  I2R-NUS team at TAC  incorporate the new tech nologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)  Combination system  Combine with the system of MSRA team at KB linking step

19 A Two Tier Framework for Context-Aware Service Organization & Discovery  Advantages over other clustering techniques  Globally optimized results  Efficient in time and space  Generally, produce a better result  Success in many areas  Image segmentation  Gene expression clustering Spectral Clustering

20 A Two Tier Framework for Context-Aware Service Organization & Discovery Spectral Clustering A = QQ -1  Eigen Decomposition on Graph Laplacian  Dimensionality Reduction  (Luxburg, 2006) George W. Bush George H.W. Bush

21 A Two Tier Framework for Context-Aware Service Organization & Discovery Hierarchical Agglomerative Clustering Text Analysis Conference, November 14-15, 2011 21 I2R-NUS-MSRA at TAC 2011: Entity Linking  Convert a doc into a feature vector: Wikipedia concepts, bag-of-words and named entities.  Estimate the weight of each feature using Query Relevance Weighting Model (Long and Shi, 2010):  this model shows good performance in Web People Search  In our work, original query name, its Wikipedia redirected names and its coreference chain mentions are all considered as appearances of the query name in the text.  Similarity scores : cosine similarity and overlap similarity.

22 A Two Tier Framework for Context-Aware Service Organization & Discovery Hierarchical Agglomerative Clustering Text Analysis Conference, November 14-15, 2011 22 I2R-NUS-MSRA at TAC 2011: Entity Linking  Docs referred to the same entity are clustered according to doc pair-wise similarity scores.  Start with singleton: each doc is a cluster  If there are two docs D and D ' in clusters C i and C j respectively: Two clusters C i and C j are merged to form a new cluster C ij if Sim(D,D ' ) > γ Calculate the similarity between the new cluster C ij and all remaining clusters γ = 0.25

23 A Two Tier Framework for Context-Aware Service Organization & Discovery Latent Dirichlet Allocation (LDA) Text Analysis Conference, November 14-15, 2011 23 I2R-NUS-MSRA at TAC 2011: Entity Linking  LDA has been applied to many NLP tasks such as: summarization and text classification  In our approach, the learned topics can represent the underlying entities of the ambiguous names  Generative story:

24 A Two Tier Framework for Context-Aware Service Organization & Discovery Text Analysis Conference, November 14-15, 2011 24 I2R-NUS-MSRA at TAC 2011: Entity Linking  Three classes SVM classifier to decide which system to be trusted  Features: scores given by the three systems Three Clustering Systems Combination Combine with the system of MSRA team at KB linking step  Binary SVM classifier to decide which system to be trusted  Features: scores given by the two systems

25 A Two Tier Framework for Context-Aware Service Organization & Discovery Experiment for Three Clustering Algorithms Text Analysis Conference, November 14-15, 2011 25 I2R-NUS-MSRA at TAC 2011: Entity Linking AlgorithmsEval 09Eval 10Eval 10 + SGP0.7450.9540.809 HAC0.6660.9500.789 LDA0.7820.9810.841 Combination0.7950.9820.852

26 A Two Tier Framework for Context-Aware Service Organization & Discovery Submissions Text Analysis Conference, November 14-15, 2011 26 I2R-NUS-MSRA at TAC 2011: Entity Linking SystemsAcc.PrecisionRecallF1 Full0.8630.8150.8490.831 Partial0.8440.7970.8290.813 Highest---0.846 Median---0.716

27 A Two Tier Framework for Context-Aware Service Organization & Discovery Conclusion Text Analysis Conference, November 14-15, 2011 27 I2R-NUS-MSRA at TAC 2011: Entity Linking  Incorporate the new technologies proposed in our recent papers (IJCAI 2011, IJCNLP 2011)  Acronym Expansion  Semantic Features  Instance Selection  Investigate three algorithms for NIL query clustering  Spectral Graph Partitioning (SGP)  Hierarchical Agglomerative Clustering (HAC)  Latent Dirichlet allocation (LDA)


Download ppt "A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim."

Similar presentations


Ads by Google