Presentation is loading. Please wait.

Presentation is loading. Please wait.

A research literature search engine with abbreviation recognition

Similar presentations


Presentation on theme: "A research literature search engine with abbreviation recognition"— Presentation transcript:

1 A research literature search engine with abbreviation recognition
Cheng-Tao Chu Pei-Chin Wang

2 Outline Features Demo Issues involved Implementation Evaluation Q&A
Tailored Edit Distance Probabilistic Model Translation Model Score Combination Evaluation Q&A

3 Features Given a query containing authors, proceeding or title keywords, return relevant papers Able to retrieve the desired papers with abbreviated author/proceeding names Web interface for query and user evaluation.

4 Demo It’s show time

5 Issues involved Tag the arbitrary query into author, proceeding, and other keywords fields Recognize author P. Raghavan -> Prabhakar Raghavan -> Padma Raghavan -> … Raghavan Probability of each possible candidates

6 Issues involved (cont.)
Recognize proceeding name More than a look-up table IJCAI -> International Joint Conference of AI -> IJCAI Workshop How to combine the weight of each candidate Score from Lucene Score for a possible author Score for a possible proceeding

7 Implementation DBLP XML Parser Tagger Database Query Browser Search
Engine Retrieved Documents Probabilistic Model Tailored Edit Distance

8 Tailored Edit Distance
Heuristic Award for consecutive matching Award for matching capitalized character More penalty on substitution, less on insertion/deletion Probabilistic representation Transform edit distance cost to probability Normalize the cost Use training data to estimate the distribution

9 Conceptual Histogram

10 Probabilistic Model Translation Model Network Structure
Use tailored edit distance to estimate the distribution Return a distribution of candidate names (Assuming the independency between the full name and its abbreviation given evidence) Network Structure Full Name First Name Middle Name Last Name First Ini. Mid. Ini. Last Ini.

11 Score Combination Lucene score formula
Assign weights to each candidates as Combination score Set idf(t) as ( weight of that term + original idf(t) ) Assign boost value to each term in query

12 Evaluation Test data construction Evaluation by test data
precision User evaluation Comparison with Google Scholar

13 Q&A


Download ppt "A research literature search engine with abbreviation recognition"

Similar presentations


Ads by Google