A research literature search engine with abbreviation recognition

Slides:



Advertisements
Similar presentations
Normal Distribution 2 To be able to transform a normal distribution into Z and use tables To be able to use normal tables to find and To use the normal.
Advertisements

Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Chapter 5: Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Information Retrieval in Practice
Search Engines and Information Retrieval
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Overview of Search Engines
WEB OF SCIENCE now including the CONFERENCE PROCEEDINGS CITATION INDEXES.
Databases & Data Warehouses Chapter 3 Database Processing.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
© 2007 CBHL The CBHL Distributed Library The Council on Botanical and Horticultural Libraries A Guide to Content and Search Features.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
NoteSearch - Find what you’re looking for. Prototype Team B.
IAEA International Atomic Energy Agency Special Characters Implementation Zbigniew Majewski 12th Joint INIS/ETDE Technical Committee Meeting October.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Crawling the Hidden Web Authors: Sriram Raghavan, Hector Garcia-Molina VLDB 2001 Speaker: Karthik Shekar 1.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Reading Notes Wang Ning Lab of Database and Information Systems
Murat Açar - Zeynep Çipiloğlu Yıldız
Multimedia Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Color Image Retrieval based on Primitives of Color Moments
Information Retrieval and Web Design
WSExpress: A QoS-Aware Search Engine for Web Services
Topic: Semantic Text Mining
Presentation transcript:

A research literature search engine with abbreviation recognition Cheng-Tao Chu Pei-Chin Wang

Outline Features Demo Issues involved Implementation Evaluation Q&A Tailored Edit Distance Probabilistic Model Translation Model Score Combination Evaluation Q&A

Features Given a query containing authors, proceeding or title keywords, return relevant papers Able to retrieve the desired papers with abbreviated author/proceeding names Web interface for query and user evaluation.

Demo It’s show time

Issues involved Tag the arbitrary query into author, proceeding, and other keywords fields Recognize author P. Raghavan -> Prabhakar Raghavan -> Padma Raghavan -> … Raghavan Probability of each possible candidates

Issues involved (cont.) Recognize proceeding name More than a look-up table IJCAI -> International Joint Conference of AI -> IJCAI Workshop How to combine the weight of each candidate Score from Lucene Score for a possible author Score for a possible proceeding

Implementation DBLP XML Parser Tagger Database Query Browser Search Engine Retrieved Documents Probabilistic Model Tailored Edit Distance

Tailored Edit Distance Heuristic Award for consecutive matching Award for matching capitalized character More penalty on substitution, less on insertion/deletion Probabilistic representation Transform edit distance cost to probability Normalize the cost Use training data to estimate the distribution

Conceptual Histogram

Probabilistic Model Translation Model Network Structure Use tailored edit distance to estimate the distribution Return a distribution of candidate names (Assuming the independency between the full name and its abbreviation given evidence) Network Structure Full Name First Name Middle Name Last Name First Ini. Mid. Ini. Last Ini.

Score Combination Lucene score formula Assign weights to each candidates as Combination score Set idf(t) as ( weight of that term + original idf(t) ) Assign boost value to each term in query

Evaluation Test data construction Evaluation by test data precision User evaluation Comparison with Google Scholar

Q&A