Applying Key Phrase Extraction to aid Invalidity Search

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

Chapter 5: Introduction to Information Retrieval

Introduction to Information Retrieval

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Information Retrieval Review

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Evaluating Retrieval Systems with Findability Measurement Shariq Bashir PhD-Student Technology University of Vienna.

Chapter 5: Information Retrieval and Web Search

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

Chapter 6: Information Retrieval and Web Search

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Automatic Labeling of Multinomial Topic Models

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

CSCE 590 Web Scraping – Information Extraction II

Queensland University of Technology

Information Organization: Overview

Linguistic Graph Similarity for News Sentence Searching

Clustering of Web pages

Semantic Processing with Context Analysis

Guangbing Yang Presentation for Xerox Docushare Symposium in 2011

An Empirical Study of Learning to Rank for Entity Search

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

Mining the Data Charu C. Aggarwal, ChengXiang Zhai

Learning to Rank Shubhra kanti karmaker (Santu)

Compact Query Term Selection Using Topically Related Text

Improving DevOps and QA efficiency using machine learning and NLP methods Omer Sagi May 2018.

Learning Literature Search Models from Citation Behavior

Searching with context

Citation-based Extraction of Core Contents from Biomedical Articles

Chapter 5: Information Retrieval and Web Search

Relevance and Reinforcement in Interactive Browsing

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Information Organization: Overview

Unsupervised Machine Learning: Clustering Assignment

Presentation transcript:

Applying Key Phrase Extraction to aid Invalidity Search Manisha Verma, Vasudeva Varma SIEL, LTRC, IIIT Hyderabad

Outline Introduction Related Work Motivation and Contribution Approaches Experiments and Results Future Work Questions ???

INTRODUCTION

Invalidity Search The task is to uncover patents or other published prior art that may render a granted patent invalid Find prior art that the patent examiner overlooked so that a patent can be declared invalid.

Input and Process INPUT It’s a patent application PROCESS Use existing search engines to find similar work. MANUALLY create queries, go through several documents – articles, granted patents etc and find similar documents.

Related Work

Our work employs the second approach. Related Work Two ways of approaching the problem Create a query from a patent and try different retrieval models Use different models to create a query from a patent then use an existing retrieval model. Our work employs the second approach.

Approach 1 Use claim text or abstract to create a query from the patent. Following have been used to improve Recall and Precision Re-ranking using several features Cluster based Pseudo Relevance Feedback Scoring based on subtopics etc.

Approach 2 Select words/phrases from different sections in a patent Find out which section results in best queries Select words using tf-idf from a patent. Assign weight to each word to mark its importance. Common weighing methods explored are tf,and tf-idf Identify the optimal length of the query i.e. number of words to keep in a query generated from a patent. Empirically determine the value.

Motivation and Contribution

Motivation and Contribution Explore and evaluate different ways to select phrases to make queries for patents. Though several key phrase extraction approaches have been proposed in the literature, they have not been used to create queries for invalidity search task. Evaluate and analyze the performance of queries created by using state-of-the-art unsupervised and supervised key phrase extraction techniques.

Approaches

Key Phrase Extraction Techniques Unsupervised TextRank (R. Mihalcea et al.) SingleRank (X. Wan et al.) Tf-Idf Tf Supervised RankPhrase (X. Jiang et al.) KEA (I. H.Witten et al.)

Unsupervised Approaches TextRank Present text as graph using co- occurrence statistics Run iterative algorithm to find dominant nodes (words) in graph.. SingleRank Same approach as TextRank While in TextRank phrases containing the top-ranked words are selected, in SingleRank, we do not filter out any low scoring words.

Supervised Approaches KEA Use features to represent key phrases. Use a classifier to train on manually annotated data. RankPhrase Treat key phrase extraction as ranking problem Same features from KEA have been used

Training Supervised Approaches ??? To annotate patents with key phrases, take some applications with relevance judgments. For every phrase in the document Fire it as a query. Calculate MAP and Recall of that phrase (using the relevance judgments) Select phrases with high Map and Recall Prune phrases based on tf-idf scores Use these phrases for the document. Use some sample documents annotated using this approach to train the supervised approach.

Experiments And Results

1.3 million patents (NTCIR) 1000 patent applications Our DATA 1.3 million patents (NTCIR) 1000 patent applications For each application, a list of patents which claim same invention is provided.

Unsupervised vs Supervised

Performance on different sections

Results The experiments indicate that key phrase extraction techniques indeed improve invalidity search results. Queries created by using unsupervised and supervised approaches perform better than those formed by tf or tf- idf. In supervised approaches, queries created by using phrases extracted by KEA show 29% and 37% improvement in MAP over TextRank and tf-idf respectively.

Future Work Weigh queries generated by using both the approaches Try the approaches on different patent collections Explore combination of the two approaches for query construction

References X. Xue and W. B. Croft. Automatic query generation for patent search. In CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 2037–2040, NY, USA, 2009. ACM. R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. of EMNLP, 2004. X. Xue and W. B. Croft. Transforming patents into prior-art queries. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 808–809, NY, USA, 2009. ACM. X. Jiang, Y. Hu, and H. Li. A ranking approach to key phrase extraction. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 756–757, NY, USA, 2009. ACM.

Questions ???