Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.

Similar presentations


Presentation on theme: "How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science."— Presentation transcript:

1 How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University

2 Technology Survey Task @ Chem Document Collection – 1.3 million patents + 0.18 million scientific articles – Tend to be long, have XML field structure Topics – 6 topics (last year only 2 groups submitted runs, not reusable) – About use/detection of chemicals (in certain applications) – Similar to Ad hoc retrieval queries 2

3 Example Topic: TS-20 tests for HCG hormone The hormone Human Chorionic Gonadotrophin (HCG) is produced when a women becomes pregnant. Tests are usually carried out by analysing blood or urine. We are looking for articles and patents on these pregnancy test kits or the chemical tests used to produce them. Human Chorionic Gonadotrophin OR HCG pregnancy Human Chorionic Gonadotrophin OR HCG 3

4 Our Runs Automatic Queries – Unweighted bag of word baseline – Weighting and combining words from different query fields Manual Queries – Interactive search using Boolean CNF queries (test OR check OR detection OR detect) AND (HCG OR “Human Chorionic Gonadotrophin” OR “Chorionic Gonadotropin” OR Choriogonadotropin OR Choriogonin) Effective, used by lawyers, librarians, medical, IR thesaurus & interaction MeSH etc. thesauri 4 check top ranked results

5 Lemur CGI 5 Identify synonyms 0.5 hours per topic

6 Results at Large (xinfAP) 6 Figure credit: Mihai Lupu Not much difference on average Worst manual queries have reasonable AP Manual queries lower some high AP topics slightly

7 Observations Weighting different query fields helped. Boolean CNF query (manual interaction) – Good Expressive Helps a lot for hard (low AP) queries – Bad Takes time & care to create & interact Manual error in formulating those queries Phrase or window restrictions improves top precision, but destroys lower level recall/precision – Difficult to identify from top rank, new tools needed 7

8 Comparisons with Best Runs Fraunhofer-SCAI – Semantic search (similar to our CNF queries) – IPC classification filtering – Doc field based term weighting Topics that our manual queries got better – TS-22 detect => detection test predict check determine determination – TS-29 minimum inhibitory concentration => … – Expanded all terms, but not all resulted in 8

9 Thanks to track organizers NSF grant IIS-1018317 Questions? 9


Download ppt "How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science."

Similar presentations


Ads by Google