Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of.

Similar presentations


Presentation on theme: "Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of."— Presentation transcript:

1 Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University 11/06/2007

2 Agenda Introduction Main task – ad hoc search Routing task – relevance feedback

3 What is legal search Goal: retrieve all documents for production requests. Production request: describes a set of documents that the plaintiff forces the defendant to produce. Recall-oriented: high risk (value) of missing (finding) important documents. Sample request text: All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated. AND OR W/5 guide strategy approval family “G rated” movie film Final query

4 Data set 7 million business records from tobacco companies and research institutes. Metadata: title, author, organizations, etc. OCR text: contain errors 50 topics generated from four hypothetical complaints created by lawyers

5 Main task – Ad hoc search Indri query formulation Without boolean constraint #combine(ranking function) With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )

6 Boolean constraint Translate the Final Query Original expressionIndri operator x AND y#uw(x y) x OR y#syn(x y) x BUT NOT y#filrej(y x) Phrase: “x y”#1(x y) Proximity: (x W/k y)#uw(k+2)(x y) AND OR W/5 guide strategy approval family “G rated” movie film

7 Ranking functions Bag of words (guide strategy approval family G rated movie film) Respect phrase operators (guide strategy approval family #1(G rated) movie film) Group synonyms together (#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film)) AND OR W/5 guide strategy approval family “G rated” movie film

8 Experiments and findings Boolean constraints improve recall and precision Structured queries outperform bag-of-words ones * B is the number of documents matching the Final Query. Its average value is 5000.

9 Per topic performance (Difference to the median of 29 manual runs) est_RB est_PB

10 Routing task of Legal track 2007 Structured queries are known to be hard to construct.  Not, with supervision Questions  Weighted query help?  Metadata&Annotations help? A definitive answer from Supervised Structured Query Construction

11 Structured query #weight( w1 t1 w2 t2 … wn tn) 0.00851 trademark.sentence0.00846 trademark 0.00665 gmp.product0.00653 basement.product 0.00625 steenland0.00606 steenland.sentence 0.00602 gouda.sentence0.00600 gouda 0.00587 steenland.organization0.00561 toi 0.00550 toi.sentence0.00544 lett.product 0.00486 chocol.ti0.00479 legal.sentence 0.00474 children.per_desc0.00467 legal.s 0.00459 legal0.00453 legal.organization 0.00435 kid.sentence0.00433 kid

12 Supervised Structured Query Construction Relevance feedback => supervised learning  Train linear SVM with keyword, keyword.field feature  SVM classifier f i : training weights for terms, choose to be tfidf/LM scores  Retrieval: #weight( w1 t1 w2 t2 … ) f i : tfidf/LM scores for terms Advantages  Given enough training, know for sure whether one type of feature helps

13 Example Query 13  All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.  (cand! OR chocolate) w/10 cigarette!

14 Annotations Feedback query: NE: bush.person sentence: violate.sent meta: television.title 0.00851 trademark.sentence0.00846 trademark 0.00665 gmp.product0.00653 basement.product 0.00625 steenland0.00606 steenland.sentence 0.00602 gouda.sentence0.00600 gouda 0.00587 steenland.organization0.00561 toi 0.00550 toi.sentence0.00544 lett.product 0.00486 chocol.ti0.00479 legal.sentence 0.00474 children.per_desc0.00467 legal.s 0.00459 legal0.00453 legal.organization 0.00435 kid.sentence0.00433 kid

15 Performance On 39 topics of Legal 2006 (2/3 of judged documents for training, the rest for testing) On 10 topics of Legal 2007 routing task

16 Routing Conclusions A principled way of constructing structured queries  Annotations  Query term weights Answers from a supervised learning algorithm  Weights helps, annotations less.

17 Thank you! Questions?


Download ppt "Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of."

Similar presentations


Ads by Google