Presentation is loading. Please wait.

Presentation is loading. Please wait.

Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz 2 1 - University of Limerick, Ireland 2 - University.

Similar presentations


Presentation on theme: "Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz 2 1 - University of Limerick, Ireland 2 - University."— Presentation transcript:

1 Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz 2 1 - University of Limerick, Ireland 2 - University of Essex, UK

2 Outline Introduction The Log at Essex Manual Log Analysis Automatic SNE Recognition Using SNEs to Improve Retrieval Conclusions

3 Introduction Web log analysis has become an active area (Jansen et al., 2000) A search engine can be general or specific Our study is of an intranet (specific) log Work follows from Kruschwitz (2003) and Kruschwitz et al. (2009) NEs are very important in QA Aim here was to link web log analysis and QA via NEs

4 Introduction QA – What color is the top stripe on the U.S. flag? Web Logs – student union Named Entities – LTB 3, Chaplaincy, SPSS

5 The Log at Essex Log of UKSearch engine Period 1 st October 2006 ‑ 30 th September 2007 40,006 queries Interaction sequence –Iterative refinement of search terms –Suggests terms to augment or replace query –35,463 interaction sequences Session comprises one or more interaction sequences Indexes web pages in the essex.ac.uk domain –and any files in that domain linked from an indexed web page

6 The Log at Essex ‑ Cont. 3552795091B81DF16D8CFA6E7991A5D737741Tue May 01 12:57:14 BST 2007000outside options outside options outside options 3552895091B81DF16D8CFA6E7991A5D737741Tue May 01 12:57:36 BST 2007100outside options art history outside options outside options art history outside options art history 3552995091B81DF16D8CFA6E7991A5D737741Tue May 01 12:57:57 BST 2007200history art outside options outside options art history history art history of art Appearance of raw log

7 The Log at Essex ‑ Cont. [Tue,May,1,12,57,14,BST,2007] >>> *T *Tue * outside options *T *Tue *USA outside options art history *T *Tue *USA history of art <<< Log shown as session with the first interaction sequence

8 Manual Log Analysis Subset of log Fourteen days –Seven during holidays –Seven during term Each group of seven days comprised one Monday, one Tuesday etc. –1,794 queries –632 during holidays –1,162 during term

9 Manual Log Analysis ­ Cont. Twenty mutually exclusive topics Plus “Other” Each query was assigned to one of these

10 Manual Log Analysis – Cont. Topics used in manual classification

11 Manual Log Analysis – Cont. Topics used in manual classification

12 Manual Log Analysis – Cont. Topic analysis of 14­day subset

13 Manual Log Analysis – Cont. Topic analysis of 14­day subset

14 Manual Log Analysis ­ Cont. Top six categories: –Academic or other use –Computer use –Administration of studies –Person name –Structure and regulations –Calendar / timetable These account for 62% of queries

15 Manual Log Analysis ­ Cont. Four non-exclusive features –Acronym lower case –Initial capitals –All capitals –Typographic or spelling error 0-4 features are assigned to each query

16 Manual Log Analysis – Cont. Features used in manual classification

17 Manual Log Analysis – Cont. Typo / Spelling analysis of 14­day subset

18 Automatic SNE Recognition - Training 1,035 distinct instances of SNEs were manually identified in queries Each manually classified as being one of 35 SNE types Presented each SNE to bing.com restricted to essex.ac.uk Selected all snippets in top ten documents SNE plus five tokens on each side Presented each snippet to OpenNLP's MaxEnt­based name finder Identifying type of SNE in snippet Creating 35 name finder models

19 Automatic SNE Recognition - Training Examples of 35 SNE types

20 Automatic SNE Recognition - Training Examples of 35 SNE types

21 Automatic SNE Recognition - Training Examples of 35 SNE types

22 Automatic SNE Recognition - Training Examples of 35 SNE types

23 Automatic SNE Recognition - Evaluation Selected 500 queries from log Searched for these in the essex.ac.uk domain, using bing.com Recorded first snippet in top document returned –280 snippets were found Presented it to the 35 OpenNLP models –Identifying one or more of relevant SNE types

24 Automatic SNE Recognition - Evaluation Results. P=C/(C+F). R=C/(C+M).

25 Automatic SNE Recognition - Evaluation Results. P=C/(C+F). R=C/(C+M).

26 Automatic SNE Recognition - Evaluation SNE clearly defined and good training examples results in good performance P was 1.0 for buildings, campuses, forms, online services, person names, regulations and policies, research groups, room names and software P was 0.94 for departments / schools / units Most interesting: departments / schools / units, online services and room names where there were 15, 41 and 11 correct instances

27 Automatic SNE Recognition - Evaluation Generally algorithm works very well Training examples were limited & numbers varied widely Some NEs were well defined –online services, departments / schools / units Others were very poorly defined –documentation, equipment Algorithm is disinclined to give false positive Thus P tends to be high

28 Using SNEs for QA Person names should match variants of themselves plus anaphors –Kruschwitz = Udo Kruschwitz = he Person names could match a post name –Kruschwitz = Director of Recruitment and Publicity

29 Using SNEs to Improve Retrieval SNEs are linked –Course code, course name, degree code, degree name –Department, research centre, research group, person –Room number, person, building, department Thus a search for –C700 should match B Sc. Biochemistry –a group could match its department –a room number could return the name of the occupant, the building or the department

30 Conclusions Categorised queries in an intranet log Thus identified important SNE types Extracted instances of these using a search engine Carried out initial training experiment with MaxEnt Proposed methods of using SNEs for IR and QA Hence used a web log to improve future search


Download ppt "Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz 2 1 - University of Limerick, Ireland 2 - University."

Similar presentations


Ads by Google