Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The BT Digital Library A case study in intelligent content management Paul Warren

Similar presentations


Presentation on theme: "1 The BT Digital Library A case study in intelligent content management Paul Warren"— Presentation transcript:

1 http://www.sekt-project.com 1 The BT Digital Library A case study in intelligent content management Paul Warren paul.w.warren@bt.com

2 http://www.sekt-project.com 2 Semantics in content management limitations of conventional technology the users’ view using the technology enhancing the experience the starting point

3 http://www.sekt-project.com 3 Semantics in content management Intelligent content management

4 http://www.sekt-project.com 4 The need for semantics Content management systems need to: index by meaning, not just text combine information from heterogeneous sources Users need information: identified by semantics, not just keywords precise and complete selected by their interests and their task context defined semantically from heterogeneous sources, accessed uniformly semantics in content management

5 http://www.sekt-project.com 5 Higher precision, greater recall Precision Find me information about Washington the man, not the state or city Find me information about a company called X which operates in industry Y Recall Finding all relevant documents E.g. ask for information about ‘George W Bush’ and be given documents on ‘the President’ semantics in content management

6 http://www.sekt-project.com 6 Interests and context Need information about Jaguar? interested in cars, the natural world, South America … with a context defined by current activities Not just about searching interest & context to share information … … and to push information to user … plus many integrated applications semantics in content management

7 http://www.sekt-project.com 7 Too much relevant information Documents with duplicate information. Goal to: extract what is unique from each document help users prioritise their reading Need to: aggregate from disparate sources remove duplication present meaningfully classified summarised semantics in content management

8 http://www.sekt-project.com 8 The starting point The BT digital library before SEKT

9 http://www.sekt-project.com 9 The BT digital library the starting point Two major document databases 5 million articles – abstracts plus some full text Originally text-based with some attribute- based querying: e.g. author, date information spaces defined by queries

10 http://www.sekt-project.com 10 An information space the starting point Query-defined alerts Emailed weekly as database updated Public info spaces anyone can subscribe forming communities Private info spaces defined by user

11 http://www.sekt-project.com 11 Personalisation the starting point Personalised entry page shows user’s info spaces, journals of interest, recent reading and ‘jottings’ (bookmarks)

12 http://www.sekt-project.com 12 Limitations of conventional technology Why we need semantics

13 http://www.sekt-project.com 13 Queries Text string ‘knowledge management’ 4161 ABI + 5029 Inspec records Descriptor ‘knowledge management’ 3213 ABI + 2783 Inspec So careful query formulation needed … … but average query length is 1.8 words Little use of ‘advanced’ functions … … 80% queries use no query modifier limitations of conventional technology

14 http://www.sekt-project.com 14 Poor relevancy of results A simple keyword search tends to offer high recall and low precision. Ambiguity in the query, e.g. synonymy where several terms could describe the same concept, homonymy where a word has many different meanings. Relevant documents retrieved |A| Non relevant documents retrieved |B| Non relevant Documents |C| Relevant Documents |D| Relevant documents Retrieved documents Recall = |A|/(|A|+|D|) (proportion of relevant documents retrieved) Precision = |A|/(|A|+|B|) (proportion of retrieved documents that are relevant) limitations of conventional technology

15 http://www.sekt-project.com 15 Presenting results Searches Only 17% results read after 1 st page … no more than 10 results checked Same query, same results regardless of user’s preference & context Document descriptors Lots – many irrelevant to readership Where relevant, not fine-grained e.g. knowledge management limitations of conventional technology

16 http://www.sekt-project.com 16 Enhancing the experience What semantics can offer a digital library

17 http://www.sekt-project.com 17 A new experience enhancing the experience Hybrid searching concepts, instances, information spaces, and text search results meaningfully classified Automatic annotation identifying companies, people, … hyperlinked to a knowledgebase Topics – finer grained than document descriptors semi-automatically generated automatic document classification An extended corpus crawling the Web for related pages Web pages added to share knowledge

18 http://www.sekt-project.com 18 A better experience Semantics to improve precision & recall Washington the man, not city or state references to the President not just George W Bush Information spaces defined on semantic queries not just text queries Taking account of interests and context semantically defined Natural language results enhancing the experience

19 http://www.sekt-project.com 19 The users’ view What users want

20 http://www.sekt-project.com 20 Initial questionnaire & focus group Users want: Improved searching and indexing based on a user’s profile integrated into working environment To stay in control advise but not decide frustrated by too many email alerts the users’ view

21 http://www.sekt-project.com 21 Features – what the users think very important / important summarising results of search with personal interests and preferences advanced attribute-based search looking beyond the library suggesting candidate topic areas highlighting & hyperlinking named entities natural language queries the users’ view

22 http://www.sekt-project.com 22 After that … Important / minor importance retrieving similar articles re-using old queries agent searches access from a range of devices the users’ view

23 http://www.sekt-project.com 23 Using the technology Applying semantics to the BT Digital Library

24 http://www.sekt-project.com 24 Search: knowledge management using the technology knowledge management as: info space topic term With clustered results

25 http://www.sekt-project.com 25 A complex query using the technology microsoft 2 companies term semantic web info space topic term sem web info space Microsoft-authored Microsoft as term

26 http://www.sekt-project.com 26 Querying a concept alloy a term but also - concept in ontology … with properties … definition … sub-concepts using the technology

27 http://www.sekt-project.com 27 Document with markup using the technology Identified: Bhargava Waterbury Connecticut USA IEE Click for related documents, e.g. by Bhargava

28 http://www.sekt-project.com 28 Categorising results … using the technology

29 http://www.sekt-project.com 29... and more categories using the technology

30 http://www.sekt-project.com 30 In summary Semantic technology - provides intelligence in content management - enhances the user experience - satisfies proven user needs


Download ppt "1 The BT Digital Library A case study in intelligent content management Paul Warren"

Similar presentations


Ads by Google