Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining the web to improve semantic-based multimedia search and digital libraries

Similar presentations


Presentation on theme: "Mining the web to improve semantic-based multimedia search and digital libraries"— Presentation transcript:

1 Mining the web to improve semantic-based multimedia search and digital libraries http://gate.ac.uk/http://gate.ac.uk/ http://nlp.shef.ac.uk/http://nlp.shef.ac.uk/ Horacio Saggion Kalina Bontcheva University of Sheffield 21 November 2006 IST Event 2006 Web Mining and Semantic Web: Networking with industry and academia [This work has been partially supported by SEKT (http://sekt.semanticweb.org/), PrestoSpace (http://www.prestospace.org) andhttp://sekt.semanticweb.org/http://www.prestospace.org TAO (http://www.tao-project.eu/ projects]http://www.tao-project.eu/

2 2(9) Web mining and semantic annotation: why? Semantic annotation produces explicit representation of knowledge, given content –Knowledge is often implicit in the data sources –…or hard to extract automatically to a sufficient accuracy Frequently knowledge can be mined from the web and merged with the original content to improve semantic search and reasoning capabilities

3 3(9) Web mining and semantic annotation: how? GATE is a widely used open-source infrastructure for text mining (http://gate.ac.uk):http://gate.ac.uk –Ten years old, with 1000s of users at 100s of sites –Supports major document formats and languages –Helps build semantic annotation components –Integrate these with content and knowledge mined from the web –Create, test, and deploy these into an end-to-end application (some examples next)

4 4(9) RichNews: Multimedia Annotation The problem: –Access to archive material in the BBC is provided by some form of semantic annotation and indexing –Manual annotation is time consuming (up to 10x real time) and expensive Rich News (developed within the Prestospace project) aims to (partially) automate the annotation of news programs –Developed on BBC TV and radio news –Involving human in the loop is possible if desired Recordings of broadcasts go in one end Index of semantic metadata describing each news story comes out the other http://gate.ac.uk/sale/www05/web-assisted-annotation.pdf

5 5(9) Web mining in RichNews Why web mining: –Speech recognition produces poor quality transcripts with many mistakes –Closed captions/subtitles not always available –These news stories can also be found on the BBC and other web sites The solution: –Obtain key terms from the ASR transcripts –Search the web for related stories from same date –Find best matching stories –Obtain semantic annotations from this richer text –Merge with semantic annotations on transcript to obtain more precise knowledge, grounded in the video stream http://gate.ac.uk/sale/www05/web-assisted-annotation.pdf

6 6(9) RichNews Example

7 7(9) TAO – Augmenting Software Artefacts with Semantics TAO project – http://www.tao-project.euhttp://www.tao-project.eu Transitioning Applications to Ontologies Case study on augmenting software artefacts with semantics Learning ontologies from multiple software artefacts Knowledge about a software project often spread across different sources on the web: –Source code, discussion messages, bug descriptions, documentation

8 8(9) New Challenges Moving towards mining and semantically annotating Web 2.0 –Opinion mining from blogs and discussion forums –Mining wikis –Social network analysis Mining multimedia content Initial experiments in ongoing projects, but we need further work on these emerging social-oriented web

9 9(9) Thank you! These slides: http://gate.ac.uk/sale/talks/ist06/ist-event06.ppt Further details: –RichNews: http://gate.ac.uk/sale/www05/web- assisted-annotation.pdf http://gate.ac.uk/sale/www05/web- assisted-annotation.pdf –SEKT: http://gate.ac.uk/sale/iswc06/iswc06.pdf http://gate.ac.uk/sale/iswc06/iswc06.pdf –TAO: http://www.tao-project.euhttp://www.tao-project.eu


Download ppt "Mining the web to improve semantic-based multimedia search and digital libraries"

Similar presentations


Ads by Google