ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Yansong Feng and Mirella Lapata
Hypermedia Presentation Generation on the Web Lynda Hardman Jacco van Ossenbruggen CWI Amsterdam.
REMBRANDT
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Information Retrieval in Practice
Search Engines and Information Retrieval
Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.
Mark Weal Auld Linky and the AKT - EQUATOR Bridge.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Danius T. Michaelides, David E. Millard, Mark J. Weal, David De Roure Auld Leaky: A Contextual Open Hypermedia Link Server.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Chapter 19: Information Retrieval
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Yimam & Kobsa July 13, 2000TWIST 2000 Centralization vs. Decentralization Issues in Internet-based KMS: Experiences from Expertise Recommender Systems.
Systems Analysis I Data Flow Diagrams
1212 Management and Communication of Distributed Conceptual Design Knowledge in the Building and Construction Industry Dr.ir. Jos van Leeuwen Eindhoven.
Overview of Search Engines
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CS 586 – Distributed Multimedia Information Management Prof. Dennis McLeod.
A Graphical User Interface for a Fine-Art Painting Image Retrieval System October 15-16, 2004 October 15-16, 2004.
REMBRANDT Who was Rembrandt ? Rembrandt was a famous Dutch painter. He was born July 15, 1606 in Leiden, Netherlands, to a father who was a.
1 Chapter 2 Revision: Documentation DFD System FC.
Search Engines and Information Retrieval Chapter 1.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Amy Dai Machine learning techniques for detecting topics in research papers.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Intelligent Information Retrieval and Presentation with Multimedia Databases Floris Wiesman (IKAT/UM) Stefano Bocconi (CWI) Boban Arsenijevic (ULCL/UL)
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
Harith Alani, Sanghee Kim, David Millard, Mark Weal, Paul Lewis, Wendy Hall, Nigel Shadbolt Using Protégé for Automatic Ontology Instantiation 7 th International.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
An Ontological Approach to Financial Analysis and Monitoring.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
WP4 Models and Contents Quality Assessment
Visual Information Retrieval
Search Engine Architecture
Lesson 6: Databases and Web Search Engines
Presented by: Hassan Sayyadi
Search Techniques and Advanced tools for Researchers
Information Retrieval
CSc4730/6730 Scientific Visualization
CSE 635 Multimedia Information Retrieval
Lesson 6: Databases and Web Search Engines
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Overview Union of three projects : Artiste, Equator, and AKT Union of three projects : Artiste, Equator, and AKT Aims: Aims: Use NLT to automatically extract relevant information about the life and work of artists from online documents Use NLT to automatically extract relevant information about the life and work of artists from online documents Feed this information automatically to an ontology designed for this domain Feed this information automatically to an ontology designed for this domain Generate stories by extracting and structuring information from the knowledge base in the form of biographical narratives in response to user requests Generate stories by extracting and structuring information from the knowledge base in the form of biographical narratives in response to user requests

Objectives To find out how effective these technologies are when used together To find out how effective these technologies are when used together To explore the way in which the limitations of one process effects the others To explore the way in which the limitations of one process effects the others (e.g. how ambiguity during extraction mind be reflected at the generation stage) (e.g. how ambiguity during extraction mind be reflected at the generation stage) To generate biographies that might not be as readable as those on the web but which : To generate biographies that might not be as readable as those on the web but which : contain information that is difficult to find out manually contain information that is difficult to find out manually gather information from disparate sources gather information from disparate sources

Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering

Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 6.Instantiation KB 2. Population 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation 7. Rendering Information Extraction

Knowledge Extraction Procedure

Search and Filter Documents Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query Calculate the similarity of retrieved documents to an example document Calculate the similarity of retrieved documents to an example document Use term frequency with normalisation for similarity computation Use term frequency with normalisation for similarity computation Apply some heuristics (e.g. sentence length) to filter out documents which contain mostly tables and/or links Apply some heuristics (e.g. sentence length) to filter out documents which contain mostly tables and/or links

Relation Extraction Natural language processing techniques to extract relation Natural language processing techniques to extract relation Guided by an ontology Guided by an ontology Use GATE (General Architecture for Text Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) Use GATE (General Architecture for Text Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ (hypernym)) Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ (hypernym))

An Example Given the sentence: Given the sentence: Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. The following facts are extracted: The following facts are extracted:

Future Information Extraction Work Incorporate a learning capability in extracting relation Incorporate a learning capability in extracting relation Need to widen the scope of the NLP tool to increase performance Need to widen the scope of the NLP tool to increase performance Extract information about ‘painting’ Extract information about ‘painting’ Extract links to painting images Extract links to painting images Further investigation about term expansion using WordNet (e.g. consider contexts in mapping synonyms or hypernyms) Further investigation about term expansion using WordNet (e.g. consider contexts in mapping synonyms or hypernyms)

Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering Knowledge Management

Ontology of artists based on CIDOC CRM Ontology of artists based on CIDOC CRM The ontology guides the extraction process The ontology guides the extraction process Populating the Ontology (feeding the KB) Populating the Ontology (feeding the KB) Knowledge consolidation Knowledge consolidation Ontology server providing a set of inference queries Ontology server providing a set of inference queries

Artequakt Ontology

Potted_biography.html >In 1631, when Rembrandt's work had become well known and his studio in Leiden was flourishing, he moved to Amsterdam. He became the leading portrait painter in Holland and received many commissions for portraits as well as for paintings of religious subjects. …..It is estimated that he painted between 50 and 60 self-portraits. Rembrandt leiden amsterdam between 50 and 60 self-portraits Potted_biography.html He became the leading portrait painter in Holland and received received many commissions for portraits as well as for paintings of religious subjects Potted_biography.html He became the leading portrait painter in Holland and received third-person past 0 ……… Populating the Ontology

Knowledge Consolidation After extracting info on Rembrandt from 10 web sites, the KB was populated with the following: After extracting info on Rembrandt from 10 web sites, the KB was populated with the following: Rembrandt instance: Rembrandt instance: 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van Rijn 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van Rijn Date of birth Date of birth 15/7/1606, 1606, 1620, /7/1606, 1606, 1620, 1641 Place of birth Place of birth Leiden, Leyden, Netherlands, Holland Leiden, Leyden, Netherlands, Holland We need to merge duplications, and verify inconsistencies before we can use this knowledge We need to merge duplications, and verify inconsistencies before we can use this knowledge

Duplication Same old problem! Same old problem! Our approach for consolidation Our approach for consolidation Simple heuristics to consolidate most duplicates Simple heuristics to consolidate most duplicates Artist names are unique Artist names are unique all Rembrandts are merged all Rembrandts are merged Merge less specific info into more detailed ones Merge less specific info into more detailed ones 1606 is merged into 15/7/ is merged into 15/7/1606 Term expansion using WordNet Term expansion using WordNet Synonyms: Leiden and Leyden, The Netherlands and Holland Synonyms: Leiden and Leyden, The Netherlands and Holland Holonyms (part of): Leiden is part of The Netherlands Holonyms (part of): Leiden is part of The Netherlands Knowledge Comparison Knowledge Comparison Rembrandt, Rembrandt Harmenszoon, and Van Rijn share a date of birth and a place of birth Rembrandt, Rembrandt Harmenszoon, and Van Rijn share a date of birth and a place of birth Difficult with multiple info – verification might help Difficult with multiple info – verification might help

Verification Inconsistency Inconsistency We don’t aim for “the right answer”, but for some sort of a confidence value We don’t aim for “the right answer”, but for some sort of a confidence value Different sources may provide different info, eg. Renoir’s dob is: Different sources may provide different info, eg. Renoir’s dob is: 5 Feb 1841 in 5 Feb 1841 in 25 Feb 1841 in 25 Feb 1841 in which one is more likely to be correct? which one is more likely to be correct? Trust: certain sources can be more trusted than others, but how do we judge that? Trust: certain sources can be more trusted than others, but how do we judge that? Frequency: certain facts might be extracted more often than others Frequency: certain facts might be extracted more often than others Extraction: some extraction rules are more reliable than others Extraction: some extraction rules are more reliable than others

Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering Narrative Generation

Biography Templates Specified as XML FOHM structures in Auld Linky Specified as XML FOHM structures in Auld Linky Leaves of the template may be: Leaves of the template may be: Queries into the DB for whole paragraphs Queries into the DB for whole paragraphs NLG using queries into the KB NLG using queries into the KB Context can be used to adjust the shape of the template according to user preferences Context can be used to adjust the shape of the template according to user preferences

BirthFamily ArtDeath The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, Search for: Paragraph with DOB Rembrandt was born on July 15, Construct Sentence with DOB In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. Paragraph about paintings His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. Paragraph about style He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists. Paragraph about influences Low Expertise Low Expertise High Expertise Sequence LoD Sequence LoD

3 2 BirthFamily ArtDeath In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. Paragraph about paintings His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. Paragraph about style He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists. Paragraph about influences Low Expertise Low Expertise High Expertise 12 3 Sequence LoD The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. On October 4, 1669, Rembrandt died in Amsterdam

Future Biography Generation Work Use co-referencing techniques to smooth out chosen paragraphs Use co-referencing techniques to smooth out chosen paragraphs Develop a ‘memory’ of what has been previously said (to catch paragraphs that include multiple ‘facts’) Develop a ‘memory’ of what has been previously said (to catch paragraphs that include multiple ‘facts’) Use conflicting factual data as a resource: Use conflicting factual data as a resource: compare conflicting accounts compare conflicting accounts generate statistical sentences “Most sources agree that…” generate statistical sentences “Most sources agree that…” Reference material so readers can evaluate the source Reference material so readers can evaluate the source

Future Direction for ArtEquAKT Improve the individual processes Improve the individual processes Incorporate images Incorporate images Use their context (descriptions etc) to extract knowledge about them Use their context (descriptions etc) to extract knowledge about them Deploy them in biographies to accompany the text Deploy them in biographies to accompany the text Use inference Use inference generate new relations in the KB generate new relations in the KB use NLP to generate sentences to describe them use NLP to generate sentences to describe them Apply technology to a physical setting (e.g. on a PDA around a gallery space) Apply technology to a physical setting (e.g. on a PDA around a gallery space)