Presentation is loading. Please wait.

Presentation is loading. Please wait.

ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal.

Similar presentations


Presentation on theme: "ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal."— Presentation transcript:

1 ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

2 Overview Union of three projects : Artiste, Equator, and AKT Union of three projects : Artiste, Equator, and AKT Aims: Aims: Use NLT to automatically extract relevant information about the life and work of artists from online documents Use NLT to automatically extract relevant information about the life and work of artists from online documents Feed this information automatically to an ontology designed for this domain Feed this information automatically to an ontology designed for this domain Generate stories by extracting and structuring information from the knowledge base in the form of biographical narratives in response to user requests Generate stories by extracting and structuring information from the knowledge base in the form of biographical narratives in response to user requests

3 Objectives To find out how effective these technologies are when used together To find out how effective these technologies are when used together To explore the way in which the limitations of one process effects the others To explore the way in which the limitations of one process effects the others (e.g. how ambiguity during extraction mind be reflected at the generation stage) (e.g. how ambiguity during extraction mind be reflected at the generation stage) To generate biographies that might not be as readable as those on the web but which : To generate biographies that might not be as readable as those on the web but which : contain information that is difficult to find out manually contain information that is difficult to find out manually gather information from disparate sources gather information from disparate sources

4 Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering

5 Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 6.Instantiation KB 2. Population 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation 7. Rendering Information Extraction

6 Knowledge Extraction Procedure

7 Search and Filter Documents Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query Calculate the similarity of retrieved documents to an example document Calculate the similarity of retrieved documents to an example document Use term frequency with normalisation for similarity computation Use term frequency with normalisation for similarity computation Apply some heuristics (e.g. sentence length) to filter out documents which contain mostly tables and/or links Apply some heuristics (e.g. sentence length) to filter out documents which contain mostly tables and/or links

8 Relation Extraction Natural language processing techniques to extract relation Natural language processing techniques to extract relation Guided by an ontology Guided by an ontology Use GATE (General Architecture for Text Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) Use GATE (General Architecture for Text Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ (hypernym)) Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ (hypernym))

9 An Example Given the sentence: Given the sentence: Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. The following facts are extracted: The following facts are extracted:

10

11 Future Information Extraction Work Incorporate a learning capability in extracting relation Incorporate a learning capability in extracting relation Need to widen the scope of the NLP tool to increase performance Need to widen the scope of the NLP tool to increase performance Extract information about ‘painting’ Extract information about ‘painting’ Extract links to painting images Extract links to painting images Further investigation about term expansion using WordNet (e.g. consider contexts in mapping synonyms or hypernyms) Further investigation about term expansion using WordNet (e.g. consider contexts in mapping synonyms or hypernyms)

12 Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering Knowledge Management

13 Ontology of artists based on CIDOC CRM Ontology of artists based on CIDOC CRM The ontology guides the extraction process The ontology guides the extraction process Populating the Ontology (feeding the KB) Populating the Ontology (feeding the KB) Knowledge consolidation Knowledge consolidation Ontology server providing a set of inference queries Ontology server providing a set of inference queries

14 Artequakt Ontology

15 Potted_biography.html >In 1631, when Rembrandt's work had become well known and his studio in Leiden was flourishing, he moved to Amsterdam. He became the leading portrait painter in Holland and received many commissions for portraits as well as for paintings of religious subjects. …..It is estimated that he painted between 50 and 60 self-portraits. Rembrandt leiden amsterdam between 50 and 60 self-portraits Potted_biography.html He became the leading portrait painter in Holland and received received many commissions for portraits as well as for paintings of religious subjects Potted_biography.html He became the leading portrait painter in Holland and received third-person past 0 ……… Populating the Ontology

16 Knowledge Consolidation After extracting info on Rembrandt from 10 web sites, the KB was populated with the following: After extracting info on Rembrandt from 10 web sites, the KB was populated with the following: Rembrandt instance: Rembrandt instance: 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van Rijn 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van Rijn Date of birth Date of birth 15/7/1606, 1606, 1620, 1641 15/7/1606, 1606, 1620, 1641 Place of birth Place of birth Leiden, Leyden, Netherlands, Holland Leiden, Leyden, Netherlands, Holland We need to merge duplications, and verify inconsistencies before we can use this knowledge We need to merge duplications, and verify inconsistencies before we can use this knowledge

17 Duplication Same old problem! Same old problem! Our approach for consolidation Our approach for consolidation Simple heuristics to consolidate most duplicates Simple heuristics to consolidate most duplicates Artist names are unique Artist names are unique all Rembrandts are merged all Rembrandts are merged Merge less specific info into more detailed ones Merge less specific info into more detailed ones 1606 is merged into 15/7/1606 1606 is merged into 15/7/1606 Term expansion using WordNet Term expansion using WordNet Synonyms: Leiden and Leyden, The Netherlands and Holland Synonyms: Leiden and Leyden, The Netherlands and Holland Holonyms (part of): Leiden is part of The Netherlands Holonyms (part of): Leiden is part of The Netherlands Knowledge Comparison Knowledge Comparison Rembrandt, Rembrandt Harmenszoon, and Van Rijn share a date of birth and a place of birth Rembrandt, Rembrandt Harmenszoon, and Van Rijn share a date of birth and a place of birth Difficult with multiple info – verification might help Difficult with multiple info – verification might help

18 Verification Inconsistency Inconsistency We don’t aim for “the right answer”, but for some sort of a confidence value We don’t aim for “the right answer”, but for some sort of a confidence value Different sources may provide different info, eg. Renoir’s dob is: Different sources may provide different info, eg. Renoir’s dob is: 5 Feb 1841 in www.pillipscollection.org/html/lbp.html 5 Feb 1841 in www.pillipscollection.org/html/lbp.htmlwww.pillipscollection.org/html/lbp.html 25 Feb 1841 in www.abcgallery.com/R/renoir/renoirbio.html 25 Feb 1841 in www.abcgallery.com/R/renoir/renoirbio.html www.abcgallery.com/R/renoir/renoirbio.html which one is more likely to be correct? which one is more likely to be correct? Trust: certain sources can be more trusted than others, but how do we judge that? Trust: certain sources can be more trusted than others, but how do we judge that? Frequency: certain facts might be extracted more often than others Frequency: certain facts might be extracted more often than others Extraction: some extraction rules are more reliable than others Extraction: some extraction rules are more reliable than others

19 Ontology 1. Extraction Web web pages Information Extraction Servlets 5. Interaction Narrative Generation Knowledge Management 3.Consolidation4. Indexing KBDB Linky story template 6. Instantiation KB 2. Population 7. Rendering Narrative Generation

20 Biography Templates Specified as XML FOHM structures in Auld Linky Specified as XML FOHM structures in Auld Linky Leaves of the template may be: Leaves of the template may be: Queries into the DB for whole paragraphs Queries into the DB for whole paragraphs NLG using queries into the KB NLG using queries into the KB Context can be used to adjust the shape of the template according to user preferences Context can be used to adjust the shape of the template according to user preferences

21 1 1212 3 3 4 12 2 BirthFamily ArtDeath The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606. Search for: Paragraph with DOB Rembrandt was born on July 15, 1606. Construct Sentence with DOB In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. Paragraph about paintings His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. Paragraph about style He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists. Paragraph about influences Low Expertise Low Expertise High Expertise Sequence LoD Sequence LoD

22 3 2 BirthFamily ArtDeath In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. Paragraph about paintings His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. Paragraph about style He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists. Paragraph about influences Low Expertise Low Expertise High Expertise 12 3 Sequence 1 12 4 12 LoD The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606. In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time. His early work was devoted to showing the lines, light and shade, and color of the people he saw about him. On October 4, 1669, Rembrandt died in Amsterdam

23 Future Biography Generation Work Use co-referencing techniques to smooth out chosen paragraphs Use co-referencing techniques to smooth out chosen paragraphs Develop a ‘memory’ of what has been previously said (to catch paragraphs that include multiple ‘facts’) Develop a ‘memory’ of what has been previously said (to catch paragraphs that include multiple ‘facts’) Use conflicting factual data as a resource: Use conflicting factual data as a resource: compare conflicting accounts compare conflicting accounts generate statistical sentences “Most sources agree that…” generate statistical sentences “Most sources agree that…” Reference material so readers can evaluate the source Reference material so readers can evaluate the source

24 Future Direction for ArtEquAKT Improve the individual processes Improve the individual processes Incorporate images Incorporate images Use their context (descriptions etc) to extract knowledge about them Use their context (descriptions etc) to extract knowledge about them Deploy them in biographies to accompany the text Deploy them in biographies to accompany the text Use inference Use inference generate new relations in the KB generate new relations in the KB use NLP to generate sentences to describe them use NLP to generate sentences to describe them Apply technology to a physical setting (e.g. on a PDA around a gallery space) Apply technology to a physical setting (e.g. on a PDA around a gallery space)


Download ppt "ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal."

Similar presentations


Ads by Google