Presentation is loading. Please wait.

Presentation is loading. Please wait.

Philips Research, Jan Korst, 26 november 20041 Ontology-based Extraction of Information from the Internet Jan Korst Philips Reseach Joint work with Michael.

Similar presentations


Presentation on theme: "Philips Research, Jan Korst, 26 november 20041 Ontology-based Extraction of Information from the Internet Jan Korst Philips Reseach Joint work with Michael."— Presentation transcript:

1 Philips Research, Jan Korst, 26 november Ontology-based Extraction of Information from the Internet Jan Korst Philips Reseach Joint work with Michael Verschoor, Nick de Jong, and Gijs Geleijnse

2 Philips Research, Jan Korst, 26 november Overview Context Ontologies Searching for enumerations / tables in web pages Case Study: Searching for famous persons on the web Concluding remarks

3 Philips Research, Jan Korst, 26 november Context recommender system: ontologies and metadata matching and reasoning preferences, personal history, and calender electronic program guide, cultural agenda recommendations for TV shows, expositions in museums, theatre shows, etc.

4 Philips Research, Jan Korst, 26 november Ontologies An ontology is a “specification of a conceptualization”. [Tom Gruber] In other words: a formal description of the concepts and their relationships in a certain domain. Example: music domain concepts: composers, songs, albums, performers,… relationships: … To define/specify ontologies for given knowledge domains semantic web languages as RDF(S) and OWL are useful.

5 Philips Research, Jan Korst, 26 november Ontologies An ontology O is defined by a 4-tuple (C, I, P, T ), where: C is a set of classes c e.g. composer, song, album, performer,… I = { I (c ) | c  C }, with I (c ) the set of instances of class c P is a set of properties p (c,c’ ) for some c, c’  C e.g. is_composer_of (composer, song) is_contained_in (song, album) T = {T (p) | p  P }, with T (p)  { (s, p, o) | s  I (c), o  I (c’ )} for each p  P the set of true statements (triples).

6 Philips Research, Jan Korst, 26 november Problem statement For a partially given ontology O’ = (C, I’, P, T’ ) of a given knowledge domain, with I’  I and T’  T, extend I’ to I’’ and T’ to T’’ to approximate I and T as well as possible. In other words: how can we populate databases. Research questions: - Can this be automated ? - Can we do this by extracting information the web ?

7 Philips Research, Jan Korst, 26 november Quality of Approximation For each class c, we define precision and recall as follows: precision (c ) = recall (c ) = For each property p, precision and recall are defined likewise.

8 Philips Research, Jan Korst, 26 november Searching for enumerations on the web basic idea: words in an enumeration tend to be of the same class. Given a small subset of instances of a given class, we want to automatically extend this subset: more-of-the-same. algorithm: - select web pages in which a given sequence or given subset of instances occurs, using Google. - scan these pages for enumerations in which one or more of the given instances occurs. - extract other terms that are in these enumerations. Similar approach has been applied on a corpus of documents in molecular biology [Nenadić, Spasić & Ananiadou, 2002].

9 Philips Research, Jan Korst, 26 november Preselection of relevant web pages Extraction of Instances/Statements Filter to remove false positives General structure of the algorithm

10 Philips Research, Jan Korst, 26 november Examples "bach vivaldi mozart" > [63] bach[154], mozart[46], vivaldi[45], haydn[17], beethoven[14], ensembles[9], handel[9], chopin[7], haendel[5], schubert[5], bizet[4], j[4], albinoni[3], brahms[3], s[3], sanz[3], tartini[3], 2[2], chaconne[2], corelligeminiani[2], gershwin[2], gluck[2], http[2], inteacutegrale[2], minor[2], paganini[2], ravel[2], strauss[2], stravinsky[2], tchaikovsky[2], teleman[2], telemann[2], albeniz[1], bellini[1], benda[1], berlioz[1], bloch[1], boccherini[1], boellman[1], boieldieu[1], bruch[1], caccini[1], caldera[1], corelli[1], diabelli[1], dowland[1], giuliani[1], grieg[1], homekcrrcom[1], jsbach[1], martin[1], milano[1], ortiz[1], pergolesi[1], prokofiev[1], purcell[1], rimskykorsakov[1], schumann[1], smetana[1], title[1], torelli[1], vieuxtemps[1]

11 Philips Research, Jan Korst, 26 november Examples (2) "france germany england italy" > [54] france[322], germany[259], brazil[257], italy[239], argentina[223], england[218], spain[215], holland[212], yugoslavia[140], croatia[133], denmark[129], norway[122], chile[91], belgium[88], nigeria[83], romania[83], mexico[66], bulgaria[59], colombia[54], scotland[34], austria[33], cameroon[30], team[25], usa[22], sth[18], states[16], morocco[13], ar[12], netherlands[12], saudi[11], africa[10], bahamas[10], paraguay[10], czech[8], jamaica[8], scandinavia[8], canada[7], japan[7], acquitane[4], australia[4], bali[4], caribbean[4], china[4], czechoslovakia[4], luxembourg[4], poland[4], us[4], flanders[2], acadeacutemiques[1], asn[1], cortona[1], europe[1], korea[1], park[1]

12 Philips Research, Jan Korst, 26 november Examples (3) poincare hilbert brouwer > [90] brouwer[20], hilbert[20], abel[18], deligne[18], gregory[18], mandelbrot[18], taylor[18], turing[18], cavalieri[17], poisson[17], banach[16], kolmogorov[16], wiener[16], goldbach[15], grassmann[15], cohen[13], hausdorff[13], jacobi[13], kronecker[13], torricelli[13], vinogradov[13], riemann[12], dedekind[11], frege[11], artin[10], babbage[10], barrow[10], boole[10], bourgain[10], eukleidõs[10], euler[10], fraenkel[10], heaviside[10], legendre[10], möbius[10], shannon[10], tchebychev[10], borel[9], fibonacci[9], fisher[9], grothendieck[9], aryabhata[8], birkhoff[8], bolyai[8], cayley[8], church[8], descartes[8], hypatie[8], markov[8], minkowski[8], bolzano[7], cramer[7], dee[7], painlevÕ[7], cantor[6], morgan[6], puthagoras[6], gauss[5], haldane[5], hauptman[5], irons[5], lejeune[5], schwartz[5], lie[4], bayes[3], poincareacute[3], poincarÕ[3], biography[2], brahmagupta[2], carnap[2], goumldel[2], gödel[2], …

13 Philips Research, Jan Korst, 26 november Hypernym-based filtering Patterns that indicate hypernym relations are distinguished: ”h such as i 1, i 2, …, i n ” and ”i 1, i 2, …, i n and other h ” [Hearst, 1992] In these patterns h is the plural of the intended class.

14 Philips Research, Jan Korst, 26 november Geographic Data Extract all countries: Input set Precision Recall France, China, Germany Georgia, Ghana, Latvia Kiribati, Monaco, Togo Find out which countries have a border in common.

15 Philips Research, Jan Korst, 26 november Case Study: Finding Famous Persons on the Web Objective: generate a long list of famous persons, by searching the web. - A famous person is a person that gets enough hits when being Googled. - We restrict ourselves to persons that have already died.

16 Philips Research, Jan Korst, 26 november Definition of number of hits Using only the last name is not specific enough. e.g. Bach, Smith Even the full name might not be specific enough. e.g. Theo van Gogh In addition, some persons score better with middle name, others without. e.g. Johann Sebastian Bach vs. Johann Bach Antonio Vivaldi vs. Antonio Lucio Vivaldi While others are best known with initials only. e.g. HG Wells, DH Lawrence

17 Philips Research, Jan Korst, 26 november Definition of number of hits We use the number of hits that are found with query: “ ( - )” e.g. “Bach (1685 – 1750)” By not using the full name, we combine different variants. e.g. Johann Sebastian Bach and JS Bach For kings, queens, popes, etc, the Latin ordinal number is used as last name. This combines the variants in different languages. e.g. Charles V Carlos V Karel V

18 Philips Research, Jan Korst, 26 november Basic idea We use potential time intervals “( - )” as starting point to search for persons. Issue exact queries to Google of the following form: allintitle: “(y1 – y2)” where y1 ∈ [ ] and y2-y1 ∈ [ ], and analyse the summaries Google returns. Look for the six words that precede “(y1 – y2)” and analyse these words.

19 Philips Research, Jan Korst, 26 november Google batch processing To process the Google queries we use a program that allows batch processing (Nick de Jong): Program allows parallel execution of multiple queries. file with queries GoogleQuery file with results

20 Philips Research, Jan Korst, 26 november Main Problem: how to separate person names from other names. Art BlakeyArt Deco West MaeWest Virginia Raul Delcroix Real Decreto HP LovecraftHP Inkjet Koye SomefunHave SomeFun Potential approaches: - - filter out non-persons by using a list of stop words. - - filter out non-persons by using an exhaustive list of first names. - - carry out further tests (“X was born in”). We only used a list of 500 stop words, including: Album, Anniversary, Archive, Articles, Biographie, Biography, Births, Boats, Burials, Catalog, Census,…

21 Philips Research, Jan Korst, 26 november Additional Problem: a single person can be presented in various ways Vasilij Kandinskij Wassily Kandinsky Vasily Kandinsky Vassily Kandinsky Kandinsky, Wassily Kandinsky Wassily Johann Sebastian Bach JS Bach Johann Sebastian Sebastian Bach Bach, Johann Sebastian

22 Philips Research, Jan Korst, 26 november Example of the word sequences that are found: [allintitle: "( )" -genealogy -genealogie] 111 Rose-Philippine Duchesne ( Wellesley, 1st Duke of Wellington ( Home Study Service Rose Philippine Duchesne Arthur, 1st Duke of Wellington ( The Duke of Wellington ( Wellesley, 1st Duke of Wellington ( Arthur Wellesley, Duke of Wellington. ( Wellesley, first Duke of Wellington ( People > Duke of Wellington ( > Pobl > Dug Wellington ( medal depicting Duke of Wellington ( Arthur Wellesley Wellington ( Wellesley, 1st Duke of Wellington ( John Landseer ( Wellington, Arthur Wellesley,Duke of, Learning Library: WELLINGTON, DUKE OF (

23 Philips Research, Jan Korst, 26 november Another Example: George Frederick Handel ( GEORGE F. HANDEL ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel,... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British Classical DVD: Handel, George Frederic (German/British, George Frederic Handel (... George Frideric HANDEL ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by HANDEL, Georg Friedrich (

24 Philips Research, Jan Korst, 26 november first reduce capitals: If a word consists of capitals only, then replace all but the first. e.g. HANDEL  Handel Unless the word contains a hyphen. e.g. SAINT-SAENS  Saint-Saens Unless the word represents a latin ordinal number. e.g. Louis XIV  Louis XIV Unless the word starts with ‘MC’. e.g. MCCULLOCH  McCulloch Unless the word is an abbreviation (initials). e.g. DE KNUTH  DE Knuth

25 Philips Research, Jan Korst, 26 november Example: George Frederick Handel ( GEORGE F. HANDEL ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel,... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British Classical DVD: Handel, George Frederic (German/British, George Frederic Handel (... George Frideric HANDEL ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by HANDEL, Georg Friedrich (

26 Philips Research, Jan Korst, 26 november Example: George Frederick Handel ( George F. Handel ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel,... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British, Classical Dvd: Handel, George Frederic (German/British, George Frederic Handel (... George Frideric Handel ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by Handel, Georg Friedrich (

27 Philips Research, Jan Korst, 26 november delete pre- and suffixes: Delete parts that cannot be part of the name. First delete suffix. Next, scan through the words from back to front, until e.g. a colon or point is encountered.

28 Philips Research, Jan Korst, 26 november Example: George Frederick Handel ( George F. Handel ( X. George Frederick Handel. ( Handel, George Frideric ( George Frederic Handel,... George Frederic Handel ( CD:Composers - H: Handel, George Frederic (German/British, Classical Dvd: Handel, George Frederic (German/British, George Frederic Handel (... George Frideric Handel ( Georg Frideric Handel | from Alibris George Frideric Handel ( New Window. George Frideric Handel ( up artist Handel, George F. ( Giulio Cesare. by GF Handel ( piece by Handel, Georg Friedrich (

29 Philips Research, Jan Korst, 26 november Example: George Frederick Handel George F. Handel X. George Frederick Handel Handel, George Frideric George Frederic Handel Handel, George Frederic George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

30 Philips Research, Jan Korst, 26 november correct inversions: If two words remain, where the first ends with a comma, then reverse. e.g. West, Mae  Mae West If three words remain, where the first ends with a comma, then reverse. e.g. Handel, George Frederick  George Frederick Handel If three words remain, where the second ends with a comma, then reverse. e.g. Van Gogh, Vincent  Vincent van Gogh Problem: not all inverted names contain commas.

31 Philips Research, Jan Korst, 26 november Example: George Frederick Handel George F. Handel X. George Frederick Handel Handel, George Frideric George Frederic Handel Handel, George Frederic George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

32 Philips Research, Jan Korst, 26 november Example: George Frederick Handel George F. Handel X. George Frederick Handel George Frideric Handel George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich

33 Philips Research, Jan Korst, 26 november save two- and three-word names Scan the list of strings and those consisting of two or three words are stored, provided that they do not contain stop words. In addition, count how often they are found.

34 Philips Research, Jan Korst, 26 november Example: George Frederick Handel George Frederic Handel 5 George F. HandelGeorge Frideric Handel2 X. George Frederick HandelGeorge F. Handel1 George Frideric HandelGeorge Frederick Handel 1 George Frederic HandelGeorg Frideric Handel1 George Frederic Handel by GF Handel1 George Frederic Handel George Frideric Handel Georg Frideric Handel from Alibris George Frideric Handel George Frideric Handel up artist Handel, George F. by GF Handel piece by Handel, Georg Friedrich For each lastname/years combination the form that was found most often is used.

35 Philips Research, Jan Korst, 26 november Unexpected Observations - - Franz-Eugen Schlachter (1859 – 1911) has 64,500 hits, but all from the same server! It concerns an on-line bible, where each bible page is implemented as a separate web page, with Franz-Eugen Schlachter in the title. We can use the similar pages information that Google gives, to filter these out. - - Koop Juliana ( ) has 8,200 hits. “Koop Juliana” results in considerably less hits than “Juliana (1948 – 1980)”. That can be an indication that the first name is not correct.

36 Philips Research, Jan Korst, 26 november Number of Persons Found 1000 – 1099: – 1199: – 1299: – 1399: – 1499: – 1599: – 1699: – 1799: – 1899: – 1999: Total 51909

37 Philips Research, Jan Korst, 26 november Top 16 born between 1500 and William Shakespeare ( ) Rene Descartes ( ) Galileo Galilei ( ) Francis Bacon ( ) John Dowland ( ) Orlandus Lassus ( ) Johannes Kepler ( ) Thomas Hobbes ( ) Frescobaldi Girolamo ( ) Claudio Monteverdi ( ) Peter Paul Rubens ( ) Tycho Brahe ( ) Michel de Montaigne ( ) John Calvin ( ) Elizabeth I ( ) Andrea Palladio ( ) Gibbons Orlando (1508 – 1580) Nicolas Poussin ( ) 6790

38 Philips Research, Jan Korst, 26 november Top 16 born between 1600 and Johann Sebastian Bach ( ) Antonio Vivaldi ( ) Henry Purcell ( ) Georg Philipp Telemann ( ) Georg Friedrich Haendel ( ) Voltaire ( ) Isaac Newton ( ) Domenico Scarlatti ( ) Arcangelo Corelli ( ) Francois Couperin ( ) Jean-Philippe Rameau ( ) Alessandro Scarlatti ( ) Tomaso Albinoni ( ) Jean-Baptiste Lully ( ) Giuseppe Tartini ( ) de la Barca ( ) John Locke ( ) Blaise Pascal ( ) 22700

39 Philips Research, Jan Korst, 26 november Top 16 born between 1700 and Wolfgang Amadeus Mozart ( ) Ludwig van Beethoven ( ) Franz Schubert ( ) Napoleon Bonaparte ( ) Joseph Haydn ( ) Johann Wolfgang Goethe ( ) Immanuel Kant ( ) Gioacchino Rossini ( ) Benjamin Franklin ( ) Washington Irving ( ) Luigi Boccherini ( ) Luigi Cherubini ( ) William Blake ( ) Arthur Schopenhauer ( ) Thomas Jefferson ( ) Jean-Jacques Rousseau ( ) Boyce William ( ) Heinrich Heine ( ) 15900

40 Philips Research, Jan Korst, 26 november Top 16 born between 1800 and Charles Darwin ( ) Albert Einstein ( ) Johannes Brahms ( ) James Joyce ( ) Peter Iljitsch Tschaikowsky ( ) Robert Schumann ( ) Frederic Chopin ( ) Giuseppe Verdi ( ) Claude Debussy ( ) Winston Churchill ( ) Franz Liszt ( ) Richard Wagner ( ) Richard Strauss ( ) Antonin Dvorak ( ) Maurice Ravel ( ) Gustav Mahler ( ) 34300

41 Philips Research, Jan Korst, 26 november Top 16 born between 1900 and nov nov Ronald Reagan ( ) Yasser Arafat ( ) Benjamin Britten ( ) Ronald Reagan ( ) John Peel ( ) Benjamin Britten ( ) Samuel Barber ( ) Samuel Barber ( ) John Fitzgerald Kennedy ( ) John Peel ( ) Robertson Davies ( ) Robertson Davies ( ) Yasser Arafat ( ) John F. Kennedy ( ) Peter Ustinov ( ) Peter Ustinov ( ) Kurt Cobain ( ) Kurt Cobain ( ) Salvador Dali ( ) Salvador Dali ( ) Christopher Reeve ( ) Jon Lee ( ) Jon Lee ( ) Marlon Brando ( ) Marlon Brando ( ) Christopher Reeve ( ) Van Gogh ( ) Jean-Paul Sartre ( ) Albert Camus ( ) 9730 Chostakovitch Dimitri ( ) Jean-Paul Sartre ( ) 9630Albert Camus ( ) Ted Hughes ( ) 8970Van Gogh ( ) Jim Morrison ( ) 8930Steve Reich ( ) 8370

42 Philips Research, Jan Korst, 26 november Top 16 born between 1000 and Johann Sebastian Bach ( ) Wolfgang Amadeus Mozart ( ) Charles Darwin ( ) Albert Einstein ( ) Ludwig van Beethoven ( ) Franz Schubert ( ) Napoleon Bonaparte ( ) Johannes Brahms ( ) James Joyce ( ) Leonardo da Vinci ( ) William Shakespeare ( ) Joseph Haydn ( ) Peter Iljitsch Tschaikowsky ( ) Johann Wolfgang Goethe ( ) Robert Schumann ( ) Ronald Reagan ( ) 44800

43 Philips Research, Jan Korst, 26 november Testing recall Herinneringen in Steen 195 persons recall: found: James Baldwin, Olaf Palme, Simone Signoret, Henry Moore, Carel Willink, Joan Miro, Theolonius Monk, Georges Brassens, John Lennon, Jean-Paul Sartre, Simone de Beauvoir, Mae West, Kurt Gödel, Elvis Presley, Maria Callas, Charlie Chaplin, Benjamin Britten, Paul Robeson, Mao Zedong, Agatha Christie, Lotte Lehmann, Robert Stolz, Edward Kennedy, Pablo Picasso, Pablo Casals, Maurits Cornelis Escher, Ezra Pound, Jim Morrison, Louis Armstrong, Igor Stravinsky, Jimi Hendrix, Barnett Newman, Charles de Gaule, Judy Garland, Dwight David Eisenhower, Ho Tsji Minh, Martin Luther King, Robert Kennedy, Erneste Guevara, John William Coltrane,… 45 not found: Louis Paul Boon, Adriaan Roland Holst, Stijn Streuvels, Ernest Claes, Johannes XXIII, Dag Hammarskj ö ld, William Christopher Handy, Lucien Guitry, Antony Fokker, Pieter Jelles Troelstra, Paul van Ostaijen, Hugo Verriest,…

44 Philips Research, Jan Korst, 26 november Testing recall Het Kunst Boek of the first 200 (dead) persons recall: found: Jaques-Laurent Agasse, Josef Albers, Allesandro Algardi, Washington Allston, Jacopo Amigoni, Fra Angelico, Antonello da Messina, Alexander Archipenko, Giuseppe Arcimboldo, Hendrick Avercamp, Francis Bacon, Giacomo Balla, Fra Bartolommeo, Jean-Michel Basquiat, Jacopo Bassano, Pompeo Batoni, Willi Baumeister, Frederic Bazille, Domenico Beccafumi, Max Beckmann, Gentille Bellini, Giovanni Bellini, Hans Bellmer, Gianlorenzo Bernini, Josef Beuys, Albert Bierstadt,… 45 not found: Andrea del Sarto, Sofonisba Anguissola, Jean Arp, John James Audubon, Hans Baldung, Andre Beauneveu, Bernardo Bellotto, George Bellows, …

45 Philips Research, Jan Korst, 26 november Testing recall The Science Book of the 156 (dead) persons recall: found: Leon Battista Alberti, Nicolas Copernicus, Andreas Vesalius, Conrad Gesner, Tycho Brahe, William Gilbert, Johannes Kepler, Galileo Galilei, John Napier, William Harvey, Blaise Pascal, Pierre de Fermat, Christiaan Huygens, James Clerk Maxwell, Robert Boyle, Nicolaus Steno, Giovanni Domenico Cassini, Isaac Newton, Edmond Halley, Carolus Linnaeus, Lazzaro Spallanzani, Johan Heinrich Lambert, Joseph Priestley, Antoine Laurent Lavoisier, William Herschel, Henry Cavendish, James Hutton, Edward Jenner, Pierre-Simon Laplace, Georges Cuvier, Thomas Robert Malthus, Alexander von Humboldt, Allesandro Volta, Thomas Young, not found: Fibonacci, Piero della Francesca, Jeremiah Horrocks, Antoni van Leeuwenhoek, Rudolph Jacob Camerarius, George Hadley, Carl Wilhelm Scheele, James Hall, Joseph von Frauenhofer, William Smith,…

46 Philips Research, Jan Korst, 26 november Testing precision precision Counting false positives: 4900 – – – – Povijest Jugoslavije ( ) Oeuvre Poetique ( ) Alabama Wills (1808 – 1870) Black Tennesseans ( ) Nippon Porcelain ( ) Personal Favorites ( ) Wheeling Glass ( ) Political Impact ( ) Movie Set ( ) Transatlantic Dialogues ( ) Sailing Navy ( ) Home Children ( ) Peace Pilgrim ( ) Briton Riviere ( ) La Regle ( ) Farm Tractors ( ) Western Warfare ( ) Le Peintre ( ) Exakta Cameras ( ) Offene Briefe ( ) Portraitmatilde Muti ( ) Nature Morte ( ) Dessins Inconnus ( ) Jacques Lacan-Seminaires ( ) Legendary Parties ( ) Memory Joggers ( ) Klondike Ho ( ) Events From ( ) estimated precision for first 5000: 0.90

47 Philips Research, Jan Korst, 26 november Some observations - - Composers dominate the top for some centuries. - - Recently-died persons have relatively high score. - - Person names only consisting of one word, such as pseudonyms Voltaire, Caravaggio, and Nadar are not yet found. - - Likewise, names consisting of four or more words are not yet found, such as Joost van den Vondel. - - Also, persons that died as teenagers are not found, such as Jeanne d’Arc and Anne Frank. - - More advanced approximate pattern matching is required to better cluster the name variations of one person and potential errors in years.

48 Philips Research, Jan Korst, 26 november Concluding remarks - - Enumeration search offers an interesting approach to find more-of-the-same, since it is generally applicable. - - The famous-persons case study indicates that with simple techniques already non-trivial results can be obtained. - - Further research: extend the case study to also include information on nationality, profession, etc. of persons. Automatically search for biographic data. - - Other intended application domains: music and medical domain.

49 Philips Research, Jan Korst, 26 november Fun Section Election of ‘De Grootste Nederlander’: Vincent van Gogh

50 Philips Research, Jan Korst, 26 november Fun Section Persons that are born and died in the same years: Sir Christopher Wren (1632 – 1723) Anthony van Leeuwenhoek (1632 – 1723) Leo Tolstoy ( ) Henri Dunant ( ) Edouard Manet ( ) Gustave Dore ( ) JRR Tolkien (1892 – 1973) Pearl Buck (1892 – 1973) Miles Davis (1926 – 1991) Klaus Kinski (1926 – 1991)


Download ppt "Philips Research, Jan Korst, 26 november 20041 Ontology-based Extraction of Information from the Internet Jan Korst Philips Reseach Joint work with Michael."

Similar presentations


Ads by Google