Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Enumerative Structures in Texts towards Hierarchical Structures in Ontologies Nathalie Aussenac-Gilles Mouna KAMEL – Bernard ROTHENBURGER IC3 team.

Similar presentations


Presentation on theme: "From Enumerative Structures in Texts towards Hierarchical Structures in Ontologies Nathalie Aussenac-Gilles Mouna KAMEL – Bernard ROTHENBURGER IC3 team."— Presentation transcript:

1 From Enumerative Structures in Texts towards Hierarchical Structures in Ontologies Nathalie Aussenac-Gilles Mouna KAMEL – Bernard ROTHENBURGER IC3 team at IRIT, Toulouse ILIKS 2011, Aix-en-Provence

2 Semantic relations SlingbacksSlingbacks are shoes which are secured by a strap behind the heel, … lexico-syntactic patterns NP0 (is|are) NP 1 Court shoesCourt shoes, known in the US as pumps, are typically high-heeled, … Platform shoePlatform shoe: shoe with very thick soles and heels The remaining elite American companies are Allen Edmonds and Alden Shoe Company.Allen EdmondsAlden Shoe Company High-heeled footwearHigh-heeled footwear is footwear that raises the heels, … 24/05/ ILIKS Hierachical structures

3 Semantic relations lexico-syntactic patterns ??? NP0 (is|are) NP 1 Ballet flatsBallet flats, known in the UK as ballerinas, ballet pumps or skimmers, are shoes with a very low heel … Variants include kitten heels (typically 1½-2 inches high) and stilletto heels (with a very narrow heel post) and wedge heels (with a wedge-shaped sole rather than a heel post).kitten heelsstilletto heels 24/05/ ILIKS Hierachical structures

4 Semantic relations lexico-syntactic patterns ??? 24/05/ ILIKS Hierachical structures

5 Semantic relations lexico-syntactic patterns ??? 24/05/ ILIKS Hierachical structures

6 Semantic relations TITLES lexico-syntactic patterns ??? 24/05/ ILIKS Hierachical structures

7 What’s new ? Going beyond the sentence  Discourse analysis (Rhetorical Structure Theory, Segmented Discourse Representation Theory,…) Using typo-dispositional clues  Logical structure of a text, Textual Architecture Model … Typology of Enumerative Structures (ES)  Vertical and paradigmatic ES  Translation Enumerative Structure > Hierarchical structure Evaluation 24/05/ ILIKS Hierachical structures

8 Discourse Structure The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions. PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks. FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc. DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank. CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. COMMERCIAL PAPER : placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days. 24/05/ ILIKS Hierachical structures

9 Discourse Structure The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions. PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks. FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc. DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank. CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. COMMERCIAL PAPER : placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days. (190) [The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions.A] [PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks.B] [FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc.C] [DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank.D] [CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. E] [COMMERCIAL PAPER placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days.F]wsj_0602 Carlson, L and Marcu D. (2001). Discourse Tagging Manual. Unpublished manuscript, 24/05/ ILIKS Hierachical structures

10 Discourse Structure The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions. PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks. FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc. DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank. CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. COMMERCIAL PAPER : placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days. (190) [The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions.A] [PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks.B] [FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc.C] [DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank.D] [CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. E] [COMMERCIAL PAPER placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days.F]wsj_0602 Carlson, L and Marcu D. (2001). Discourse Tagging Manual. Unpublished manuscript, 24/05/ ILIKS Hierachical structures

11 Discourse Structure (190) [The key U.S. and foreign annual interest rates below are a guide to general levels but don't always represent actual transactions.A] [PRIME RATE: 10 1/2%. The base rate on corporate loans at large U.S. money center commercial banks.B] [FEDERAL FUNDS: 8 3/4% high, 8 11/16% low, 8 5/8% near closing bid, 8 11/16% offered. Reserves traded among commercial banks for overnight use in amounts of $1 million or more. Source: Fulton Prebon (U.S.A.) Inc.C] [DISCOUNT RATE: 7%. The charge on loans to depository institutions by the New York Federal Reserve Bank.D] [CALL MONEY: 9 3/4% to 10%. The charge on loans to brokers on stock exchange collateral. E] [COMMERCIAL PAPER placed directly by General Motors Acceptance Corp.: 8.50% 30 to 44 days; 8.25% 45 to 65 days; 8.375% 66 to 89 days; 8% 90 to 119 days; 7.875% 120 to 149 days; 7.75% 150 to 179 days; 7.50% 180 to 270 days.F]wsj_0602 Elaboration-Set-Member List BC D E F Carlson, L and Marcu D. (2001). Discourse Tagging Manual. Unpublished manuscript, The representation in RST is generally carried out manually. 24/05/ ILIKS Hierachical structures

12 Typo-dispositional markers Typographical markers Dispositional markers 24/05/ ILIKS Hierachical structures

13 The act of enumerating : stating the successive elements of a same conceptual domain, these elements being hierarchically directly or indirectly linked to a classifying concept Can take several forms (examples from the “shoe” wikipedia page) Enumerative Structure 24/05/ ILIKS Hierachical structures

14 Enumerative Structures (ES) PRIMER {ITEM} CONCLUSION Horizontal vs. Vertical Syntagmatic vs. Paradigmatic 24/05/ ILIKS Hierachical structures

15 Enumerative Structures (ES) Erich von Hornbostel and Curt Sachs adopted Mahillon's scheme and published an extensive new scheme for classification in Zeitschrift für Ethnologie in Hornbostel and Sachs used most of Mahillon's system, but replaced the term autophone with idiophone. The original Hornbostel-Sachs system classified instruments into four main groups:  Idiophones, which would be an instrument that you could hit, strike, shake or scrape – such as the xylophone and rattle. They produce sound by vibrating themselves; they are sorted into concussion, percussion, shaken, scraped, split, and plucked idiophones.  Membranophones, which would be an instrument that uses a stretched skin, or membrane (key word being "stretched")such as drums or kazoos, produce sound by a vibrating membrane; they are sorted into predrum membranophones, tubular drums, friction idiophones, kettledrums, friction drums, and mirlitons.  Chordophones, which would be an instrument that uses stretched string or cord – such as the piano or cello, produce sound by vibrating strings; they are sorted into zithers, keyboard chordophones, lyres, harps, lutes, and bowed chordophones.  Aerophones, which would be an instrument that you produce a sound by blowing air into – such as the pipe organ or oboe, produce sound by vibrating columns of air; they are sorted into free aerophones, flutes, organs, reedpipes, and lip-vibrated aerophones. Sachs later added a fifth category, electrophones, such as theremins, which produce sound by electronic means. [107] Within each category are many subgroups. The system has been criticised and revised over the years, but remains widely used by ethnomusicologists and organologists. [107] From the " Musical instrument " wikipedia page 24/05/ ILIKS Hierachical structures

16 Enumerative Structures (ES) Erich von Hornbostel and Curt Sachs adopted Mahillon's scheme and published an extensive new scheme for classification in Zeitschrift für Ethnologie in Hornbostel and Sachs used most of Mahillon's system, but replaced the term autophone with idiophone. The original Hornbostel-Sachs system classified instruments into four main groups:  Idiophones, which would be an instrument that you could hit, strike, shake or scrape – such as the xylophone and rattle. They produce sound by vibrating themselves; they are sorted into concussion, percussion, shaken, scraped, split, and plucked idiophones.  Membranophones, which would be an instrument that uses a stretched skin, or membrane (key word being "stretched")such as drums or kazoos, produce sound by a vibrating membrane; they are sorted into predrum membranophones, tubular drums, friction idiophones, kettledrums, friction drums, and mirlitons.  Chordophones, which would be an instrument that uses stretched string or cord – such as the piano or cello, produce sound by vibrating strings; they are sorted into zithers, keyboard chordophones, lyres, harps, lutes, and bowed chordophones.  Aerophones, which would be an instrument that you produce a sound by blowing air into – such as the pipe organ or oboe, produce sound by vibrating columns of air; they are sorted into free aerophones, flutes, organs, reedpipes, and lip-vibrated aerophones. Sachs later added a fifth category, electrophones, such as theremins, which produce sound by electronic means. Within each category are many subgroups. The system has been criticised and revised over the years, but remains widely used by ethnomusicologists and organologists. Primer From the " Musical instrument " wikipedia page 24/05/ ILIKS Hierachical structures

17 Enumerative Structures (ES) Erich von Hornbostel and Curt Sachs adopted Mahillon's scheme and published an extensive new scheme for classification in Zeitschrift für Ethnologie in Hornbostel and Sachs used most of Mahillon's system, but replaced the term autophone with idiophone. The original Hornbostel-Sachs system classified instruments into four main groups:  Idiophones, which would be an instrument that you could hit, strike, shake or scrape – such as the xylophone and rattle. They produce sound by vibrating themselves; they are sorted into concussion, percussion, shaken, scraped, split, and plucked idiophones.  Membranophones, which would be an instrument that uses a stretched skin, or membrane (key word being "stretched")such as drums or kazoos, produce sound by a vibrating membrane; they are sorted into predrum membranophones, tubular drums, friction idiophones, kettledrums, friction drums, and mirlitons.  Chordophones, which would be an instrument that uses stretched string or cord – such as the piano or cello, produce sound by vibrating strings; they are sorted into zithers, keyboard chordophones, lyres, harps, lutes, and bowed chordophones.  Aerophones, which would be an instrument that you produce a sound by blowing air into – such as the pipe organ or oboe, produce sound by vibrating columns of air; they are sorted into free aerophones, flutes, organs, reedpipes, and lip-vibrated aerophones. Sachs later added a fifth category, electrophones, such as theremins, which produce sound by electronic means. Within each category are many subgroups. The system has been criticised and revised over the years, but remains widely used by ethnomusicologists and organologists. Primer From the " Musical instrument " wikipedia page Items 24/05/ ILIKS Hierachical structures

18 Enumerative Structures (ES) Erich von Hornbostel and Curt Sachs adopted Mahillon's scheme and published an extensive new scheme for classification in Zeitschrift für Ethnologie in Hornbostel and Sachs used most of Mahillon's system, but replaced the term autophone with idiophone. The original Hornbostel-Sachs system classified instruments into four main groups:  Idiophones, which would be an instrument that you could hit, strike, shake or scrape – such as the xylophone and rattle. They produce sound by vibrating themselves; they are sorted into concussion, percussion, shaken, scraped, split, and plucked idiophones.  Membranophones, which would be an instrument that uses a stretched skin, or membrane (key word being "stretched")such as drums or kazoos, produce sound by a vibrating membrane; they are sorted into predrum membranophones, tubular drums, friction idiophones, kettledrums, friction drums, and mirlitons.  Chordophones, which would be an instrument that uses stretched string or cord – such as the piano or cello, produce sound by vibrating strings; they are sorted into zithers, keyboard chordophones, lyres, harps, lutes, and bowed chordophones.  Aerophones, which would be an instrument that you produce a sound by blowing air into – such as the pipe organ or oboe, produce sound by vibrating columns of air; they are sorted into free aerophones, flutes, organs, reedpipes, and lip-vibrated aerophones. Sachs later added a fifth category, electrophones, such as theremins, which produce sound by electronic means. Within each category are many subgroups. The system has been criticised and revised over the years, but remains widely used by ethnomusicologists and organologists. Primer From the " Musical instrument " wikipedia page Items Conclusion 24/05/ ILIKS Hierachical structures

19 Enumerative Structures (ES) PRIMER {ITEM} CONCLUSION Horizontal vs. Vertical Syntagmatic vs. Paradigmatic 24/05/ ILIKS Hierachical structures

20 Horizontal vs. Vertical ES Under IAU definitions, there are eight planets in the Solar System. In order of increasing distance from the Sun, there are the four terrestrial planets, Mercury, Venus, Earth, and Mars, then the four gas-giant ones, Jupiter, Saturn, Uranus, and Neptune. versus Under IAU definitions, in the Solar System and in order of increasing distance from the Sun, there are eight planets: four terrestrial planets: - Mercury, - Venus, - Earth, - Mars. four gas-giant planets: - Jupiter, - Saturn, - Uranus, - Neptune. PRIMER ITEMS ~~~~~~~~~~~~~ ~~~~ ~~~~~~~~~~~~~ ~~~~~~~~ 24/05/ ILIKS Hierachical structures

21 Horizontal vs. Vertical ES Under IAU definitions, there are eight planets in the Solar System. In order of increasing distance from the Sun, there are four terrestrial planets, Mercury, Venus, Earth, and Mars, then the four gas-giant ones, Jupiter, Saturn, Uranus, and Neptune. versus Under IAU definitions, in the Solar System and in order of increasing distance from the Sun, there are eight planets: four terrestrial planets: - Mercury, - Venus, - Earth, - Mars. four gas-giant planets: - Jupiter, - Saturn, - Uranus, - Neptune.  less ambiguous 24/05/ ILIKS Hierachical structures

22 False enumerative structures ? This overconsumption is mainly responsible for the growing resistance of bacteria to antibiotics : The more a country consumes antibiotics the more resistant the bacteria become : in France, Staphylococcus aureus is resistant to méthicilline in 57% of the cases, though the observed frequency is only 1% in Denmark and 9% in Germany. and to each noticeable and lasting decrease of antibiotic consumption corresponds a decrease of this resistance phenomenon. From " The Sunday Times " wikipedia page [This overconsumption A ][is mainly responsible for the growing resistance of bacteria to antibiotics B ]. [The more a country consumes antibiotics the more resistant the bacteria become C ]:[In France, Staphylococcus aureus is resistant to methicillin in 57% of the cases D ], [though the observed frequency is only 1% in Denmark and 9% in Germany E ].[and to each noticeable and lasting decrease of antibiotic consumption F ][corresponds a decrease of this resistance phenomenon. G ] justify(non-volitionnal-cause(A,B), sequence (motivation(contrast(D,E),C), non-volitionnal-cause(F,G)) SE(Primer([A,B]),enum(item([C,E,D]),item([F,G])) 24/05/ ILIKS Hierachical structures

23 Enumerative Structures (ES) PRIMER {ITEM} CONCLUSION Horizontal vs. Vertical Syntagmatic vs. Paradigmatic 24/05/ ILIKS Hierachical structures

24 Syntagmatic vs. Paradigmatic ES Character shoes have : a one to three inch heel which is usually made of leather versus Men's shoes can also be decorated in various ways: Plain-toes: have a sleek appearance and no extra decorations on the vamp. Cap-toes: has an extra layer of leather that "caps" the toe. This is possibly the most popular decoration. heads of items are syntactically equivalent there are dependencies between items 24/05/ ILIKS Hierachical structures

25 A shoe is mainly composed of : the sole, which protects the bottom of the feet, more or less raised on the back of the heel and the vamp, upper part that wraps the foot Paradigmatic ES > Hierarchical Structure ~~~ ~~~~ Part-of 24/05/ ILIKS Hierachical structures

26 Typology of primers Women's dress shoes :  Pumps  SlingBuncks  Loafers  Mules  Ballet flats  Sandals The primer is a noun phrase : root of the tree > noun phrase relation > is-a The sole of a sandal can be made of : rubber, leather, wood, tatami, or rope The primer is only composed of a noun phrase and of a verb : root of the tree > noun phrase relation > meaning of the verb The primer is incomplete (proposition which is syntactically incomplete and for which the missing components are given by the items) 24/05/ ILIKS Hierachical structures

27 Typology of primers Snowshoes today are divided into three types: aerobic/running (small and light; not intended for backcountry use); recreational (a bit larger; meant for use in gentle-to moderate walks of 3–5 miles (4.8–8.0 km)); and mountaineering (the largest, meant for serious hill- climbing, long-distance trips and off-trail use). The primer contains a numeral or a linguistic clue such as "categories", "types", "following", etc. root of the tree > term which cooccurs with this marker relation > is-a Some shoes are exclusively worn by women :  pumps  Stiletto heels  Ballet flats root of the tree > subject or object ??? relation > is-a The primer is complete (proposition which is syntactically complete) 24/05/ ILIKS Hierachical structures

28 Application Enrichment of the OntoTopo ontology ( ANR-07-MDCO-005, Domain : map data description – geography Resources : 24/05/ ILIKS Hierachical structures

29 Application 383 Parallel ES Source Ontology 728 concepts 402 pages (183  Enumerative Structures) Module for annotating W IKIPEDI A Type of primerRate Incomplete27% Complete with marker54% Complete without marker18% 24/05/ ILIKS Hierachical structures

30 Application 383 PES Source Ontologie 728 concepts 402 pages (183  SE) Module for annotating W IKIPEDI A ~300 Hierarchical Structures Module for extracting hierarchical structures ~400 new concepts ~300 new instances 24/05/ ILIKS Hierachical structures

31 Conclusion Results:  PES are increasingly found in electronic documents  Additional tool based on the layout are possible  Translation process by successive annotations exists, this process depends on the type of the primer  It can dramatically improve an ontology Perspectives:  Combine structure to the texique and syntaxe for ontology learning  Tackle more complex grammatical constructions and spelling variations  Improve ontology enrichment with hierarchical structures 24/05/ ILIKS Hierachical structures


Download ppt "From Enumerative Structures in Texts towards Hierarchical Structures in Ontologies Nathalie Aussenac-Gilles Mouna KAMEL – Bernard ROTHENBURGER IC3 team."

Similar presentations


Ads by Google