Presentation is loading. Please wait.

Presentation is loading. Please wait.

FNERC (towards final version v.3) Edinburgh, March 2002.

Similar presentations


Presentation on theme: "FNERC (towards final version v.3) Edinburgh, March 2002."— Presentation transcript:

1 FNERC (towards final version v.3) Edinburgh, March 2002

2 >lingway█ Edinburgh meeting Table Of Contents >FNERC V.2 (recall) >FNERC V.3 >Improvements >Application to 2nd Domain >Machine learning

3 >lingway█ Edinburgh meeting FNERC V2 Overview

4 >lingway█ Edinburgh meeting FNERC V2 : Evaluation ZoningRE + Context

5 >lingway█ Edinburgh meeting FNERC : name matching /normalization >Name Matching consists in matching co-referential NE, NUMEX and TIMEX inside a same product description >FNERC : use of value “attribute” that we add during FNERC module >Example : if in the same product description, we annotate twice a PROCESSOR (say Intel PIII and Intel Pentium III), they will have the same value Id, and then when filling the NE – PROCESSOR slot, the module will just add one to the slot >As for Normalisation, the same value Id will be used to fill the slot with the first synonym of the Ontology >Run with a XSLT style-sheet against the XHTML input file

6 >lingway█ Edinburgh meeting FNERC V3 : improvements >Conclusion from the V2 >Zoning (1 st and 2 nd domain) >Adding Contextual Rules (1 st domain)?

7 >lingway█ Edinburgh meeting FNERC : 2 nd Domain >Ontology matters >Location: Country, Region, City >Employer organization : non-profit, Gvrt body, public, private >Background knowledge : education, language, skill >Job categories >Contract >Job Title >Department >Lexicons

8 >lingway█ Edinburgh meeting FNERC : 2 nd Domain >NERC Adaptation : >No sentence tokenization needed (no Entity at the sentence scale) >LgXmlsegmenter for zoning (enabling to declare empty tags) >Rules : Lists, Regular Expression, Context

9 >lingway█ Edinburgh meeting FNERC : 2 nd Domain >Location: not a necessary feature >Country : lists + patterns (Pays : France / Dans toute l’Europe) >Region : lists + patterns (Région parsienne) >City : lists + patterns (Ville : Boulogne / lieu de travail : Arcueil (94) >Miscellaneous : –(92) : area indicative –Situation géographique : Poste basé à Toulouse, déplacements occasionnels à l'étranger. > Employer organization : leaderEcrivez nous à > Generic Patterns for MAJMIN : Dans le cadre de son développement, Cybion recherche … / Cybion, leader français de la veille et de l'intelligence / Ecrivez nous à : SOCIETEL, 13 rue des forêts / Illicom recrute ! > Specific Patterns : Organisation des nations Unies, Compagnie Française du pétrole > Other : –« grand groupe bancaire » –Nous recherchons pour une importante société des Réseaux et des Telecom basée en IDF

10 >lingway█ Edinburgh meeting FNERC : 2 nd Domain >Background knowledge : >education : lists and patterns (Formation: bac + 4/5 / Formation BTS/DUT / Formation: Economie/Gestion, Sciences, Documentation ) >language : lists and patterns (langues requises: / bilingue anglais-japonais) >skill : lists and patterns (connaissances techniques: / Maîtrise de Word, Excel et Internet nécessaire / Ingénieur réseaux confirmé (Novell, MC2, MCP) >Job categories : >mapping with Job Title ? >Contract : Lists >Job Title : Lists + Layout >Straightforward : “Titre : Administrateurs Systèmes & Réseaux” >Specific size and font layout >Redundancy of structure : B1_illicom_1.html >Department : Patterns

11 >lingway█ Edinburgh meeting NERC V3 : adaptation to a new domain >Adaptation : >machine learning techniques >human customization of rules

12 >lingway█ Edinburgh meeting Machine learning and NERC V3 >Goal : helping the writing of rules related to a new domain >Approach : >3 spaces (left, entity, right) >Positive and negative >Rule induction (iteration) >References: >Markus Junker, Michael Sintek, and Matthias Rinck: Learning for Text Categorization and Information Extraction with ILP >Dayne Freitag: Toward General-Purpose learning for Information Extraction

13 >lingway█ Edinburgh meeting Example 1

14 >lingway█ Edinburgh meeting Example 1 (representation)

15 >lingway█ Edinburgh meeting Types of rules (left) >Word in position 3 >Bi-gramme in position 2,3 >Trigramme >Word (position 1, 2 or 3) >Bi-gramme in position (1,2) >Idem properties >Comb. Word+properties

16 >lingway█ Edinburgh meeting Example 1st iteration Rule (left) = "formation" in position 3 Rest: Next rule (left) = "Niveau" etc.

17 >lingway█ Edinburgh meeting Result = input to the expert >A set of (evaluated) rules >A first (evaluated) system >A set of cases non covered by the rules

18 >lingway█ Edinburgh meeting FNERC V3 Schedule >First results: end of March >Final version and Evaluation: mid-April >Final report for D2.4: end April


Download ppt "FNERC (towards final version v.3) Edinburgh, March 2002."

Similar presentations


Ads by Google