Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy EVALITA 2007.

Similar presentations


Presentation on theme: "Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy EVALITA 2007."— Presentation transcript:

1 Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy EVALITA Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007

2 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Outline Chronos: a multilingual system for TE recognition/normalization System description Some examples Results at EVALITA 2007

3 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Chronos Multilingual (ITA/ENG) tool for TE recognition and normalization according to the TIMEX2 standard Approach –Rule-based system ENG-Chronos: 1500 rules ITA-Chronos: 981 rules –Six phases: Preprocessing, Detection, Braketing, Information Gathering, Anchors Selection, Normalization ENG-Chronos participated in TERN-04 with good results on the Recognition+Normalization Task –Ranked 2 nd, with 76% TERN-Value (best system: 78%)

4 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos: System Architecture Tokenization, POS Tagging, Multiwords Recognition Detection Basic Tagging Rules Bracketing Composition Rules Information Gathering Tagging Rules for: SET, Anchor_Dir, Anchor_Val, MOD Type, T_Cat, Heur, Op, Quant, Val_Ext Plain Text Intermediate Annotation Attributes Normalization Dates Normalization Anchors Selection Tagged Text Detection and BracketingNormalization

5 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP1: Preprocessing The first phase of the process performs: –Tokenization –POS tagging –Multiwords recognition The preprocessed input text is then passed to the TE detection phase, where around 400 tagging rules are in charge of finding all the TEs it contains.

6 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP2: Detection Markable expressions are detected considering the presence of lexical triggers in the input text –anno, oggi, Venerdì, Natale, quotidianamente, 10/09/2007, 1982, etc. Basic Tagging Rules –Regular expressions checking for: word senses, parts of speech, symbols, or words satisfying specific predicates PATTERNt1 t2 t3 t1[pos=E] t2[pos=N] t3[pred=TimeUnit-p] OUTPUT t1 t2 t3 Tagging rule matching with Fra tre giorni …E = preposition …N = numeral …TimeUnit-p satisfied by: secondo, minuto, ora, giorno, settimana, mese, etc.

7 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP3: Bracketing Considers the context surrounding the detected triggers –inizio, fine, prima, dopo, fa, successivo, precedente, durante, circa, almeno, 3, sesto, etc. Composition rules: –In charge of handling conflicts between possible multiple taggings (e.g. when a recognized TE contains, overlaps, or is adjacent to one or more detected TEs) PATTERNT-EXP1 T-EXP2 T-EXP1[start = n] [end = m] T-EXP2[start = no

8 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP4: Information gathering Goal: mine relevant information for normalization Considers triggers+context to assign values to –TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR) –TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op, Quant) This is done by running separate sets of specialized tagging rules Such information is stored in the Intermediate Annotation, and input to the normalization component

9 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE]

10 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] Detected TE

11 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN

12 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING

13 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL

14 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR

15 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR +

16 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR + 3

17 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo TIMEX2 attributes MOD: più di, circa, oltre … SET: ogni, tutti … ANCHOR_DIR: prima, durante, dopo... TEMPORARY attributes type: [T-ABS | T-REL] t-cat: [second, minute, hour, day,…] op: [=, +, -] quant: [n 0] heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR + 3 PR-DATE

18 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Intermediate Annotation: Example adige _id …Così il 31 Luglio del 2002, quindi oltre tre anni dopo lincidente, il giovane venne nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo… …quindi oltre tre anni dopo lincidente… Detection and Bracketing Intermediate Annotation Plain Text

19 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP5: Anchors Selection Goal: connect each detected T-REL to an appropriate anchor date –While the meaning of T-ABSs (13 Marzo 2005) is context- independent, T-RELs (tre anni dopo) can only be interpreted with respect to e reference TE The heur attribute is used for this purpose –2 heuristics: CR-DATE: connects a T-REL to the documents creation date (found at the beginning of the doc, or induced from docs name. e.g. adige _…) PR-DATE: connects a T-REL to the nearest detected TE with a compatible granularity (a t-cat with at least the same degree of specificity) t-cat= month month, week, day, century

20 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP6: Dates Normalization Goal: fill the VAL attribute of each detected TE T-ABSs: regular expressions considering their superficial form (1990s 199) T-RELs: rewriting rules considering the anchor (e.g. 2002) the operator (OP) to be applied (e.g. +) the quantity (QUANT) to be added/subtracted (e.g. 3) tre anni dopo

21 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos at EVALITA 2007 Results over the EVALITA-07 test set (2715 computation time, ~50 words/sec) Higher scores on MOD and SET attributes –Activated by the presence of triggers that are easy to identify Lower scores with ANCHOR_VAL and ANCHOR_DIR –Require the analysis of a larger context, e.g. including verb tense ValuePrecisionRecallF-Measure Rec Rec.+Norm

22 EVALITA /10/2007M. Negri Dealing with Italian Temporal Expressions: the ITA-Chronos System Web Demo


Download ppt "Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy EVALITA 2007."

Similar presentations


Ads by Google