Presentation is loading. Please wait.

Presentation is loading. Please wait.

JavaNLP time annotations

Similar presentations


Presentation on theme: "JavaNLP time annotations"— Presentation transcript:

1 JavaNLP time annotations
SUTime JavaNLP time annotations

2 What does SUTime do? Similar to GUTime
Recognizes time expressions using patterns Deterministic, based on regular expression patterns Greedy (picks longest sequence of tokens that may represent a time expression) Normalizes time expressions Annotations follow TimeML TIMEX3 standard XSD: Extensions for time expressions that are not supported by TIMEX3 standard Resolves relative times with respect to reference date

3 SUTime Time Representation
Main Temporal types Time – A instance in time ( ), can be partially specified (Friday), with limited granularity Duration - A length of time (3 days) Range – Time interval with start and end points Set – A set of temporals Periodic sets: Every Friday

4 SUTime Representation
Standard date and times (in years, months, days, day of week, hours, minutes, seconds, milliseconds) Common times: Seasons (e.g. winter), Time of day (e.g. morning), Weekend Partial Times (June => XXXX-06) Relative Time (last week) Duration Exact durations (specified in milliseconds or in fields) Inexact durations (a few years => PXY) Duration ranges (2 to 3 months => P2M/P3M)

5 SUTime Limitations Holidays are not supported
Support for ranges is poor from 3 to 4 p.m is identified as 15:57:00 12-13 March 2011 (12-13 is ignored) Resolving relative expressions with respect to the given reference date can be problematic Handling of ambiguous phrases is poor Some common words (e.g. spring/fall) are always identified as a temporal expression Patterns are language (English) specific

6 SUTime Usage TimeAnnotator Pipeline
TimeAnnotator timeAnnotator = new TimeAnnotator(“sutime”, properties); Properties: Specifies SUTime options (prefixed by “sutime.”) Pipeline TimeAnnotator should come after the tokenizer, sentence splitter, and pos tagger Optional (also before): NER or NumberAnnotator/QuantifiableEntityNormalizingAnnota tor

7 SUTime Options Property Description sutime. markTimeRanges
Whether time ranges should be marked (e.g. if markTimeRanges is true, July to August => range). Default = false. includeNested Whether nested time expressions should be included (e.g. if markTimeRanges is true, July to August => range, if includeNested is true, both July and August will also be marked as time expressions). Default = false. teRelHeurLevel Heuristics for determining how to resolve relative time NONE = no heuristics (default) (refdate = , Friday => ) BASIC = basic heuristics taking into account past tense (refdate = , It happened Friday => ) MORE = more heuristics with since/until includeRange Whether range attributes should be included in the TIMEX3 XML output. Default = false.

8 SUTime input annotations
DocDateAnnotation (String) If present, then the string is interpreted as a date/time and used as the reference document date with respect to which other temporal expressions are resolved SentencesAnnotation (List<CoreMap>) If present, time expressions will be extracted from each sentence and each sentence will be annotated individually. TokensAnnotations (List<CoreLabel>) Required either at the entire annotation level or per sentence level.

9 SUTime output annotations
Timex.Annotations (List<CoreMap>) List of time expressions (each a CoreMap) On the entire annotation and also for each sentence Time annotations (for each time expression/CoreMap) Annotation Description Timex.Annotation Timex object with TIMEX3 XML attributes. Use for exporting TIMEX3 information. TimeExpression.Annotation TimeExpression object. Use getTemporal() to get internal temporal representation. TimeExpression.ChildrenAnnotation (List<CoreMap>) List of chunks forming this time expression (inner chunks can be tokens, nested time expressions, numeric expressions, etc)

10 SUTime output annotations
Standard annotations (for each time expression) Annotation Description TextAnnotation (String) Text of this time expression. TokensAnnotation (List<CoreLabel>) Tokens that make up this time expression. CharacterOffsetBeginAnnotation (Integer) The index of the first character of this time expression. CharacterOffsetEndAnnotation The index of the first character after this time expression. TokenBeginAnnotation (Integer) The index of the first token of this time expression. TokenEndAnnotation (Integer) The index of the first token after this time expression. Note: Indices are 0-based, and always relative to the original annotation. Begin indices are inclusive, end indices are exclusive.

11 Comparison with GUTime
SUTime GUTime Language Java Perl Timex TIMEX3 with extensions TIMEX3 tag, but follows ACE TIMEX2 mostly (extension of TempEx) Demo Comments No support for holidays. Limited support for ranges, ambiguous phrases. Some support for holidays. No support for ranges, poor support for years that are written out. TempEval2 (English Test) Time Expression Identification: P=0.89, R=0.94, F1=0.91 Attribute Accurate: Type=0.94, Value=0.72 P=0.89, R=0.79, F1=0.84 Type=0.95, Value=0.68

12 SUTime and GUTime examples
Type SUTime GUTime Date <TIMEX3 tid="t1" value=" " type="DATE">October of 1963</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE" VAL="196310">October of 1963</TIMEX3> Duration <TIMEX3 tid="t1" TYPE="DURATION" VAL="P56Y">fifty six years</TIMEX3> Set <TIMEX3 tid="t1" value="XXXX-WXX-7" type="SET" quant="every third" periodicity="P3W">Every third Sunday</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE" SET="YES" VAL="XXXXWXX-0" PERIODICITY="F3W" GRANULARITY="G1D">Every third Sunday</TIMEX3>

13 Examples (GUTime unsupported)
Type SUTime GUTime Time <TIMEX3 tid="t1" value=" T17:05:00" type="TIME">5:05 in the afternoon</TIMEX3> 5:05 in the afternoon Date - Written out year <TIMEX3 tid="t1" value="1994-WI" type="DATE">winter of nineteen ninety four</TIMEX3> <TIMEX3 tid="t1" TYPE="DATE">winter</TIMEX3> of nineteen ninety four Duration Range <TIMEX3 tid="t1" alt_value="P2M/P3M" type="DURATION">two to three months</TIMEX3> two to three months Reference Date is

14 Examples (SUTime unsupported)
Type SUTime GUTime Holiday last Christmas <TIMEX3 tid="t1" TYPE="DATE" ALT_VAL=" ">last Christmas</TIMEX3> Ambiguous words The <TIMEX3 tid="t1" value="2011-SP" type="DATE">spring</TIMEX3> water was cool and refreshing The <TIMEX3 tid="t1" TYPE="DATE">spring</TIMEX3> water was cool and refreshing. Reference Date is


Download ppt "JavaNLP time annotations"

Similar presentations


Ads by Google