Presentation on theme: "JavaNLP time annotations SUTime. What does SUTime do? Similar to GUTime Recognizes time expressions using patterns Deterministic, based on regular expression."— Presentation transcript:
JavaNLP time annotations SUTime
What does SUTime do? Similar to GUTime Recognizes time expressions using patterns Deterministic, based on regular expression patterns Greedy (picks longest sequence of tokens that may represent a time expression) Normalizes time expressions Annotations follow TimeML TIMEX3 standard XSD: Extensions for time expressions that are not supported by TIMEX3 standard Resolves relative times with respect to reference date
SUTime Time Representation Main Temporal types Time – A instance in time ( ), can be partially specified (Friday), with limited granularity Duration - A length of time (3 days) Range – Time interval with start and end points Set – A set of temporals Periodic sets: Every Friday
SUTime Representation Time Standard date and times (in years, months, days, day of week, hours, minutes, seconds, milliseconds) Common times: Seasons (e.g. winter), Time of day (e.g. morning), Weekend Partial Times (June => XXXX-06) Relative Time (last week) Duration Exact durations (specified in milliseconds or in fields) Inexact durations (a few years => PXY) Duration ranges (2 to 3 months => P2M/P3M)
SUTime Limitations Holidays are not supported Support for ranges is poor from 3 to 4 p.m is identified as 15:57: March 2011 (12-13 is ignored) Resolving relative expressions with respect to the given reference date can be problematic Handling of ambiguous phrases is poor Some common words (e.g. spring/fall) are always identified as a temporal expression Patterns are language (English) specific …
SUTime Usage TimeAnnotator TimeAnnotator timeAnnotator = new TimeAnnotator(sutime, properties); Properties: Specifies SUTime options (prefixed by sutime.) Pipeline TimeAnnotator should come after the tokenizer, sentence splitter, and pos tagger Optional (also before): NER or NumberAnnotator/QuantifiableEntityNormalizingAnnota tor
SUTime Options PropertyDescription sutime. markTimeRanges Whether time ranges should be marked (e.g. if markTimeRanges is true, July to August => range). Default = false. sutime. includeNested Whether nested time expressions should be included (e.g. if markTimeRanges is true, July to August => range, if includeNested is true, both July and August will also be marked as time expressions). Default = false. sutime. teRelHeurLevel Heuristics for determining how to resolve relative time NONE = no heuristics (default) (refdate = , Friday => ) BASIC = basic heuristics taking into account past tense (refdate = , It happened Friday => ) MORE = more heuristics with since/until sutime. includeRange Whether range attributes should be included in the TIMEX3 XML output. Default = false.
SUTime input annotations DocDateAnnotation (String) If present, then the string is interpreted as a date/time and used as the reference document date with respect to which other temporal expressions are resolved SentencesAnnotation (List ) If present, time expressions will be extracted from each sentence and each sentence will be annotated individually. TokensAnnotations (List ) Required either at the entire annotation level or per sentence level.
SUTime output annotations Timex.Annotations (List ) List of time expressions (each a CoreMap) On the entire annotation and also for each sentence Time annotations (for each time expression/CoreMap) AnnotationDescription Timex.AnnotationTimex object with TIMEX3 XML attributes. Use for exporting TIMEX3 information. TimeExpression.AnnotationTimeExpression object. Use getTemporal() to get internal temporal representation. TimeExpression.ChildrenAnnotation (List ) List of chunks forming this time expression (inner chunks can be tokens, nested time expressions, numeric expressions, etc)
SUTime output annotations Standard annotations (for each time expression) AnnotationDescription TextAnnotation (String)Text of this time expression. TokensAnnotation (List ) Tokens that make up this time expression. CharacterOffsetBeginAnnotation (Integer) The index of the first character of this time expression. CharacterOffsetEndAnnotation (Integer) The index of the first character after this time expression. TokenBeginAnnotation (Integer)The index of the first token of this time expression. TokenEndAnnotation (Integer)The index of the first token after this time expression. Note: Indices are 0-based, and always relative to the original annotation. Begin indices are inclusive, end indices are exclusive.
Comparison with GUTime SUTimeGUTime LanguageJavaPerl TimexTIMEX3 with extensionsTIMEX3 tag, but follows ACE TIMEX2 mostly (extension of TempEx) Demohttp://nlp.stanford.edu:8080/sutimehttp://nlp.stanford.edu:8080/gutime CommentsNo support for holidays. Limited support for ranges, ambiguous phrases. Some support for holidays. No support for ranges, poor support for years that are written out. TempEval2 (English Test) Time Expression Identification: P=0.89, R=0.94, F1=0.91 Attribute Accurate: Type=0.94, Value=0.72 Time Expression Identification: P=0.89, R=0.79, F1=0.84 Attribute Accurate: Type=0.95, Value=0.68
SUTime and GUTime examples TypeSUTimeGUTime Date October of 1963 Duration fifty six years Set Every third Sunday
Examples (GUTime unsupported) TypeSUTimeGUTime Time 5:05 in the afternoon Date - Written out year winter of nineteen ninety four Duration Range two to three months Reference Date is
Examples (SUTime unsupported) TypeSUTimeGUTime Holidaylast Christmas Ambiguous words The spring water was cool and refreshing The spring water was cool and refreshing. Reference Date is