Presentation on theme: "JavaNLP time annotations"— Presentation transcript:
1JavaNLP time annotations SUTimeJavaNLP time annotations
2What does SUTime do? Similar to GUTime Recognizes time expressions using patternsDeterministic, based on regular expression patternsGreedy (picks longest sequence of tokens that may represent a time expression)Normalizes time expressionsAnnotations follow TimeML TIMEX3 standardXSD:Extensions for time expressions that are not supported by TIMEX3 standardResolves relative times with respect to reference date
3SUTime Time Representation Main Temporal typesTime – A instance in time ( ), can be partially specified (Friday), with limited granularityDuration - A length of time (3 days)Range – Time interval with start and end pointsSet – A set of temporalsPeriodic sets: Every Friday
4SUTime Representation Standard date and times (in years, months, days, day of week, hours, minutes, seconds, milliseconds)Common times: Seasons (e.g. winter), Time of day (e.g. morning), WeekendPartial Times (June => XXXX-06)Relative Time (last week)DurationExact durations (specified in milliseconds or in fields)Inexact durations (a few years => PXY)Duration ranges (2 to 3 months => P2M/P3M)
5SUTime Limitations Holidays are not supported Support for ranges is poorfrom 3 to 4 p.m is identified as 15:57:0012-13 March 2011 (12-13 is ignored)Resolving relative expressions with respect to the given reference date can be problematicHandling of ambiguous phrases is poorSome common words (e.g. spring/fall) are always identified as a temporal expressionPatterns are language (English) specific…
6SUTime Usage TimeAnnotator Pipeline TimeAnnotator timeAnnotator = new TimeAnnotator(“sutime”, properties);Properties:Specifies SUTime options (prefixed by “sutime.”)PipelineTimeAnnotator should come after the tokenizer, sentence splitter, and pos taggerOptional (also before): NER or NumberAnnotator/QuantifiableEntityNormalizingAnnota tor
7SUTime Options Property Description sutime. markTimeRanges Whether time ranges should be marked (e.g. if markTimeRanges is true, July to August => range). Default = false.includeNestedWhether nested time expressions should be included (e.g. if markTimeRanges is true, July to August => range, if includeNested is true, both July and August will also be marked as time expressions). Default = false.teRelHeurLevelHeuristics for determining how to resolve relative timeNONE = no heuristics (default)(refdate = , Friday => )BASIC = basic heuristics taking into account past tense(refdate = , It happened Friday => )MORE = more heuristics with since/untilincludeRangeWhether range attributes should be included in the TIMEX3 XML output. Default = false.
8SUTime input annotations DocDateAnnotation (String)If present, then the string is interpreted as a date/time and used as the reference document date with respect to which other temporal expressions are resolvedSentencesAnnotation (List<CoreMap>)If present, time expressions will be extracted from each sentence and each sentence will be annotated individually.TokensAnnotations (List<CoreLabel>)Required either at the entire annotation level or per sentence level.
9SUTime output annotations Timex.Annotations (List<CoreMap>)List of time expressions (each a CoreMap)On the entire annotation and also for each sentenceTime annotations (for each time expression/CoreMap)AnnotationDescriptionTimex.AnnotationTimex object with TIMEX3 XML attributes. Use for exporting TIMEX3 information.TimeExpression.AnnotationTimeExpression object. Use getTemporal() to get internal temporal representation.TimeExpression.ChildrenAnnotation(List<CoreMap>)List of chunks forming this time expression (inner chunks can be tokens, nested time expressions, numeric expressions, etc)
10SUTime output annotations Standard annotations (for each time expression)AnnotationDescriptionTextAnnotation (String)Text of this time expression.TokensAnnotation (List<CoreLabel>)Tokens that make up this time expression.CharacterOffsetBeginAnnotation(Integer)The index of the first character of this time expression.CharacterOffsetEndAnnotationThe index of the first character after this time expression.TokenBeginAnnotation (Integer)The index of the first token of this time expression.TokenEndAnnotation (Integer)The index of the first token after this time expression.Note: Indices are 0-based, and always relative to the original annotation.Begin indices are inclusive, end indices are exclusive.
11Comparison with GUTime SUTimeGUTimeLanguageJavaPerlTimexTIMEX3 with extensionsTIMEX3 tag, but follows ACE TIMEX2 mostly (extension of TempEx)DemoCommentsNo support for holidays. Limited support for ranges, ambiguous phrases.Some support for holidays. No support for ranges, poor support for years that are written out.TempEval2 (English Test)Time Expression Identification:P=0.89, R=0.94, F1=0.91Attribute Accurate:Type=0.94, Value=0.72P=0.89, R=0.79, F1=0.84Type=0.95, Value=0.68
12SUTime and GUTime examples TypeSUTimeGUTimeDate<TIMEX3 tid="t1" value=" " type="DATE">October of 1963</TIMEX3><TIMEX3 tid="t1" TYPE="DATE" VAL="196310">October of 1963</TIMEX3>Duration<TIMEX3 tid="t1" TYPE="DURATION" VAL="P56Y">fifty six years</TIMEX3>Set<TIMEX3 tid="t1" value="XXXX-WXX-7" type="SET" quant="every third" periodicity="P3W">Every third Sunday</TIMEX3><TIMEX3 tid="t1" TYPE="DATE" SET="YES" VAL="XXXXWXX-0" PERIODICITY="F3W" GRANULARITY="G1D">Every third Sunday</TIMEX3>
13Examples (GUTime unsupported) TypeSUTimeGUTimeTime<TIMEX3 tid="t1" value=" T17:05:00" type="TIME">5:05 in the afternoon</TIMEX3>5:05 in the afternoonDate - Written out year<TIMEX3 tid="t1" value="1994-WI" type="DATE">winter of nineteen ninety four</TIMEX3><TIMEX3 tid="t1" TYPE="DATE">winter</TIMEX3> of nineteen ninety fourDuration Range<TIMEX3 tid="t1" alt_value="P2M/P3M" type="DURATION">two to three months</TIMEX3>two to three monthsReference Date is
14Examples (SUTime unsupported) TypeSUTimeGUTimeHolidaylast Christmas<TIMEX3 tid="t1" TYPE="DATE" ALT_VAL=" ">last Christmas</TIMEX3>Ambiguous wordsThe <TIMEX3 tid="t1" value="2011-SP" type="DATE">spring</TIMEX3> water was cool and refreshingThe <TIMEX3 tid="t1" TYPE="DATE">spring</TIMEX3> water was cool and refreshing.Reference Date is