Presentation is loading. Please wait.

Presentation is loading. Please wait.

What’s “NEXT”? Navigating through Dense Annotation Spaces Branimir K. Boguraev Mary S. Neff Language Engineering for Content Analysis IBM T.J. Watson Research.

Similar presentations


Presentation on theme: "What’s “NEXT”? Navigating through Dense Annotation Spaces Branimir K. Boguraev Mary S. Neff Language Engineering for Content Analysis IBM T.J. Watson Research."— Presentation transcript:

1 What’s “NEXT”? Navigating through Dense Annotation Spaces Branimir K. Boguraev Mary S. Neff Language Engineering for Content Analysis IBM T.J. Watson Research Center Yorktown Heights, NY

2 Outline Dense annotation spaces Dense annotation spaces Navigational challenges Navigational challenges Elements of the annotation-matching Formalism Elements of the annotation-matching Formalism Support for navigational control Support for navigational control Conclusion Conclusion Future work Future work

3 Dense Annotation Spaces Service Reps can read customer name, in order to contact the customer. {np}{np}{nps}{nps}{md}{md}{vb}{vb}{nn}{nn}{nn}{nn}{in}{in}{nn}{nn}{to}{to}{vb}{vb}{dt}{dt}{nn}{nn} [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ] [SC][SC] [SENT][SENT] {np}{np}{nps}{nps}{md}{md}{vb}{vb}{nn}{nn}{nn}{nn}{in}{in}{nn}{nn}{to}{to}{vb}{vb}{dt}{dt}{nn}{nn} [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ] [SC][SC] [SENT][SENT]

4 Annotation ‘trees’ Service Reps can read customer name, in order to contact the customer. {np}{np}{nps}{nps}{md}{md}{vb}{vb}{nn}{nn}{nn}{nn}{in}{in}{nn}{nn}{to}{to}{vb}{vb}{dt}{dt}{nn}{nn} [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ] [SC][SC] [SENT][SENT]

5 Annotation lattice Service Reps can read customer name, in order to contact the customer. {np}{np}{nps}{nps}{md}{md}{vb}{vb}{nn}{nn}{nn}{nn}{in}{in}{nn}{nn}{to}{to}{vb}{vb}{dt}{dt}{nn}{nn} [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ] [SC][SC] [SENT][SENT]

6 Navigational Challenges [PNAME ] [Title][Name ] [First] [Middle] [Last] [First] [Middle] [Last] What is visible to the lattice traversal engine?

7 Annotation-Based Finite State Transducer (AFst) UIMA-based UIMA-based A finite state calculus over typed feature structures A finite state calculus over typed feature structures  Cf. “grep” over a sequence of annotations, specified as types and features np = /[NP. Token[pos=~”DT”] |. Token[pos=~”JJ”]*. ( Token[pos=~”NN”] | Token[pos=~”NNS”] ). ( Token[pos=~”NN”] | Token[pos=~”NNS”] ). /]NP ; /]NP ;

8 Pitching the Iterator: support for navigational control Service Reps can read customer name, in order to contact the customer. {np}{np}{nps}{nps}{md}{md}{vb}{vb}{nn}{nn}{nn}{nn}{in}{in}{nn}{nn}{to}{to}{vb}{vb}{dt}{dt}{nn}{nn} [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ] [SC][SC] [SENT][SENT]

9 Defining a particular path through the annotation space requires a lattice traversal engine that can focus on—simultaneously— Defining a particular path through the annotation space requires a lattice traversal engine that can focus on—simultaneously— o Sequential constraints ~ pattern matching  Horizontal—prenominal mod and nominal head o Structural constraints  Vertical—iterate over NP with specific configurational relationship – e.g. not sentence initial, not in a PP o Configurational constraints  Type prioritization Afst Traversal Regime

10 Linearizing the Lattice: what’s “next”?  Unambiguous Typeset iterator, inferred from grammar: … [SUB]. [VG]. [OBJ]. [PP] …  UIMA natural annotation sort order: o Start position ascending o Length descending o Type priority, defined in UIMA descriptors [NP][NP][NP][NP][NP][NP][NP][NP][VG][VG][VG][VG] [PP][PP] [SUB][SUB][OBJ][OBJ][OBJ][OBJ]

11 Linearizing the Lattice: what’s “next”? Grammar-wide declarations boundary % Sentence[]; boundary % Sentence[]; honour % Address[] ; month = Token[lemma=~”January”] | Token[lemma=~”February”]| Token[lemma=~”February”]| … ; … ; date = /[Year. :month |. :month |. Token[string=~:^[12]\d[{3}$:] Token[string=~:^[12]\d[{3}$:] /]Year; /]Year;

12 Focus: Selecting Nested Boundary Annotations <nameValuePair> Focus Focus Section[label~=:Education:] Section[label~=:Education:]</string> Sentence[number==1] Sentence[number==1]</string></array></value></nameValuePair>

13 Linearizing the Lattice: what’s “next”? Grammar-wide declarations match % first, last, longesr, shortest, all advance % skip, step

14 What’s “next”?: Switching Levels, Mixed Iterator Refocus the iterator to examine inner contour: @descend, @ascend findDrSmith = /PName[@descend]. /PName[@descend]. Title[string=~”Dr.”. /Name[@descend]. /Name[@descend]. First[]|. Last[string==“Smith”]. /Name[@ascend]. /Name[@ascend]. /PName[@ascend] ; /PName[@ascend] ;

15 Alternate Multiple Level Access Upper/lower context without switching levels Token[_costarts=~Sentence[number==1];Subject[_covers=~PName[];PName[_costarts=~NP[],_coends=~NP[]];

16 Grammar cascading From simpler to more complex analyses From simpler to more complex analyses Lower levels of output feed as inputs into higher levels Lower levels of output feed as inputs into higher levels  Small noun phrases & verb groups  Prepositional, possessive & adjectival phrases  More complex noun phrases  Variety of clause types  Grammatical relations (subject, object)

17 Implementations Shallow Parsing Shallow Parsing Named Entity Detection interleaved with shallow parsing Named Entity Detection interleaved with shallow parsing Terminology identification in new domains Terminology identification in new domains Temporal expression parsing Temporal expression parsing Privacy policy rules Privacy policy rules Information extraction from resumes Information extraction from resumes Information extraction from contact center telephone calls Information extraction from contact center telephone calls

18 Future work list Alternate (semi-ambiguous) iterator, useful for “disambiguator” grammars Alternate (semi-ambiguous) iterator, useful for “disambiguator” grammars  Actor[] Director[] Tree-walk iterator for tree representations where children are explicitly referenced in features Tree-walk iterator for tree representations where children are explicitly referenced in features

19 Performance Notes Performance is a function of How grammar is written How grammar is written Optimisation of fst graph (grammar compiler) Optimisation of fst graph (grammar compiler) Optimisation of symbol compiler Optimisation of symbol compiler Optimisation of executor Optimisation of executor However … for the benefit of the curious … IBM Software Group (Dublin) optimised the last two, and …

20 IBM LanguageWare (Dublin) text analysis performance results The analysis: - AFST rules and FST dictionary - 26 rules, 7 dictionaries (things like first names, indicators like Corp. etc) - creating Person and Company annotations - creating Person and Company annotations The Test - test set: Enron - 924 files - (4.5Mb) The Results: Precision for Company Annotations only: 0.81 Recall for Company Annotations only: 0.67 Precision for Person Annotations only: 0.93 Recall for Person Annotations only: 0.91 Processing time: 3.4 seconds These numbers are 10 times faster than the best of breed internal reference annotators.

21 Perpetrators … er… Responsible parties Bran Boguraev Bran Boguraev Mary Neff Mary Neff Bran Lambov Bran Lambov D.J. McCloskey D.J. McCloskey Thilo Goetz Thilo Goetz Thomas Hampp Thomas Hampp Oliver Suhre Oliver Suhre Roy Byrd Roy Byrd Herb Chong Herb Chong Albert Eskenazi Albert Eskenazi Paul Kaye Paul Kaye Son Bao Pham Son Bao Pham Lokesh Shresta Lokesh Shresta Max Silberztein Max Silberztein

22 For more on AFst and tools -- Tomorrow, 12:25 in Fez 1: A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Environment Youssef Drissi, Branimir Boguraev, David Ferrucci, Paul Keyser, and Anthony Levas


Download ppt "What’s “NEXT”? Navigating through Dense Annotation Spaces Branimir K. Boguraev Mary S. Neff Language Engineering for Content Analysis IBM T.J. Watson Research."

Similar presentations


Ads by Google