Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh.

Similar presentations


Presentation on theme: "Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh."— Presentation transcript:

1 Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh

2 Standoff Annotation Don’t keep all your data in one big document One document for each annotation level (with its own DTD) Links between documents

3 LTG link syntax (1) an element can point to one or more contiguous elements in the same or a different document each element is identified by a unique ID a link is shown as an attribute on an element default attributes in the DTD tell a program that this is a link

4 LTG link syntax (2) attributes to describe a link which will be embedded in the original element output document hrefCDATA#IMPLIED xml:linkCDATA#FIXED"simple“ showCDATA#FIXED"embed“ actuateCDATA#FIXED"auto"

5 Standoff Example (1): Words XML turn right for three centimetres okay

6 Standoff Example (2): Moves XML <move type=“instruct” speaker=“spk1” id=“m1” href=“words.xml#id(w1)..id(w5)”/> <move type=“align” speaker=“spk1” id=“m2” href=“words.xml#id(w6)”/> …

7 Standoff Example (3): Moves and Words XML turn right for three centimetres okay <move type=“align” speaker=“spk1” id=“m2” href=“words.xml#id(w6)”/> …

8 Advantages of Standoff Annotation It is possible to have levels of annotation which have crossing branches (not normally possible in XML) New levels of annotation can be added without disturbing existing ones Editing one level of annotation has minimal knock-on effects on others People can work on different levels at the same time without worrying about creating different versions

9 Example Map Task Annotation Structure threecentimetresokaythreeorfourcentimetresokay right M instruct M ackM instructM ackM align S1 S2 turnrightfor reparandumrepair Game instruct Disfluency Dialogue Moves Dialogue Games Disfluencies Words

10 HCRC Map Task XML Corpus Architecture Gaze Timed Units Tokens Tagged Words Automatic Syntax Moves Games Transactions Disfluencies Landmark References Other Speaker’s Words

11 Tools and Software LTXML tools www.ltg.ed.ac.uk/software MATE workbench (NITE) mate.nis.sdu.dk(nite.nis.sdu.dk) Map Task XML www.hcrc.ed.ac.uk/maptask

12 knit Part of the LTXML toolkit Allows you to “expand” links according to how they have been defined in the DTD (e.g. replace or embed) Command line program, can be used in pipelines

13 Standoff Example (3): Moves and Words XML turn right for three centimetres okay <move type=“align” speaker=“spk1” id=“m2” href=“words.xml#id(w6)”/> …

14 Standoff Example (4) Moves XML with embed links turn right for three centimetres okay …

15 Standoff Example (4) Moves XML with replace links turn right for three centimetres okay …

16 Working with knit Use knit on one XML document to work with one hierarchical view of the data To work across hierarchies, knit several views and navigate using the structures plus the unique ids of elements

17 Stylesheets style sheet: template rules –pattern which specifies which tree it applies to –pattern which specifies which tree it should output stylesheet processor –reads XML document and stylesheet –carries out the instructions in the stylesheet –outputs a new XML document or

18 Template Matching XPath is a language for addressing parts of an XML document, and is used by XSLT in the match attribute of a template e.g. matches any sentence element. A stylesheet processor goes through the XML document matching elements to templates and carries out the instructions in the template.

19 Standard Stylesheet Example

20 The MATE Workbench For display, querying, and especially annotation of XML corpora Flexible user-defined user interfaces Uses stylesheets to create Java display objects which have defined user interface behaviours In MATE internal data representation, elements with link pointers are viewed as parent elements

21 MATE query language Easy to write queries over more than one hierarchy In MATE query language you define variables by element type and then relationships between them ($a ^ $b) means that element $a is a parent of element $b, either in the same document, or via a link.

22 MATE example query Find all words which are in a move whose label is “instruct” and which are part of a disfluency ($w word)($m move)($d disfluency); ($m ^ $w) and ($m label ~ instruct) and ($d ^ $w)

23 Conclusions Standoff markuup is not just theoretically a good idea Map Task standoff annotations in place for 5 years, used regularly Accessible to linguists with modest technical backgrounds


Download ppt "Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh."

Similar presentations


Ads by Google