Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Humanities – Stanford University – 2011-06-201 Expressive power of markup languages and graph structures Yves Marcoux Université de Montréal, Canada.

Similar presentations


Presentation on theme: "Digital Humanities – Stanford University – 2011-06-201 Expressive power of markup languages and graph structures Yves Marcoux Université de Montréal, Canada."— Presentation transcript:

1 Digital Humanities – Stanford University – 2011-06-201 Expressive power of markup languages and graph structures Yves Marcoux Université de Montréal, Canada Michael Sperberg-McQueen Black Mesa Technologies Claus Huitfeldt University of Bergen, Norway

2 Digital Humanities – Stanford University – 2011-06-202 Overview of the talk 1.Problem setting –Graph representations of XML documents –Need for more complex structures –Overlap-only-TexMECS 2.Main result and consequence 3.Future work

3 Digital Humanities – Stanford University – 2011-06-203 1. Problem setting

4 Digital Humanities – Stanford University – 2011-06-204 Graph representations of structured documents

5 Digital Humanities – Stanford University – 2011-06-205 XML document = tree top b c  a Embedding in markup  Child-parent in tree

6 Digital Humanities – Stanford University – 2011-06-206 Any tree  an XML document top b c  a

7 Digital Humanities – Stanford University – 2011-06-207 Any tree  an XML document top b c  a e d Perfect correspondence !

8 Digital Humanities – Stanford University – 2011-06-208 Document Object Models DOMs are essentially graph representations of structured documents "Patched" for attributes, namespaces, etc. DOM manipulations = graph modifications It suffices to make sure that the graph remains a tree

9 Digital Humanities – Stanford University – 2011-06-209 Need for more complex structures

10 Digital Humanities – Stanford University – 2011-06-2010 Overlap et al. In real life (outside of XML documents), information is often not purely hierarchical Classical examples: –verse structure vs sentence structure –speech structure vs line structure –reordering, discontinuity, etc. In general: multiple structures applied (at least in part) to same contents

11 Digital Humanities – Stanford University – 2011-06-2011 Example 1 (Peer) Hvorfor bande? (Åse) Tvi, du tør ej! ¶ Alt ihob er tøv og tant! ¶ vers peer åse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej!

12 Digital Humanities – Stanford University – 2011-06-2012 Example 2 (last verse spoken in chorus by Peer & Åse) vers peer åse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej!

13 Digital Humanities – Stanford University – 2011-06-2013 Example 3 (last verse spoken in chorus by Peer & Åse) vers peeråse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej!

14 Example 4 Digital Humanities – Stanford University – 2011-06-2014

15 Digital Humanities – Stanford University – 2011-06-2015 OO-TexMECS

16 Digital Humanities – Stanford University – 2011-06-2016 TexMECS A particular proposal to address the overlap problem with overlapping markup+ MECS (Huitfeldt 1992-1996) –Multi-element code system TexMECS (Huitfeldt & SMcQ 2003) –"Trivially extended MECS" Markup Languages for Complex Documents (MLCD) project

17 Digital Humanities – Stanford University – 2011-06-2017 Overlap-only TexMECS TexMECS allows overlapping markup... but also much more: –virtual elements, interrupted elements, etc. OO-TexMECS 101 –Start-tags: –Overlapping elements allowed –Natural notion of well-formedness

18 Digital Humanities – Stanford University – 2011-06-2018 OO-TexMECS example (Peer) Hvorfor bande? (Åse) Tvi, du tør ej! ¶ Alt ihob er tøv og tant! ¶ |doc>

19 Earlier result In 2008 [M2008], we identified a particular class of graphs that we showed to correspond exactly to OO-TexMECS –Those graphs are essentially+ the « restricted GODDAGs » (r-GODDAGs) of [SH2004] –All trees are r-GODDAGs –Some non-trees are r-GODDAGs too –So: OO-TexMECS more expressive than XML Digital Humanities – Stanford University – 2011-06-2019

20 Digital Humanities – Stanford University – 2011-06-2020 Example 1 r-GODDAG ? vers peer åse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej! √

21 Digital Humanities – Stanford University – 2011-06-2021 Example 2 r-GODDAG ? vers peer åse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej!

22 Digital Humanities – Stanford University – 2011-06-2022 Example 3 r-GODDAG ? vers peeråse Alt ihob er tøv og tant!Hvorfor bande?Tvi, du tør ej!

23 Example 4 r-GODDAG ? Digital Humanities – Stanford University – 2011-06-2023

24 However… That kind of result depends on the class of « possible » graphs Proof used « noDAGs » (node-ordered) –Already fairly restricted class (though not as much as r-GODDAGs) Would we get the same result with a larger universe of discourse… –Arbitrary graphs ? Digital Humanities – Stanford University – 2011-06-2024

25 Example Digital Humanities – Stanford University – 2011-06-2025 A B ba "A""B" ba "A""B" ""

26 2. Main result and consequence Digital Humanities – Stanford University – 2011-06-2026

27 The result (1/4) Essentially: r-GODDAGs are really the only graphs you can express with OO- TexMECS Digital Humanities – Stanford University – 2011-06-2027

28 The result (2/4) Universe of discourse: CODGs (child- ordered graphs) –finite, directed graphs, otherwise unrestricted –can have cycles –same child multiple times –many « roots » –can be disconnected Digital Humanities – Stanford University – 2011-06-2028

29 Example 4 CODG ? Digital Humanities – Stanford University – 2011-06-2029 √

30 The result (3/4) Proof did not carry over Defining condition for graphs expressible in OO-TexMECS did not carry over –completion-acyclic noDAGs –vs full-completion-acyclic CODGs But essentially: –completion-acyclic noDAGs = –full-completion-acyclic CODGS = r-GODDAGs Digital Humanities – Stanford University – 2011-06-2030

31 The result (4/4) So, essentially, the graphs expressible in OO-TexMECS are: –the completion-acyclic noDAGs = –the full-completion-acyclic CODGS = –the r-GODDAGs Consequence: if you need more complex structures than r-GODDAGs, you must extend XML with more than overlap+ Digital Humanities – Stanford University – 2011-06-2031

32 Digital Humanities – Stanford University – 2011-06-2032 Future work Optimal verification algorithm for full- completion-acyclicity Optimal serialization algorithm for full- completion-acyclic CODGs Graphs with partially ordered children Other constructs of TexMECS

33 Digital Humanities – Stanford University – 2011-06-2033 Thank you ! Questions ?


Download ppt "Digital Humanities – Stanford University – 2011-06-201 Expressive power of markup languages and graph structures Yves Marcoux Université de Montréal, Canada."

Similar presentations


Ads by Google