Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating the PyOpenGL Docs Hey, you got Python in my XML!

Similar presentations


Presentation on theme: "Generating the PyOpenGL Docs Hey, you got Python in my XML!"— Presentation transcript:

1 Generating the PyOpenGL Docs Hey, you got Python in my XML!

2 How I Stopped Worrying About Efficiency And used Python to process XML

3 An Exploration ElementTree (lxml.etree) and Kid XML Templating

4 The Task Merge Python-specific Information into OpenGL.org's Man-page Documentation

5 Details Docbook XML “Fragment” Files (434 files) Embedded Math ML “Driver” Documents Python-specific API (live introspection)

6 Original Solution Java (uh-oh) Saxon Docbook-XSL Sun Resolver Oasis XML Catalogs (setup req'd) MathML DTD (setup req'd) MathML XSL Stylesheets Custom XSL to join, merge and process

7 Resulting Problems Did I mention Java and XSL? Fragile configuration

8 Way too Slow Loading huge document Profligate XPath queries Run time: >70m

9 Way too Difficult No easy way to alter results Python-specific content required DocBook Post-processing required to add Pythonic data

10 Hang Performance Need something we can keep up to date

11 Small Bites Read Individual Documents Model from Document (where interesting) Index in Python

12 Read How? from lxml import etree as ET tree = ET.XML( data ) tree = ET.parse( filename ) tree = ET.parse( file_like_object )

13 Propriety Counts! XML is the land of the anal-retentive Namespace declarations are key if you want to process real documents

14 Node Types node = ET.Element( '{namespace}tagname' ) node.tag == '{namespace}tagname'

15 Node Content for x in node: node[i:j] node.remove( child_node ) node.index( child_node ) node.append( child_node )

16 Text Content and Tails content = ' a b c ' tree = ET.XML( content ) node.text == 'a' child.text == 'b' # note that this is not inside child! child.tail == 'c'

17 Attributes.get( key ).set( key ).keys().items() Note: namespaces

18 Processing Approaches Search vs. Recursion

19 Why Search? Search better for “interpretation” tasks Find all “definitions” and register them Find all “references” and register them

20 Why Recursive Processing? Recursive better for “transformation” tasks Convert this to a that Process all children deciding what to do with them Filter, modify, rewrite

21 Searching (XPath) lxml.etree-specific (not standard) Convenient, but use with caution nodes = node.xpath( './/d:nodetype', {'d':'namespace_url'} )

22 XPath Paths '//something' => anywhere in document './/something' => descendant of current './something' => direct child of current './something/*' => direct children of something

23 Compound Paths './/something/other' => descendent “other” with direct parent “something” './/something//other' => descendant “other” with intermediate ancestor “something” './/something/*/other' => others with something as grandparent

24 XPath Qualifiers './something[1]' => first something child './something[last()]' => last something child “./something[text()='blah']” => where content is './/[@id=”this”]' => where id attribute is this (contains, starts-with)

25 For each Document Load document Search for interesting “sections” Convert “interesting” sections to Model objects (Model includes indexing) Retain remaining sections as XML content

26 Pythonic Data Load the Python modules/objects Use introspection to produce Model objects (attach to model objects previously created)

27 Now What? Create tags in ElementTree? I don't think so. This is page layout!

28 Kid to the Rescue Happens to be based on ElementTree (though not lxml.etree) Doesn't actually matter for this application

29 Template Functions Matter Kid (and Genshi) let you define XML functions Reusable template fragments Pass in Python objects (model) ElementTree elements are also Python objects

30 Dispatch Fragments on Type Model objects displayed with simple templates Uninteresting markup uses recursive transformation (divs and spans from tag names) Somewhat interesting markup uses special fragments (e.g. make cross-refs anchors)

31 Iterate CSS Tweaking Index File

32 Advantages Much faster (70s vs >70m) Far easier to control output Far easier to set up < 1200 lines of code including everything

33 You got Python in my XML! Happy processing :).


Download ppt "Generating the PyOpenGL Docs Hey, you got Python in my XML!"

Similar presentations


Ads by Google