Download presentation
Presentation is loading. Please wait.
Published byBerenice Bailey Modified over 7 years ago
1
Generating the PyOpenGL Docs Hey, you got Python in my XML!
2
How I Stopped Worrying About Efficiency And used Python to process XML
3
An Exploration ElementTree (lxml.etree) and Kid XML Templating
4
The Task Merge Python-specific Information into OpenGL.org's Man-page Documentation
5
Details Docbook XML “Fragment” Files (434 files) Embedded Math ML “Driver” Documents Python-specific API (live introspection)
6
Original Solution Java (uh-oh) Saxon Docbook-XSL Sun Resolver Oasis XML Catalogs (setup req'd) MathML DTD (setup req'd) MathML XSL Stylesheets Custom XSL to join, merge and process
7
Resulting Problems Did I mention Java and XSL? Fragile configuration
8
Way too Slow Loading huge document Profligate XPath queries Run time: >70m
9
Way too Difficult No easy way to alter results Python-specific content required DocBook Post-processing required to add Pythonic data
10
Hang Performance Need something we can keep up to date
11
Small Bites Read Individual Documents Model from Document (where interesting) Index in Python
12
Read How? from lxml import etree as ET tree = ET.XML( data ) tree = ET.parse( filename ) tree = ET.parse( file_like_object )
13
Propriety Counts! XML is the land of the anal-retentive Namespace declarations are key if you want to process real documents
14
Node Types node = ET.Element( '{namespace}tagname' ) node.tag == '{namespace}tagname'
15
Node Content for x in node: node[i:j] node.remove( child_node ) node.index( child_node ) node.append( child_node )
16
Text Content and Tails content = ' a b c ' tree = ET.XML( content ) node.text == 'a' child.text == 'b' # note that this is not inside child! child.tail == 'c'
17
Attributes.get( key ).set( key ).keys().items() Note: namespaces
18
Processing Approaches Search vs. Recursion
19
Why Search? Search better for “interpretation” tasks Find all “definitions” and register them Find all “references” and register them
20
Why Recursive Processing? Recursive better for “transformation” tasks Convert this to a that Process all children deciding what to do with them Filter, modify, rewrite
21
Searching (XPath) lxml.etree-specific (not standard) Convenient, but use with caution nodes = node.xpath( './/d:nodetype', {'d':'namespace_url'} )
22
XPath Paths '//something' => anywhere in document './/something' => descendant of current './something' => direct child of current './something/*' => direct children of something
23
Compound Paths './/something/other' => descendent “other” with direct parent “something” './/something//other' => descendant “other” with intermediate ancestor “something” './/something/*/other' => others with something as grandparent
24
XPath Qualifiers './something[1]' => first something child './something[last()]' => last something child “./something[text()='blah']” => where content is './/[@id=”this”]' => where id attribute is this (contains, starts-with)
25
For each Document Load document Search for interesting “sections” Convert “interesting” sections to Model objects (Model includes indexing) Retain remaining sections as XML content
26
Pythonic Data Load the Python modules/objects Use introspection to produce Model objects (attach to model objects previously created)
27
Now What? Create tags in ElementTree? I don't think so. This is page layout!
28
Kid to the Rescue Happens to be based on ElementTree (though not lxml.etree) Doesn't actually matter for this application
29
Template Functions Matter Kid (and Genshi) let you define XML functions Reusable template fragments Pass in Python objects (model) ElementTree elements are also Python objects
30
Dispatch Fragments on Type Model objects displayed with simple templates Uninteresting markup uses recursive transformation (divs and spans from tag names) Somewhat interesting markup uses special fragments (e.g. make cross-refs anchors)
31
Iterate CSS Tweaking Index File
32
Advantages Much faster (70s vs >70m) Far easier to control output Far easier to set up < 1200 lines of code including everything
33
You got Python in my XML! Happy processing :).
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.