Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hoyle paper 019-31 SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas.

Similar presentations


Presentation on theme: "Hoyle paper 019-31 SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas."— Presentation transcript:

1 Hoyle paper SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas

2 Hoyle paper SUGI 31 Three Scenarios Extracting text and attributes Extracting data from tables Extracting drawing object parameters

3 Hoyle paper SUGI 31 XML - Syntax Some content Other content Must begin with this prolog tag Paired tags, must have 1 root tag case sensitive Tags and content called "element" Tags can be Qualified by attributes Elements can be nested, Start and end in same parent

4 Hoyle paper SUGI 31 Word XML

5 Hoyle paper SUGI 31 Word XML

6 Hoyle paper SUGI 31 Extracting Text and Properties

7 Hoyle paper SUGI 31 What Does SAS Need? SAS XML Engine Needs XMLMAP file Can use XML Mapper to generate XMLMAP Only needs to be generated once for each type of extract

8 Hoyle paper SUGI 31 Example Document Styles and Colors Have Meaning I have never been so humiliated in my life. That was very rude treatment. What a pleasant experience. Your staff was both quick and pleasant. It took about the time I expected to reach someone. I have nothing to say. The sky is blue and the sea is green. You are the worst organization in the world. I love you guys.

9 Hoyle paper SUGI 31 Style and Color Style is “Treated” – a statement about treatment Color is “Red” - represents negative affect

10 Hoyle paper SUGI 31 Example Document as XML I have never been so humiliated in my life. That was very rude treatment. What a pleasant experience. Your staff was both quick and pleasant. It took about the time I expected to reach someone. I have nothing to say. The sky is blue and the sea is green. You are the worst organization in the world. I love you guys. Paragraph property: /w:wordDocument/w:body /wx:sect/w:p/w:pPr Run property: /w:wordDocument/w:body /wx:sect/w:p/w:r/w:rPr.

11 Hoyle paper SUGI 31 Rows The XMLMap has to describe a path that delineates rows: In this case it’s each text element in a run (in a paragraph…) /w:wordDocument/w:body/wx:sect/w:p/ w:r/w:t

12 Hoyle paper SUGI 31 Columns – the Text The XMLMap has to describe a path that delineates each column: The text itself is: /w:wordDocument/w:body/wx:sect/w:p /w:r/w:t

13 Hoyle paper SUGI 31 Columns – the Text Element Number A sequential number for the text element is: /w:wordDocument/w:body/wx:sect/w:p/w:r/w:t

14 Hoyle paper SUGI 31 Columns – the Paragraph Number A sequential number for the paragraph is: /w:wordDocument/w:body/wx:sect/w:p

15 Hoyle paper SUGI 31 Columns –Paragraph Color /w:wordDocument/w:body/wx:sect/w:p/

16 Hoyle paper SUGI 31 Columns – Run Color /w:wordDocument/w:body/wx:sect/w:

17 Hoyle paper SUGI 31 Columns – Run Style /w:wordDocument/w:body/wx:sect/w:p/w:r/ character string 11

18 Hoyle paper SUGI 31 The Data as Read into SAS

19 Hoyle paper SUGI 31 Tables

20 Hoyle paper SUGI 31 Our Sample Tables Read all data from all tables into one dataset Add variables to indicate table, row, column

21 Hoyle paper SUGI 31 The Tables Dataset

22 Hoyle paper SUGI 31 The Tables Dataset

23 Hoyle paper SUGI 31 Word XML – Tables Absolute Path /w:wordDocument/w:body/wx: sect/w:tbl/w:tr/w:tc/w:p/w:r/ w:t Relative Path w:tc/w:p/w:r/w:t

24 Hoyle paper SUGI 31 Count Table Beginnings w:tbl

25 Hoyle paper SUGI 31 Count Table Endings w:tbl

26 Hoyle paper SUGI 31 Graphics

27 Hoyle paper SUGI 31 Drawing Object Parameters VML – Vector Markup Language This example will only read lines –(they’re easiest) Other drawing objects have different XML elements

28 Hoyle paper SUGI 31 Our Example Drawing

29 Hoyle paper SUGI 31 Word XML – Drawn Lines

30 Hoyle paper SUGI 31 One Row for Each Line Element /w:wordDocument/w:body/wx:sect/w:p/w:r/ w:pict/v:group/v:line

31 Hoyle paper SUGI 31 Columns Parameters as Attributes /w:wordDocument/w:body/wx:sect/w:p/w:r/

32 Hoyle paper SUGI 31 The Dataset

33 Hoyle paper SUGI 31 Example Code in Paper Convert colors Parse stroke weight (e.g. 2pt) Detect the keyword “flip” and flip coordinates

34 Hoyle paper SUGI 31 As Drawn by SAS

35 Hoyle paper SUGI 31 Contact Information Larry Hoyle Policy Research Institute, University of Kansas sugi31 sugi31


Download ppt "Hoyle paper 019-31 SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas."

Similar presentations


Ads by Google