Hoyle paper 019-31 SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas.

Slides:



Advertisements
Similar presentations
Table, List, Blocks, Inline Style
Advertisements

HTML popo.
XML in a SAS World Mike Molter d-Wise Technologies.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
Cascading Style Sheets (CSS). Cascading Style Sheets With the explosive growth of the World Wide Web, designers and programmers quickly explored and reached.
WeB application development
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Reading Microsoft Word XML files with SAS August 25, 2005 Larry Hoyle -- Policy Research Institute University of Kansas revised 8/18/2005.
ICS-FORTH 1 May 22, 2001 Christos Georgis The extensible markup language: An introduction to XML What is a XML document ? How do we check its validity.
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
HTML and Web Page Design Presented by Frank H. Osborne, Ph. D. © 2005 ID 2950 Technology and the Young Child.
SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product.
Tutorial 11 Creating XML Document
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
Chapter 14 Introduction to HTML
Introduction to HTML academy.zariba.com 1. Lecture Content 1.What is HTML? 2.The HTML Tag 3.Most popular HTML tags 2.
Computer Sciences Department
Tutorial 1: Getting Started with HTML5
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML. Markup Languages u What does this number (100) mean? –Actually, it’s just a string of characters! –A markup language can be used to distinguish this.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XP New Perspectives on XML, 2 nd Edition Tutorial 10 1 WORKING WITH THE DOCUMENT OBJECT MODEL TUTORIAL 10.
Learning HTML. HTML Attributes HTML elements can have attributes Attributes provide additional information about an element Class – specifies a class.
XML Extensible Markup Language. Markup Languages u What does this number (100) mean? –Actually, it’s just a string of characters! –A markup language can.
ACM 511 HTML Week -1 ACM 511 Course Notes. Books ACM 511 Course Notes.
Tutorial 1 Developing a Basic Web Page. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives – Lesson 1 Introduction to the.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
Review IDIA 619 Spring 2013 Bridget M. Blodgett. HTML A basic HTML document looks like this: Sample page Sample page This is a simple sample. HTML user.
I NTRO TO CSS IAT100 Spring I NTRO TO CSS Covered in this lesson: Overview What is CSS? Why to use CSS? CSS for Skinning your Website Structure.
Programming in HTML.  Programming Language  Used to design/create web pages  Hyper Text Markup Language  Markup Language  Series of Markup tags 
Introduction to HTML. What is HTML? Hyper Text Markup Language (HTML) is a language for describing web pages. HTML is not a programming language, it is.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
How do I use HTML and XML to present information?.
XSLT Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
HTML: Hyptertext Markup Language Doman’s Sections.
CPSC 203 Introduction to Computers Lab 33 By Jie Gao.
HTML Basics Let’s Make a Web Page. What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a.
USING XML AS A DATA SOURCE. Data binding is a process by which information in a data source is stored as an object in computer memory. In this presentation,
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
HTML Basics. HTML Introduction Stands for HyperText Markup Language. HTML files are plain text files with mark ups. Some characteristics of HTML: –No.
HTML ( HYPER TEXT MARK UP LANGUAGE ). What is HTML HTML describes the content and format of web pages using tags. Ex. Title Tag: A title It’s the job.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
SAS ODS (Output Delivery System) Donald Miller 812 Oswald Tower ;
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
ECA225 Applied Interactive Programming Cascading Style Sheets, pt 1 ECA 225 Applied Online Programming.
XML and SVG from PQL By Dave Doulton Computing Services University of Southampton.
With Microsoft Excel 2007 Comprehensive 1e© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Excel 2007 Comprehensive.
Colors & Fonts Building a Website Lesson 7. Font Font The tag specifies the font face, font size, and color of text. The tag can have any or all of these.
HTML Basics Computers. What is an HTML file? *HTML is a format that tells a computer how to display a web page. The documents themselves are plain text.
Microsoft Word Notes. Justified To justify your typing: –Highlight the information –Choose format –Paragraph –Use the dropdown menu next to alignment.
WEEK -1 ACM 262 ACM 262 Course Notes. HTML What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML.
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
1 HTML. 2 Full forms WWW – world Wide Web HTTP – Hyper Text Transfer Protocol HTML – Hyper Text Markup Language.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
1 Introduction to HTML. 2 Definitions  W W W – World Wide Web.  HTML – HyperText Markup Language – The Language of Web Pages on the World Wide Web.
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
Fall 2016 CSULA Saloni Chacha
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
The XML Language.
Creating an XML Document
More Sample XML By Sadia Anjum.
Web Programming and Design
Presentation transcript:

Hoyle paper SUGI 31 Reading Microsoft Word XML files with SAS® Larry Hoyle, Policy Research Institute, University of Kansas

Hoyle paper SUGI 31 Three Scenarios Extracting text and attributes Extracting data from tables Extracting drawing object parameters

Hoyle paper SUGI 31 XML - Syntax Some content Other content Must begin with this prolog tag Paired tags, must have 1 root tag case sensitive Tags and content called "element" Tags can be Qualified by attributes Elements can be nested, Start and end in same parent

Hoyle paper SUGI 31 Word XML

Hoyle paper SUGI 31 Word XML

Hoyle paper SUGI 31 Extracting Text and Properties

Hoyle paper SUGI 31 What Does SAS Need? SAS XML Engine Needs XMLMAP file Can use XML Mapper to generate XMLMAP Only needs to be generated once for each type of extract

Hoyle paper SUGI 31 Example Document Styles and Colors Have Meaning I have never been so humiliated in my life. That was very rude treatment. What a pleasant experience. Your staff was both quick and pleasant. It took about the time I expected to reach someone. I have nothing to say. The sky is blue and the sea is green. You are the worst organization in the world. I love you guys.

Hoyle paper SUGI 31 Style and Color Style is “Treated” – a statement about treatment Color is “Red” - represents negative affect

Hoyle paper SUGI 31 Example Document as XML I have never been so humiliated in my life. That was very rude treatment. What a pleasant experience. Your staff was both quick and pleasant. It took about the time I expected to reach someone. I have nothing to say. The sky is blue and the sea is green. You are the worst organization in the world. I love you guys. Paragraph property: /w:wordDocument/w:body /wx:sect/w:p/w:pPr Run property: /w:wordDocument/w:body /wx:sect/w:p/w:r/w:rPr.

Hoyle paper SUGI 31 Rows The XMLMap has to describe a path that delineates rows: In this case it’s each text element in a run (in a paragraph…) /w:wordDocument/w:body/wx:sect/w:p/ w:r/w:t

Hoyle paper SUGI 31 Columns – the Text The XMLMap has to describe a path that delineates each column: The text itself is: /w:wordDocument/w:body/wx:sect/w:p /w:r/w:t

Hoyle paper SUGI 31 Columns – the Text Element Number A sequential number for the text element is: /w:wordDocument/w:body/wx:sect/w:p/w:r/w:t

Hoyle paper SUGI 31 Columns – the Paragraph Number A sequential number for the paragraph is: /w:wordDocument/w:body/wx:sect/w:p

Hoyle paper SUGI 31 Columns –Paragraph Color /w:wordDocument/w:body/wx:sect/w:p/

Hoyle paper SUGI 31 Columns – Run Color /w:wordDocument/w:body/wx:sect/w:

Hoyle paper SUGI 31 Columns – Run Style /w:wordDocument/w:body/wx:sect/w:p/w:r/ character string 11

Hoyle paper SUGI 31 The Data as Read into SAS

Hoyle paper SUGI 31 Tables

Hoyle paper SUGI 31 Our Sample Tables Read all data from all tables into one dataset Add variables to indicate table, row, column

Hoyle paper SUGI 31 The Tables Dataset

Hoyle paper SUGI 31 The Tables Dataset

Hoyle paper SUGI 31 Word XML – Tables Absolute Path /w:wordDocument/w:body/wx: sect/w:tbl/w:tr/w:tc/w:p/w:r/ w:t Relative Path w:tc/w:p/w:r/w:t

Hoyle paper SUGI 31 Count Table Beginnings w:tbl

Hoyle paper SUGI 31 Count Table Endings w:tbl

Hoyle paper SUGI 31 Graphics

Hoyle paper SUGI 31 Drawing Object Parameters VML – Vector Markup Language This example will only read lines –(they’re easiest) Other drawing objects have different XML elements

Hoyle paper SUGI 31 Our Example Drawing

Hoyle paper SUGI 31 Word XML – Drawn Lines

Hoyle paper SUGI 31 One Row for Each Line Element /w:wordDocument/w:body/wx:sect/w:p/w:r/ w:pict/v:group/v:line

Hoyle paper SUGI 31 Columns Parameters as Attributes /w:wordDocument/w:body/wx:sect/w:p/w:r/

Hoyle paper SUGI 31 The Dataset

Hoyle paper SUGI 31 Example Code in Paper Convert colors Parse stroke weight (e.g. 2pt) Detect the keyword “flip” and flip coordinates

Hoyle paper SUGI 31 As Drawn by SAS

Hoyle paper SUGI 31 Contact Information Larry Hoyle Policy Research Institute, University of Kansas sugi31 sugi31