Programming for WWW (ICE 1338) Lecture #9 Lecture #9 July 23, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

XML: Extensible Markup Language
An Introduction to XML Based on the W3C XML Recommendations.
ICE1341 Programming Languages Spring 2005 Lecture #19 Lecture #19 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ICE0534 – Web-based Software Development ICE1338 – Programming for WWW Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information.
COS 381 Day 22. Agenda  Assignment #5 Corrected 2 B’s, 1 C, and 3 D’s  Next Capstone progress report due April 21  Capstone projects are DUE May 10.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic.
COS 381 Day 20. Agenda  Assignment #5 Due  Capstone Progress Reports Overdue Were Due April 7  Capstone projects are DUE May 10 at 1PM  Assignment.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
Chapter 10 © 2001 by Addison Wesley Longman, Inc. 1 Chapter 10 Sebesta: Programming the World Wide Web.
Tutorial 11 Creating XML Document
COS 381 Day 14. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
ICE1341 Programming Languages Spring 2005 Lecture #8 Lecture #8 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
Chapter 9 Using Perl for CGI Programming. Computation is required to support sophisticated web applications Computation can be done by the server or the.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Creating a Basic Web Page
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
March 19, ICE 1341 – Programming Languages (Lecture #8) In-Young Ko Programming Languages (ICE 1341) Lecture #8 Programming Languages (ICE 1341)
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
CP476 Internet Computing CGI1 CGI is a common way to provide for specific computations on server side, interactions with users, or access to databases.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
USING PERL FOR CGI PROGRAMMING
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
April 30, ICE 1341 – Programming Languages (Lecture #18) In-Young Ko Programming Languages (ICE 1341) Lecture #18 Programming Languages (ICE 1341)
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Using Perl for CGI Programming.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Chapter 10 © 2003 by Addison Wesley Longman, Inc. 1 Chapter 10 Using Perl for CGI Programming.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
CP476 Internet Computing CGI1 Cookie –Cookie is a mechanism for a web server recall info of accessing of a client browser –A cookie is an object sent by.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Martin Kruliš by Martin Kruliš (v1.1)1.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Unit 4 Representing Web Data: XML
XML in Web Technologies
Chapter 7 Representing Web Data: XML
Chapter 27 WWW and HTTP.
10.1 The Common Gateway Interface
Tutorial 10: Programming with javascript
Allyson Falkner Spokane County ISD
Review of XML IST 421 Spring 2004 Lecture 5.
Presentation transcript:

Programming for WWW (ICE 1338) Lecture #9 Lecture #9 July 23, 2004 In-Young Ko iko.AT. icu.ac.kr Information and Communications University (ICU) iko.AT. icu.ac.kr

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Announcements Class hours on Friday July 30 th will be moved to 3:00PM~5:30PM Class hours on Friday July 30 th will be moved to 3:00PM~5:30PM

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Review of the Previous Lecture Interaction between Java Applets and JavaScript Interaction between Java Applets and JavaScript CGI programming CGI programming Perl pattern matching Perl pattern matching

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Contents of Today’s Lecture Perl modules Perl modules Cookies Cookies Introduction to PHP Introduction to PHP XML and XML Processing XML and XML Processing

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University CGI.pm Module CGI.pm: A Perl module serves as a library CGI.pm: A Perl module serves as a library The use declaration is used to make a module available to a program The use declaration is used to make a module available to a program To make only part of a module available, specify the part name after a colon To make only part of a module available, specify the part name after a colon e.g., use CGI ":standard"; Common CGI.pm Functions Common CGI.pm Functions “Shortcut” functions produce tags, using their parameters as attribute values “Shortcut” functions produce tags, using their parameters as attribute values e.g., h2("Very easy!"); produces Very easy! Very easy! In this example, the parameter to the function h2 is used as the content of the tag In this example, the parameter to the function h2 is used as the content of the tag AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University CGI.pm Module (cont.) Tags can have both content and attributes Tags can have both content and attributes Each attribute is passed as a name/value pair Each attribute is passed as a name/value pair Attribute names are passed with a preceding dash Attribute names are passed with a preceding dash e.g., textarea(-name => "Description", -rows => "2", -cols => "35"); Produces: Produces: Tags and their attributes are distributed over the parameters of the function Tags and their attributes are distributed over the parameters of the function e.g., ol(li({-type => "square"}, ["milk", "bread", "cheese"])); Output: milk Output: milk bread bread cheese cheese AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University CGI.pm Module (cont.) Producing output for return to the user Producing output for return to the user A call to header() produces: A call to header() produces: Content-type: text/html;charset=ISO blank line – The start_html function is used to create the head of the return document, as well as the tag The start_html function is used to create the head of the return document, as well as the tag The parameter to start_html is used as the title of the document The parameter to start_html is used as the title of the document e.g., start_html("Bill’s Bags"); DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml11-transitional.dtd"> Bill’s Bags Bill’s Bags </head><body> The end_html function generates The end_html function generates AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University CGI.pm Module (cont.) The param function is given a widget’s name; it returns the widget’s value The param function is given a widget’s name; it returns the widget’s value If the query string has name=Abraham in it, param("name") will return "Abraham“ If the query string has name=Abraham in it, param("name") will return "Abraham“ e.g., my($age, $gender, $vote) = (param("age"), param("gender"), param("vote")); (param("age"), param("gender"), param("vote")); AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Cookies A session is the collection of all of the requests made by a particular browser from the time the browser is started until the user exits the browser A session is the collection of all of the requests made by a particular browser from the time the browser is started until the user exits the browser The HTTP protocol is stateless, but, there are several reasons why it is useful for the server to relate a request to a session The HTTP protocol is stateless, but, there are several reasons why it is useful for the server to relate a request to a session Shopping carts for many different simultaneous customers Shopping carts for many different simultaneous customers Customer profiling for advertising Customer profiling for advertising Customized interfaces for specific clients Customized interfaces for specific clients Approaches to storing client information: Approaches to storing client information: Store it on the server – too much to store! Store it on the server – too much to store! Store it on the client machine – this works Store it on the client machine – this works AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Cookies (cont.) A cookie is an object sent by the server to the client A cookie is an object sent by the server to the client Cookies are created by some software system on the server (maybe a CGI program) Cookies are created by some software system on the server (maybe a CGI program) At the time a cookie is created, it is given a lifetime At the time a cookie is created, it is given a lifetime Every time the browser sends a request to the server that created the cookie, while the cookie is still alive, the cookie is included Every time the browser sends a request to the server that created the cookie, while the cookie is still alive, the cookie is included A browser can be set to reject all cookies A browser can be set to reject all cookies AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Using CGI.pm for Cookies CGI.pm includes support for cookies CGI.pm includes support for cookies cookie(-name => a_name, -value => a_value, -expires => a_time); The time is a number followed by a unit code (d, s, m, h, M, y) The time is a number followed by a unit code (d, s, m, h, M, y) e.g., -expires => '+5d' Cookies must be placed in the HTTP header at the time the header is created Cookies must be placed in the HTTP header at the time the header is created e.g., header(-cookie => $my_cookie); To fetch the cookies from an HTTP request, call cookie with no parameters – A hash of all cookies is returned To fetch the cookies from an HTTP request, call cookie with no parameters – A hash of all cookies is returned To fetch the value of one particular cookie, send the cookie’s name to the cookie function, To fetch the value of one particular cookie, send the cookie’s name to the cookie function, e.g., $age = cookie(′age′); AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University A Cookie Example A cookie that tells the client the time of his or her last visit to this site A cookie that tells the client the time of his or her last visit to this site Use the Perl function, localtime, to get the parts of time Use the Perl function, localtime, to get the parts of time ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = = ($sec, $min, $hour, $mday, $mon, $year); $day_cookie = cookie(-name => 'last_time', -value => -expires => '+5d'); -value => -expires => '+5d'); AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Perl References Textbook chapters 4 and 5 Textbook chapters 4 and 5 Perl.com: Perl.com: A Perl Tutorial: A Perl Tutorial: Perl Pattern Matching: Perl Pattern Matching: Perl Functions: /pod/perlfunc.html Perl Functions: /pod/perlfunc.html /pod/perlfunc.html /pod/perlfunc.html Perl Modules: Perl Modules:

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University PHP (PHP Hypertext Preprocessor) Developed in 1994 by Rasmus Lerdorf to allow him to track visitors to his Web site Developed in 1994 by Rasmus Lerdorf to allow him to track visitors to his Web site Used for form handling, file processing, and database access Used for form handling, file processing, and database access A server-side scripting language whose scripts are embedded in HTML documents – Similar to JavaScript, but on the server side A server-side scripting language whose scripts are embedded in HTML documents – Similar to JavaScript, but on the server side An alternative to CGI, Active Server Pages (ASP), and Java Server Pages (JSP) An alternative to CGI, Active Server Pages (ASP), and Java Server Pages (JSP) The PHP processor has two modes: copying HTML texts and interpreting PHP codes The PHP processor has two modes: copying HTML texts and interpreting PHP codes Syntax is similar to that of JavaScript Syntax is similar to that of JavaScript Dynamically typed Dynamically typed AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University An Example PHP Code PHP Test PHP Test Hello World '; ?> Hello World '; ?> </html> <html> PHP Test PHP Test Hello World Hello World </html> ‘hello.php’ on the Web server The document content received by the client via ‘

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University A PHP Example: A Hit Counter counter.php counter.php <?php $counter_file = ("counter.txt"); $visits = file($counter_file); $visits[0]++; $fp = fopen($counter_file, "w"); fputs($fp, "$visits[0]"); fclose($fp); echo "There have been $visits[0] visitors so far"; ?> Now add the following to your page where you wish the counter to appear Now add the following to your page where you wish the counter to appear

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University PHP References PHP.net: PHP.net: PHP Manual: PHP Manual: Examples and Tutorials: nd_Tutorials/ Examples and Tutorials: nd_Tutorials/ nd_Tutorials/ nd_Tutorials/

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University SGML (Standard Generalized Markup Language) SGML is a meta-markup language developed in the early 1980s (ISO 8879, 1986) SGML is a meta-markup language developed in the early 1980s (ISO 8879, 1986)ISO 8879ISO 8879 HTML was developed using SGML in the early 1990s - specifically for Web documents HTML was developed using SGML in the early 1990s - specifically for Web documents Problems with HTML: Problems with HTML: 1. Fixed set of tags and attributes User cannot define new tags or attributes User cannot define new tags or attributes So, the tags cannot connote any particular meaning So, the tags cannot connote any particular meaning 2. No restrictions on arrangement or order of tag appearance SGML is too large and complex to use, and it is very difficult to build a parser for it SGML is too large and complex to use, and it is very difficult to build a parser for it AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML (Extended Markup Language) XML is a light version of SGML that provides a way of storing and transferring data of any kind XML is a light version of SGML that provides a way of storing and transferring data of any kind XML vs. HTML XML vs. HTML HTML is a markup language used to describe the layout of any kind of information HTML is a markup language used to describe the layout of any kind of information XML is a meta-markup language that can be used to define markup languages that can define the meaning of specific kinds of information XML is a meta-markup language that can be used to define markup languages that can define the meaning of specific kinds of information XML does not predefine any tags XML does not predefine any tags All documents described with an XML-derived markup language can be parsed with a single parser All documents described with an XML-derived markup language can be parsed with a single parser AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Syntax A flexible text format that is originally designed for large-scale electronic publishing of documents A flexible text format that is originally designed for large-scale electronic publishing of documents An XML document is a hierarchical organization of one or more named elements An XML document is a hierarchical organization of one or more named elements An element is composed of an opening-tag, data (string or another element), and a closing-tag An element is composed of an opening-tag, data (string or another element), and a closing-tag An opening-tag is an element name surrounded by ‘ ’ An opening-tag is an element name surrounded by ‘ ’ A closing-tag is an element name surrounded by ‘ ’ A closing-tag is an element name surrounded by ‘ ’ An element may have zero or more attributes An element may have zero or more attributes An attribute is a name-value pair that specifies a property of the element An attribute is a name-value pair that specifies a property of the element

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Syntax (cont.) All XML documents begin with an XML declaration: All XML documents begin with an XML declaration: XML comments are just like HTML comments XML comments are just like HTML comments XML names: XML names: Must begin with a letter or an underscore Must begin with a letter or an underscore They can include digits, hyphens, and periods They can include digits, hyphens, and periods There is no length limitation There is no length limitation They are case sensitive (unlike HTML names) They are case sensitive (unlike HTML names) Syntax rules for XML: Syntax rules for XML: Every XML document defines a single root element, whose opening tag must appear as the first line of the document Every XML document defines a single root element, whose opening tag must appear as the first line of the document Every element that has content must have a closing tag Every element that has content must have a closing tag Tags must be properly nested Tags must be properly nested All attribute values must be quoted All attribute values must be quoted AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University An XML Document Example <class> Prog. for WWW Prog. for WWW ICE1338 ICE1338 <students> Y.K. Ko Y.K. Ko </student> D.W. Kim D.W. Kim </student></students></class>

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University An XML Document Example <class> Prog. for WWW Prog. for WWW ICE1338 ICE1338 <students> Y.K. Ko Y.K. Ko </student> D.W. Kim D.W. Kim </student></students></class> An opening-tag A closing-tag An element An attribute The root element A value

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Document Structures Logical Structure: tells what elements are to be included in a document and in what order Logical Structure: tells what elements are to be included in a document and in what order A new nested tag needs to be defined to provide more info about the content of a tag A new nested tag needs to be defined to provide more info about the content of a tag Nested tags are better than attributes, because attributes cannot describe structure and the structural complexity may grow Nested tags are better than attributes, because attributes cannot describe structure and the structural complexity may grow Attributes should always be used to identify numbers or names of elements (like HTML id and name attributes) Attributes should always be used to identify numbers or names of elements (like HTML id and name attributes) Physical Structure: governs the content in a document in form of storage units called entities Physical Structure: governs the content in a document in form of storage units called entities AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Logical Structure Examples......</patient> AW lecture notes <patient> Maggie Maggie Dee Dee Magpie Magpie......</patient> <patient> Maggie Dee Magpie Maggie Dee Magpie......</patient>

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University DTD (Data Type Definitions) A DTD is a set of structural rules called declarations A DTD is a set of structural rules called declarations Specify a set of elements, along with how and where they can appear in a document (in BNF) Specify a set of elements, along with how and where they can appear in a document (in BNF) Purpose: provide a standard form for a collection of XML documents Purpose: provide a standard form for a collection of XML documents Not all XML documents have or need a DTD Not all XML documents have or need a DTD The DTD for a document can be internal or external The DTD for a document can be internal or external All of the declarations of a DTD are enclosed in the block of a DOCTYPE markup declaration All of the declarations of a DTD are enclosed in the block of a DOCTYPE markup declaration DTD declarations have the form: DTD declarations have the form: Possible declaration keywords: Possible declaration keywords: ELEMENT, ATTLIST, ENTITY, and NOTATION ELEMENT, ATTLIST, ENTITY, and NOTATION AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University A DTD Example <!DOCTYPE HYPERLIB [ <!ATTLIST AUTHOR <!ATTLIST AUTHOR function (manager | editor | contrib) #REQUIRED> function (manager | editor | contrib) #REQUIRED> ]>

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University A DTD Example <!DOCTYPE HYPERLIB [ <!ATTLIST AUTHOR <!ATTLIST AUTHOR function (manager | editor | contrib) #REQUIRED> function (manager | editor | contrib) #REQUIRED> ]>

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University More on DTD… Internal and External DTDs Internal and External DTDs Internal DTDs Internal DTDs External DTDs External DTDs Problems with DTDs: Problems with DTDs: Syntax is different from XML - cannot be parsed with an XML parser Syntax is different from XML - cannot be parsed with an XML parser It is confusing to deal with two different syntactic forms It is confusing to deal with two different syntactic forms DTDs do not allow specification of particular kinds of data DTDs do not allow specification of particular kinds of data AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Entities Entities allow users to assign a name to some content, and use that name to refer to that content Entities allow users to assign a name to some content, and use that name to refer to that content Used as "macros" for content (e.g., special characters, images, documents) Used as "macros" for content (e.g., special characters, images, documents) Entity Categories Entity Categories The Document Entity: the root of the entity tree, the whole document The Document Entity: the root of the entity tree, the whole document Internal General Entities: association of an arbitrary piece of text with a name Internal General Entities: association of an arbitrary piece of text with a name External General Entities: incorporate content from external files External General Entities: incorporate content from external files

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Internal General Entities Predefined Entities Predefined Entities Character References – refer to Unicode characters using &#decimal; or &#xhex; Character References – refer to Unicode characters using &#decimal; or &#xhex; Internal Entity Declaration Internal Entity Declaration Internal Entity Reference - &entityname; Internal Entity Reference - &entityname; EntityEntity NameReplacement Text The left angle bracket (<) lt< The right angle bracket (>) gt> The ampersand (&) amp& The single quote or apostrophe (') apos' The double quote (") quot"

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University External General Entities Provides a mechanism for dividing a document up into logical chunks, each of which can be stored in a separate file Provides a mechanism for dividing a document up into logical chunks, each of which can be stored in a separate file When the parent file is parsed by an XML processor, it will have the effect of inserting the contents of each of the individual files at that location of the respective entity references When the parent file is parsed by an XML processor, it will have the effect of inserting the contents of each of the individual files at that location of the respective entity references External entities can contain binary data, which can be used to reference images and other non- XML content in the document External entities can contain binary data, which can be used to reference images and other non- XML content in the document

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University External Entity Example <!DOCTYPE [ [ ] ]>… <document>&section1;...&sectionm;</document> AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Namespaces Markup vocabulary: the collection of all of the element types and attribute names of a markup language (a tag set) Markup vocabulary: the collection of all of the element types and attribute names of a markup language (a tag set) An XML document may define its own tag set and also use that of another tag set - CONFLICTS! An XML document may define its own tag set and also use that of another tag set - CONFLICTS! XML namespace: a collection of names used in XML documents as element types and attribute names XML namespace: a collection of names used in XML documents as element types and attribute names The name of an XML namespace has the form of a URI The name of an XML namespace has the form of a URI A namespace declaration has the form: A namespace declaration has the form: The prefix is a short name for the namespace, which is attached to names from the namespace in the XML document e.g., e.g., In the document, you can use In the document, you can use AW lecture notes

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Namespace Example <h:html xmlns:xdc=" xmlns:h=" xmlns:h=" Book Review Book Review XML: A Primer XML: A Primer Author Price Author Price Pages Date Pages Date Simon St. Laurent Simon St. Laurent / /01 </h:html>

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Processors XML Parsers: read XML documents and provide access to their content and structure via DOM (e.g., Xerces, Sun’s Java XML Parser) XML Parsers: read XML documents and provide access to their content and structure via DOM (e.g., Xerces, Sun’s Java XML Parser) Document Filtering (Validation) Document Filtering (Validation) Document Type Declaration (DTD): a grammar for a class of XML documents Document Type Declaration (DTD): a grammar for a class of XML documents XML Schema (XSD): a successor of DTD. Describes the structure of an XML document XML Schema (XSD): a successor of DTD. Describes the structure of an XML document XML Presentation XML Presentation eXtensible Stylesheet Language (XSL): a language to define the transformation and presentation of an XML document eXtensible Stylesheet Language (XSL): a language to define the transformation and presentation of an XML document

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML Processors XML Document Databases XML Parser DTD/ XMLSchema XSL Description XSL Processor XML Grammar (Structure) Validation DOM Objects HTML Presentation Parsing Events DOM API SAX API

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University XML APIs SAX (Simple API for XML) – XML-DEV SAX (Simple API for XML) – XML-DEV Stream-based Access Interface (Sequential Access) Stream-based Access Interface (Sequential Access) Notifies an application of a stream of parsing events Notifies an application of a stream of parsing events Needs a Content Handler to handle the parsing events (e.g., start and end of an element) Needs a Content Handler to handle the parsing events (e.g., start and end of an element) Appropriate to handle a large XML document Appropriate to handle a large XML document DOM (Document Object Model) – W3C DOM (Document Object Model) – W3C Object-oriented Access Interface (Random Access) Object-oriented Access Interface (Random Access) Builds a tree of nodes based on the structure and information in an XML document Builds a tree of nodes based on the structure and information in an XML document Types of nodes: Document, Element, Attr, … Types of nodes: Document, Element, Attr, …

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University DOM Representation <class> Prog. Lang. Prog. Lang. ICE1341 ICE1341 Y.K. Ko Y.K. Ko D.W. Kim D.W. Kim </class> XML Document DOM Representation Document (Root Node) Elements (Child Nodes) Node Values (Text Nodes)

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Java API Hierarchy for DOM Node getChildNodes(): NodeList getAttributes(): NamedNodeMap getNodeName(): String getNodeValue(): String appendChild(Node) removeChild(Node) setNodeValue(String) Attr getName(): String getValue(): String setValue(String) CharacterData getData(): String getLength(): int setData(String) Document createAttribute(String): Attr createElement(String): Element createTextNode(String): Text getDocumentElement(): Element getElementByTagName(String): NodeList Element getAttribute(String): String getTagName(): String Text splitText(int): Text Comment

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University An Example of Creating DOM Objects from an XML File try { DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); Document doc = Document doc = docBuilder.parse(new File("sample.xml")); Element rootEle = doc.getDocumentElement(); Element rootEle = doc.getDocumentElement(); NodeList children = rootEle.getChildNodes(); NodeList children = rootEle.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { for (int i = 0; i < children.getLength(); i++) { Node subEle = children.item(i); Node subEle = children.item(i); … } } catch(Exception e) { e.printStackTrace(); }

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Related Materials W3C’s XML Web Site: W3C’s XML Web Site: XML Specification: / XML Specification: / / / XML Concepts: XML Concepts: DTD Tutorial: DTD Tutorial: XML Schema Tutorial: XML Schema Tutorial: W3C’s XSL Site: W3C’s XSL Site: XML Entities and their Applications: XML Entities and their Applications: Other XML-related Notes: Other XML-related Notes:

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Related Materials W3C Document Object Model ( W3C Document Object Model ( A simple way to read an XML file in Java ( A simple way to read an XML file in Java ( Working with XML (java.sun.com/xml/jaxp/dist/1.0.1/docs/tutorial/index.html) Working with XML (java.sun.com/xml/jaxp/dist/1.0.1/docs/tutorial/index.html)java.sun.com/xml/jaxp/dist/1.0.1/docs/tutorial/index.html Java Technology and XML FAQs (java.sun.com/xml/faq.html) Java Technology and XML FAQs (java.sun.com/xml/faq.html)java.sun.com/xml/faq.html Java API Manual (java.sun.com/j2se/1.4.2/docs/api/) Java API Manual (java.sun.com/j2se/1.4.2/docs/api/)java.sun.com/j2se/1.4.2/docs/api/ See org.w3c.dom and javax.xml.parsers See org.w3c.dom and javax.xml.parsers XML.org ( XML.org (

July 23, Programming for WWW (Lecture#9) In-Young Ko, Information Communications University Homework #3 Due by Friday July 30th Due by Friday July 30th Design an XML document structure to represent the results from your Web Wrapper Design an XML document structure to represent the results from your Web Wrapper You can use DTD or XSD for writing the grammar, but it is not a requirement You can use DTD or XSD for writing the grammar, but it is not a requirement You can just sketch the structure by drawing a tree hierarchy You can just sketch the structure by drawing a tree hierarchy Write a program to generate a DOM hierarchy of the wrapper results by using a DOM library, and link the program with your Web wrapper Write a program to generate a DOM hierarchy of the wrapper results by using a DOM library, and link the program with your Web wrapper Produce an XML file from the DOM representation of the results Produce an XML file from the DOM representation of the results Submit the following things electronically to the TA Submit the following things electronically to the TA The XML document structure design The XML document structure design Your Web wrapper program with the XML generation part Your Web wrapper program with the XML generation part An output XML file An output XML file