XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern.

1 XMLandKM XML and KM Powering Information and Retrieval for the Semantic Web Frank Cervone Assistant University Librarian for Information Technology, Northwestern University Darlene Fichter Data Library Coordinator, University of Saskatchewan Library

2 XMLandKM Introductions Who are you? Where do you work? What is your experience with KM? What is your interest in XML?

3 XMLandKM Outline Semantic Web and KM What is XML? SGML & HTML - where do they fit? XML - Structure and Elements XML Applications –Integration of disparate content News –Expertise profiling –Enterprise solutions

4 XMLandKM Semantic Web “The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” Tim Berners-Lee and others

5 XMLandKM One Goal Support elaborate precise searches by integrating and utilizing all relevant sources of information / relationships. Illustration from Scientific American May 1, 2001

6 XMLandKM Is XML a magical fix? Not likely. It does not magically integrate redundant data versions We’re unlikely to replace systems with single, common shared version of integrated just for this reason But, if used correctly, XML can help

7 XMLandKM Harness the Power of Semantics If we wish to harness this power, then we need to –To understand and resolve the different words and meanings we use to refer to the same things –Consider ways and means of defining standard terminology & establishing agreed upon meaning usually through standard metadata –Be able to use XML messaging between applications and transformations

8 XMLandKM Pieces

9 XMLandKM XML – Codification of Knowledge Knowledge Representation In order for the “idea” to become a reality computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.

10 XMLandKM Why talk about the semantic web? Many of the “information intensive” processes of KM are facing the same challenge –Capture – formalize existing knowledge –Select and assess relevance, value.. –Store – in repository with schema –Share – distribute based on interest and work –Apply – retrieve, use in daily work –Create new knowledge Beckman, T. Eight stage process of KM

11 XMLandKM XML & KM – What’s the connection? Many KM activities that have nothing to do with technology Some KM activities have technology is a key enabler or component – in these cases XML is often under the hood –Knowing about XML means we can exploit the opportunities and see the limitations

12 XMLandKM XML Overview Structured data interchange –A common syntax for expressing structure in data Designed to account for “unstructured” data –documents Inherently conveys meaning/structure Content and display separate from structure Delivered via standard text files

13 XMLandKM XML in 7 bullets New, but not that new Structured data in a text file via markup Self-describing information Looks like HTML but isn't Verbose text, isn't meant to be read License-free, platform-independent and well- supported A family of technologies (parts adapted from Bert Bos,

14 XMLandKM Driving Forces for XML Adoption Internationalized media-independent electronic publishing Definition of platform-independent protocols for the exchange of data –electronic commerce –knowledge harvesting Information delivery to user agents –automatic processing after receipt

15 XMLandKM Benefits of Adoption Easier to develop software –handle specialized information distributed over the Web Processing information using lighter- weight software Allows greater end-user control of information display –style sheets Metadata for resource discovery

16 XMLandKM The *ML family SGML HTML XML From World Wide Web Consortium note W3C Data Formats, by Tim Berners-Lee.W3C Data Formats

17 XMLandKM SGML Designed for documents Very powerful Very complicated “Well defined” = strict rules Rigid - not very extensible Inappropriate for wide-spread use

18 XMLandKM HTML Simple, general-purpose document markup language Simple hyperlinking Designed for collaborative authoring Combined authoring and viewing roles

19 XMLandKM HTML Evolution Started with simple document description –Few tags designed for structuring documents Quickly evolved –forms –images –tables –frames –fonts

20 XMLandKM HTML shortcomings Not easily extensible –HTML standards change too slowly –Browser-specific tags ("extensions") –Totally geared toward document display Limited data formatting –mathematics Can't markup data in any structurally meaningful way

21 XMLandKM Why can’t HTML be used for information exchange? HTML markup provides no inherent method of knowing what the information is about Browser paradigm is too constraining Metadata schemes are deficient –Search engines return far too many hits Can't related information items (pages) to one another One-way linking is somewhat limited

22 XMLandKM How HTML confuses content and presentation …

23 XMLandKM Example - content and presentation mixture in HTML 005.72 M849et2001 Enterprise application integration with XML and Java Upper Saddle River, NJ : Prentice Hall PTR, 2001

24 XMLandKM But what does it mean?

25 XMLandKM XML represents structure, not presentation Enterprise application integration with XML and Java J.P. Morganthal, with Bill la Forge Upper Saddle River, NJ Prentice Hall PTR 2001

26 XMLandKM XML is hierarchical

27 XMLandKM Nesting rosette theme

28 XMLandKM Elements, Attributes, and Content Enterprise application integration with XML and Java J.P. Morganthal, with Bill la Forge

29 XMLandKM DOM – Document Object Model DOM – a platform- and language- neutral interface that allow programs and scripts to dynamically access and update the content, structure and style of documents Built into web browsers and servers –Used by web browser for dynamic display capabilities

30 XMLandKM Document Type Definition (DTD) A set of syntax rules for creating tags Defines –What tags can be used –The order they should appear in –Which tags can be nested –Which tags have attributes Can be part of an XML document –Typically defined externally


32 XMLandKM Attributes

33 XMLandKM Schemas Introduces a mechanism for strong typing –Allows a schema to be directly imported into a database to create a table Standardized NULL representation Key representation

34 XMLandKM Well-formed and valid Well-formed –Conforms to the general rules of XML syntax, which are very rigorous –Example – a tag must always be ended Discourse Analysis Valid –Documents that conform to the specific DTD in use

35 XMLandKM XML-Link and XML Pointer Open set of linking elements Non-directional –arbitrary –non-hierarchical XML Pointer –Enables addressing any part of a text A more powerful HTML “anchor” tag XML-Link –Enables attaching a behavior to a link –Extended links, similar to a web ring


37 XMLandKM Displaying XML information in the browser XML parser built in –Relates data stream to DTD and style sheet Style Sheets –Only method for formatting XML data for display Similar to HTML CSS –More powerful XSLT –Processing language that allows for transformation of data presentation

38 XMLandKM XHTML “Next generation” HTML HTML that conforms to XML standards Will eventually support integration with other XML applications Device independent web-access

39 XMLandKM XHTML Example Bare bones example validate

40 XMLandKM HTML 4 - XHTML Major Differences All related to “well formedness” –Tag/attributes must be in lower-case –Elements must nest, no overlap –All non-empty elements must be closed –All empty elements must be terminated –Attribute values must be quoted –Attributes cannot be minimized –Scripts should be downloaded from server

41 XMLandKM XML Life Cycle Authoring Presentation Search and Retrieval Integration

42 XMLandKM The Big Picture

43 XMLandKM Just “Add Water & Stir” XML (document or database) XSLT style sheet XSLT Processor (XML Parser) Browser (XML Parser)

44 XMLandKM Authoring Tools Editors (getting the content in) –XML and XSLT Editors XML Spy XML Notepad XMetal Xeena –Word processors WordPerfect –Content Management Systems

45 XMLandKM XML Spy Structured/document editor –XML –DTD –schemas (DCD, XDR, BizTalk, XSD) –XSLT Views for: –Structured editing (grid view, table view) –Document editing (WYSIWYG) Full Unicode support –MSXML3 is used by default, but can be changed

46 XMLandKM XML Notepad Quick and dirty editor for Windows Doesn't use DTD to guide editing –if present, however, validates it on document loading

47 XMLandKM XMetal Professional, full-featured XML/SGML editing tool –word processor-like view –source view –tag view SGML or XML DTD's –context-sensitive lists of allowed elements and attributes –supports CALS tables, DOM, CSS, and HTML Integrated browser preview for XML documents.

48 XMLandKM Xeena Loads DTD and provides tree-view syntax directed editing Aware of the DTD grammar –Makes only authorized elements icons sensitive –Ensures that all documents generated are valid according to the given DTD

49 XMLandKM WordPerfect Word processor with advanced support for authoring XML and SGML documents in a WYSIWYG environment Includes –Wizards –Automatic element insertion –Automatic generation of documents. The DTD, layout information, and mapping files are incorporated into a single WordPerfect template.

50 XMLandKM Content Management Systems Many CM systems repositories use XML under the hood for tagging and storing information Or can “speak” XML – export as XML to allow integration with other applications Open any trade magazine and see the standard vendor names proclaim their support for XML To the document creator, XML is “invisible”

51 XMLandKM XML Conversion Tools Examples: Logictran RTF Converter HTML Tidy –Free Windows program –Converts HTML to XHTML or XML

52 XMLandKM Logictran RTF Converter Converts Word and RTF documents to HTML, XML, SGML The converter allows you to create output for any DTD. You can generate HTML, XHTML, OEB and Docbook.

53 XMLandKM XSLT Processors Means of converting files between XML dialects and other formats –MSXML built into Internet Explorer –Xalan

54 XMLandKM XML Parsers Examples Expat –Written in C (ported to other languages), used by LIBWWW, Apache, … XML4J –from alphaWorks, in Java, based on Apache Xerces, supports DOM and SAX Many other parsers

55 XMLandKM Servers Apache XML – built in Xerces XML parser, Xalan XSLT

56 XMLandKM Browsers Internet Explorer 6 –XML support is fairly extensive –Namespaces are supported –Supports Style sheets in CSS as well as XSLT 1.0 Parser is still an issue Netscape 6.1 –supports HTML 4.0, XML, CSS, DOM, namespaces, simple Xlink –Does NOT support XSLT Opera – supports XML

57 XMLandKM XML Standards & Applications Many activities where XML has a role OASIS has an extensive list of applications –RSS (news headlines) –MathML –SMIL –DocBook

58 XMLandKM XML Standards – Multiplying Like Rabbits Software applications (transactions, interchange) Publishing

59 XMLandKM Software Applications Office tools and groupware Decision support systems Functional/transactional systems for HR, CRM.. Intelligent systems (ES, IPSS) User support

60 XMLandKM Publishing Digital rights (EBX,…) DocBook, e-book, TEI News (RSS, ICE, nift, NewsML) Special subject area formats (MathML, ChemML, CellML, GeneXML)

61 XMLandKM MathML Display of mathematical formulas x 2 + 3x - 3 = 0 x 2 + 3 &invisibletimes; x - 3 = 0

62 XMLandKM Publishing: News Web site news Syndicated news Headlines Full text KM applications Integrating internal, external news, creating auto-categorization of news, adding items to the news based on new additions to the repository, user profiling ICE RSS NewsML nift

63 XMLandKM RSS (Rich Site Summary) CRM News Web news format Simple application Take a look at the bits and peices

64 XMLandKM RSS – Why? The Need –Quick, easy, and consistent announcements pushed out to other sites –Incorporate news and other information feeds on a site

65 XMLandKM How it works

66 XMLandKM Before RSS No standard Every one put up what was new and described it differently Special one off programs to create parsers and screen scrapers

67 XMLandKM The Result > 1700 sites sharing news Many sites re-posting the headlines Examples: xmlTree - directory of contentxmlTree

68 XMLandKM RSS Syntax RSS file has two major placeholders for data: channel and items.

69 XMLandKM Channel Element The channel element must contain the following: title or name of the channel, short description of the channel, link to the web site of the channel, and the language that is encoding the web site. Also, numerous optional elements can be included with the channel, such as copyright, webmaster, publication date and so on.

70 XMLandKM Item Element RSS file can have up to 15 item elements. Item elements are used to store the headlines and are the meat of the document. Item elements have the following elements: title link description

71 XMLandKM RSS Code First line contains an XML declaration: The next item is the DTD identifier

72 XMLandKM RSS Statement Next, the rss element –must specify the version attribute. –may contain an encoding attribute the default is UTF-8

73 XMLandKM Channel Definition Contains a single channel element. –Title, description, link to channel’s web site, language, one or more item elements, lots of optional elements moreover... US politics news US politics news - news headlines from around the web, refreshed every 15 minutes en-us

74 XMLandKM Item Elements Up to 15 item elements 'Author Unknown' by Don Foster acks/index.html Salon Nov 2 2000 6:51AM

75 XMLandKM From Simple Documents to Complex Hierarchical Many objects and elements Many “namespaces”

76 XMLandKM Namespaces A single XML document may contain elements and attributes that are defined for and used by two or more XML- based languages without conflict or ambiguity

77 XMLandKM Example Working Knowledge Overview and case studies of knowledge management 5. Knowledge Transfer …

78 XMLandKM OEB - Open E-Book In September 1999, the group published the Open E-Book 1.0 Publication Structure The Open E-book standard is essentially XHTML—that is, a clean version of HTML 4.0 along with support for CSS.

79 XMLandKM RDF - Resource Description Framework Framework for metadata Interoperability of information exchange between applications Applications: –Resource discovery –Knowledge sharing and exchange –Content rating –Intellectual property rights

80 XMLandKM RDF Example <rdf:RDF xmlns:rdf=" syntax-ns#" xmlns:dc=""> <rdf:Description rdf:about="http://your.url" dc:creator=”Frank Cervone" dc:title="My RDF document" dc:description=”Exciting RDF Stuff." dc:date=”2000-11-10" />

81 XMLandKM Emerging Standards For KM XTM OPML RFML FLBC ebXML

82 XMLandKM XTM: Topic Maps Used to organize information into knowledge bases Topic maps are a new ISO standard for describing knowledge structures and associating them with information resources “GPS” for information ml “A book without an index is like a country without a map”

83 XMLandKM OPML Outline Processor Markup Language –Outline-structured information Used for data the is easily browsed and editable –Specifications –Legal briefs –Product plans –Presentations –Screenplays –Directories

84 XMLandKM RFML Relational-functional markup language Used to define relationship and functions among data elements –Tables within relational databases –Relational views

85 XMLandKM FLBC Formal Language for Business Communication –Automated communication –Conversation management –Dialog management –Based on speech act theory Formally defined message types Broad range of message types Defined in terms of intentions Clear delineation between message type and content

86 XMLandKM XML in Use Portals Content management & syndication Content management: industry sector Integration Analytical/decision making Search and retrieval Visualization

87 XMLandKM Applications: Portals Portal are an obvious place for XML to be used. Most are integrating diverse data sources. Examples: –Hummingbird’s Enterprise Portal Suite allows XML-based third party application integration for variety of scripting languages Basically “write with your own tools/platform” exchange data with XML –DataChannel, Sybase Enterprise Portal, Citrix XPS,

88 XMLandKM Content Production & Syndication Interwoven –Intranet/extranet content management and authoring based on intelligent business rules, profiling etc. –Newest component of Interwoven’s suite of tools focuses on content distribution and uses XML. –OpenSyndicate uses a XML repository which allows content to be stored as objects and reused for multiple projects.

89 XMLandKM Open Syndicate

90 XMLandKM Content: Industry Specific Solutions Ringtail Solutions –Suite of litigation support and KM modules for legal practitioner

91 XMLandKM Integration InfoShark –Used to integrate data from host of services and programs, from 100’s to 1000’s of transactions each day –Automates data exchange between Oracle, IBM DBW and Microsoft SQL for use over Internet, intranets, and extranets –Being used by Montgomery county for eGov services of all types

92 XMLandKM Analytical/Decision Making Spotfire –DecisionSite 6.2 is powered by XML-based application manager to tools, guides, resources for Genomics, Chemistry And Manufacturing

93 XMLandKM Search and Retrieval Powerful and precise searching Examples of search engines that support XML: Verity, Inktomi Builders, harvesters and automatic metadata creation and categorization (Data Harmony’s Machine Aided Indexer)

94 XMLandKM Visualization –visual mapping technology provides enterprises with data search and discovery,

95 XMLandKM Not a Silver Bullet “ XML is not the answer to all the world’s problems—it creates new problems, that are awfully damn interesting to solve.” Simon St. Laurent, author of XML: A Primer, on the xml-dev mailing list

96 XMLandKM Thank you! Frank Cervone Assistant University Librarian for Information Technology, Northwestern University Darlene Fichter Data Library Coordinator, University of Saskatchewan

