Presentation is loading. Please wait.

Presentation is loading. Please wait.

IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Part 2. ISD for systems with non-uniformly.

Similar presentations


Presentation on theme: "IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Part 2. ISD for systems with non-uniformly."— Presentation transcript:

1 IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Part 2. ISD for systems with non-uniformly structured data Part III: (one out of) Basics of web-based IS (www, web2.0, …) Markup’s, HTML, XML Design tools for Info Sys: UML API’s for mobile apps Security, Cryptography IS product lifecycles Algorithm analysis, P, NP, NPC Info storage (modeling, normalization) Info retrieval (Relational algebra, Calculus, SQL) DB integrated API’s

2 Agenda Brief introduction to file storage and protocols Basics of xml

3 Files: unit-containers of data Files are a common mechanism of data interchange in communication Data stored in a file must be readable by an application. Standards: Format in which [some specific type of] data is stored Type of Standards: Closed, Open, Open with proprietary extensions Examples ?

4 Protocols Implications of closed/open standards Closed: Can be used to leverage dominance in one application to other applications Examples ? Protocol analyzers and reverse engineers can discover standards  arms race Open: Seamless data exchange, Choice of applications, Competitive pricing and features, Resistance to catastrophic attacks, …

5 Data Protocols: documents Text files: A sequence of standard ASCII encoded characters - Can be read by _many _ applications - No formatting information, except for line feeds, tabs. Text, HTML, SGML, XML, … HTML: HyperText Markup Language - Sequence of standard ASCII encoded characters - File contains: DATA + how data should be DISPLAYED - Open standard, can be read/displayed by _many_ applications

6 Data Protocols: HTML HTML is a markup language A markup language specifies: - what markup is allowed and whereabouts - what markup is required - how markup is to be distinguished from text - what the markup means

7 Data Protocols: HTML.. Significance of HTML: - Can refer to other sources of data anywhere in the internet (URL) Title appears on the title bar of the window Here is the URL of HKUST home page Billions of documents can be linked in a network of information (www)

8 Data Protocols: SGML Standard Generalized Markup Language - Sequence of standard ASCII encoded characters - Open standard, can be read/displayed by _many_ applications - SGML is a metalanguage: - HTML describes how data should be displayed - SGML describes the structure of the data: It provides a means to describe a markup language, e.g. HTML.

9 Data Protocols: SGML.. SGML is NOT a markup language, it describes markup languages SGML allows you to describe a markup language independently of what the markup is intended to do

10 PLM data Protocols: SGML Fundamental notions in SGML: Markup Entities Markup Elements and their Attributes Document type Entities are strings of bytes; each entity must have a name. The string may be an entire file, or just one character. Elements: All documents are made up some type of higher level objects, e.g. paragraphs, titles, pictures, lists, etc. Document type : A DTD describes the entities, their allowed structure that can appear in a document.

11 PLM data Protocols: SGML Practical SGML: - Very powerful - Arbitrary freedom to the designer to describe document structure - Complex to create consistent, time constant DTD So ? It provides a mechanism to impose structure on unstructured data - Mother of HTML - Superset (and Mother) of XML

12 Data Protocols: XML eXtensible Markup Language - Sequence of standard ASCII encoded characters - A metalanguage that allows users to define their own markup language - A simplification of SGML - Is becoming the dominant document protocol for most applications. XML provides a way to - describe how the document is structured [this can also be done using style-files, which can be shared] - Document contains data organized in schemas defined using the declared structure

13 XML (some motivation) The Vision All applications on the web are easy to make open. Goods and services are easy to find. For example, any customer can: - Discover all sites that have some used book he/she cannot find. Then order the book from one of them. - Open a spreadsheet or Java application and easily let either of them talk directly to any site that manages the customer's portfolios. Then make changes to the portfolio. In short, make it easy to discover and interact with structured data and applications on the web. -Adam Bosworth, Microsoft Inc. Currently, all Microsoft Office™ document types can be stored in XML format.

14 Use of XML Practical use of XML: Schema of entities can match structure of Databases  Allows direct data exchange between documents and DB Practical use of XML in collaboration: Companies can share schemas  Shared terminology, easier to write applications to exchange data Practical use of XML w.r.t. DB-backed apps: HTML is widely used for formatting and structuring Web docs, but it is not suitable for specifying structured data that is extracted from databases.

15 Structured, Semi Structured and Unstructured Data Structured Data: Information stored in databases is known as structured data because - it is represented in a strict format. - The DBMS ensures that all data follows the structure, constraints specified Semi-Structured Data: Data that may have a certain structure, but not all the information collected will have identical structure. The schema information is mixed in with the data values; each data object can have different attributes that are not known in advance. Unstructured Data: Files of data, but with very limited indication of the type of data. Examples: A text document with a story. Web pages in HTML that contain some data.

16 XML basics The basic object is XML is the XML document. Each XML document describes a collection of objects; each object is described by zero-or-more objects, and zero-or-more elements Elements Attributes Attributes are used in XML to provide additional information about elements Kenny 12345678 Anton Kenny 12345678 Anton Elements: class, student, … Attribute: code

17 XML basics.. Objects are enclosed in ‘tags’ Each tag has a user-defined name XML specifies the structure of data, but not how to display it XML docs must have exactly one ‘root’ object  each document is a tree An element may contain elements or text Kenny 12345678 Anton

18 XML basics… XML separates the ‘structure of the data’ from ‘how to display it’ The formatting for displaying XML documents is specified by -- Cascading Style Sheet (css) files -- eXtensible Stylesheet Language Transformation (XSLT) files preferred by W3C

19 Adam Eve Anton Adam Eve Anton XSLT basics

20 Well-formed and Valid XML documents Well formed XML documents contain no syntax errors: - single root - every tag has a closing tag - proper nesting of open/close tags … Valid XML document: An xml document that is well-formed, and obeys the structure specified by a separate xml DTD file or Schema file DTD: Document Type Definition simplified example of a DTD:

21 Software support for XML, XSLT Most web browsers, text editors (e.g. MS Word) can parse XML Design software support: Dreamweaver (Adobe™), Stylus studio, … DTD’s are useful only when data and document strictly follow a fixed structure; practically, most IT applications use an alternative, XML Schema, to specify document structure.

22 XML: summary Separation of data from its display/visualization  - we can use the same layout (Stylesheet) for all pages on a website - uniformly change the layout of a web-site by changing the Stylesheets User-defined tags  Different companies can use mutually agreed terminology, exchange info For example: if all spreadsheets are stored in universally agreed xml format then any spreadsheet file can be manipulated by arbitrary software, like MS Excel, Google spreadsheet, OpenOffice, …

23 Closing notes: motivation for XML XML enforces humans to tag the content of data/text with their meaning. An alternative, i.e. to use computer programs to understand meaning of the data is an important, but difficult problem (see below) and is called Natural Language Processing. I saw the man on the hill with a telescope. ?

24 References and Further Reading XML on wikipedia XML on w3schools Next: UML Reference paper: Andre Bergholz, Extending your markup: An XML tutorial, IEEE Internet computing, July-August 2000, p74-79


Download ppt "IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Part 2. ISD for systems with non-uniformly."

Similar presentations


Ads by Google