Presentation is loading. Please wait.

Presentation is loading. Please wait. IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display.

Similar presentations

Presentation on theme: " IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display."— Presentation transcript:

1 IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display (d) Document formatting - searching

2 2 Agenda 1.Document formatting 2.Topic 2 (e) Mark-up and page display 3.Topic 2 (f) Mark-up and document searching 4.Implications of document mark-up for web applications and information professionals

3 3 Elements of the Web THE WEB Connecting computers Digital representation of documents Display and organisation of documents Linking documents

4 4 1. Document formatting For the writer: need standard capability for formatting and for describing document content For the reader: need software which can find documents and can display them as the author intended Different types of formatting Formatting for layout/structure Formatting for appearance/style Formatting for content description Note: interpret ‘document’ very broadly

5 5 Document appearance content layout style Document format content layout style The document formatting problem:Display Creator of document Audience Document Document reader Write/ format Read

6 6 Select document with contents best suited to information needs Doc 3 The document formatting problem:Searching Creators of documents Audiences DocumentDocument reader Describe document content Read Doc 1 Doc 2 Doc.. Doc n

7 7 Needs for formatting for display Handle documents with content of all types - text, graphics, photos, sound, film, etc Support all possible document display needs Device independence for writer and reader: Writer should be sure that his formatted document will be able to be read by anyone; Reader should be sure that he can read any document; Both reader and writer should feel sure that the reader sees the document exactly as the writer intended it

8 8 Needs for formatting for searching Can readers find the relevant documents which they need? Can authors ensure their documents are described and classified appropriately? The biggest library in the world is useless if the content is not organised and classified so people can find what they want

9 9 2. Topic 2(e) Mark-up languages and document display: Brief history GML (late 1960s IBM) SGML (late 1970s-mid 80s) HTML (1990) CSS (mid 1990s) XHTML (late 1990s) XML (late 1990s)

10 10 What is a Mark-up Language? A set of standard formatting symbols, incorporated into a document to direct the computer how to format and display it Can be used to describe: document structure/layout (ie headings, paragraphs, titles, etc) text appearance (fonts, typeface, etc) style sheets (templates, etc) (Can also include metadata for contents; see next section)

11 11 Features of a hypertext mark-up language Tags - all document elements are marked by tags delimited by arrows; eg means start paragraph; means end paragraph Attributes - some tags can have attributes which specify extra detail; eg means start a new paragraph and centre it on the page Links - the anchor tag allows you to link an element in the document to a location in another web-based document eg

12 12 A sample of a ML: Some formatted text... Test sample of a Mark-up Language This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer. Note how this text has been formatted to include features such as: Headers/titles Paragraphs; Italics and Bolding multiple fonts Spacing and text alignment;

13 13... and the HTML tags to format it Test sample of a Mark-up Language Test sample of a Mark-up Language This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer. Note how this text has been formatted to include features such as: Headers/titles Paragraphs; Italics and Bolding multiple fonts Spacing and text alignment;

14 14 Evolution: From GML to HTML GML (General Mark-up Language): The original mark-up language; developed in IBM to cope with problems of multiple document formats SGML (Standardised General Mark-up Language) - an international standard for mark- up based on GML HTML (hypertext mark-up language) - a ‘quick and dirty’ mark-up language developed by Tim Berners-Lee for formatting documents for display on his web

15 15 HTML: Hypertext Mark-up Language The original web ML Grew from a very simple Version 1.0 (only 20 tags to format layout) to a large and complex ML Four main ‘standard’ versions: 2.0 (1994); 3.2 (1996); 4.0 (1997); 4.01 (1999) Initial simplicity lost as new formatting capabilities were added; many ‘non-standard’ tags No more up-grades after 4.01; replaced by XHTML

16 16 Dealing with non-text elements of documents HTML permits the inclusion of graphics, sound, video, animation, program scripts, etc as objects within the document Standard file formats have evolved - covered in lecture on digital representation Inclusion of program-like objects covered in lecture on interactivity

17 17 Improving on HTML:Cascading Style Sheets (CSS) Developed to make up for inadequacies of HTML for controlling document appearance/style Works like templates and style sheets in Word, Powerpoint, etc Enables definition of standard text style - fonts, typeface, etc Followed by other variants of HTML also developed to improve it, fix it, enhance it, etc DHTML, MathML, VRML, etc

18 18 Making HTML extensible What if I want to write a document which needs a special type of formatting? Work around it within HTML? Get new tags built into HTML? Build a new specialised ML? The concept of the extensible language XHTML: A sort of HTML 5.0, but now with extensibility built in - allows you create and define your own tags XML (see next section)

19 19 ‘Normal’ technology for computer- based document formatting Document creation software (eg MS-Word) puts formatting instructions in the document Document is read with the same software that the writer used to create it (eg MS-Word) Formatting controls are unique to each software package (copyright) Software companies sometimes won’t support the formats used by other rival companies ‘Neutral’ formats (eg.rtf,.txt, etc) are more widely accessible but may lose formatting

20 20 Technology for formatting web documents: Writing HTML Initially, all HTML formatting done by hand in text editors Then, specialist HTML composers developed - Front Page, Dreamweaver, etc Now, even MS-Word can generate an HTML file ‘Quality’ of HTML from composer packages is very variable; compliance with W3C standard? Often need to ‘patch’ composer code

21 21 Technology for displaying formatted web documents: Browsers Berners-Lee’s browser for displaying HTML- formatted pages Mosaic, Netscape, IE, etc The inter-relationship between browsers and HTML Plug-ins Error-handling with HTML Proposed error-handling with XML

22 22 3. Topic 2(f): Mark-up for document searching Imagine a library without a catalogue; how would you find things in it? Imagine a librarian without a cataloguing system; how would you know where to put everything? The web can be thought of as a vast library of documents Where is its librarian and cataloguing system?! So how do we find things in it? The need for metadata

23 23 Metadata Metadata = data about data. Metadata for a document is data about the document and its contents Relate to: Book indexes; library catalogues, etc data dictionary entries,entity attributes, etc Usefulness for the reader: To help me find a document; to help me find the ‘best’ document for my needs Usefulness for the author: To help users find my document; to make it clear what my document is about

24 24 Indexing documents and metadata Ideally every document should include its metadata Metadata may need to be about: the document itself: eg author, title, date created, version, etc the document’s content: eg content description, topics, related documents, etc; document elements may also need metadata - eg source, creator, copyright, etc Different sorts of documents/document elements have different sorts of relevant metadata Can we develop universal metadata formats?

25 25 Mark-up Languages and metadata Metadata was always considered an important issue in ML development (SGML, etc) … … but it was not a big deal in the original development of an ML for the web (remember Berners-Lee’s original limited vision of the web) Therefore, HTML had very primitive metadata capabilities

26 26 Metadata and HTML HTML allowed metadata to be contained in two main tags: (info about document) (information about content) Primitive capabilities Not a required element of the document No standards for how metadata should be managed within the tags Ignored by most HTML authors

27 27 Searching methods and metadata: finding documents on the web What do I want when I search for documents? All documents relevant to my query (rigour) No documents which are irrelevant to my query (precision) A ranked list in order of relevance A ranked list in order of ‘quality’ (validity, currency, size, …?) (Note the contradictions inherent in these needs)

28 28 Possible search/indexing methods “Brute force” Usage or linkage-based Librarian-based Author-based (See tutorial resources for detailed explanations)

29 29 “Brute force” search/indexing The basis of most early web search engines: Index every word in the document Determine relevance and ranking from word frequency and position No cataloguing required Can be completely automated and therefore very cheap Gives lots of ‘hits’ for almost any word, but very inaccurate and imprecise

30 30 Usage and linkage-based search/indexing methods Initially most famously used by Google; now very common: Identify document content by its associated links Measure document quality by its popularity or by the number of links to it No cataloguing required Can be completely automated and therefore very cheap Accuracy is variable; coverage is incomplete Gives lots of hits, but patchy results

31 31 Directory-based search/indexing methods Initially used by Yahoo: ‘ Librarian’ views site and identifies its content (indexes) and quality (relevance and ranking) ‘Librarian’ includes site in catalogue of sites Labour-intensive and therefore expensive Relies on librarian’s expertise and judgement of content Generally accurate and precise, but requires skilled librarians

32 32 Author-based search/indexing methods Basis for metadata systems: Document author includes document and content information in the document Indexes are based on author-supplied information Uses only the author’s time; therefore cheap for document distributor Relies on author’s understanding of indexing and metadata As accurate and precise as the document author makes it

33 33 XML: Extensible Mark-up Language Developed by W3C and others as the ‘big one’ - “the universal format for structured documents and data on the web” Key concern is metadata, but also aims to provide a framework within which display formats such as HTML, CSS, etc can sit as specialised languages Provides framework within users can create and define their own mark-up tags for specialist applications (hence “extensible”) Based on SGML, but guided by experience with HTML

34 34 The role of other ML’s after XML HTML, CSS, etc are so widely-used that they must still be supported Future versions will be (should be!) sub-sets of XML; eg after HTML 4.01, next version was XHTML, etc New specialist standards and formats developed to live under XML and as a bridge back to old formats Move to ‘tighter’, less forgiving mark-up interpreters to ensure compliance

35 35 XML and metadata standards For XML to extend to cover all document types, it needs to have standards for these document types Hence, various standards are being developed within the XML framework to try to define the ‘standard’ document elements for different document types Leads us to the Semantic web and web services (see later lecture)

36 36 4. Implications of mark-up for web applications The growing complexity of mark-up languages and browsers The merging (collision?) of form (document appearance and display) with function (document description and searching) What matters most for your application? How easily can it be implemented in a way which readers/users can deal with?

37 37 Initial Vision User End Document End User reads with...Developer creates with... Standard mark-up language (HTML) Standard browser (Mosaic)

38 38 Actuality User EndDocument End HTML DHTML XHTML XML + Other media + Scripts Netscape Internet Explorer Opera etc … to display ‘standard’ media + plug-ins to display other media (Acrobat, Real Player, etc) User accesses with... Developer creates with... Composer tools which generate marked-up document: Dreamweaver, Front Page, CMS, etc Scripting Languages

39 39 Developing around the limits of HTML/XML and browsers Standards for all document formatting needs? Support for standards by the big industry players? Technical expertise needed in order to publish material which displays properly on the web? Technical expertise needed to find and read material on the web? Web access from non-computer-screen devices?

40 40 Ambition and Achievability (1) Can a single formatting standard satisfy every document display and content description requirement? Can the HTML web be turned into the XML web? Conceptual simplicity vs practical complexity See Jan Bosak’s remarks (next slide)

41 41 Ambition and Achievability (2): Jan Bosak on mark-up, linking and display Early visionaries went further than adopters would follow Breakthroughs came from newcomers who simplified earlier advanced techniques Original more complex work is now being re- introduced and seen as necessary Biggest roadblock to success with more advanced solutions is the success of the simple limited solutions (Prolog to: Goldfarb, C (2004) ‘XML handbook’, 5 th ed)

42 42 What does this mean for information professionals? Learning about HTML tags? Learning about browser capabilities and differences? Learning about search engines? Learning about metadata? Learning about XML? Web services and the semantic web

Download ppt " IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display."

Similar presentations

Ads by Google