Download presentation
Presentation is loading. Please wait.
Published byJared Miles Modified over 9 years ago
1
HTML and Web Pages Transparency No. 1 HTML and Web Pages Cheng-Chia Chen
2
HTML and Web Pages Transparency No. 2 Outlines HyperText, Markup language and HTML URLs and related schemes Brief introduction to HTML Brief introduction to CSS Limitations of HTML unicode
3
HTML and Web Pages Transparency No. 3 References (& further readings) HTML: HTML(5) Tutorial HTML(5) Tutorial from w3schools.com Introduction to HTML Introduction to HTML (HTML5) from MDNHTML5MDN CSS: CSS TutorialCSS Tutorial from w3schools.com CSS Getting StartedCSS Getting Started tutorial (from MDN )MDN HTTP: HTTP Made Really Easy HTTP:The Protocol Every Web Developer Must Know-Part 1
4
HTML and Web Pages Transparency No. 4 HTML What is HTML ? a textual laguage for describing web pages. an acronym for Hyper-Text Markup Langugage. a combination of two concepts: hypertext, markup language HyperText: Collections of documents connected by hyperlinks A concept predating WWW for many years (see textbook): Paul Otlet, original philosophical treatise (1934) Vannevar Bush, hypothetical Memex system (automated desk in which docs stored in microfilms and referenced by code numbers) (1945) --- first hypertext mechanism Ted Nelson: first working system and ther term 'hypertext' introduced (1968) Hypermedia: generalizes hypertext beyond text
5
HTML and Web Pages Transparency No. 5 Are languages that allow you to add markups to the data of a document. Markup is text (control codes, tags, etc.) that is added to the data of a document in order to convey additional information about it can be used to make explicit the formal structure of text pre XML(1998) Charles Goldfarb, Ed Mosher and Ray Lorie invent GML in 1969 at IBM implement the INTIME system (1970) culminated in Standard Generalized Markup Language, SGML (1986) Example: DTD, element, attribute, tag: <!DOCTYPE greeting [ ]> hello world! Markup Languages
6
HTML and Web Pages Transparency No. 6 The Origins of the WWW WWW was invented by Tim Berners-Lee at CERN (1989) Hypertext across the Internet (replacing FTP) Three constituents: HTML + URL + HTTP HTML for content representation URL for addressing HTTP for request/response content transfer HTML is an SGML language for hypertext URL is an notation for locating files on servers HTTP is a high-level protocol for file transfers
7
HTML and Web Pages Transparency No. 7 The Design of HTML Simple, purist design principles HTML describes the logical structure of a document Browsers are free to interpret tags differently HTML is a lightweight file format Size of file containing just ” Hello World! ”: Postscript11,274 bytes PDF4,915 bytes MS Word19,456 bytes HTML28 bytes
8
HTML and Web Pages Transparency No. 8 The History of HTML 1992: HTML 1.0, Tim-Berners Lee original proposal 1993: HTML+, some physical layout 1994: HTML 2.0, standard with best features 1995: Non-standard Netscape features 1996: Competing Netscape and Explorer features 1996: HTML 3.2, the Browser Wars end 1997: HTML 4.0, stylesheets are introduced 1999: HTML 4.01, slight modification to 4.0 2000: XHTML 1.0, an XML version of HTML 4.01 2001: XHTML 1.1, modularization 2002: XHTML 2.0, simplified and generalized 2014-10-28: HTML 5 (History)HTML 5History and Current Status...Current Status...
9
HTML and Web Pages Transparency No. 9 Uniform Resource Locator referecnes: spec: http://www.ietf.org/rfc/rfc1738.txthttp://www.ietf.org/rfc/rfc1738.txt wiki: http://en.wikipedia.org/wiki/URL A Web resource is located by a URL http://www.w3.org:80/TR/html4/... URL = [scheme:][//authority][path][?query][#fragment] Relative URL & query string: employee /emp?id=12&name=jack Fragment identifier (reference; ref) http://www.w3.org/TR/HTML4/#minitoc Java API for URL: Java.net.URLJava.net.URL scheme authority or server path
10
HTML and Web Pages Transparency No. 10 URIs, URNs, and IRIs URI = URL + URN. Means to describe all resources in the information space, even those that do not have a physical presence. Can use ASCII code only. references: // java API for URI : java.net.URIjava.net.URI http://www.w3.org/Addressing/ http://en.wikipedia.org/wiki/Uniform_Resource_Identifier Uniform Resource Identifier (URI) (RFC2936, 3986)RFC29363986 scheme:scheme-specific-part Conventions about use of /, #, and ? common schems: http,https, ftp, mailto, file Use %hh for escaping of special char. E.g., # %23. Uniform Resource Name (URN) (RFC2141)RFC2141 a pointer to a resource, but without a reference to its location. Ex: urn:isbn:0-471-94128-X urn:jdbc:... International Resource Identifier (IRI) (RFC3987)RFC3987 Allow use of non-ascii code; mapped to URI by complex encoding function. ex: http://www.bl å b æ rgr ø d.dk/bl å b æ rgr ø d.html --> http://www.xn--blbrgrd-fxak7p.dk/bl%E5b%E6rgr%F8d.html
11
HTML and Web Pages Transparency No. 11 Overall structure of an HTML (from w3schools)documentHTMLw3schools DOCTYPE head The Title of the Document meta script -->stylelink... Brief introduction to HTML
12
HTML and Web Pages Transparency No. 12 Simple Formatting (1/2) Good Advice Good Advice for Everyday Life For UNIX programmers Never type: rm -rf /* on your computer. For Nuclear Scientists Never press the Big Red Button. pre : preformatted text ul : unordred list li: list item ol: ordered list h1 ~ h6. : headings; b : bold i : italics tt: fixed-width font br: break of line font: font style designation
13
HTML and Web Pages Transparency No. 13 Simple Formatting (2/2)
14
HTML and Web Pages Transparency No. 14 More Formatting Things To Do Feed the cat. Try out the shell command: foreach x ( `ls` ) cat $x | tr "aeiouy" "x" > $x end Buy ticket for Timbuktu.
15
HTML and Web Pages Transparency No. 15 Hyperlinks: Source Document Source Document Better look here.
16
HTML and Web Pages Transparency No. 16 Hyperlinks: Target Document Target Document... Chapter 17: Dangerous Shell Commands Never execute a shell command that inadvertently changes all vowels to the character 'x'.
17
HTML and Web Pages Transparency No. 17 Tables (link)link PostScript 11,274 bytes PDF 4,915 bytes MS Word 19,456 bytes HTML 28 bytes
18
HTML and Web Pages Transparency No. 18 Tables for Alignment Using Tables
19
HTML and Web Pages Transparency No. 19 Fill-Out Forms ( ) Collects named values from the client Allow clients to invoke remote services
20
HTML and Web Pages Transparency No. 20 GUI Elements Small Medium Large Cheese Pepperoni Anchovies Small Medium Large Cheese Pepperoni Anchovies Types of input: text, password, radio, checkbox, button hidden, file, reset, image, submit. - New Input Types in HTML5New Input Types in HTML5
21
HTML and Web Pages Transparency No. 21 GUI Elements Write something here...
22
HTML and Web Pages Transparency No. 22 Logical Versus Physical (Logical) structure the page starts with a header the entries are written in a list numbers are emphasized (Physical) layout&rendering headers are centered, huge, and grey lists have square bullets emphasis is rendered in bold-style italics ●To promote the reuse and unification of document styles,It is desirable to separate the physical layout and presentation info of a document from its structure.
23
HTML and Web Pages Transparency No. 23 Brief introduction to to CSS ( , ) Cascading Stylesheets separate structure from layout/formatting The essential concepts are selectors and properties Properties may have different values: colorred, yellow, rgb(212,120,20),#ff0000 font-stylenormal, italics, oblique font-size12pt, larger, 150%, 1.5em text-alignleft, right, center, justify line-heightnormal, 1.2em, 120% displayblock, inline, inline-block,inline-block list-item,...,none + + display: none => hide the content and take no space 1em = the length of "M"; 1ex = the length of "x".
24
HTML and Web Pages Transparency No. 24 HTML Block and Inline Elements ( ) Every HTML element has a default display value. Most elements have either block or inline value. Block-level Elements A block-level element always starts on a new line and takes up the full width available (stretches out to the left and right as far as it can). Examples:, -,, Inline Elements An inline element does not start on a new line and only takes up as much width as necessary. Examples:,,.
25
HTML and Web Pages Transparency No. 25 Generic block/in-line elements : define a section in a documentdiv a block-level element often used as a container for other HTML elements. has no required attributes, but style and class are common. When used together with CSS, the element can be used to style blocks of content: Ex: Ex1; HTML Layouts (using div)Ex1HTML Layouts (using div) : define a section in a documentspan used to group inline-elements in a document. provides no visual change by itself. provides a way to add a hook to a part of a text or a part of a document. Ex: Ex1; Ex2Ex1Ex2
26
HTML and Web Pages Transparency No. 26 Structure of a Stylesheet A selector is a list of tag names For each selector, some properties are assigned values: b {color: red; font-size: 12pt} i {color: green} Longer selectors give context sensitivity: table b {color: red; font-size: 12pt} form b {color: yellow; font-size: 12pt} i {color: green} /* css comment; no // comment! */ The most specific selector is chosen to apply
27
HTML and Web Pages Transparency No. 27 Selector Possible selector tags element name(type) a { color: red; } => … element id #redLink { color: red; } => … element class.redLink { color: red; } => … [attr] => … [attr=‘abc’] => … e: first-child => Every e element which is the first child of its parent. Complete list of CSS3 selectors. Complete list of CSS3 selectors
28
HTML and Web Pages Transparency No. 28 Tag relationship in selectors Tag relationship... T1 T2 … => T2 inside T1 … T1>T2 … => T2 immediate inside T1 … T1+T2 … => T2 immediately following T1 … T1~T2 … => T2 following T1 (T1 and T2 are sibling but T2 need not immediately follow T1) More selectors (including those in css3) More selectors
29
HTML and Web Pages Transparency No. 29 Specificity in Action b {color: red;} b b {color: blue;} b.foo {color: green;} b b.foo {color: yellow;} b.bar {color: maroon;} CSS Test Hey! Wow! Amazing! Impressive! k00l! Fantastic! Hey! Wow! Amazing! Impressive! K00l! Fantastic! Rule for selector conflicts: 1. By the lexical graphical order of the triple in each selector: (#IDs, #classes, #elements) 2. last-wins if 1 cannot resolve.
30
HTML and Web Pages Transparency No. 30 Applying a Stylesheet h1 { color: #888; font: 50px/50px "Impact"; text-align: center; } ul { list-style-type: square; } em { font-style: italic; font-weight: bold; } Phone Numbers Phone Numbers John Doe, (202) 555-1414 Jane Dow, (202) 555-9132 Jack Doe, (212) 555-1742
31
HTML and Web Pages Transparency No. 31 HTML Validity HTML has a formal syntax specification 800 lines of DTD notation A validator gives syntax errors for invalid documents Most HTML documents on the Web are invalid (data taken from the textbook(2006) ): Valid documents may contain this logo: available validators:validators a unifying validating service (unicorn)unifying validating service unicorn from w3schools:w3schools www.microsoft.com 123 errors www.cnn.com 58 errors www.ibm.com 30 errors www.google.com 27 errors www.sun.com 19 errors
32
HTML and Web Pages Transparency No. 32 Validation Errors Line 3, column 7: document type does not allow element "BODY" here. ^ Line 4, column 13: document type does not allow element "B" here; assuming missing "CAPTION" start-tag 123 ^ Line 4, column 20: end tag for element "I" which is not open. 123 ^ Line 4, column 28: end tag for "B" omitted, but its declaration does not permit this. 123 ^ Line 4, column 11: start tag was here. 123 ^ Line 4, column 28: end tag for "CAPTION" omitted, but its declaration does not permit this. 123 ^ Line 4, column 11: start tag was here. 123 ^ Line 4, column 28: end tag for "TABLE" which is not finished. 123 ^ Line 6, column 6: end tag for "HTML" which is not finished. 123
33
HTML and Web Pages Transparency No. 33 Reasons for Invalidity Ignorance of the HTML standard Lack of testing ”This page is optimized for the XYZ browser” ”This page is best viewed in 1024x768” Automatic tools generate invalid HTML output Forgiving browsers try to interpret invalid input Lousy HTML This is not very good. In fact, it is quite bad But the browser does something.
34
HTML and Web Pages Transparency No. 34 Problems with Invalidity There are several different browsers Each browsers has many different implementations Each implementation must interpret invalid HTML There are many arbitrary choices to make The HTML standard has been undermined HTML renders differently for most clients
35
HTML and Web Pages Transparency No. 35 A Standard for Invalid HTML The HTML Tidy tool tries to save the situationHTML Tidy tool Invalid HTML is transformed to (almost) valid HTML Still many arbitrary choices, but now we agree Lousy HTML This is not very good. In fact, it is quite bad But the browser does something. Lousy HTML This is not very good. In fact, it is quite bad But the browser does something.
36
HTML and Web Pages Transparency No. 36 HTML for Recipes Rhubarb Cobbler Wed, 4 Jun 95 This recipe is suggested by Jane Dow. Rhubarb Cobbler made with bananas as the main sweetener. It was delicious. 2 1/2 cups diced rhubarb 2 tablespoons sugar 2 fairly ripe bananas 1/4 teaspoon cinnamon dash of nutmeg Combine all and use as cobbler, pie, or crisp. This recipe has 170 calories, 28% from fat, 58% from carbohydrates, and 14% from protein. Related recipes: Garden Quiche is also yummy.
37
HTML and Web Pages Transparency No. 37 Limitations of HTML HTML is designed for hypertext, not for recipes Structure and presentation is intertwined HTML validation is less than recipe validation HTML standards have been undermined We need a special Recipe Markup Language!
38
HTML and Web Pages Transparency No. 38 Unicode : Bytes vs. Characters HTML files are represented as text files A text file is logically a sequence of characters But physically a sequence of bytes Several mappings exist: ASCII EBCDIC Unicode Unicode aims to cover all characters in all past or present written languages
39
HTML and Web Pages Transparency No. 39 Unicode Characters A character is a symbol that appears in a text letters of the alphabet pictograms (like ©) modifiers (such as accents) : å Unicode characters are abstract entities described by names such as: LATIN CAPITAL LETTER A LATIN CAPITAL LETTER A WITH RING ABOVE HIRAGANA LETTER SA ( さ ) RUNIC LETTER THURISAZ THURS THORN No graphical nor byte representations by themselves.
40
HTML and Web Pages Transparency No. 40 Unicode Glyphs A glyph is a graphical presentation of a part of a text. A typical example is: Å // as a glyph This glyph may represent either of the characters: LATIN CAPITAL LETTER A WITH RING ABOVE ANGSTROM SIGN Or even a sequence of two characters: LATIN CAPITAL LETTER A + COMBINING RING ABOVE There are cases in which a single character maps to two or more glyphs Conclusion: It is not a 1-1 mapping between chars and glyphs.
41
HTML and Web Pages Transparency No. 41 Unicode Code Points A code point is a unique number assigned to every Unicode character Code points are between 0 and 1,114,112 organized into 17 planes ; 1 plane = 256 page; 1 page has 256 code points. Only around 100,000 are used today The character HIRAGANA LETTER SA ( さ )is assigned the code point 12,373 Code point 0 through 127 coincide with ASCII Some code points are never assigned to characters. 11011--- -------- reserved for UTF16 encoding.
42
HTML and Web Pages Transparency No. 42 Unicode Character Encoding A character encoding is used to interpret a sequence of bytes as a sequence of code points The bytes are first parsed into code units Code units have a fixed length One or more code units may be required to denote a code point Examples are UTF-8, UTF-16, UTF-32
43
HTML and Web Pages Transparency No. 43 UTF-8 A code unit is a single byte A code point is from 1 to 4 code units Code units between 0 and 127 directly represent the corresponding code points 0000000~01111111 represent themselves. 110XXXXX indicates that 2 code units are used 1110XXXX indicates that 3 code units are used 11110XXX indicates that 4 code units are used The remaining code units looks like 10XXXXXX Possible forms of a UTF-8 code: 1. 0xxxxxxx ; 2. 110xxxxx 10xxxxxx 3. 1110xxxx 10xxxxxx 10xxxxxx; 4. 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
44
HTML and Web Pages Transparency No. 44 UTF-8 Example Given the sequence : 11100011 10000001 10010101 The fist byte indicates that the first codepoint needs 3 bytes: 11100011 10000001 10010101 After extracting all content bits: 0011 0000 0101 0101 We form a two-byte codpoint : 0x3055 or 12,373, which is the character HIRAGANA LETTER SA
45
HTML and Web Pages Transparency No. 45 UTF-16 A code unit consists of 2 bytes Code point below 65,536 are in a single code unit Higher code points y are represented as: 110110XX XXXXXXXX 110111XX XXXXXXXX where xxx…xxx = (y-2 16 ) 2 or y = (xx…xxx) 2 + 2 16 They are rarely used!! But then 11011000 00000000 represent copoint 0xd800 or the half of a copoint > 65535 ? This can be resolved because Unicode assign no code points between the numbers: 11011000 00000000 (55,296; 0xd800) and 11011111 11111111 (57,343; 0xdfff)
46
HTML and Web Pages Transparency No. 46 Byte Order When reading several bytes at once, we must consider the byte order of the architecture UTF-16 starts any text with the special code point: 11111110 11111111 (65,279; 0xFEFF) called zero-width non-breaking space (BOM; Byte order Mark) The dual code point 11111111 11111110 (65,534; 0xFFFE) is never assigned (i.e., not a legal code point). UTF-16LE (LSB First) and UTF-16BE (MSB first) may avoid this.
47
HTML and Web Pages Transparency No. 47 UTF-16 Example 11111110 11111111 00110000 01010101 (BigEndin) 00110000 01010101 12,373 (0x3055) HIRAGANA LETTER SA 11111111 11111110 00110000 01010101 (LittleEndin) 01010101 00110000 0x5530 != 12,373
48
HTML and Web Pages Transparency No. 48 ISO-8859-1 Another popular character encoding Only 256 code points Single byte code units Coincides with ASCII on code points 0-127 Cannot represent general Unicode In all, there are hundreds of different encodings... ISO-8859-1 ~ISO-8859-9 big5, MS950,...
49
HTML and Web Pages Transparency No. 49 Character Encodings in HTML The document may declare its own encoding: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> For the meta tags to be parsed correctly the encoding should coincides with ASCII. Unicode characters may be represented as character reference of the form: さ // decimal さ // hexidecimal
50
HTML and Web Pages Transparency No. 50 Unicode in Java Java represents characters as UTF-16 code units Not as UTF-16 code points! A pragmatic choice to use only 16 bits The length function on strings may be wrong returns the number of code units Some strings may represent illegal data eg: char c = 0xFFFE ; // not a legal unicode codpoint Java uses classes java.io.InputString/OutputStream to represent byte streams while using java.io.Reader/Writer to represent I/O chactcter streams.
51
HTML and Web Pages Transparency No. 51 Different ways to get an InputStream There are many ways to get an InputStream depending the input source: From a File => new FileInputStream(File | “FileName”)FileInputStream From byte[] => new ByteArrayInputStream(byte[] )ByteArrayInputStream From URLConnection =>URLConnection new URL(“http://...”).openStream() or con = new URL(“…”).getConnection(); con.connect(); con.getInputStream() From java.net.Socket ServerSocketjava.net.SocketServerSocket New Socket(“remoteHost”, port).get{Input/OutputStream}() New ServerSocket(port).accept().get{Input/Output}Stream() ; …
52
HTML and Web Pages Transparency No. 52 From bytes to characters in Java InputStreams allow you to read in data in bytes, but how to read in characters from an input stream ? Ans: Use a byte-char transformer: InputStreamReader.InputStreamReader InputStream in = … // Charset or CharsetDecoderCharsetCharsetDecoder Reader rd = new InputStreamReader(in, “UTF-8”))UTF-8 int ch = rd.read(); rd.read(new char[100], 0, 50 ) BufferedReader brd = new BufferedReader(rd); while(s = brd.readLine() != null) out.println (s) ;
53
HTML and Web Pages Transparency No. 53 From characters to bytes in java Similarly, character data can be written to a Writer, which can then be transformed into bytes via OutputStreamWriter. Ex: Write a String s = ” 政大資科系 ” into bytes in “UTF-8” format in file “f1.txt”. OutputStream ops = new FileOutputStream(“f1.txt) ; // or ops = new Socket(…).getOutputStream() ; // or ops = new ByteArrayOutputStream()ByteArrayOutputStream BufferedWriter wr = new BufferedWriter( new OutputStreamWriter( // Charset or CharsetEncoderCharsetEncoder new FileOutputStream(“f1.txt”), “UTF-8”) ); wr.write(s, 0, s.length) ; // use default charset // cf: new FileWriter(“f1.txt”).write(s,0,s.length) ;
54
HTML and Web Pages Transparency No. 54 Summary History and structure of HTML and CSS Survivor’s guides to these technologies Limitations of HTML for general data
55
HTML and Web Pages Transparency No. 55 Essential Online Resources http://www.w3.org/TR/html4/ http://www.w3.org/Addressing/ http://www.w3.org/Style/CSS/ http://validator.w3.org/ http://www.w3.org/
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.