Presentation on theme: "LIS650lecture 0 Re-Introductory lecture Thomas Krichel 2006-02-04."— Presentation transcript:
LIS650lecture 0 Re-Introductory lecture Thomas Krichel 2006-02-04
today Today's contents –Administrative introduction to the course –Substantive introduction to the course –Talk about you! –Introduction to the web –Introduction to XML –Introduction to character sets –A few words about images Fairly general, abstract and tough lecture. Can lead to serious angst.
course resources Course home page is at http://openlib.org/home/krichel/courses/lis650n06s The course resource page http://openlib.org/home/krichel/courses/lis650 The class mailing list https://lists.liu.edu/mailman/listinfo/cwp-lis650- krichel Me. Send me email. Unless you request privacy,I answer to the class mailing list. I come here on several days to counsel students. I announce all times on the mailing list.
general assessment First quiz next lecture. If you miss a lecture, let me know in advance. In addition to the quizzes, we have –the web site plan(to be handed in next week) –the web site assessment(done, hopefully) –the final web site(to be handed in at the end) Final grade is calculated by computer. Quizzes go through a complicated discounting scheme. It disregards the worst quiz performance.
web site assessment Assess the web site of a university Library and Information Science department. A pretty complete list is at http://informationr.net/wl/ A list of admissible departments is http://openlib.org/home/krichel/courses/lis650/doc /departments.html Write a text not describing, but commenting on the web site. Try to keep you text short please, no more than 2 pages.
the final web site Contents should be equivalent to a student essay. Good contents and good architecture are important to a straight A. It should be a contribution to knowledge on a topic. Personal sites are not allowed. Deadline to finish web site: one week after the end of the last lecture. You will not be able to change your web site between the deadline and the time that the grade is issued.
course history Course was first run as an institute 2002-05-13 to 2002-05-17 Title was Webmastering I: the static web site. To the curriculum committee, this title did not sound academic enough. In 2003 Web Site Architecture and Design (WebSAD) became the the full title. In 2005 Passive Web Site Architecture and Design became the title. WebSAD is what we basically learn.
teaching WebSAD WebSAD combines many aspects: –Authoring pages –Work on the organization of data to fit onto pages –Set display style of different pages –Define look and feel of the site –Organize the contribution of data –Maintain a technical web installation Some of them can be learned in a course, but others can not. Emphasis has to be on learnable elements.
teaching philosophy Point and click on a computer software is not enough. Explain underlying principles. Promote standards –XHTML 1.0 strict –CSS level 2.1 Avoid proprietary software. Provide a reasonable rigorous introduction to digital information.
LIS650 contents Deals with the maintenance of a passive web site. Such a web site remains the same whatever the user does with it. There is no customization for different users or times. Topics include –(x)html –css –site usability and information architecture, as far as relevant for passive web sites –http, URI, web server
things this course does not do Forms: allow you to design forms that users fill in. But you do not have the programming skills to do something with the form. Frames: allow you to put several documents into one physical document. Most experts advise against them. We do not cover image maps. We dont do some advanced CSS properties. Some exotic features of HTML are overlooked.
Other course: LIS651 Deals with building active web sites. –Users fill in a form –Users submit the form –Web server return a page that is specific to the request of the user. Teaches a language called PHP, that is widely used to generate such web sites. –Gets you introduced to computer programming. –Gets you to train analytical thinking. Teaches relational database to store and retrieve information. –Gets you to think about the structure of information.
world wide web According to the W3C: the World Wide Web (Web) is a network of information resources. The Web relies on four standards to make these resources readily available to the widest possible audience: –A uniform naming scheme for locating resources on the Web (i.e. URIs). –Protocols, for access to named resources over the Internet (e.g., http). –Hypertext, for easy navigation among resources (e.g., HTML). –Vocabularies for types of objects on the Web (i.e. MIME types)
URI introduction Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Uniform Resource Identifier, or URI. URIs typically consist of three pieces: –The name of the mechanism used to access the resource or the otherwise resolve it –The name of the machine hosting the resource. –The name of the resource itself, given as a path.
example URI http://openlib.org/home/krichel This URI may be read as follows: There is a document available via the HTTP protocol, residing on the Internet host openlib.org, accessible via the path "/home/krichel". mailto:email@example.com This URI may be read as follows: There is email user krichel in a domain openlib.org to whom email may be sent.
Internet application protocols Computers connected to the Internet (hosts) use different application level protocols to do things. Common protocols include –http-- dns-- telnet –smtp-- ssh-- ftp All of the ones cited are client/server protocols –client issues a request –server gives a response We need a web server when we want to disseminate web pages.
our server Is the machine wotan.liu.edu wotan is the head of the gods in the Germanic legend. It is a humble PC. It runs both http and ssh server software.
the http protocol http stands for the hypertext transfer protocol. http is a widely used application level protocol on the web. http is stateless. Each transaction is self- contained. Each transaction has no relationship to the previous one.
more on http http has a limited vocabulary of requests and responses. It is no good, say, to operate a machine remotely. http is insecure. The contents of http transactions (requests/responses) can be observed. We can therefore not use it to build web pages.
communication with wotan The protocol that we use for communicating with the server is the secure shell, short ssh. It is based public-key cryptography. There are two PC programs commonly used as ssh clients –putty for issuing commands –winscp for file transfer. winscp is the one we will use. In offers a range of other facilities besides file transfer. Mac users should investigate a software called fugu: http://rsug.itd.umich.edu/software/fugu/
important rule When you compose web pages, you use winscp. When you look at your own web pages, you use a common web user agent. Never use winscp to look at your own web pages. You will not rot in hell, but you will be confused.
user name & password You can choose your user name as a short form of your own name. It should be all lowercases and can not have spaces. Your final project pages can be placed in a subdirectory, say at http://wotan.liu.edu/~user/project, where user is your user name. We will worry about that later.
registration time As part of the course, you are being provided with web space on the server wotan.liu.edu, at the URL http://wotan.liu.edu/~user where user is a user name that you will chose now. You may wish to make the user name some short form of your name. Remember you will be able to have that site for many years to come.
free software I maintain wotan.liu.edu server but you can build your own server if –you have Internet access –you have an old PC to spare All the server software, as well as putty and winscp are free, open-source. It is one of my fundamental beliefs that free information should run on free software.
installing winscp http://winscp.net/eng/download.php has –installation package. for use if you have administrator rights on the machine where you are installing to –application. for use otherwise, i.e. to just download and run the application At installation time, when/if asked about the default interface, I suggest you use Windows explorer style, rather than the default Norton commander style. You can change that later, so no panic.
other stuff: installing user agents Download and install a recent version of at least two browsers. I suggest –Mozilla Firefox at http://www.mozilla.org/products/firefox/ –Opera at http://www.opera.com You can also get –Internet Explorer –Safari –Lynx –Konqueror
open a wotan session with winscp If you see a list of session, click on new session. –The host name is wotan.liu.edu. –Give your user name. –Click on save, this will save the session, after ok. You will be lead to the list of saved sessions, double-click to open a session. At first connection you will see a warning you can ignore. You can save the password as part of the session. It is risky to do that in a public classroom. You may want to do it at home.
initial remote files on wotan A set of files starting with a dot. –These are places where Linux Masters exert their black magic. –Leave them alone. A directory called public_html –This is the place where web masters exert their magic. You can go into that directory to see the files that you have on your web site at the moment. –There should be two file validated.html main.css –Do NOT double-click any file!
HTML HTML is the hypertext markup language. HTML is a markup language that is widely used on the Web. The latest, and probably last version of HTML is version 4.01. It is described at http://www.w3.org/TR/html4/
XHTML The W3C, the standard making body for the Web, have issued XHTML, a replacement of HTML that is compatible with XML. We will work with XHTML. But we will call it HTML by abuse of language. Some say that XHTML is a version of HTML.
SGML HTML XML You will probably have come across these terms. SGML was developed first. HTML and XML are developed from SGML in different ways. –HTML is an SGML DTD. –XML is an SGML application. One common thing here is the ML. It stands for Markup Language. Markup is everything in a document that is not content.
SGML Standard Generalized Markup Language Descriptive approach with three separate layers –structure: types of information in document –content: the information itself –style: defines how to typeset the document Developed for the publishing industry by a group of consultants. So complicated that no software implements it fully. But an important idea that remains of it is the document type definition.
Document Type Definition (DTD) The DTD is a non-SGML language that describes SGML document types. It describes –information the document handles, e.g. title chapter –Relationships between fields e.g. a chapter contains sections a title comes at the top of the document HTML is an SGML DTD.
XML Since SGML is so complicated, it is not good for use on the Web. So the W3C has issued XML, the eXtensible Markup Language. Every XML document is SGML, but not the opposite. Thus XML is like SGML but with many features removed. XML defines the syntax that we will use to write HTML. We have to study that syntax in some detail, now.
nodes "node" is a word used to characterize everything that can be put in the XML document. We will study the following types on nodes –character data –elements –attributes –comments –DTD declarations There are other types of nodes that we don't need to learn about here.
node type: character data Character data is simply a sequence of characters. Examples –"abec" –"8 [[ + 2 ¼" At the end of the lecture, we will discuss character data again.
node type: XML elements XML is based on elements. There are basically three ways of writing an element. The first way is write. Here element is the name of the element. Such an element is called an empty element. Example: This is an empty element, the name of which is bang.
non-empty elements If name is the name of the element, you can give an element contents contents by writing contents. contents is often simple character data. Here is called a start tag. is called the end tag. Both tags surround the contents of the element. Remember the previous slide? Then note that is just a shortcut for. Elements within other elements are called child elements.
element & character data examples bonjour здравствуйте She says hello to you. Bibbelsches Bohnesupp mit Quetschekuche or Dibbellabbes mit Abbeltratsch I koh Glos essa, und es duard ma ned wei. Ja mogu esti staklo, i ne boli me. Kristala jan dezaket, ez det minik ematen.
node type: attributes Elements can have attributes. Here is an element with two attributes Here attribute_name_one and attribute_name_two are attribute names and value_one and value_two are attribute values. The element itself is empty. Example: bonjour
more on attributes Attribute names are separated from their values by the = sign. There can be no two attributes to the same element with the same names. So you can not have something like
more on attributes Attribute values are simple strings. You can not have an element inside an attribute value. Thus you can not write, for example ">chocolate Attribute values can be enclosed in single or double quotes. It does not matter. Double quotes are more common, so I suggest you use those.
more examples Александер Сергеевич Пушкин Alexander S. Pushkin Alexandre Pouchkine
node type: comments In an XML document, you can make comments about your code. These are notes to yourself. Comments start with Example: Comments can not be nested. Can appear anywhere in the document. They can enclose elements.
node type: DTD declaration XML documents, like any SGML documents, accept document type declarations. A document type declaration tells us something about the vocabulary of elements and attributes used in the document. It should appear before the root element, after the XML declaration, if you have one. It takes the form We will come back to the document type declaration later.
XML document An XML document is a piece of data that is written in XML. But sometimes the author of a document makes a mistake, and, in fact the XML is wrong in some ways. If there is no mistake, the document is called well- formed. If a document is not well-formed, it really is not an XML document.
some rules for well-formedness All elements must be properly nested. You can only close the outer element after all inner elements are closed. Examples – not well-formed – well formed An element that is nested inside another element is called a child of that element.
more rules for well-formedness An attribute must have a value. Thus you can not write.... The value may be empty like in... or....
more rules for well-formedness There must be one single element in the document that all other elements are children of. It is called the root element. All other elements are called children of the root.
more rules for well-formedness Whitespace that surrounds the root element is ignored. The root element may be preceded by a prologue. This is anything before the root element. The DTD declaration can only appear in the prologue.
XML example file: validated.html This is an XML file. Look at it through the "view source" feature of your user agent. Please look at it to find all the node types. Examine how the well-formedness constraints are implemented. Make sure you understand every aspect of its syntax.
copying validated.html validated.html is your model web page. To create a new web page, right click (remember never double-click) on validated.html, and choose "duplicate" from the menu. Do not choose "copy". You will be asked to supply a name for the file. Erase any contents in the dialog box, and then enter the file name you want to create (say test.html). Always have that file name end with ".html". You may be asked to give your password again. Did I say you should not double-click in winscp?
test.html In your test.html file, look for the Right before that string, insert Hello, world! Save your file. Do not double click test.html ! Open a web user agent, point it to the URL http://wotan.liu.edu/~user/test.html where user is your user name.
public_html Imagine you are user user and you have a file file in public_html. The web server will map requests to http://wotan.liu.edu/~user/file to show the file /home/user/public_html/file. Here user stands for your user name, and file is the file name, and "/" is the directory separator.
Web page and MIME type If file ends with ".html" the web browser will be told that the file is a HTML file. This is done using the MIME type text/html. Therefore you should give all HTML files the extension ".html". Only when the user agent knows that the pages is a web page it will be rendered accordingly by the browser.
index.html The web server on wotan will map requests to http://wotan.liu.edu/~user/ to show the file /home/user/public_html/index.html If this file is not there, the server prepares a HTML document from the list of files that it finds in the directory. Then it sends it to the user agent. Once you have a file index.html, the web user can no longer see the individual files in your directory.
characters: concept A character set combine two things –Character repertoire: a set of characters e.g. "A", "" "", "" –Character code positions: defines a number for each character in the repertoire. Character encoding is a way to encode the code positions in bytes. To correctly display a document, the user agent needs to know both!
playing safe with characters Only use the characters on the US keyboard, don't insert symbols. Save as ASCII or UTF-8. All ASCII files are also UTF-8 files. Never save as "Unicode" within MS Notepad. If you encounter a character that is not on your keyboard, use an SGML entity. The SGML entity is the last special SGML thing that we have to study.
SGML entities SGML entities are something like a way to represent non-ASCII characters when only ASCII input is possible. Codes can can be &code; –Example: "é" Inserts an "é". –This is called a character entity –Codes are often abbreviation of the character names Codes can be in hex number form –Example: & to insert an ampersand. –This is called a numeric entity.
XHTML entities They are officially defined in three files that are maintained by the W3C –http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent –http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent –http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent A sample line is –
"name": "XHTML entities They are officially defined in three files that are maintained by the W3C –http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent –http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent –http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent A sample line is –
important entities used in XML There are three that you need to know and use. –< stands for < –> stands for > –& stands for & –" stands for " – makes a non-breakable space Every time you want to insert or & in the documents, you have to use the entities instead. Examples: –firstname.lastname@example.org– Je suis Français. –Marks & Spencers– 3 < 4
other example Look at http://wotan.liu.edu/home/krichel/examp les/xml/gradesheet.xml. First consider the rendered version as it appears in the browser. It illustrates the type of XML data file that Thomas uses to compose his grades and feeds them into the computer. It is well-formed XML. Second, consider the source code of the web page. Why are there all these < and > ?
special topic: images The appeal of the web to the masses has a lot to do with its capability to transport image. Image formats are independent of the web, but there are two classic format that are widely supported by user agents. –GIF –JPEG –PNG The resolution of the image is an important factor.
resolution On a pixel image the term resolution is often used to say how many pixels are there horizontally and vertically. The larger the number of pixels the wider it will appear on the screen. But you will never know how large it is on the screen because that depends on how many pixels your user's screen draws per inch of display. The web is a bad place for a control freaks.
GIF stands for graphics interchange format. developed by CompuServe. unresolved copyright issues make the format abhorred by the free software community. 250 colors maximum uses a loss-less compression technique
GIF has three tricks interlacing: –when downloading the file, the browser can show every forth row first –user gets in an idea of the picture before it is sharp transparency –some GIFs are transparent, so you can see them on top of already exist –technically, the GIF has one color as the background color, and pixels of that color are ignored by the user agent animation –some GIFs are in fact sequences of GIFs that can be rendered one after the other.
JPEG The Joint Photographic Experts Group is a standard-making body for images They can support thousands of colors. The compression is lossy, i.e. the JPEG file will look like the original image, but not be the same. The compression does not work well with drawings. There are no copyright and patent problems with JPEG
Homework Look at course home page. Install winscp and browsers at home. Prepare a one-page max web site plan. Bring a printed copy with you next week. Prepare for quiz at the beginning of next lecture.
web site plan What is the intent of the web site? Who commissioned the web site? Whom is the site for? What pages will be on the site? –Name each page. –Establish hierarchy between pages. Any special technical challenges?
http://openlib.org/home/krichel Please shutdown the computers when you are done. Thank you for your attention!