Presentation on theme: "How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)"— Presentation transcript:
How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)
Bush 1945 – As We May Think The memex is a desktop machine, consisting of: 1)A user interface. 2)A repository of documents. 3)A search engine. 4)A linking mechanism. 5)Memex II can learn from its experience.
Quote from As We May Think The human mind … operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. … trails that are not frequently followed are prone to fade …. Yet the speed of action, the intricacy of trails, … is awe-inspiring beyond all else in nature. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.
NelsonNelsons Hypertext A universal hypertext. Xanadu is a distributed network of documents (1960s). User interface - transpointing windows. Elaborate copyright mechansim. Superceded by WWW
EngelbartEngelbarts oN Line System (1968) First working hypertext system, where documents were liked together.
Tim Berners-Lees Tim Berners-Lees WWW Cern 1990 - First Browser Web protocols –URL –HTTP –HTML World Wide Web Consortium (W3C) founded in 1994World Wide Web Consortium
MosaicMosaic – The Web Browser that changed history changed history Released late 1993 – developed by Marc Andreessen Netscape triggered the boom of WWW throughout the 90s Browser wars with Microsoft – IE won (2003 stats: 95.6% - IE, 3.7% NS)
Difference between the internet and the web Internet – physical computer network infrastructure on which the web is built.Internet The World Wide Web (web) is a virtual network defined through the web protocols. The internet supports other protocols such as email, ftp and instant messaging.
Graph of Web pages Graph of Web pages related to www.dcs.bbk.ac.uk
IP Addresses Internet Protocol (IP) address – each machine connected to the Internet is identified by a unique 32 bit number. My IP address is: 184.108.40.206 (ipconfig.exe from command prompt) IP addresses may be dynamic. IP addresses have corresponding Domain Name Server (DNS) addresses. My DNS address is: dhcp34.dcs.bbk.ac.uk
URLsURLs – Uniform Resource Locators Address of an internet resource E.g. http://www.dcs.bbk.ac.uk/~mark –http is the protocol (others: ftp, mailto, file) –www.dcs.bbk.ac.uk is the domain name –~mark is the path to the resource Query string follows a ? to run a script (dynamic URL) e.g. – http://www.google.com/search?q=url
HTTPHTTP – HyperText Transfer Protocol Protocol of messages exchanged by a user agent (client) and a web server. Most common request is GET: –GET URL (agents request) –HTTP/1.1 200 OK (servers response) –Response header (includes display type) –Blank line –Response data follows
HTMLHTML – HyperText Markup Language I am assuming you all have some knowledge of HTML ! The combination of the three components: URL, HTTP and HTML, defines the basic functionality of the web.
Server Log Files IP or DNS address of agent making request Timestamp, status, transfer volume Referrer URL (where the request was made from) Requested URL (from the HTTP request) User Agent (browser, OS) Other information such as authentication.
Cookies A cookie is a piece of text that a web site can store on the user's machine when the user is browsing the site. This information can be retrieved later by the web site, for example in order to identify a user returning to the site. Can be used for statistics, personalisation. Some security and privacy issues.
Tracking Users with Cookies Across multiple sites Browser Banner Ad Web site HTTP request for web page Send web page includes ad links HTTP request for ad with cookie Send ad and update cookie
W3C Extended Logging Definitions cs = client-to-server actions s = server actions c = client actions sc = server-to-client actions
date 2003-01-07 08:58:12 220.127.116.11 DCSNT\gtuff01 18.104.22.168 80 GET /support/ - 302 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) 2003-01-07 08:58:19 22.214.171.124 - 126.96.36.199 80 GET /intranet/cs/ - 401 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) c-iptimecs-usernames-ip s- port cs-method cs-uri- stem cs-uri- query sc-status cs (User-Agent) 2003-01-07 08:58:12 188.8.131.52 DCSNT\gtuff01 184.108.40.206 80 GET /support/ - 302 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) 2003-01-07 08:58:19 220.127.116.11 - 18.104.22.168 80 GET /intranet/cs/ - 401 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) Example of extended log entries format
2003-02-01 00:01:44 22.214.171.124 - GET /library/HM.js - 200 www.i-resign.com Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) i%2Dresign%2Dlogin=UID=3649008;+interstitial=not;+ASPSESSIONIDGQQGQYAO=OAMCCDGBODIOFHLAFHFAGKHD - 2003-02-01 00:02:19 126.96.36.199 - GET /uk/discussion/new_topic.asp t=331 200 www.i-resign.com Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) - http://www.google.co.uk/search?q=i+hate+my+job&ie=UTF-8&oe=UTF- 8&hl=en&meta=cr%3DcountryUK%7CcountryGB c-iptime cs- username cs-status cs-uri- stem cs-uri- query sc-method cs (User-Agent) cs (Cookie) 2003-02-01 00:01:44 00:02:19 188.8.131.52 184.108.40.206 - - GET /library/HM.js /uk/discussion/ new_topic.asp t=331 - 200 www.i-resign.com Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) i%2Dresign%2Dlogin=UID=3649008;+interstitial=not;+ASPSESSIONID GQQGQYAO=OAMCCDGBODIOFHLAFHFAGKHD http://www.google.co.uk/search?q=i+hate+my+job&ie=UTF-8&oe= TF-8&hl=en&meta=cr%3DcountryUK%7CcountryGB - cs (Referrer) date - cs-host Another example of extended log entries format
Yahoo! (www.yahoo.com) - (1994-) directory service and search engine.www.yahoo.com Infoseek – (1994-2001) search engine. Inktomi – (1995-) search engine infrastructure, acquired by Yahoo! 2003. AltaVista – (1995-) search engine, acquired by Overture in 2003. AlltheWeb – (1999-) search engine, acquired by Overture in 2003. Ask Jeeves (www.ask.com) - (1996-) Q&A and search engine, acquired by IAC/InterActiveCorp in 2005.www.ask.com Overture – (1997-) pay-per-click search engine, acquired by Yahoo! 2003. Bing (www.bing.com) – (2009-) Microsoft rebarded search engine, was Live in 2006 and MSN search before.www.bing.com Google (www.google.com) – (1998-) – search engine.www.google.com Brief History of Search Engines