Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Internet and Networking Concepts Representation and Management of Data on the Internet.

Similar presentations


Presentation on theme: "Basic Internet and Networking Concepts Representation and Management of Data on the Internet."— Presentation transcript:

1 Basic Internet and Networking Concepts Representation and Management of Data on the Internet

2 2 The Internet and the World-Wide Web TCP/IP and Web Browsers

3 3 The Internet and the Web H Internet means Inter-Network A world-wide network of many LANs (local- area networks) The LANs are of various types H Web means World-Wide Web A large collection of information arranged as hypertext and stored in many computers that are part of the Internet H The two are similar but not identical

4 A Bit of History  The Internet grew very rapidly throughout the 1980s and 90s  Less than 600 computers were connected to the Internet in 1983  Now there are tens (if not hundreds) of millions of computers  The Web started in 1989 and grew very rapidly during the 1990s  The current Web has billions of pages

5 Internet Applications  Email  Telnet  FTP  Newsgroups  World-Wide Web  Chat ...

6 The Web

7 Web Browsers  Web browsers provide a very convenient interface for viewing the information stored on the Web  Mosaic – the first browser – was introduced in 1993 and sharply increased the popularity of the Web

8 8 TCP/IP H TCP/IP is the common language of the Internet IP – Internet Protocol TCP – Transmission Control Protocol H The IP protocol transmits packets of data from one host (computer) to another H The TCP protocol uses many packets to transmit a long stream of data

9 9 TCP vs. IP H IP routes each packet from the source host to the destination host IP is oblivious to the fact that usually each packet is part of a data stream H TCP handles correctly a long data stream Divides a long data stream into many packets, at the source Reassembles the packets, in the right order, at the destination Handles errors and lost data

10 10 Sockets H Sockets are a common interface that make TCP streams look like file streams H Modern programming languages support sockets H A read or a write operation to/from a socket may block Until data arrives, or Until data can be sent H Use multiple threads so that blocking will not cause the whole GUI to freeze

11 11 A Short Overview of How the Web Works

12 Web Servers  Pieces of information are stored on the Web as HTML pages  These HTML pages are stored as files on particular hosts of the Internet  These hosts are called Web servers  Each server runs an HTTP-daemon in order to make its HTML pages available to other hosts

13 Browsers  We use a browser to display HTML pages  The browser is responsible for fetching the HTML pages and displaying their contents according to the HTML rules

14 HTTP Daemons  An HTTP-daemon is an application that is constantly running on the server and waits for requests from remote hosts  A host can request the daemon for an HTML page (a file) that is located on the server  Technically, any host connected to the Internet can act as a Web server by running an HTTP-daemon application

15 Browser - HTTPD Interaction host www.cs.huji.ac.il HTTPD application Disk Browser user requests http:// www.cs.huji.ac.il /index.html GET /index.html sends the content of index.html

16 Browser - HTTPD Interaction  The user requests http://www.cs.huji.ac.il/index.html  The browser contacts the HTTP-daemon running on the host www.cs.huji.ac.il and requests the HTML page /index.html  The HTTP-daemon translates the requested name to a specific file in its local file system  The HTTP-daemon reads the file index.html from the disk and sends the content of the file to the browser  The browser receives the HTML page, parses it according to the HTML rules and displays it

17 Proxy Servers  A proxy server acts as a delegate of browsers for accessing the Web  The browser transfers the request for a document to the Proxy  The Proxy contacts the suitable Web- server and fetches the document on behalf of the browser

18 Proxy Server Proxy server Proxy application Browser user requests a document browser requests the document from the proxy sends the content of index.html proxy asks the document from the HTTPD Cache

19 Advantages of Proxy Servers  Proxy servers have several advantages over direct access: They can be combined with a firewall to enable restricted access to the Internet They enable caching of popular documents They can enlarge the functionality of the browser by translating from one protocol to another (for example, from FTP to HTTP and vice-versa)

20 Dynamically Generated Documents host www.excite.com HTTPD application Browser user requests http://www.excite.com/search?what=something GET /search?what=something sends the content of index.html execution of a search program

21 21 IP Addresses, Host Names and URLs

22 IP Addresses  Every host connected to the Internet has a unique IP address that identifies it  IP addresses are 32-bit numbers that are usually written as four decimal numbers separated by dots, e.g. 135.17.98.240, where the numbers refer to the four bytes composing this address

23 Internet Addresses  Many hosts have, in addition to IP address, human-readable Internet Address (or hostnames)  Here are some examples of Internet Addresses:  www.cs.huji.ac.il  www.cocacola.com  shum.cc.huji.ac.il  The first part is the name of a particular host (i.e., computer)  The rest is the domain name

24 Internet Addresses (cont’d)  Hostnames have a hierarchical structure www.cs.huji.ac.il  www is a computer in the Dept. of Computer Science (cs) at the Hebrew University of Jerusalem, Israel (huji), which is an Academic Campus (ac) of Israel (il)  The rightmost name describes the main domain of the host (il - Israel); left to it, there is a sub-domain, and then further to the left, there are more specific sub- domains

25 Generic Domains  There are 7 special domains that are called generic domains com - commercial organizations (www.cocacola.com) edu - educational institutions (www.berkeley.com) gov - U.S. governmental organizations (www.cia.gov) int - international organizations mil - U.S. military net - networks (InterNIC) org - other organizations (www.w3.org)

26 Country Domains  Generic domains usually refer to hosts inside the U.S. Other countries use two-letter country domains: il - Israel uk - United Kingdom jp - Japan se - Sweden  These domains usually have sub-domains that correspond to the generic domains; for example, co.il is the domain of all the commercial organizations in Israel, and ac.il is the domain of all the academic institutions inside Israel

27 URLs  Each information piece on the Web has a unique identifying address which is called a URL (Uniform Resource Locator)  A URL takes the following form:  http://www.huji.ac.il/index.html  It has 3 parts: a protocol field, a hostname field and a file field protocol file hostname

28 URL Fields  The protocol field (“http” in the previous example) specifies the way in which the information should be accessed  The host field specifies the host on which the information is found  The file field specifies the particular location in the host's file system where the file is found  More complex forms of URLs are possible

29 Using IP Addresses  How does the browser know the IP address of the Web server?  One possibility is that the user explicitly specifies the IP address of the server in the host field of the URL, for example: http://135.17.98.240/index.html  However, it is inconvenient for people to remember such addresses

30 Back to the Browser  When we address a host in the Internet, we usually use its hostname (e.g., using a hostname in a URL)  The browser needs to map this hostname into the corresponding IP address of the given host  There is no one-to-one correspondence between the sections of an IP address and the sections of a hostname

31 Translating IP Addresses to Hostnames  The translation of IP addresses to hostnames requires a lookup table  Since there are millions of hosts on the Internet, it is not feasible for the browser to hold a table which maps all hostnames to their IP-addresses  Moreover, new hosts are added to the Internet every day and hosts change their names

32 DNS  The browser (and other Internet applications) use a DNS Server to map hostnames to IP addresses  DNS (Domain Name System) is an hierarchical scheme for naming hosts

33 33 Basic Networking Concepts

34 34 Local-Area Networks LAN A Local-Area Network (LAN) covers a small distance and a small number of computers A LAN often connects the machines in a single room, floor or building

35 LANs (Local-Area Networks)  Limited size  Privately owned Centrally managed Usually hosts physically connected via cables Homogeneous devices & protocols Known features (latency, bandwidth,..)

36 WANs (Wide-Area Networks)

37 37 Wide-Area Networks LAN A Wide-Area Network (WAN) connects two or more LANs, often over long distances A LAN is usually owned by one organization, but a WAN often connects different groups in different countries LAN

38 38 Measures H Bandwidth Kbps, Mbps, Gbps – Kilo, Mega, Giga bits per second To convert to KBps, MBps, GBps (Bytes per second) divide by 10 (to allow for overhead) H Latency Initial delay for the first useful bit to go from the source to the destination

39 39 Bandwidth vs. Latency H Which technology provides the largest bandwidth between Tel Aviv and NY? A jumbo jet loaded with DVDs But the latency is terrible (20 hours) H Latency is at times more important and is generally harder to improve than bandwidth

40 What is a protocol? 06 7647834 Welcome to Mount Hermon ski site. For ski conditions press 1, for reservation of ski packages press 5,... 5 Please select the type of your credit card. For Visa press 1,...

41 Layering models protocol sketches protocol modem protocol CAD protocol

42 42 TCP/IP  A protocol is a set of rules that determine how things communicate with each other  The software which manages Internet communication follows a suite of protocols called TCP/IP  The Internet Protocol (IP) determines the format of the information as it is transferred  The Transmission Control Protocol (TCP) dictates how messages are reassembled and handles lost information

43 TCP/IP protocol suite Application HTTP, FTP, TELNET,... Transport TCP, UDP Internet IP Link Ethernet, Token-Ring,...

44 TCP/IP protocol suite Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens

45 Packets headers Taken from "TCP/IP Illustrated Vol. 1" / Richard Stevens

46 IP Layer  Transmission of packets between two hosts  IP addresses  Routing protocol

47 IP Addresses Class A B C D E From 0.0.0.0 128.0.0.0 192.0.0.0 224.0.0.0 240.0.0.0 Till 127.255.255.255 191.255.255.255 233.255.255.255 239.255.255.255 247.255.255.255 Net ID 7 bit 14 bit 21 bit 28 bit 27 bit Host ID 24 bit 16 bit 8 bit - Class Network ID Host ID 32 bit  InterNIC

48 Routing

49 49 Routing Principles H A router sits on two or more LANs It routes packets between LANs H A router does not have a global, end-to-end picture of the route a packet should take H Routing is done hop by hop H “Best Effort” Delivery No guarantee of delivery

50 50 Router Protocol H Routers constantly talk to each other to collectively decide which routes are best H Routers can dynamically adjust things as congestion appears or if a link or router goes down

51 Transport Layer  TCP Connection oriented Reliable, keeps order  UDP Connectionless Unreliable Fast

52 Client-Server Model Server application Client application Server machine 144.12.34.99 Client machine 190.30.42.155 Port

53 Well-Known Ports  FTP 21  Telnet 23  HTTPD 80 ...

54 Firewalls  A firewall poses restrictions on the traffic in or out of a local-area network  Examples:  Hides sensitive data from the outside world  Prevents access of local users to specific sites outside the local-area network

55 How a Firewall Works  All the traffic (of IP-packets) in or out of the local-area network is forced to go through a single host  A firewall application is installed on this host  The firewall examines all the in and out traffic of IP-packets and discards illegal packets

56 56 HTTP Protocol, Server-Side and Client-Side Technologies CGI, Servlets, JSP, Java Scripts

57 HTTP Protocol  Hypertext Transfer Protocol  Used between Web-clients (e.g., browsers) and Web-servers (and proxies)  Text based  Built on top of TCP  Stateless protocol

58 HTTP Transaction -- Client  Client request : Sends a request GET /index.html HTTP/1.0 Sends optional header information User-Agent: browser name Accept: formats the browser understands... Sends a blank line ( \n ) Can send post data

59 HTTP Transaction -- Server  Server response: sends status line HTTP/1.0 200 OK sends header information Content-type: text/html Content-length: 3022...  sends a blank line ( \n )  sends document data

60 Reacting to Responses of Clients  HTML pages are static documents  To achieve interaction with the user, there is a need for Internet tools and techniques that get input from the user and react according to this input  Sometimes there is a need to produce output as a result of querying a database. The output in this case is not known in advance

61 Server Technologies  Some Web applications use online input to create pages on the fly (for example, search engines)  A request will include, in addition to the URL of the service provider, a list of parameters  For example, http://www.google.com/search?q=search-word  The creation of the pages may also require interaction with some applications (for example, database queries)

62 Creating Pages on the Fly in the Server  There are four common ways to serve page requests that include input parameters: CGI (Common Gateway Interface) programming Java Servlets JSP -- Java Server Pages, or Microsoft ASP -- Active Server Pages (similar to JSP)

63 CGI Programming  CGI is a scripting language  A cgi script works with an application that runs on the server and creates HTML code  An early technology

64 Java Servlets  Servlets are java applications that some Web servers can run  A Servlet creates pages on the fly and these pages are returned to the requesting browser

65 JSP and ASP  JSP (Java Server Pages) Create an HTML page that has Java code inside HTML tags  This page is actually a template  The code, for example, could issue a database query and create an HTML table for the result The Web server executes the code in the template and produces a pure HTML page that is returned to the client  Microsoft ASP (Active Server Pages) The code is VB (Visual Basic) scripts The Web server must be Microsoft IIS server

66 Client Technologies  Some technologies interact with the user on the client level (Web browser)  Java Script is a scripting language that can be added to HTML pages  Web browsers can run the script and change the output accordingly  There is a slight interaction of the script with the file system using cookies  Cookies are small files that store some personal information in the file system of the client

67 67 Separating Content from Style XML and Style Sheets

68 Separating Content from Style  In HTML, the contents and the style of pages are inseparable HTML tags actually refer only to the style  XML (eXtensible Markup Language) is a new markup language for marking the semantics (meaning) of the data  XML tags describe the meaning of each portion of text in an XML document

69 XML Tags  XML tags are similar to attributes in a relation  However, the attributes are the same for all the records of the relation  In XML documents, each portion of text has its own tag databases operating systems  XML tags can be nested

70 Parsing XML Documents  XML facilitates easy parsing of documents according to their semantics  For example, the CS Department has many Web pages of courses  Can we write a program that reads all these pages and prints a list of the names of courses?  If XML tags are used, it is easy to do that

71 Using XML  XML is important in the context of data exchange between applications  It is possible to define a common set of tags that are suited for specific applications  For example, MathML is used for exchanging mathematical information

72 Showing XML Document in Browsers  XML documents contain data with semantic tags  For a graphical representation, information about the style must be added For example, HTML tags provide information about the style

73 Style Sheets  Style is added to XML documents by means of style sheets  There are two style-sheet languages CSS -- Cascading Style Sheets Describe how to graphically show the data XSL -- XML Style-sheet Language Can also transform the data

74 Putting it All Together  A common architecture for Web applications has several tiers DBMS (database management system) for storing and processing information A Web server for producing pages as a result of client requests A browser that supports dynamic pages using Java scripts (for creating dynamic pages) and CSS (for creating the desired visual output)

75 How Should XML be Used?  How can we query easily and effectively XML documents?  How can we store efficiently XML documents?  What is the proper way to include other resources in XML documents (i.e., figures, sounds, etc.)?  How can we use  a general style, and  information that is semantically well defined without making the process of creating documents too cumbersome?

76 Course topics  Server-side programming JDBC for connecting to the DBMS Servlets JSP  Client-side programming Java Scripts CSS  Data storage and processing on the Web XML XSL

77 Search Engines  What are search engines?  How do they work?  Shortcomings of search engines  Some popular search engines: Infoseek, HotBot, Altavista, Excite, Lycos, Yahoo!, Jeeves,... InfoseekHotBotAltavistaExcite LycosYahoo!Jeeves


Download ppt "Basic Internet and Networking Concepts Representation and Management of Data on the Internet."

Similar presentations


Ads by Google