COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)

Slides:



Advertisements
Similar presentations
REST Vs. SOAP.
Advertisements

II. Basic Web Concepts.
Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
Developing a Web Site: Links Using a link is a quicker way to access information at the bottom of a Web page than scrolling down A user can select a link.
Copyright 2012 & 2015 – Noah Mendelsohn A Brief Introduction to Requests for Comments – The Specifications for the Internet Noah Mendelsohn Tufts University.
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
2/9/2004 Web and HTTP February 9, /9/2004 Assignments Due – Reading and Warmup Work on Message of the Day.
HTTP & REST Noah Mendelsohn Tufts University Web: COMP 150-IDS: Internet Scale.
Web Architecture Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Copyright 2012 & 2015 – Noah Mendelsohn Introduction to: The Architecture of the World Wide Web Noah Mendelsohn Tufts University
Chapter 13. Applets and HTML HTML Applets Computer Programming with JAVA.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
IS-907 Java EE World Wide Web - Overview. World Wide Web - History Tim Berners-Lee, CERN, 1990 Enable researchers to share information: Remote Access.
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
XP 1 Charles Edeki AIU Live Chat for Unit 2 ITC0381.
REST API Design. Application API API = Application Programming Interface APIs expose functionality of an application or service that exists independently.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2017)
Hypertext Transfer Protocol
Objective % Select and utilize tools to design and develop websites.
Block 5: An application layer protocol: HTTP
Web Basics: HTML and HTTP
HTTP – An overview.
Hypertext Transfer Protocol
The Hypertext Transfer Protocol
JavaScript and Ajax (Internet Background)
Web Development Web Servers.
Application layer 1 Principles of network applications 2 Web and HTTP
Introduction to: The Architecture of the World Wide Web
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2017)
Versioning, Extensibility & Postel’s Law
1993 version of Mosaic browser.
CNIT 131 Internet Basics & Beginning HTML
COMP2322 Lab 2 HTTP Steven Lee Feb. 8, 2017.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 9
HTTP Protocol Specification
Hypertext Transfer Protocol
Internet transport protocols services
Introduction Web Environments
Objective % Select and utilize tools to design and develop websites.
Tutorial (4): HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
IS333D: MULTI-TIER APPLICATION DEVELOPMENT
Introduction to: The Architecture of the World Wide Web
WEB API.
The Internet and HTTP and DNS Examples
Chapter 27 WWW and HTTP.
HTTP Hypertext Transfer Protocol
Hypertext Transfer Protocol
Versioning, Extensibility & Postel’s Law
Introduction to: The Architecture of the World Wide Web
Hypertext Transfer Protocol
CS 5565 Network Architecture and Protocols
World Wide Web Uniform Resource Locator hostname [:port]/path
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
Hypertext Transfer Protocol
Kevin Harville Source: Webmaster in a Nutshell, O'Rielly Books
The HTTP Protocol COSC 2206 Internet Tools The HTTP Protocol
Web Server Design Week 5 Old Dominion University
CORBA Programming B.Ramamurthy Chapter 3 5/2/2019.
HTTP Hypertext Transfer Protocol
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Hypertext Transfer Protocol
Q/ Compare between HTTP & HTTPS? HTTP HTTPS
CSCI-351 Data communication and Networks
Presentation transcript:

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016) URIs and RFC 3986 Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

Goals What is named when we use the Web Learn the detailed design of URIs See how the naming principles we’ve explored are reflected in Web architecture and URIs Learn to read RFCs and to study the art of writing specifications Understand why grammars are important

Review: Naming Questions 3 3

Some characteristics of names Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too few names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Review Web Architecture Basics

Architecting a universal Web Identification: URIs Interaction: HTTP Data formats: HTML, JPEG, etc.

What Happens When We Browse a Web Page? Consider: What are all the things that are “named” in the interaction between browser and Web Server?

The user clicks on a link URI is http://webarch.noahdemo.com/demo1/test.html URI is http://webarch.noahdemo.com/demo1/test.html

The http “scheme” tells client to send HTTP GET msg URI is http://webarch.noahdemo.com/demo1/test.html URI is http://webarch.noahdemo.com/demo1/test.html HTTP GET

The client sends an HTTP GET URI is http://webarch.noahdemo.com/demo1/test.html HTTP GET demo1/test.html GET /demo1/test.html HTTP/1.0 Host: webarch.noahdemo.com User-Agent: Noahs Demo HttpClient v1.0 Accept: */* Accept-language: en-us Host: webarch.noahdemo.com

The server sends an HTTP Response HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html> The server sends an HTTP Response HTTP GET HTTP Status Code 200 Means Success! demo1/test.html Host: webarch.noahdemo.com HTTP RESPONSE

The server sends an HTTP Response HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html> The server sends an HTTP Response HTTP GET demo1/test.html The “representation” returned is an HTML document Host: webarch.noahdemo.com HTTP RESPONSE

Architecting a universal Web Identification: URIs Interaction: HTTP Data formats: HTML, JPEG, etc.

Assign URIs for all Resources A resource is something that has information (e.g. a Web page) If a resource doesn’t have a URI, you can’t link to it…it’s not part of the Web.

The Structure of URIs 16 16

A simple URI http://uss.tufts.edu/stuserv/acadcal/

A simple URI http://uss.tufts.edu/stuserv/acadcal/

A simple URI Scheme http://uss.tufts.edu/stuserv/acadcal/

Schemes http://uss.tufts.edu/stuserv/acadcal/ mailto:noah@cs.tufts.edu Schemes let us name different kinds of things, accessed in different ways.

Authority: who controls allocation of this name? A simple URI Authority http://uss.tufts.edu/stuserv/acadcal/ Authority: who controls allocation of this name?

// Fixed in grammar to indicate authority follows A simple URI // Fixed in grammar to indicate authority follows http://uss.tufts.edu/stuserv/acadcal/

A simple URI http://uss.tufts.edu/stuserv/acadcal/ Path Path: provides for hierarchical naming… … also supports “../xxx” relative syntax

A simple URI http://uss.tufts.edu/stuserv/acadcal/ Path Path: provides for hierarchical naming… … maps well to heirarchical information systems

A more complex URI http://www.tufts.edu?student=smith

A more complex URI http://www.tufts.edu?student=smith Query component The query is part of the URI... However, in many cases, all URIs with a common path are processed by the same server-side code Also…HTML forms are useful for filling in the query components

Fragments identify parts of documents http://tools.ietf.org/html/rfc3986#section-3.5 Fragment interpretation depends on the media type of the returned representation (text/html)…this is useful but tricky and causes a variety of problems.

Characteristics of URIs 28 28

Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Both supported Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Depends on scheme Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Yes… URIs “on the side of a bus” is an important goal… but some URIs are complex Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Allowed but not required Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs With most schemes, absolute URIs are global Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs FILE: scheme is not global! Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs NO!! Status code 404 is key to Web scalability Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Some aliases required e.g.: http vs. HTTP… worst cases depend on users Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Depends on scheme and user… see Metadata in URI finding Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs URIs are the structuring mechanism for the Web as a whole Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Designed to allow mappings to hierarchical systems Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Decentralized allocation except: Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs scheme names centrally registered with IANA Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs For http and mailto schemes: central Domain Name (DNS) registration required for authority Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Yes. E.g.: ASCII-only, spaces and some punctuation must be %encoded Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed?

Some characteristics of URIs Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names) Opaque vs. data-carrying? Reflect structure of system? Supports navigation: e.g. “..”? Who can generate them? Constraints from environment E.g. no “-” in C/C++ variable names Indirect identification allowed? URIs are silent on this…but HTTP redirection provides for indirect identification

Grammars 46 46

What are formal grammars? Grammars are formal languages for specifying other languages A grammar allows you to: Always: determine whether a given string is “in” the specified language Often: associate structures in the grammar with parts of the string The Chomsky hierarchy: Different grammars have different expressive power Regular expressions recognize “regular languages” (ab*)  a, ab, abb, abbb Context-free grammars are more powerful: typically used for programming languages The ABNF used in RFC’s is a context-free grammar Context-free grammars can be recognized (parsed) by a finite-state pushdown automaton Tutorial at: http://en.wikipedia.org/wiki/Chomsky_hierarchy

Why use formal grammars for specifying languages? Precise and rigorous Less ambiguous than an explanation in English Membership of a string in a language can be checked automatically Tools to process the language can often be constructed automatically from the grammar

ABNF: the grammar for IETF RFCs ABNF example from RFC 3986: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] ABNF is itself specified in RFC 2234 ABNF is convenient to use in fixed-font specification documents like RFCs

Summary 50 50

Summary The structure and interpretation of URIs is set out in RFC 3986 URIs embody many of the principles we have studied Formal grammars are powerful tools for specifying names The design decisions embodied in URIs are keys to the success of the Web!!