Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Intelligence By Otto Borchert April 28, 2003.

Similar presentations


Presentation on theme: "Web Intelligence By Otto Borchert April 28, 2003."— Presentation transcript:

1 Web Intelligence By Otto Borchert April 28, 2003

2 Background Application Layer / HTTP Agents Present - Google / Page Rank Future - Semantic Web / OWL

3 Hypertext Transfer Protocol (HTTP) Application level protocol (World Wide Web) Runs over TCP, normally port 80 Information retrieved using a URL (Uniform Resource Locator) protocol://host:port Typical HTTP packet format –START_LINE –MESSAGE_HEADER – –MESSAGE_BODY

4 Request Messages Given by client on START_LINE Includes: –OPTIONS: request information about available options –GET: (one of 2 most commonly used) retrieve document identified in URL –HEAD (other most common used) retrieve metainformation about document identified in URL (find out how old a page is) –POST: give information to server –PUT: store document under specified URL –DELETE: delete specified URL –TRACE: loopback request message –CONNECT: for use by proxies

5 Example request GET http://www.cs.ndsu.nodak.edu/index.html HTTP/1.1 –Give entire descriptor in START_LINE GET index.html HTTP/1.1 Host: www.cs.ndsu.nodak.edu –Precise page given in START_LINE, host in MESSAGE_HEADER

6 Server reply Server replies with a Response Message Contains version of HTTP being used, 3 digit code indicating whether or not the request was successful and the reason for giving that code

7 Codes 1xx – Informational (Request received, continuing process) 2xx – Success (Action successfully received, understood, and accepted) 3xx – Redirection (further action must be taken to complete the request) 4xx – Client Error (request contains bad syntax or cannot be fufilled) 5xx – Server Error (server failed to fulfill an apparently valid request)

8 Example Replies HTTP/1.1 202 Accepted –Web page request accepted, displays page HTTP/1.1 404 Not Found –The usual not found error HTTP/1.1 301 Moved Permanently –The page has moved, includes a MESSAGE_HEADER like in request to tell where the page has been moved to

9 HTTP extras In version 1.0 one TCP connection for each request. 1.1 allowed for persistent connections HTTP was set up with web caching in mind. One can check the date a page was last updated and store the newest versions of frequently accessed pages on a local machine

10 Is the web intelligent? Intelligence is a poorly defined word anyway. For example, would you consider these intelligent? –Document analysis systems for cataloging and summarizing Web pages –Profiling systems for placing selective Web advertising –Data mining and analysis –Tools for searching databases supported by Web browsers –Translation tools that convert to and from human languages –Statistical software for network caching, routing, and tracking –Knowledge-based systems for automated e-mail reading –Smart agents for Internet-based product and service marketing –Video object recognition and searching

11 Is the web intelligent? (2) One of the most important advances in making the web intelligent is through the use of agents. These agents take many forms including many listed on the previous slide

12 What is an agent? No standard definition Can be: –Web Crawler –Travel Agent –Secretary –Hard to distinguish between agent and program. Agent normally performs actions based on data it finds, without much human intervention Agents can be defined as intelligent as well Act as the glue for many of the following ideas

13 The Present of Web Intelligence - Google Presently the most used search engine the Internet has to offer. Provides a unique blend of computer hardware and software to complete millions of user searches each day Based on a system called Page Rank

14 PageRank Developed by Larry Page and Sergey Brin at Stanford University (Google’s founders) Uses a system of link ranking –If there is a link from page A to page B, page B is correlated to page A –If page A is a strong page to begin with, page B becomes stronger as well

15 Word Association On top of PageRank, there is also a system of word matching. –Word counts (Do the words exist on the page?) –Proximity checks (Are the words close together?)

16 Can’t you cheat PageRank? People try everyday! Higher search ranking == More exposure Link Farms –Places where people merely have millions of links to a web page in hopes the target will move higher on the list. –Google’s answer: Page importance. Once link farms are discovered, they are given a negative rank, so if you have a page on a link farm, its rank will go down as well

17 Another way to cheat Put lots of words related to your page in your page (even if they are not visible) Google’s answer: PageRank is primary, cheaters are given lower priority

18 Moral Decisions Wired article –Computer screen shows location, query pairs for random searches on Google’s engines. –One search during the late hours on the West Coast was “How to stop a friend from committing suicide” –Can’t do much about it but make sure they get the right information the next time

19 The Future of Web Intelligence The Semantic Web

20 What is the Semantic Web? As the web presently stands, it is complete nonsense to most software applications. –Two completely different statements The ball is round The round ball The semantic web is a series of protocols meant to enrich the current web with meaning

21 Series of Protocols RDF – Resource Description Framework OWL – Web Ontology Language (extension of RDF)

22 Resource Description Framework From World Wide Web Consortium webpage RDF “defines a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines (a priori) the semantics of any application domain. The definition of the mechanism should be domain neutral, yet the mechanism should be suitable for describing information about any domain“

23 RDF – Some examples Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila. –Abstract, conceptual Framework –Concrete syntax using XML

24 Abstract example Subject (Resource) –http://www.w3.org/Home/Lassila Predicate (Property) –Creator Object (literal) –"Ora Lassila“ Graphic

25 Concrete syntax Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila. Ora Lassila

26 Web Ontology Language What is an ontology? –“defines the terms used to describe and represent an area of knowledge” OWL defines ontologies for use on the web Actually an extension of RDF

27 Ontologies Date and Time Countries of the World Wines Space Shuttle Information

28 Some example OWL statements

29 Conclusion Web intelligence is a broad new field for exploration Present efforts like Google can be improved upon with more semantic information Any questions?


Download ppt "Web Intelligence By Otto Borchert April 28, 2003."

Similar presentations


Ads by Google