World Wide Web  Hypertext documents Text Links  Web billions of documents authored by millions of diverse people edited by no one in particular distributed.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Hypertext, hypermedia and interactivity. A brief overview and background primer.
 To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may.
HYPERMEDIA Chang-Yang Lin Eastern Kentucky University
Project 1 Introduction to HTML.
History of the Internet. Origins Late 1950’s: invention of the modem: modulator-demodulator or digital to analog ARPA (Advanced Research Projects Agency)
1 Technological standards, communications protocols, security technology Packet Switching, Web Protocols, HTML, HTTP, HTTPS and HTTS Public and Private.
Tutorial 1 Developing a Basic Web Page
1 Web Search Introduction. 2 The World Wide Web Developed by Tim Berners-Lee in 1990 at CERN to organize research documents available on the Internet.
Topics in this presentation: The Web and how it works Difference between Web pages and web sites Web browsers and Web servers HTML purpose and structure.
Developing a Basic Web Page with HTML
1st Project Introduction to HTML.
1 Internet History Internet made up of thousands of networks worldwide No one in charge of Internet - No governing body Internet backbone owned by private.
Internet Basics مهندس / محمد العنزي
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. 1.
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
The WWW and HTML CMPT 281. Outline Hypertext The Internet The World-Wide-Web How the WWW works Web pages Markup HTML.
World Wide Web Hypertext documents Hypertext documents Text Text Links Links Web Web billions of documents billions of documents authored by millions of.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
The Internet Writer’s Handbook 2/e Introduction to World Wide Web Terms Writing for the Web.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. Revised 1/12/2015 by William Pegram 1.
Operating Systems Concepts 1/e Ruth Watson Chapter 12 Chapter 12 Introduction to the Internet Ruth Watson.
Programming the Web Web = Computer Network + Hypertext.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
The World Wide Web (abbreviated as WWW or W3 and commonly known as the Web) is a system of interlinked hypertext documents accessed via the Internet.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
Chapter 8 Introduction to HTML and Applets Fundamentals of Java.
Overview of HTML and XML. Contents n History n Usage n Examples n Advantages n Disadvantages.
1. Introduction to Internet and to the Web. Motto People are using the web to build things they have not built or written or drawn or communicated anywhere.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 1 Key Concepts 1.
Web Design (1) Terminology. Coding ‘languages’ (1) HTML - Hypertext Markup Language - describes the content of a web page CSS - Cascading Style Sheets.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
History of the Web The World-Wide-Web has revolutionized the availability and access to information. Billions of web pages may be reached through search.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
HYPERTEXT and HYPERMEDIA By Steven Geist and Larnic Ransom.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
COP 3813 Intro to Internet Computing Prof. Roy Levow Lecture 1.
OWL Representing Information Using the Web Ontology Language.
History of the Web – PART 1 ;-). The World-Wide-Web has revolutionized the availability and access to information. Billions of web pages may be reached.
Tutorial 1 Developing a Basic Web Page. Objectives Learn the history of the Web and HTML Describe HTML standards and specifications Understand HTML elements.
Hypertext. Hypertext History (1) Many early attempts to organize human knowledge Many early attempts to organize human knowledge Thesaurus (Roget) Thesaurus.
CSCI-235 Micro-Computers in Science The Internet and World Wide Web.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Intro to HTML. History of the World Wide Web  A network is a structure linking computers together for the purpose of sharing information and services.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Web Design New Brighton High School Exploring the History of the World Wide WebWorld Wide Web.
The Internet MIT2000. Theoretical Predecessors Encyclopaedia ◦Encyclopédie  Denis Diderot ◦Encyclopaedia Britannica (1768) Knowledge and power ◦Political.
CO1552 – Web Application Development Lecture 2 – What is the Internet?
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
The Internet. The Internet and Systems that Use It Internet –A group of computer networks that encircle the entire globe –Began in 1969 Protocol –Language.
By: The Immigrants :D I mean the Mexican and the Colombian I mean Daniel and Felipe.
Web Design Principles 5 th Edition Chapter 3 Writing HTML for the Modern Web.
Information Networks. Internet It is a global system of interconnected computer networks that link several billion devices worldwide. It is an international.
A BRIEF HISTORY OF THE INTERNET, WEB, AND HTML. Internet vs. World Wide Web What is The Internet? The Internet is a massive network of networks, a networking.
A BRIEF HISTORY OF THE INTERNET, WEB, AND HTML. Internet vs. World Wide Web What is The Internet? The Internet is a massive network of networks, a networking.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Data mining in web applications
Project 1 Introduction to HTML.
A Brief Introduction to the Internet
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
Introduction to World Wide Web
Presentation transcript:

World Wide Web  Hypertext documents Text Links  Web billions of documents authored by millions of diverse people edited by no one in particular distributed over millions of computers, connected by variety of media

Mining the WebChakrabarti and Ramakrishnan2 History of Hypertext  Citation, Hyperlinking  Ramayana, Mahabharata, Talmud branching, non-linear discourse, nested commentary,  Dictionary, encyclopedia self-contained networks of textual nodes joined by referential links

Mining the WebChakrabarti and Ramakrishnan3 Hypertext systems  Memex [Vannevar Bush] stands for “memory extension” photoelectrical-mechanical storage and computing device Aim: to create and help follow hyperlinks across documents  Hypertext Coined by Ted Nelson Xanadu hypertext: system with  robust two-way hyperlinks, version management, controversy management, annotation and copyright management.

Mining the WebChakrabarti and Ramakrishnan4 World-wide Web  Initiated at CERN (the European Organization for Nuclear Research) By Tim Berners-Lee  GUIs Berners-Lee (1990) Erwise and Viola(1992), Midas (1993)  Mosaic (1993) a hypertext GUI for the X-window system HTML: markup language for rendering hypertext HTTP: hypertext transport protocol for sending HTML and other data over the Internet CERN HTTPD: server of hypertext documents

Mining the WebChakrabarti and Ramakrishnan5 The early days of the Web : CERN HTTP traffic grows by 1000 between (image courtesy W3C)

Mining the WebChakrabarti and Ramakrishnan6 The early days of the Web: The number of servers grows from a few hundred to a million between 1991 and 1997 (image courtesy Nielsen)

Mining the WebChakrabarti and Ramakrishnan7 1994: the landmark year  Foundation of the “Mosaic Communications Corporation"  first World-wide Web conference  MIT and CERN agreed to set up the World-wide Web Consortium (W3C).

Mining the WebChakrabarti and Ramakrishnan8 Web: A populist, participatory medium  number of writers =(approx) number of readers.  the evolution of MEMES ideas, theories etc that spread from person to person by imitation. Now they have constructed the Internet !! E.g.: “Free speech online", chain letters, and viruses

Mining the WebChakrabarti and Ramakrishnan9 Abundance and authority crisis  liberal and informal culture of content generation and dissemination.  Very little uniform civil code.  redundancy and non-standard form and content.  millions of qualifying pages for most broad queries Example: java or kayaking  no authoritative information about the reliability of a site

Mining the WebChakrabarti and Ramakrishnan10 Problems due to Uniform accessibility  little support for adapting to the background of specific users.  commercial interests routinely influence the operation of Web search “Search Engine Optimization“ !!

Mining the WebChakrabarti and Ramakrishnan11 Hypertext data  Semi-structured or unstructured No schema  Large number of attributes

Mining the WebChakrabarti and Ramakrishnan12 Crawling and indexing  Purpose of crawling and indexing quick fetching of large number of Web pages into a local repository indexing based on keywords Ordering responses to maximize user’s chances of the first few responses satisfying his information need.  Earliest search engine: Lycos (Jan 1994)  Followed by…. Alta Vista (1995), HotBot and Inktomi, Excite

Mining the WebChakrabarti and Ramakrishnan13 Topic directories  Yahoo! directory to locate useful Web sites  Efforts for organizing knowledge into ontologies Centralized: (Yahoo!) Decentralized: About.COM and the Open Directory

Mining the WebChakrabarti and Ramakrishnan14 Clustering and classification  Clustering discover groups in the set of documents such that documents within a group are more similar than documents across groups. Subjective disagreements due to  different similarity measures  Large feature sets  Classification For assisting human efforts in maintaining taxonomies E.g.: IBM's Lotus Notes text processing system & Universal Database text extenders

Mining the WebChakrabarti and Ramakrishnan15 Hyperlink analysis  Take advantage of the structure of the Web graph. Indicators of prestige of a page (E.g. citations) HITS & PageRank  Bibliometry bibliographic citation graph of academic papers  Topic distillation Adapting to idioms of Web authorship and linking styles

Mining the WebChakrabarti and Ramakrishnan16 Resource discovery and vertical portals  Federations of crawling and search services each specializing in specific topical areas.  Goal-driven Web resource discovery language analysis does not scale to billions of documents counter by throwing more hardware

Mining the WebChakrabarti and Ramakrishnan17 Structured vs. Web data mining  traditional data mining data is structured and relational well-defined tables, columns, rows, keys, and constraints.  Web data readily available data rich in features and patterns spontaneous formation and evolution of  topic-induced graph clusters  hyperlink-induced communities  Goal of book: discovering patterns which are spontaneously driven by semantics,