Characterizing the Web CSCI 572: Information Retrieval and Search Engines Summer 2011.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Listed Danny Sullivan Editor, Search Engine Watch
Advertisements

Semantic Annotation and Search for Resources in the Next Generation Web Ajith H. Ranabahu, Amit Sheth, Maryam Panahiazar, Sanjaya Wijeratne Kno.e.sis Center.
Our Digital World Second Edition
CS 431 The Semester in Elevator Speak Carl Lagoze – Cornell University May 5, 2004.
INTERNET A collection of networks. History ARPANet – developed for security of sending in case of a nuclear attack IDEA – the system would not go down.
UMBC AN HONORS UNIVERSITY IN MARYLAND Future Research Challenges and Needed Resources for The Web, Semantics and Data Mining Tim Finin UMBC, Baltimore.
Applied Architecture (or… Architecture In Action) David Woollard University of Southern California Software Architecture Group NASA Jet Propulsion Laboratory.
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
(c) Maria Indrawan Distributed Information Retrieval.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
SEO: Past, Present, Future Name Company Twitter. SEO Tips from Website Grader Lessons from 2,602,042 websites.
The Internet. What is the Internet? A community with about 100 million users Available in almost every country about 160,000 people are added each month.
Chapter 3 Search Before Google. Briefly describe search engines before Google Innovations (introduction of something new) Mistakes or things that these.
1 Internet History Internet made up of thousands of networks worldwide No one in charge of Internet - No governing body Internet backbone owned by private.
Web 3.0 or The Semantic Web By: Konrad Sit CCT355 November 21 st 2011.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Search engines Christian Rennerskog, Jonas Rosling, Mattias Olsson.
IT 210 The Internet & World Wide Web introduction.
Chapter 1: Introduction to Web
Search Engines and Information Retrieval Chapter 1.
The Internet and New Technologies: The Media Converge Chapter 2.
Digital Media Dr. Jim Rowan ITEC The Internet your computer DHCP: your browser (Safari)(client) webpages and other stuff yahoo.com (server)
Chapter 8 The Internet: A Resource for All of Us.
May 28, E-Commerce ‘Attracting More People through Better Web Design’ by Duke Duyck Total Design Creations
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Promotion & Cataloguing AGCJ 407 Web Authoring in Agricultural Communications.
 Commercial- to sell or promote company products  Portal- provide a variety of everyday services  Informational- provide useful info & news  Educational-
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine Interfaces search engine modus operandi.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Business Research Methods Using the Internet- to aid your studies.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Do's and don'ts to improve your site's ranking … Presentation by:
Use of Electronic and Internet advertising options Standard 3.4.
Content Detection and Analysis CSCI 572: Information Retrieval and Search Engines Summer 2010.
1 UNIT 15 Webpage Creator Lecturer: fadwa tlaelan.
Kingdom of Saudi Arabia Ministry of Higher Education Al-Imam Muhammad Ibn Saud Islamic University College of Computer and Information Sciences Chapter.
1 Very similar items lost in the Web: An investigation of deduplication by Google Web Search and other search engines CWI, Amsterdam,
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Contact Us: For more information on this or any other TownNews.com product, contact your regional sales manager. Main TownNews.com Office:
CSCI 572: Information Retrieval and Search Engines: Summer 2011 Prof. Chris A. Mattmann.
Huda AL-Omairl - Network91 The Internet. Huda AL-Omairl - Network92 What is Internet? The world’s largest computer network, consisting of millions of.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Internet Architecture and Governance
Internet. What is Internet Internet is a computer network made up of millions of networks worldwide. No one knows exactly how many computers are connected.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
and Internet Explorer.  The transmission of messages and files via a computer network  Messages can consist of simple text or can contain attachments,
Chapter 3 Searching the Literature. Reading the Literature You will need to understand what has already been written about a topic before you can ask.
SEO BASICS Internet Marketers #SEOmkt3730 Done By: Evan Clough Ashley Sellers Erik Wilson Stephen Glover.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Content Marketing Proposal NAME AND CLASS. Outline Part I: Market ◦The Company ◦Product/Service ◦Current Online Status ◦Challenges ◦Opportunity Part II:
Chapter 8: Web Analytics, Web Mining, and Social Analytics
MAKE YOUR BUSINESS GROW WITH WEB CONTENT MANAGEMENT In today’s world, internet is used as one of the most important and effective marketing tool. For.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
E-commerce | WWW World Wide Web - Concepts
Prepared for SEO Analysis Prepared for 17 June 2014.
E-commerce | WWW World Wide Web - Concepts
Internet.
Objective % Explain concepts used to create websites.
Fred Dirkse CEO, OIC Group, Inc.
Computer Networks and Internet
Objective Explain concepts used to create websites.
Presentation transcript:

Characterizing the Web CSCI 572: Information Retrieval and Search Engines Summer 2011

May-18-11CS572-Summer2011CAM-2 Outline The web –Scale –Complexity –Growth Differences between then and now Where the web is headed

May-18-11CS572-Summer2011CAM-3 The Web Massive scale directed graph Driven by the underlying REST architecture –The key abstraction of information is a resource, named by an URL. –The representation of a resource is a sequence of bytes, plus representation metadata to describe those bytes. –All interactions are context-free: each interaction contains all of the information necessary to understand the request. –Components perform only a small set of well-defined methods on a resource producing a representation to capture the current or intended state of that resource and transfer that representation between components. –Representation metadata are encouraged in support of caching and representation reuse. –The presence of intermediaries is promoted. Copyright © Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

May-18-11CS572-Summer2011CAM-4 Scale GYBA = Sorted on Google, Yahoo!, Bing and Ask YGBA = Sorted on Yahoo!, Google, Bing and Ask

May-18-11CS572-Summer2011CAM-5 How is the scale measured? # of indexed web pages by search engines? –Is this an accurate representation? Published data from major ISPs? –Is this accurate information? What’s missing? –The “deep” web, or dynamic pages –Pages behind security firewalls

May-18-11CS572-Summer2011CAM-6 Why is scale important? Has many influential drivers on the ultimate use cases of the web –Discovery and retrieval of information via: Search Engines Web Services and Grid Computing Targeted communities like Social Networking and the growing field of Analytics Has many influential drivers on the way we build software for web-scale systems –New programming paradigms, e.g., Map Reduce –New technologies to handle huge scale computing, or “Big Data”

May-18-11CS572-Summer2011CAM-7 Complexity

May-18-11CS572-Summer2011CAM-8 Proliferation of content types available By some accounts, 16K to 51K content types* What to do with content types? –Parse them How? Extract their text and structure –Index their metadata In an indexing technology like Lucene, Solr, or Compass, or in Google Appliance –Identify what language they belong to Ngrams *

May-18-11CS572-Summer2011CAM-9 Growth Steady growth, on logarithmic scale since mid 90’s Well into the 100s of M of website and 10s of B of web page scale (even without the deep web)

May-18-11CS572-Summer2011CAM-10 What does growth mean to us (you)? Need for efficient algorithms for all sorts of things –Mining the web for information on you to target ads –Mining the web for information on you to decide whether to hire you or not –Disseminating news effectively (to you) –Disseminating media effectively (to you) –Providing rich browser experiences to lure you to web sites so that you can be sold products NOTE: I underlined you everywhere above for those that missed it, we’ll get back to this

May-18-11CS572-Summer2011CAM-11 The Web: Then and Now Before –The purpose of the web was for geeks to exchange , post on bulletin boards regarding their favorite D&D games, to send files to one another –Scope was limited to geeks, broad infection was many years away –Search* since 1996: Hotbot, Excite, WebCrawler, AskJeeves, Yahoo!, Google, DogPile, Altavista, Lycos, MSN Search, AOL Search, Infoseek, Netscape, Metacrawler, AllTheWeb *

May-18-11CS572-Summer2011CAM-12 The Web: Then and Now Now –The purpose is limitless Computation with services, semantic description of content, proliferation of content, rich browsers, clients, interaction, media Social web is next big thing –Scope is (I kid you not, a 2 year old on up) –Search* now: Google, with competitors like Yahoo and Bing pulling up the rear, and trying to build out open source computational infrastructures to compete *

May-18-11CS572-Summer2011CAM-13 The movement towards the social web Social Networking companies have figured out that mining info about you guys can help build the “semantic” information that was once dreamed about by the likes of Tim Berners-Lee in his Scientific American article in the late 90’s, early 2000’s Why did semantic web fail to gain acceptance but social web has succeeded? –The realization that machines are poor annotators of information and that they are even worse trust establishers –And that you guys are the experts at this!

May-18-11CS572-Summer2011CAM-14 Social Web and “Big Data” Many challenges induced by the complexity, scale, and growth of the traditional web are only increased when the social web is taken into account The development of algorithms to crawl the social graph have led to several Ph.D.s and are huge money makers for existing businesses –Analytics is what they call this nowadays Search is a HUGE challenge and interesting research problem within the social web –Instead of using information retrieval to deduce a “rank” for a page, use the trust value assigned via your social graph

May-18-11CS572-Summer2011CAM-15 Wrapup Web has changed dramatically in the last 10 years Understand the different dimensions of the web and the variation points –Scale, complexity and growth are only a selected few Understand where the web is going and why