WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Chapter 5: Introduction to Information Retrieval
PHP Meetup - SEO 2/12/2009. Where to Focus? Ensuring the findability of content Ensuring content is well understood by search engines Maximizing the importance.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Information Retrieval in Practice
Search Engines and Information Retrieval
Architecture of a Search Engine
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Information Retrieval
Overview of Search Engines
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
Search Engine Optimization HOW AND WHY Introduction to SEO SEO stands for “Search Engine Optimization” and often refers to the ability to easily locate.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
SEO for Web Designers By Alfredo Palconit, Jr.. I. What is SEO? A process of improving a site’s traffic and rank from organic search engine results. Notes:
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
1 Web Developer & Design Foundations with XHTML Chapter 13 Key Concepts.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
SEO techniques & Mastering Google Adwords By Ganesh.S
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Kevin Mauricio Apaza Huaranca San Pablo Catholic University.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
WIRED Week 4 Syllabus Review Readings Overview - Web IR Chapter - Brin & Page - Google - Kobayashi & Takeda – Overview Search Engine Optimization Assignment.
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
WIRED Future Quick review of Everything What I do when searching, seeking and retrieving Questions? Projects and Courses in the Fall Course Evaluation.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Search Engine Optimisation No Point having a lovely site and lovely content if no one can find it!
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Search Engine Optimization
Information Retrieval in Practice
Information Architecture
Search Engine Optimization
Search Engine Architecture
IST 516 Fall 2011 Dongwon Lee, Ph.D.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Search Search Engines Search Engine Optimization Search Interfaces
Search Search Engines Search Engine Optimization Search Interfaces
Hvhmi ارائه دهنده : ندا منقاش. Hvhmi ارائه دهنده : ندا منقاش.
Data Mining Chapter 6 Search Engines
Web Search Engines.
Presentation transcript:

WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion

Web Search Engines Independent of IR model Distributed index and servers - Crawler - Query server - Indexer Crawlers and Spiders - Centralized control, Coordinated, Refresh, Filtering - Not the main problem Queries - Interface, processing, results Indexing - Data normalization, load balancing, data sharing

Harvesting Not just Web data - Caching, Duplication, Normalization Armies of crawlers Filtering collected data Gatherers - Collects and extracts on various schedules - Works with several brokers Brokers - Indexes and interfaces to queries - Works with other Brokers and Gatherers Topical Agents?

Web Crawling Issues Follow chains of URLs to gather more URLs Extract index (content) from each page Lather-Rinse-Repeat Update crawler to-do list Associate frequency of crawls Breadth or Depth first? Endless looping Duplicate pages/sites Changed page (or not really?) Dynamically generated pages Intranet pages Markup language getting in the way NOROBOTS What should a crawler get?

Indexing the Web Inverted File Index - Sorted words with pointers to location(s) & page(s) - Pointers are the focus (inversion) What about pages and sites? - Massive redundancy on well-organized sites Navigation Topics Content “State of the art indexing techniques” = 30% of text (not page) size. p 383 How can you tune an index for massively changing documents?

Ranking Boolean and Vector models mostly used - Why? - Works from the index, not the text Which ranking methods are best? - Datasets - Syntaxes - Users & Testing

Ranking Methods TF-IDF - Simple, smaller data sets Boolean Spread - Degrees of match - Within a document - Set of documents - Links between documents (meta docs?) Vector Spread - Standard cosine between query and index (to document) - Links with answer or pointing to answer Most Cited

Is Web ranking different? Links are the difference that makes the difference - Internal links on a page - Internal links on a site - Relationships between sites - Link freshness Kleinberg’s HITS method (1998) - Hypertext Induced Topic Search - Number of pages that point to (processed) query - Authorities (relevant content by links) - Hubs (links to varied authorities)

Problems with Hubs & Authorities Is more links always better? What about pages without many outgoing links? How do you count multiple links from within one page to another? Do automatically generated sites/pages have an advantage? - CMS systems may have linking “fingerprints” - Metadata How varied are the link weights? - Simple counts - Modified by other IR measures

Anatomy of a LS Web Search Engine Initial Google Design PageRank - PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn)) - “A model of user behavior” probability of a random surfer visiting a page is its PageRank + a damping factor (boredom) - Pages point to a page - Highly ranked pages point to a page - Anchor text is mined (the label for the link) - Proximity included

Anatomy 2 Repository of page content Document index - Forward (sorted) - Inverted (sorter) Lexicon of words & pointers Hit Lists of word occurrence(s) Crawlers Ranking Feedback of selection (~)

Popularity? Do you always want the most popular information source? - Talk Radio - New York Times Bestseller List - “Lincoln’s Doctors Dog” - “The C.S.I. Diet and Cookbook” Trend or Fad? Blogs, Editorials and Propaganda vs. “Facts”? Result Diversity Death of the Mid-List

Metasearch Issues One place for everything? First or Last place to look? Better or different interface? Combined, sorted results would be best - How to sort? - Sorting for different types of queries Syntax Errors State Information (monitoring) Copyright issues (robots) User, content and interface mismatches/challenges

Web Searching Metaphors How do people visualize the Web? Is Browsing better? Do we need new metaphors for using the Web? - Searching - Browsing - What else?

Search Engine Optimization Found by spiders and submissions - More links to and from site - Registration on major directories - Links to and from major directories Real Contact information Helps prove validity - META tag - Header and footer of home page - About Us or Contact Us pages - Location/Map page

Good Design is SEO Basic interface Well-structured links - Comprehensive Site Navigation - Updated and accurate links Easy to find (via the Web or on the site itself) Clear labels - TITLEs - Headings - Term consistency - Link consistency Small sizes to download quickly

Web Search Tests Perform searches with targeted keywords Compare and contrast top results with your potential site - Similar terms - Links (external and internal) - Popularity (sites that link to the site) Use Data to - Build a keyword list - Build an introductory text Blurbs Description (2 sentences max) Any page found via a Web search engine should have search for the site itself Regularly monitor Search with your terms

Internal Search Robots.txt Log and analyze search results - Measure success and failure - Tune for click-through productivity - Keep list of terms - Match terms to pages Add terms Script terms to certain pages - Provide list (links) of most recent search terms - Provide list (links) of most popular search terms

Page Design Use CSS - - Keep content in pages, not CSS templates Put JavaScript, etc. in external files - - tag too for alternate content Continually verify external links ALT tags & Accessibility Compliance Index link on Splash page (if needed) Exact consistency on internal links (ending “/”s) Redirects

Search and MIME types Flash now supports internal text PDF files - Add comments and authorship info - Modify existing PDFs Check Document Properties  Fonts with fonts shows that PDF can be indexed (not a group a graphics files) - Provide text abstract or summary of PDF PPT, use text if possible Java interfaces prove difficult Dynamic pages should have key(word) static elements FORMs not always completely indexed

Track your Tracking Keep list of sites submitted to - When, Who, address, exact URL submitted - Suggested categories, Current site description - Terms and Conditions Keep list of “goal” keywords Keep list of sites you check keywords - Keywords - Dates - Successes/Failures

Assignment Overview & Scheduling Leading WIRED Topic Discussions - # in class = # of weeks left? Web Information Retrieval System Evaluation & Presentation - 5 page written evaluation of a Web IR System - technology overview (how it works) - a brief history of the development of this type of system (why it works better) - intended uses for the system (who, when, why) - (your) examples or case studies of the system in use and its overall effectiveness

How can (Web) IR be better? - Better IR models - Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview