Presentation is loading. Please wait.

Presentation is loading. Please wait.

WIRED Week 2 Syllabus Update Readings Overview.

Similar presentations


Presentation on theme: "WIRED Week 2 Syllabus Update Readings Overview."— Presentation transcript:

1 WIRED Week 2 Syllabus Update Readings Overview

2 Why IR? IR originally mostly for systems, not people
IR in the last 25 years: classification and categorization systems and languages user interfaces and visualization A small world of concern The Web changed everything Huge amount of accessible information Varied information sources Relatively easy to look for information Improving IR means improving learning Digital technology changes everything (again) We cut out the middle man and pass the savings on to you!

3 WIRED Focus Information Retrieval: representation, storage, organization of, and access to information items Focus is on the user information need User information need: Find all docs containing information on Austin which: Are hosted by utexas.edu Discuss restaurants Emphasis is on the retrieval of information (not data, not just a keyword match)

4 The Search Who is John Battelle?
Magazine Editor: WIRED, The Industry Standard Web 2.0 conference organizer Business 2.0 magazine columnist Federated Media Publishing Boingboing.net “manager”

5 Database of Intentions
What do you think the database of intentions is? Is it more than Google’s Zeitgeist? What we’re thinking about and interested in. Everything we want to know and when we want to know it. “the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result” (Battelle, p 6) “a real time history of post-Web culture” (p 6) What other databases like this are there? How is this possible?

6 Searchiness? The “tasking” of search?
Everything could be a search task? Every task has an ad associated with it? Our expectations are met and made with search. How would the Web work without search? Yahoo and links, LOTS of links You are your clickstream? Products & services based on it “marketing, media, technology, pop culture, international law, and civil liberties” (p 13)

7 Elements of Search Crawl Index Runtime system (query processor)
Segments the data Analyzes the Crawl Optimizes everything Interface Query Reults Users

8 Search before Google Traditional systems: SMART (Salton)
Strongly typed information, (traditional databases) Not always interactive or easy to use Library Catalogs online Controlled vocabulary & limited records Internet: Archie & Veronica Titles only (mostly) over text Web: WWW Wanderer, Web Crawler Full text, HTML & links

9 AltaVista gets serious
Web now large enough to be a challenge Now enough content that you’d want to search it Costs of hardware & bandwidth falling Parallel crawlers Significant CPU resources 1995 = 16 million documents Why didn’t people get it ?

10 The Web goes Pro Lycos Yahoo AOL Excite
Anchor text & content location context Yahoo Directory & clean interface for browsing links Adversiting & user (logs) analysis AOL Gateway to the internet for many Excite Consumer-driven, word relationships Acquisitions of Magellan, WebCrawler ++ MyExcite - the Portal @Home (compete with AOL)

11 Google is Born Larry Page & Sergey Brin
Links are the key (Bibliometrics) Impact factor (“link it if you like it”) Patterns of citation (links) expand the text Defending & setting the context of your work by associating it with others Backrub Crawl pages, store links, analyze them, publish Large computing challenges PageRank Link counts with a recipe for deriving (relative) value Value is who & and their rank too

12 Google goes Pro More resources for more data
Help with (significant) analysis design Lack of commercial approach may have been a strength Not ads, but just good search Simple (non-existent) design of interface had an impact More people getting online Broadband adoption & stabilizing browsers Growing content (to say the least)

13 Assignments Read weekly Primary Readings & Participate in class discussions 10% Re-design Search Results interface 10% Web (log) analytics 25% “Google 2010” (5 page paper) 10% Class Topic Presentation 15% Main Project 30%

14 Projects and/or Papers Overview
How can (Web) IR be better? Better IR models Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications


Download ppt "WIRED Week 2 Syllabus Update Readings Overview."

Similar presentations


Ads by Google