HUMANS do it better! dmoz: The Open Directory Project.

Slides:



Advertisements
Similar presentations
By: Laura Henderson WHAT IS DMOZ ?  What does DMOZ stand for ? It was originally known as DMOZ, from ‘Directory.MOZilla.org’, Now called.
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
1 Advanced Searching Use Query Languages. Use more than one search engine. –Or metasearches like at Start with simple searches. Add.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
How Search Engines Work Source:
Search engines. The number of Internet hosts exceeded in in in in in
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
Week 3: MetaSearch Engines Click here for Word handout Tom Johnson Boston University - Dept. of Journalism
What is a search engine? A program that indexes documents, then attempts to match documents relevant to a user's search requests. The term search engine.
SEARCHING ON THE INTERNET
Effective Internet Searching. Why use the Internet Search for a question Research a topic Current research Variety of sources, a click away What other.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
1 SOCIAL BOOKMARKING 101. HIBA KHALID BILAL SAEED KHAN FARID ALIANI ASKARI HASAN SOCIAL BOOKMARKING.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Lesson 12 — The Internet and Research
History of the Internet  Began in 1969 as a network of computers at UCLA, Santa Barbara, Stanford & Univ. of Utah.  It was funded by the U.S Dept.
Slide No. 1 Searching the Web H Search engines and directories H Locating these resources H Using these resources H Interpreting results H Locating specific.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
WHAT IS A SEARCH ENGINE. Widescreen Presentation Proteus, Keeper of Knowledge. Proteus is synonymous with change and success.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Search Yahoo! With Boolean Operators AND, OR, (), “”, NOT, Domain:
Beyond Search Engines: Advanced Web Searching Subject Directories  Librarians’ Index to the Internet  Infomine Finding Databases on a Subject  The Invisible.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Effective Search Strings Continued. Truncated Searches A special symbol (*) which allows you to search simultaneously for several words with the same.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Where do I find it? Created by Connie CampbellConnie Campbell.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Stop Searching and Start FINDING: Strategies for Effective Web Research.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Searching The Internet Open Text Searching vs. Subject Tree Search Open Text Search Search Engine scans the Web looking for a word or group of words.
How do I find works in the Repository?. University of Texas Libraries UT DR Digital Repository Search in the Repository Keyword search from the Repository.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
7/30/07 Google Search Tips: Lesser Used Databases By Robin Hartman, Associate Librarian Darling Library – Hope International University Adapted from “A.
Web Directories: Group 5 Jack Baker Laura Bingham Morgan Stewart.
A process of taking your best guesses. Companies have web sites where you can access your information.
Unit 1—Computer Basics Lesson 3 The Internet and Research.
LIR 10: Week 10 Advanced WWW Topics. Class Announcements New features on Section 2904 Schedule Missing Homework Online Quiz due 11/16 Another WWW directory.
1 SEARCHING FOR TRUTH Locating Information on the WWW chapter 5.
Selected Internet Search Engines Search Engine Database Advanced/ Boolean Other search options Miscellaneous Google Google google.co m Advanced Search.
WELCOME to Internet 102. Overview of Internet 102 Review of basic internet navigation Review of basic internet navigation Searching for and finding information.
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
By: Kem Forbs Advanced Google Search. Tips and Tricks Keywords: adding additional terms or keywords can redefine your search and make the most relevant.
Internet Power Searching Finding Pearls in a Zillion Grains of Sand By Amelia Kassel Found in “Technical Communication” on page 198.
WebScan: Implementing QueryServer 2.0 Karl Geiger, Amgen Inc. BRS NA UG August 1999.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Federated & Meta Search
Search Engines and Searching the Web
Information Retrieval and Web Design
Information Retrieval and Web Design
Lesson 2: Gathering and Organizing Information Using ICT KEY QUESTION: HOW DO YOU GATHER AND ORGANIZE INFORMATION USING THE COMPUTER AND INTERNET?
Presentation transcript:

HUMANS do it better! dmoz: The Open Directory Project

What is dmoz? dmoz stands for Directory MOZilla dmoz stands for Directory MOZilla Also known as the Open Directory Project (ODP) Also known as the Open Directory Project (ODP) Searchable directory, similar to Yahoo! Searchable directory, similar to Yahoo! Administered by Netscape as a non- commercial entity Administered by Netscape as a non- commercial entity

Who maintains dmoz? Data maintained by “expert” volunteers Data maintained by “expert” volunteers –Anyone can become an editor –47,083 editors ODP categorizes “quality” information ODP categorizes “quality” information –378,028 categories

Interface features Simple Simple No ads No ads Browseable directory Browseable directory Regular and advanced search Regular and advanced search

Web coverage dmoz - 3,260,681 documents dmoz - 3,260,681 documents Google - 2,073,418,204 documents Google - 2,073,418,204 documents

dmoz directory structure Top ArtsHealth Conditions & Diseases Sleep Disorders Narcolepsy Fitness World

RDF Format <d:Title>Top</d:Title></Topic> <d:Title>Arts</d:Title> </Topic> John phillips Blown glass John phillips Blown glass A small display of glass by John Phillips A small display of glass by John Phillips </ExternalPage> <d:Title>Computers</d:Title> </Topic> FME HUB FME HUB Formal Methods Europe (FME) is a European organization supported by the Commission of the European Union (via ESSI of the ESPRIT programme), with the mission of promoting and supporting the industrial use of formal methods for computer systems development. Formal Methods Europe (FME) is a European organization supported by the Commission of the European Union (via ESSI of the ESPRIT programme), with the mission of promoting and supporting the industrial use of formal methods for computer systems development. </ExternalPage> Computer Timeline Computer Timeline A brief description of the eras in computing. A brief description of the eras in computing. </ExternalPage>

Using dmoz data Data is freely available for download Data is freely available for download Must provide attribution and back-link Must provide attribution and back-link No Warranty No Warranty

dmoz data Many sites use dmoz data Many sites use dmoz data –AOL Search –Google –Lycos –HotBot –over 200 others Some sites add enhancements and extensions Some sites add enhancements and extensions –Google adds page rank –Lycos adds targeted ads

Searching dmoz Boolean Boolean –implicitly AND –AND, OR, ANDNOT –allows shorthand (+, |, -) Wildcard search (pup*) Wildcard search (pup*) Phrasal search Phrasal search Mixed searches Mixed searches Field based queries Field based queries

Search relevance Queries performed against fields in the RDF database Queries performed against fields in the RDF database –For documents: title, description, URL –For categories: title, terms/keywords Keywords are chosen manually; potentially more relevant Keywords are chosen manually; potentially more relevant Results clustered by category and ranked according to the number of matches within a given category Results clustered by category and ranked according to the number of matches within a given category –Some inconsistency, but it doesn't seem to be publicly documented –Some documents are flagged with a star and appear at the top of a directory listing (these do not seem to get special promotion in search results)

Relevance feedback Not directly supported Not directly supported Web forms for reporting feedback Web forms for reporting feedback

Engine Uses I-Search Uses I-Search Open source Open source Modules may be added to enable searching of different document types Modules may be added to enable searching of different document types dmoz extensions to I-Search dmoz extensions to I-Search –RDF parsing module –Special search module, to return sub-records

More about I-Search Supports many different kinds of queries Supports many different kinds of queries –Vector search (or at least some sort of weighted keyword search) –Soundex (looks for "similar" words, English and similar only) –Boolean search –Geographic search (hits within a given x1,y1,x2,y2 box) –field searches (for structured documents, like RDF) Thesaurus expansion and stopword lists supported Thesaurus expansion and stopword lists supported Queries translated into an RPN, and pushed onto a stack Queries translated into an RPN, and pushed onto a stack Operations/operands are handled in a generic fashion Operations/operands are handled in a generic fashion Has a number of options for searching (for exact terms): Has a number of options for searching (for exact terms): –dictionary (hash table) –binary search of sorted index

dmoz vs. UNCA Library Catalog UNCA Library Catalog has a fixed vocabulary UNCA Library Catalog has a fixed vocabulary Library catalog created by trained professionals; dmoz uses “expert” volunteers Library catalog created by trained professionals; dmoz uses “expert” volunteers Both use field-based queries Both use field-based queries dmoz always searches the same fields dmoz always searches the same fields

dmoz vs. Google Google uses dmoz’s data Google uses dmoz’s data Google is a search engine (good for finding specific information) Google is a search engine (good for finding specific information) dmoz is a directory (good for finding general information) dmoz is a directory (good for finding general information) Google adds page ranking to dmoz documents Google adds page ranking to dmoz documents

Query 1: When is the next year of the Ram on the Chinese calendar? +"Chinese calendar" +"year of the ram“ +"Chinese calendar" +"year of the ram“ Documents returned Documents returned –Google: 10 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 70% –dmoz: N/A –Library N/A +"Chinese calendar" +"Chinese calendar" Documents returned Documents returned –Google: 15,200 –dmoz: 10; 7 categories –Library: 2 No dead links No dead links Overlap Overlap –4 pages (Google/dmoz) Relevance Relevance –Google: 30% –dmoz: 30% –Library: 50%

Query 2: According to Douglas Adams, author of "HitchHiker's Guide to the Galaxy,“ what is the answer to the question: "What is the meaning of life?" "douglas adams" hitchhiker guide galaxy "meaning of life" "douglas adams" hitchhiker guide galaxy "meaning of life" Documents returned Documents returned –Google: ~364 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 60% –dmoz: N/A –Library N/A “meaning of life“ answer “meaning of life“ answer Documents returned Documents returned –Google: 49,700 –dmoz: 1 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 0% –dmoz: 0% –Library: 0%

Query 3: Find Morgan horse breeders in North Carolina morgan horse breeders north carolina morgan horse breeders north carolina Documents returned Documents returned –Google: 1140 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 40% –dmoz: N/A –Library N/A

Questions?