Presentation is loading. Please wait.

Presentation is loading. Please wait.

HUMANS do it better! dmoz: The Open Directory Project.

Similar presentations


Presentation on theme: "HUMANS do it better! dmoz: The Open Directory Project."— Presentation transcript:

1 HUMANS do it better! dmoz: The Open Directory Project

2 What is dmoz? dmoz stands for Directory MOZilla dmoz stands for Directory MOZilla Also known as the Open Directory Project (ODP) Also known as the Open Directory Project (ODP) Searchable directory, similar to Yahoo! Searchable directory, similar to Yahoo! Administered by Netscape as a non- commercial entity Administered by Netscape as a non- commercial entity

3 Who maintains dmoz? Data maintained by “expert” volunteers Data maintained by “expert” volunteers –Anyone can become an editor –47,083 editors ODP categorizes “quality” information ODP categorizes “quality” information –378,028 categories

4 Interface features Simple Simple No ads No ads Browseable directory Browseable directory Regular and advanced search Regular and advanced search http://www.dmoz.org/ http://www.dmoz.org/ http://www.dmoz.org/

5 Web coverage dmoz - 3,260,681 documents dmoz - 3,260,681 documents Google - 2,073,418,204 documents Google - 2,073,418,204 documents

6 dmoz directory structure Top ArtsHealth Conditions & Diseases Sleep Disorders Narcolepsy Fitness World

7 RDF Format <d:Title>Top</d:Title></Topic> <d:Title>Arts</d:Title> </Topic> John phillips Blown glass John phillips Blown glass A small display of glass by John Phillips A small display of glass by John Phillips </ExternalPage> <d:Title>Computers</d:Title> </Topic> FME HUB FME HUB Formal Methods Europe (FME) is a European organization supported by the Commission of the European Union (via ESSI of the ESPRIT programme), with the mission of promoting and supporting the industrial use of formal methods for computer systems development. Formal Methods Europe (FME) is a European organization supported by the Commission of the European Union (via ESSI of the ESPRIT programme), with the mission of promoting and supporting the industrial use of formal methods for computer systems development. </ExternalPage> Computer Timeline Computer Timeline A brief description of the eras in computing. A brief description of the eras in computing. </ExternalPage>

8 Using dmoz data Data is freely available for download Data is freely available for download http://dmoz.org/rdf.html http://dmoz.org/rdf.html http://dmoz.org/rdf.html http://dmoz.org/license.html http://dmoz.org/license.html http://dmoz.org/license.html Must provide attribution and back-link Must provide attribution and back-link No Warranty No Warranty

9 dmoz data Many sites use dmoz data Many sites use dmoz data –AOL Search –Google –Lycos –HotBot –over 200 others Some sites add enhancements and extensions Some sites add enhancements and extensions –Google adds page rank –Lycos adds targeted ads

10 Searching dmoz Boolean Boolean –implicitly AND –AND, OR, ANDNOT –allows shorthand (+, |, -) Wildcard search (pup*) Wildcard search (pup*) Phrasal search Phrasal search Mixed searches Mixed searches Field based queries Field based queries

11 Search relevance Queries performed against fields in the RDF database Queries performed against fields in the RDF database –For documents: title, description, URL –For categories: title, terms/keywords Keywords are chosen manually; potentially more relevant Keywords are chosen manually; potentially more relevant Results clustered by category and ranked according to the number of matches within a given category Results clustered by category and ranked according to the number of matches within a given category –Some inconsistency, but it doesn't seem to be publicly documented –Some documents are flagged with a star and appear at the top of a directory listing (these do not seem to get special promotion in search results)

12 Relevance feedback Not directly supported Not directly supported Web forms for reporting feedback Web forms for reporting feedback http://dmoz.org/cgi-bin/feedback.cgi http://dmoz.org/cgi-bin/feedback.cgi http://dmoz.org/cgi-bin/feedback.cgi

13 Engine Uses I-Search Uses I-Search http://www.etymon.com/Isearch/ http://www.etymon.com/Isearch/ http://www.etymon.com/Isearch/ Open source Open source Modules may be added to enable searching of different document types Modules may be added to enable searching of different document types dmoz extensions to I-Search dmoz extensions to I-Search –RDF parsing module –Special search module, to return sub-records

14 More about I-Search Supports many different kinds of queries Supports many different kinds of queries –Vector search (or at least some sort of weighted keyword search) –Soundex (looks for "similar" words, English and similar only) –Boolean search –Geographic search (hits within a given x1,y1,x2,y2 box) –field searches (for structured documents, like RDF) Thesaurus expansion and stopword lists supported Thesaurus expansion and stopword lists supported Queries translated into an RPN, and pushed onto a stack Queries translated into an RPN, and pushed onto a stack Operations/operands are handled in a generic fashion Operations/operands are handled in a generic fashion Has a number of options for searching (for exact terms): Has a number of options for searching (for exact terms): –dictionary (hash table) –binary search of sorted index

15 dmoz vs. UNCA Library Catalog UNCA Library Catalog has a fixed vocabulary UNCA Library Catalog has a fixed vocabulary Library catalog created by trained professionals; dmoz uses “expert” volunteers Library catalog created by trained professionals; dmoz uses “expert” volunteers Both use field-based queries Both use field-based queries dmoz always searches the same fields dmoz always searches the same fields

16 dmoz vs. Google Google uses dmoz’s data Google uses dmoz’s data Google is a search engine (good for finding specific information) Google is a search engine (good for finding specific information) dmoz is a directory (good for finding general information) dmoz is a directory (good for finding general information) Google adds page ranking to dmoz documents Google adds page ranking to dmoz documents

17 Query 1: When is the next year of the Ram on the Chinese calendar? +"Chinese calendar" +"year of the ram“ +"Chinese calendar" +"year of the ram“ Documents returned Documents returned –Google: 10 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 70% –dmoz: N/A –Library N/A +"Chinese calendar" +"Chinese calendar" Documents returned Documents returned –Google: 15,200 –dmoz: 10; 7 categories –Library: 2 No dead links No dead links Overlap Overlap –4 pages (Google/dmoz) Relevance Relevance –Google: 30% –dmoz: 30% –Library: 50%

18 Query 2: According to Douglas Adams, author of "HitchHiker's Guide to the Galaxy,“ what is the answer to the question: "What is the meaning of life?" "douglas adams" hitchhiker guide galaxy "meaning of life" "douglas adams" hitchhiker guide galaxy "meaning of life" Documents returned Documents returned –Google: ~364 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 60% –dmoz: N/A –Library N/A “meaning of life“ answer “meaning of life“ answer Documents returned Documents returned –Google: 49,700 –dmoz: 1 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 0% –dmoz: 0% –Library: 0%

19 Query 3: Find Morgan horse breeders in North Carolina morgan horse breeders north carolina morgan horse breeders north carolina Documents returned Documents returned –Google: 1140 –dmoz: 0 –Library: 0 No dead links No dead links No overlap No overlap Relevance Relevance –Google: 40% –dmoz: N/A –Library N/A

20 Questions?


Download ppt "HUMANS do it better! dmoz: The Open Directory Project."

Similar presentations


Ads by Google