Commercial Online Databases and the Internet OSS 99 Global Information Forum May 24, 1999 Anne Caputo Dow Jones Interactive Publishing
Traditional Search Services Challenge the Web nThe Internet Searchoff September 1997-February 1998 Susan Feldman, DATASEARCH nGoal Compare searching traditional online services with World Wide Web Effectiveness in finding information When to use which one Strengths of each approach
Searchoff Ground Rules Be a trained, experienced searcher Use a real question from a client Search either Dialog or Dow Jones Interactive Relevance rank the results Rank the top 30 retrieved documents on a scale of 1 to 5
Subjects Searched nBusiness nTechnology nMedicine/Pharmaceuticals nScience nHumanities nEngineering nOther 38% 18% 14% 10% 8% 6%
Web Search Engines Used nAlta Vista nHotbot nExcite nInfoseek nLycos nWebferret 45% 20% 14% 5% 2%
Relevance Points # Documents Internet Search-Off Results Web totals Dlg/dj totals WD DW
Searching time uTotal minutes searching time: uDIALOG/DOW JONES: 594 minutes uWWW search engines: 1230 minutes uPlus formatting time
Searching Assumptions: traditional search engines Information exists on the subject The information is high quality The information is current The information is expensive To find it, we need expertise and training to know how and where to search It will be a surprise if we cant find something
Searching assumptions: World Wide Web There MIGHT be information on the topic Quality and timeliness is unpredictable The information is free Theres no telling how the search engine works searching requires no skill searching requires no training It will be a surprise if we find something
Retrieved Documents by Relevance RANKED 1RANKED 2RANKED 3RANKED 4RANKED 5 Less Relevant More Relevant Series1 Series2 Web -- DIALOG/ Dow Jones W wW W D D D D
Conclusion DIALOG training has influenced an entire generation of searchers: we automatically shift into Boolean
Digression: Nested Boolean searches dont take advantage of the strong points of Web search engines Statistical search engines search a whole territory. Boolean engines search for a point in that territory
Web Strategies Map the territory: Use your searching skills to create lists of related terms Omit Boolean operators; Let the search engine work without interference Put the most important and most rare words first Use MORE LIKE THIS to improve results
Web Strategies Use phrases when possible to eliminate irrelevant materials Ignore the useless hits and pursue the good ones Dont worry about finding six million documents. Just look at the top 30 Rephrase the search Move to another search engine if you dont find anything
Conclusions: traditional search services Predictable archives Chemical Engineering Electrical Engineering Strengths History and background on companies History and historical figures Market reports, industry reports
Conclusions: traditional search services uCurrent drug studies (authoritative) uIndustry newsletters and journals uFinancial industry coverage uScholarly journal articles uHigh quality information uQuick searches when you know the information is likely to be there
Conclusions: The Web Pictures and illustrations Some conference coverage and papers Product information comes from company Small companies – products/background Medical statistics (current) If you know where to find the information
Conclusions: use both uTo supplement each other for: uStandards uArticles on topics of general interest uPopular subjects uOrganizations uDirectory information uReviews/evaluations/how-to information
Government regulations and other agency information Competitive intelligence Obscure topics Clues for finding information on and offline Conclusions: use both
Conclusions: general uTime is money. u Free information that takes too long to find and format is expensive information uThe Web is a new tool. u We need to learn to use both online sources well uVary strategies and approach to take advantage of each medium