Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.

Slides:



Advertisements
Similar presentations
Slide 1 of 10 Taming the Internet. Slide 2 of 10 Overview Specific products include Directories, Intellectual Capital Collections, and annotated reports.
Advertisements

Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
Alex Wade, Microsoft Research And acknowledging Walter L. Warnick, Ph.D. Director, Office of Scientific and Technical Information U.S. Department of Energy.
National Technical Information Service. NTIS National Technical Information Service National Technical Information Service (NTIS), United States Department.
Exploring the Academic Invisible Web Das wissenschaftliche Invisible Web erkunden Dr. Dirk Lewandowski Heinrich-Heine-Universität Düsseldorf, Information.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
1 Do More Searching in Less Time Fall Term 2010 Helen B. Josephine
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
What’s new in search? Internet Librarian Oct 29 th 2007.
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
SD1230 Unit 8 The Mobile Landscape. Course Objectives During this unit, we will cover the following course objectives: – Identify the characteristics.
By: Bihu Malhotra 10DD.   A global network which is able to connect to the millions of computers around the world.  Their connectivity makes it easier.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Five Years InterLab ’07 Los Alamos, New Mexico October 1–3, 2007 Valerie S. Allen, MSLIS U.S. Department of Energy Office of Scientific and.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
DTIC Discovery Tools 28 March 2012 Moderator: Kapin L. Ferguson.
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
FLICC Meeting, November 29, 2007 Sharing Our Knowledge Dr. Walter L. Warnick Director DOE Office of Scientific and Technical Information Advancing Global.
Module 3: Business Information Systems Chapter 8: Electronic and Mobile Commerce.
OpenURL Link Resolvers 101
1999 Asian Women's Network Training Workshop Tools for Searching Information on the Web  Search Engines  Meta-searchers  Information Gateways  Subject.
Week 9 Search Engines and the Invisible Web. Resource Pages Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Mobile Social Networks
Research resources for 3 rd /4 th year projects James Webley Subject Librarian Mathematics 13 October 2014.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
Navigating An Introductory Guide for Librarians Brought to you by:
You Found It ! A Wealth of Government Science Information A Wealth of Government Science Information C’mon in ! We’ll show you!
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
1 Federated Search (Emphasizing WorldWideScience.org) as a Transformational Technology Enabling Knowledge Discovery InterLending and Document Supply Conference.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Speeding Nano Progress Using Information Diffusion Walt Warnick, Ph.D. Director, Office of Scientific and Technical Information U.S. Department of Energy.
Sharon M. Jordan Assistant Director for Program Integration U.S. DOE Office of Scientific & Technical Information Vantage Point: Government R&D Results.
Protecting Swami’s Name in a Digital World Presented by the Sathya Sai International Organisation, IT Committee.
1 OSTI - Accelerating Science Information Dr. Walter L. Warnick Director U.S. Department of Energy Office of Scientific and Technical Information Federal.
WorldWideScience.org: An International Knowledge-Sharing Model Brian A. Hitson Office of Scientific & Technical Information U.S. Department of Energy.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
Dr. Walter L. Warnick Director Office of Scientific and Technical Information Office of Science ARPA-E June 24, 2010 Innovative Web Resources Can Advance.
Searching for NZ Information in the Virtual Library Alastair G Smith School of Information Management Victoria University of Wellington.
Five Key Ingredients to Building an Online Media Brand November 20, 2014.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
1 Do More Searching in Less Time Winter Term 2013 Helen B. Josephine
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
Web Search Architecture & The Deep Web
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Internet Power Searching Finding Pearls in a Zillion Grains of Sand.
Rensselaer Polytechnic Institute CSCI-4220 – Network Programming David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st edition.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
[xxxx] SEO Online Marketing for Business Catalyst Websites
ACSIUS Technologies Pvt. Ltd. Tomorrow’s Success Starts Today!
“The illiterate of the future will not be the person who cannot learn. It will be the person who does not know how to learn.” – Alvin Toffler.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
Multilingual WorldWideScience:
Using computers to search electronic databases
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
Search Engines & Subject Directories
ITE 130 Web Searching.
Scientific and Technical Information Issues
Searching for Truth: Locating Information on the WWW
Unit# 5: Internet and Worldwide Web
Search Engines & Subject Directories
Search Engines & Subject Directories
PlainLanguage.gov success story
Presentation transcript:

Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific and Technical Information U.S. Department of Energy November 14 th, 2007

About OSTI A U.S. Department of Energy program within the Office of Science Maintains appropriate public access to DOE research results All collections of scientific and technical information resulting from R&D activities generated from the facilities within the national DOE complex Provides stewardship for the Department’s 60-year legacy of classified and unclassified scientific and technical reports Maintains an electronic repository of over 4 million DOE-produced R&D records dating to 1940s

About OSTI OSTI accelerates the advancement of discovery by speeding access to R&D findings. Science.gov - 50 million pages of U.S. government science information from 17 US Government science organizations WorldWideScience.org million+ pages of international research information from the governments of 17 countries Science Accelerator - federated search of important DOE databases such as E-print Network (includes 1 million documents & 27,000 Web sites) and Information Bridge (includes over 145,000 DOE full text reports)

Overview Users and Search Current problems with search and retrieval OSTI strategy for overcoming problems Future and current work

Do You Know? How big is the web? How much of the worlds information is on the web? How similar are the major search engines in terms of search results? What percentage of a typical web site’s functionality is actually used?

User Goals: –Find authoritative and relevant information. –Users don’t want to search, they want to get something done. Broad scope search engines –Google, Yahoo, MSN (GYM) Narrow scope search engines –Specialized, Topical, Vertical – PubMed, music.yahoo.com, Information Bridge Users and Search

The web now encompasses over 100 million web sites (and a far larger number of pages). The deep web (non-Googleable) has been estimated to be several magnitudes greater than the surface web. Only about 5% of the world’s total information is online today. Only 15% of DOE’s R&D information is full text searchable on the internet. Search - Data Availability

87% of online users have gone online to research a scientific topic. 25% of a knowledge worker’s time is spent searching for information. User Search Statistics

The conventional wisdom is that the major search engines serve up similar results. Survey participants reported ~70% overlap in the top 10 results on Google and Yahoo!. Using the 500 most popular search terms, on average, Google and Yahoo! share only 3.8 of their top 10 results. ~5% of searchers go beyond page #1 Relevancy Bias

More than 95% of your customers will use less than 5% of the features and functions of your site. Imperative that for a site to be successful it must accommodate the typical user. Site Usage Statistics

Users and Search Summary Users want authoritative, relevant information fast and easy Search is prevalent, information users spend a significant portion of their time searching Not all data is online, and not all information available online is included in GYM searches If relevancy rankings don’t return “relevant information” on the first page – the data is not found most of the time

Users want authoritative, relevant information fast and easy Search is prevalent, information users spend a significant portion of their time searching Not all data is online, and not all information available online is included in GYM searches If relevancy rankings don’t return “relevant information” on the first page – the data is not found most of the time Problem Areas

Failure rate for desktop information seekers keeps rising(~ 30%) Search success inversely proportional to amount of data? Problem Areas

OSTI’s focus has been and remains to make scientific and technical information searchable and retrievable. OSTI’s Focus

Distribution of DOE content to major search engines. –Sitemap Protocol – low development time, low maintenance, reduces amount of unnecessary repeated data requests from crawlers –Allows for nearly 100% coverage for each content source –~60% of October’s traffic to Information Bridge were from Google referrals OSTI Strategies

Enabling vertical search capabilities to authoritative, relevant Scientific and Technical Information (STI) –Federated search - Includes authoritative, subject-matter relevant searches of Deep Web Content –Web harvesting - Includes content harvested/crawled from authoritative, subject matter specific URLs OSTI Strategies

Development and maintenance of DOE STI data collections –Information Bridge –Energy Citations –DOE Patent Database OSTI Strategies

Attribution to source of data –Makes users finding data via search engines aware of the source of data –Users more likely to bookmark and re-visit high quality vertical search engines OSTI Strategies

Content distribution via major search engines + Providing STI specific vertical search capabilities enabled via Federated Search and Web Harvesting + Increasing awareness of OSTI vertical search applications via attribution on search engine referrals = More users getting the most relevant results from swath of available internet OSTI Strategies - Overview

Enabling search on non-text information Numeric Data Video Images Audio Future Work – Data Types

30% search failure rate tolerable for desktop, not necessarily true for mobile device searches Ipsos Insight's 2005 "The Face of the Web" study shows significant increases in: ownership of mobile phones, mobile surfing by mainstream users, and adoption of wireless mobile technology by adults aged 35 and older. Digital natives two thumb typing at incredible speeds (est. 1.5 digital natives in Japan can type at equivalent desktop speeds of 100 words / min) Future Work – Mobile

Visualization – identification of scientific communities (publishing groups) and cross over areas in scientific research Social Tools –75% of a user’s time spent on top news sites is spent reading user comments about the story, and only 25% on the story itself –Over 60% of web content utilized by users age 25 and under is user generated Future Work – Visualization & Social Tools

Utilize HCI labs and testing results to optimize web sites Expand reach of federated search by adding additional deep web content Add functionality to OSTI’s federated vertical search engines * CompletePlanet.com – searchable directory of Deep Web sources Future Work

Jeff Given Office of Scientific and Technical Information Contact Information