Presentation on theme: "Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah."— Presentation transcript:
Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
What is the Deep Web? The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users.
How big is the Deep Web? 550 billion documents 500 times the content of the surface Web Google has identified 1.2 billion documents An Internet search typically searches.03% (1/3000) of available content.
Whats in the Deep Web? Searchable databases Downloadable files & spreadsheets Image and multi-media files Data sets Various file formats such as.pdf Lots of government information
Why use the Deep Web? Higher quality sources –Selected and organized by subject experts Dynamic display Customized data sets Some data is visual, and not word searchable Regular search engines miss vast resources available in the Deep Web
Why are we talking about Government Sites in the Deep Web? Governments have the mandate and the capacity to gather information that individuals dont Most government information is copyright free Government information is authoritative Governments have the financial and human resources to maintain Deep Web sites
The Web Today Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of The Deep Web. The content of these web sites include items with either an.html or.pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is Grey Literature.
Definition of Grey Literature That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers
Growth and Life of Federal Information On federal web sites the amount of information grew 13-fold between 1992- 2003 The average life expectancy of federal web resource is 4 months (2003)
What can libraries do? LOCKSS-DOCS project (BYU and UU are members) (Archival project) Cooperative efforts in specific subject areas (Western Waters Digital Library) Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants)
Finding Naked People - Forsyth, Fleck (1996)Finding Naked People - Forsyth, Fleck (1996) (Correct) (54 citations)(Correct)(54 citations) This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin. http.cs.berkeley.edu/~daf/newo2.ps.Z
Graph showing number of citations to Finding Naked People
searching for ""University of Utah"" displaying records 1 - 25 of a total of 27 next 25last 25 Development and Evaluation of Stitched Sandwich Panels Larry E. Stanley; Daniel O. Adams NASA Langley Research Center NASA/CR-2001-211025, June 2001; 20010702 ….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center…… http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/ NASA-2001-cr211025.pdf
International Deep Web Resources International organizations collect an amazing amount of data Statistical data is often best organized in database and spreadsheet format Like the US Government, individual countries post data files and databases This information may not be available in print sources in schools and libraries
United Nations Official Documents System http://documents.un.org/
Why use the ODS? Full-text Official United Nations Documents (1993 -) online, free Retrospective digitization in process Highly relevant material for almost any international topic Timely and authoritative
United Nations Statistical Databases Value of the information: –Authoritative –Comparative –Time series –Compact Database topics include: Commodity trade Demographics Disability statistics Social indicators Statistics on men and women
Individual Country Statistics http://www.census.gov/main/www/stat_int.html
Why use this kind of information? Aggregate statistical sources are often not as up-to-date Individual countries are often more specific in their indicators than aggregate sources Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers