Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah.

Similar presentations


Presentation on theme: "Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah."— Presentation transcript:

1 Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

2 What is the Deep Web? The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users.

3 How big is the Deep Web? 550 billion documents 500 times the content of the surface Web Google has identified 1.2 billion documents An Internet search typically searches.03% (1/3000) of available content.

4 Whats in the Deep Web? Searchable databases Downloadable files & spreadsheets Image and multi-media files Data sets Various file formats such as.pdf Lots of government information

5 Why use the Deep Web? Higher quality sources –Selected and organized by subject experts Dynamic display Customized data sets Some data is visual, and not word searchable Regular search engines miss vast resources available in the Deep Web

6 Why are we talking about Government Sites in the Deep Web? Governments have the mandate and the capacity to gather information that individuals dont Most government information is copyright free Government information is authoritative Governments have the financial and human resources to maintain Deep Web sites

7

8 The Web Today Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of The Deep Web. The content of these web sites include items with either an.html or.pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is Grey Literature.

9 Definition of Grey Literature That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers

10 Growth and Life of Federal Information On federal web sites the amount of information grew 13-fold between 1992- 2003 The average life expectancy of federal web resource is 4 months (2003)

11 What can libraries do? LOCKSS-DOCS project (BYU and UU are members) (Archival project) Cooperative efforts in specific subject areas (Western Waters Digital Library) Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants)

12

13

14

15

16

17

18

19

20

21 Finding Naked People - Forsyth, Fleck (1996)Finding Naked People - Forsyth, Fleck (1996) (Correct) (54 citations)(Correct)(54 citations) This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin. http.cs.berkeley.edu/~daf/newo2.ps.Z

22 Graph showing number of citations to Finding Naked People

23

24 Arches National Park : NASA Landsat 7 10/3/99

25

26

27 searching for ""University of Utah"" displaying records 1 - 25 of a total of 27 next 25last 25 Development and Evaluation of Stitched Sandwich Panels Larry E. Stanley; Daniel O. Adams NASA Langley Research Center NASA/CR-2001-211025, June 2001; 20010702 ….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center…… http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/ NASA-2001-cr211025.pdf

28

29

30

31

32

33 Marriott Library, Salt Lake City, Utah, United States 9/18/2003 (TerraServer)

34

35 Utah Seismic Hazards (National Atlas)

36 International Deep Web Resources International organizations collect an amazing amount of data Statistical data is often best organized in database and spreadsheet format Like the US Government, individual countries post data files and databases This information may not be available in print sources in schools and libraries

37 United Nations Official Documents System http://documents.un.org/

38 Why use the ODS? Full-text Official United Nations Documents (1993 -) online, free Retrospective digitization in process Highly relevant material for almost any international topic Timely and authoritative

39

40

41

42 United Nations Statistical Databases Value of the information: –Authoritative –Comparative –Time series –Compact Database topics include: Commodity trade Demographics Disability statistics Social indicators Statistics on men and women

43 http://unstats.un.org/unsd/databases.htm

44

45

46

47 Individual Country Statistics http://www.census.gov/main/www/stat_int.html

48 Why use this kind of information? Aggregate statistical sources are often not as up-to-date Individual countries are often more specific in their indicators than aggregate sources Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers

49

50

51

52 For Further Information Marriott Library, University of Utah 801-581-8394 www.lib.utah.edu/documents Peter.Kraus@utah.edu


Download ppt "Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah."

Similar presentations


Ads by Google