Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley.

Similar presentations


Presentation on theme: "8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley."— Presentation transcript:

1 8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval

2 8/31/2000Information Organization and Retrieval What is Information? There is no “correct” definition Can involve philosophy, psychology, signal processing, physics Cookie Monster’s definition: – “news or facts about something” Oxford English Dictionary –information: informing, telling; thing told, knowledge, items of knowledge, news –knowledge: knowing familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known

3 8/31/2000Information Organization and Retrieval Assignment 1 What is information, according to your background or area of expertise?

4 8/31/2000Information Organization and Retrieval Types of Information Differentiation by form. Differentiation by content. Differentiation by quality. Differentiation by associated information.

5 8/31/2000Information Organization and Retrieval Information Properties Information can be communicated electronically –Broadcasting –Networking Information can be easily duplicated and shared –Problems of Ownership –Problems of Control Adapted from ‘Silicon Dreams’ by Robert W. Lucky

6 8/31/2000Information Organization and Retrieval Intuitive Notion (Losee 97) Information must –Be something, although the exact nature (substance, energy, or abstract concept) is not clear; –Be “new”: repetition of previously received messages is not informative –Be “true”: false or counterfactual information is “mis- information” –Be “about” something This human-centered approach emphasizes meaning and use of message

7 8/31/2000Information Organization and Retrieval Information from the Human Perspective Levels in cognitive processing –perception –observation/attention –reasoning, assimilating, forming inferences Knowledge: justified true belief Belief: an idea held based on some support; an internally accepted statement, result of inductive processes combining observed facts with a reasoning process Does information require a human mind? –Communication and information transfer among ants –A tree falls in the forest … is there information there? –Existence of quarks

8 8/31/2000Information Organization and Retrieval Meaning vs. Form Form of information as the information itself Meaning of a signal vs. the signal itself –What aspects of a document are information? Representation (Norman 93) –Why do we write things down? Socrates thought writing would obliterate serious thought Sounds and gestures fade away –Artifacts help us to reason –Anything not present in the representation can be ignored –Things left out of the representation are often what we don’t know how to represent

9 8/31/2000Information Organization and Retrieval Information Hierarchy Wisdom Knowledge Information Data

10 8/31/2000Information Organization and Retrieval Information Hierarchy Data –The raw material of information Information –Data organized and presented by someone Knowledge –Information read, heard or seen and understood Wisdom –Distilled and integrated knowledge and understanding

11 8/31/2000Information Organization and Retrieval Information Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data?

12 8/31/2000Information Organization and Retrieval Origins Very early history of content representation –Sumerian tokens and “envelopes” –Alexandria - pinakes –Indices

13 8/31/2000Information Organization and Retrieval Origins Biblical Indexes and Concordances –Hugo de St. Caro – 1247 A.D. : 500 Monks -- KWOC –Book indexes (Nuremburg Chronicle) Library Catalogs Journal Indexes “Information Explosion” following WWII –Cranfield Studies of indexing languages and information retrieval –Development of bibliographic databases Index Medicus -- production and Medlars searching

14 8/31/2000Information Organization and Retrieval Information Theory Claude Shannon, 1940’s, studying communication Ways to measure information –Communication: producing the same message at its destination as that seen at its source –Problem: a “noisy channel” can distort the message Between transmitter and receiver, the message must be encoded Semantic aspects are irrelevant Noise Channel Receiver Desti- nation Message source Trans- mitter

15 8/31/2000Information Organization and Retrieval Information Theory Better called “Communication Theory” Communication may be over time and space Noise SourceDecodingEncodingDestination Message Channel StorageSource Decoding (Retrieval/Reading) Encoding (writing/indexing) Destination Message

16 8/31/2000Information Organization and Retrieval What kinds of information are there? Text –books, periodicals, WWW, memos, ads –published/refeered Film Photos, other Images Broadcast TV, Radio Telephone Conversations Databases

17 8/31/2000Information Organization and Retrieval How much information is there? (Estimates courtesy Hal Varian and Peter Lyman: http://www.sims.berkeley.edu/emc)

18 8/31/2000Information Organization and Retrieval How Much Information? Stored Information –Print –Film –Optical –Magnetic Communicated –Internet –Broadcast –Phone –Mail

19 8/31/2000Information Organization and Retrieval Print Annual Production –Books 968,735 = 8 Terabytes (compressed image) –Newspapers 22643 = 25 Terabytes –Journals 40000 = 2 Terabytes –Magazines 80000 = 10 Terabytes –Office Documents 12x10^9 pages = 312 Terabytes –TOTAL 357 Terabytes (1824 scanned, 35 text)

20 8/31/2000Information Organization and Retrieval Print Library of Congress Printed book collection –About 18 Million books –About 130 Terabytes (compressed image) –For all of LC we should also assume 13M photographs, 5MB each = 65 TB 4M maps, say 200 TB 500K files, 1GB each = 500 TB 3.5M sound recordings, ~2000 TB Grand total: 3 petabytes (~3000 terabytes) Books in Print –3.2 Million titles –About 26 Terabytes

21 8/31/2000Information Organization and Retrieval Film and Image Film –Photographs = 410 Petabytes per year –Movies = 16 Terabytes (Commercial Production of about 4000 films) –X-Rays = 12 Petabytes

22 8/31/2000Information Organization and Retrieval Optical Media CD-Music 90,000 items = 58 TB CD-ROM 3,000 items = 3 TB DVD-Video 5,000 items = 22 TB Total 83 TB

23 8/31/2000Information Organization and Retrieval Magnetic Media Audio Tape 184,200,000 = 184.2 Petabytes Video Tape 355,000,000 = 1420 Floppy disks = 0.07 Removable disks = 1.69 Hard Disks = 500

24 8/31/2000Information Organization and Retrieval Totals Stored Per Year Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books 8 7 Newspapers 25 20 Periodicals 12 12 Office documents 312 312 SUBTOTAL 357 351 Film Photographs 410,000 100,000 Cinema 16 16 X-Rays 12,000 12,000 SUBTOTAL 422,000 112,016 Optical Music CDs 58 40 Data CDs 3 3 DVDs 22 22 SUBTOTAL 83 65 Magnetic Camcorder 300,000 300,000 Disk drives 2,555,000 1,000,20 SUBTOTAL 2,855,000 1,300,200 TOTAL 3,277,440 1,412,632

25 8/31/2000Information Organization and Retrieval Internet Traffic -- Historical Nov ‘92 Apr ‘95 Dec 1996 = 1500Tb Dec 1997 = 3000Tb

26 8/31/2000Information Organization and Retrieval Internet Traffic Nov ‘92Apr ‘95

27 8/31/2000Information Organization and Retrieval Currently... There are an estimated 2.1 Billion pages on the Web –About 21 Terabytes –About 7500 further Terabytes in web-accessed DBs. 610 Billion email messages per year = 11285 TB Internet Traffic is doubling every 100 days - An estimated 62 Million Americans now use the internet (US Commerce Dept 1998) Radio took 38 years to get 50 M listeners, TV took 13 years, the Net took 4 years...

28 8/31/2000Information Organization and Retrieval Internet - Recent Statistics 5 M Level 2 Domains (NW June 1999) 43.2 Million Hosts (NW January 1999) 206/246 IP countries (NW July 1998) 300 Million Users (Newsbytes, Mar 2000) (830 Million Telephone Terminations) Source: Vint Cerf

29 8/31/2000Information Organization and Retrieval Internet Hosts (000s) 1989-2006 Source: Vint Cerf

30 8/31/2000Information Organization and Retrieval Projected Voice and Data Traffic Gb/s Source: America's Network, May 15, 1998

31 8/31/2000Information Organization and Retrieval Users on the Internet - May 1999 CAN/US - 90.65M Europe - 40.09M Asia/Pac - 26.97M Latin Am - 5.29M Africa - 1.14M Mid-east - 0.88 M --------------------------- Total - 165M Source: Vint Cerf

32 8/31/2000Information Organization and Retrieval Language Distribution of Web Content Source: Jack Xu: Excite

33 8/31/2000Information Organization and Retrieval Language Distribution on a 634 Million Web Pages Corpus

34 8/31/2000Information Organization and Retrieval Sources on Information, Computer, and Network Use http://www.sims.berkeley.edu/emc/ http://www.cs.cmu.edu/afs/cs.cmu.edu/user/bam/ www/numbers.html –Statistical snippets extracted from the news http://www.wcom.com/about_the_company/cerfs_ up/ –Vint Cerf’s pages http://www.firstmonday.dk/issues/issue3_10/coff man/index.html –The size and growth rate of the Internet by K.G. Coffman and Andrew Odlyzko

35 8/31/2000Information Organization and Retrieval Human Memory –Landauer 86: Human brain holds 200MB looked at rate of information intake and rate of forgetting, and amount of information adults need for normal tasks –6B people on earth implies total memory of all people alive about 1,200 petabytes –Another way: estimate that people take in a byte/sec lifetime 250,000 days or 2B sec result is 2 GB (doesn’t count synthesizing new info)

36 8/31/2000Information Organization and Retrieval Information Overload “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden)

37 8/31/2000Information Organization and Retrieval To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley

38 8/31/2000Information Organization and Retrieval Information Life Cycle Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

39 8/31/2000Information Organization and Retrieval Authoring/Modifying Converting Data+Information+Knowledge to New Information. Creating information from observation, thought. Editing and Publication. Gatekeeping

40 8/31/2000Information Organization and Retrieval Organizing/Indexing Collecting and Integrating information. Affects Data, Information and Metadata. “Metadata” Describes data and information. –More on this later. Organizing Information. –Types of organization? Indexing

41 8/31/2000Information Organization and Retrieval Storing/Retrieving Information Storage –How and Where is Information stored? Retrieving Information. –How is information recovered from storage –How to find needed information –Linked with Accessing/Filtering stage

42 8/31/2000Information Organization and Retrieval Distribution/Networking Transmission of information –How is information transmitted? Networks vs Broadcast.

43 8/31/2000Information Organization and Retrieval Accessing/Filtering Using the organization created in the O/I stage to: –Select desired (or relevant) information –Locate that information –Retrieve the information from its storage location (often via a network)

44 8/31/2000Information Organization and Retrieval Using/Creating Using Information. Transformation of Information to Knowledge. Knowledge to New Data and New Information.

45 8/31/2000Information Organization and Retrieval Key issues in this course How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs. –Retrieving How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them. –Organizing

46 8/31/2000Information Organization and Retrieval Key Issues Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

47 8/31/2000Information Organization and Retrieval Next Week Introduction to IR The search process


Download ppt "8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley."

Similar presentations


Ads by Google