Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at.

Similar presentations

Presentation on theme: "Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at."— Presentation transcript:

1 Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at Urbana- Champaign 1


3 Please, not another how we do it good! It will be okay. Ive got a lot of good things to show you. And, hopefully advance the discuss on these important issues and and solutions. I will recount some of the successes we have had – and detail some of the mistakes we have made this past year or so. Good, quality, shareable metadata is so damn important 3

4 @ UIUC Library We have learned a lot through the various digitization projects we have been involved with including Open Content Alliance, Illinois Harvest project and starting Google Digitization Project the next few months. We have learned quite a bit about cataloging and metadata, access systems, search and metasearch, digital preservation, and better ways to make all this information findable. 4

5 Conversation about the ILS Today, Im wanting to have a dialogue with you about where we think we are concerning: online catalogs, metadata, other access options outside the library world, metasearch, and where do we go from here to offer better search for our users. We are at a critical stage. Our online catalog is not very good at allowing users to find what they seek. There are better options 5

6 6

7 Problems with our online catalogs Search is difficult Does not include many of the available resources in our collections, including images, digital collections, many of our electronic resources, archival materials Our metadata does not include much of the pertinent information needed to make a judgment about a resource Our metadata is hard to discern 7

8 8

9 Better Options than our Online CATS Im almost at a point where Id advise our users at University of Illinois at Urbana- Champaign to begin there outside our online catalog (particularly Microsoft Live Book Search, Amazon, and Google Book Search) Then after she or he get their results, come back and search our catalog to see if we have it (either digital or print version) Does not make me very happy to say that. 9

10 Why search elsewhere first? User can evaluate resource much better through search at Microsoft, Amazon and Google Information such as table of contents, indices, bibliographies, cover images, cover data, summaries, biographies, reviews, etc make it easier to determine if resource helps one research or not 10

11 11

12 Key Phrases – Amazons CAPs and SIPs 12

13 13

14 Amazons CAPs and SIPs Capitalized Phrases (CAPs) are people, places, events, or important topics mentioned frequently in a book. Statistically Improbable Phrases (SIPs) are the most distinctive phrases in the text of books in Amazons Search Inside the Book. To identify SIPs, they scan the text of all books in the Search Inside program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. 14

15 Machiavelli in Amazon 15

16 CAPS, Search in the Book, etc. 16

17 Interlinking of Citations 17

18 Amazons Inside the Book 18

19 Microsoft Live Books 19

20 Microsoft Live – Inside Book - Index 20

21 Microsoft Live Books - Bibliographies 21

22 Google Book Search 22

23 Google Book Search – Full text content 23

24 Google – Publisher supplied 24

25 Google – TOCs, Summaries 25

26 Google – References from books, articles, related books 26

27 Googles Metadata Records 27

28 Googles Metadata Records (continued) 28

29 Multiple sources of data Amazon, Microsoft, and Google are getting this data from various sources including from publishers, vendors such as Bowker, digitization of materials, and harvesting metadata from evaluative sources. Millions of full-text or partial full-text content 29

30 30

31 Still far behind in breadth of collection Amazon, Google and Microsoft still dont have it right. When we do a search, we are searching everything. If you do a search in Microsoft, it is searching across the entire body of full-text content. It is hard to do an advanced search of title, author, series title, publisher, etc. They do not have the breadth of titles or sources we have or OCLC WorldCat has. We have a couple hundred years of collecting on them. In 5 to 6 years, yes, they probably will. Eventually, may be able to search across 60 million full-text resources. 31

32 Why Amazon, Microsoft, Google? Why am I showing what Amazon, Microsoft and Google are doing in regards to search? To make us all feel bad. Maybe. Just a little. Really to show alternatives to our online catalogs. What is out there. But also to show us some of the opportunities, how we can do better. Central to this is metadata – creating surrogate records that help lead users to what they are want 32

33 UIUC work with Open Content Alliance 33

34 Examples of digitized books 34

35 Downloading of resources 35

36 The present 36

37 NCSU Endeca Catalog 37

38 Vanderbilts Primo 38

39 Vanderbilt Primo title level 39

40 Oklahoma States Aquabrowser 40

41 Title level - Aquabrowser 41

42 Aquabrowser – Searchable TOCs and Summaries 42

43 UIUC Various Access Systems Voyager ILS system CONTENTdm – digital images Dspace – IDEALS, Illinois Institutional Repository DLXS – digital text Olive – Newspapers and Serials Online Research Resources (ORR) – local electronic resources management system Discover/SFX OpenURL knowledge base 43

44 Metasearch – Is it the answer? 44

45 UIUCs Information Gateway 45

46 Easy Search Results (metasearch) 46

47 Illinois Harvest – metasearch across formats from OAI Harvesting 47

48 Illinois Harvest - results with images, learning objects, digitized books, and streaming audio 48

49 Positives They are pulling in metadata from multiple sources, including the publishers, intermediate vendors and from digitization projects They are adding value such as Google maps and textual analysis We are still cataloging for a surrogate record environment and we have got to move beyond that quickly. We do not have the metadata structures to pull in and incorporate much of the data that is out there. The metadata that Amazon, Microsoft and Google are bringing to bear. 49

50 Possibilities We have access to the same sources of metadata. We can get ONIX feeds from publishers. We can harvest table of contents, indexes and bibliographies from the works we are digitizing. We can add cover images, book reviews, summaries and abstracts. We can crunch data and performing datamining as well as they can With the help of OCLC, we can layer such applications as WorldCat Identities and authority control on top of all this. 50

51 WorldCat Identities 51

52 WorldCat Identities - Machiavelli 52

53 WorldCat Identities Display 53

54 Identities - Continued 54

55 55

56 Metadata MARC records still have a role to play. Cannot be the only game in town anymore. It is not a flexible enough structure or standard to accommodate researchers need, especially with the technological opportunities we have today. It cannot accommodate much of the data we need to produce interconnectivity (linking) between resources 56

57 MARC – Where are we at now? Libraries – we still do most of our cataloging in MARC Other viable schemas – Dublin Core (both Simple and Qualified), MODS, MARCXML Preservation metadata schemas (such as PREMIS) Content standards (such as AACR2 and CCO) Controlled vocabularies (such as LCSH, TGN, AAT and other applicable vocabularies) Transmission standards such as METS 57

58 ONIX (Online Information Exchange) ONIX is a standard format that publishers use to distribute electronic information about their books to wholesale, e-tail and retail booksellers, and other publishers. Standard XML template for organizing data storage 58

59 Metadata Encoding & Transmission Standard (METS) The METS schema provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata. Provide a useful standard for the exchange of digital library objects between repositories. METS provides the ability to associate a digital object with behaviours or services. 59

60 Interconnectivity We can start to create the search environment that allows one to move from citation to full-text content to other works about or cited within a work continue to next full-text resource Each year over the next 7 years, we will be able to move from full-text content to full-text content Moving from bibliography to bibliography, citation to citation; OpenURL can show us the way 60

61 Automating Metadata Generation Im the chair of the Automating Metadata Generation Task Force formed by the ALCTS Big Heads of Technical Services and we will have a white paper out this fall outlining the capabilities and possibilities of automating the creation of metadata records. And, yes, we can automate many of our processes for creating metadata.

62 Our structures and standards cannot support this presently Cant fit a lot of this data into a MARC record No real standards for indexes, table of contents, citations, bibliographies. The mark-up languages can accommodate this. To easily pull these valuable data from a resource, need to be able to easily identify and harvest Can get this data from publishers for recent publications and pull from digitization projects for older materials Pull together using metadata record, ONIX and METS wrapper 62

63 New Systems Need system that can read MARC and XML or has the ability to easily convert MARC to MARCXML Allows search across surrogate records and full-text content Relevancy ranking User can easily discern different formats pulled in through metasearch (monographs, articles, images, datasets, citations, etc.) Strong structured search and also powerful keyword indexing Easy to determine how best to get this piece of information (i.e. Open WorldCat) 63

64 New Systems (continued) Ability to harvesting data from multiple sources Ability to keep this data current and accurate Ability to track changes to this data, ensuring we always keep the best Have to automate a lot of these processes Technologies exist to allow us to do it Collaboration 64

Download ppt "Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at."

Similar presentations

Ads by Google