Presentation is loading. Please wait.

Presentation is loading. Please wait.

OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005.

Similar presentations


Presentation on theme: "OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005."— Presentation transcript:

1 OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

2 What is OAIster?  Is/was a means for UM to test the OAI protocol… (hence the name)  A method for sharing metadata among institutions and groups of people  A means of developing a search service for end-users worldwide

3 Basics of OAI

4 What does OAIster collect?  Harvests all metadata from all OAI data providers (within reason)  Only keeps metadata that points to digital objects, e.g., articles, photographs, datasets, etc. in digitized form  All available via search service…

5 Searching OAIster  Time to show off OAIster…  http://www.oaister.org/

6 A little history  Service is now 3.5 years old  Started with 66 data providers and a little over 200K records  Now have 572 data providers and “a little” over 6 million records  37% US, 63% international

7 Visibility of OAI  Surprising who hasn’t made their metadata shareable through OAI  Harvard, Yale, Stanford…the big ones  Initially perplexing, but now clearer:  always done at the end  only recently thought of at initiation of projects  truthfully, many institutions not collaborative…

8 Examples of data providers  Many data providers are huge, e.g.,  arXiv: physics preprint and postprint articles  pubmed: medical articles, although restricted  pictureaustralia: images from govt and academic institutions in Australia  lcoa: Library of Congress digital archives  usc: U South California census data

9 Examples of data providers  Most are small, though  Many around 100 records  Value of making their records available  increased visibility  inclusion in bigger search service than theirs  incorporation in Yahoo! Search

10 Yahoo! Search  Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing  e.g., “gardens at albury” in Yahoo! Search  know it’s not static html roboting  IspartOf Victorian Railways collection.  IspartOf Victorian Railways collection.  Many, many more hits  Also send metadata to Google

11 System design UM harvester Record storage XSLT transformation tool BibClass indexes OAI-enabled DC records Non-OAI- enabled DC records XSL stylesheets (per source type) Search interface (XPAT)

12 Transformation of metadata  Most metadata needs to be brushed off  adding an http:// to the front of URLs  Or raked  removing instances of <![CDATA[  Or wrung out  instead of “Where’s Waldo,” it’s “Where’s the incorrect UTF-8 character?”  And should be normalized…

13 Why normalize?  Sample date values <date>2-12-01</date><date>2002-01-01</date><date>0000-00-00</date><date>1822</date> between 1827 and 1833 between 1827 and 1833 <date>18--?</date> November 13, 1947 November 13, 1947 SEP 1958 SEP 1958 235 bce 235 bce Summer, 1948 Summer, 1948

14 Why use a CV?  Sample subject values <subject>30,51,52</subject> 1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson]. 1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson]. Slavery--United States--Controversial literature Slavery--United States--Controversial literature view of interior with John Henry sculpture view of interior with John Henry sculpture Particles (Nuclear physics) -- Research. Particles (Nuclear physics) -- Research.

15 Best practices  Fixing more than half of the data providers is cumbersome  Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do  http://oai-best.comm.nsdl.org/cgi- bin/wiki.pl?TableOfContents

16 2nd phase OAI  “Best Practices” group sponsored by the Digital Library Federation, which also…  Sponsors our latest grant  Better and more easily calculated statistics  Search interface improvements  Clustering / classification techniques  Using richer metadata

17 Clustering / classification  Using automated means to take a selection of metadata and determine “what it’s about”  Working with Emory University (one of our grant partners) to test their tool  Results will be integrated into search so can search in smaller group of OAIster records

18 Using richer metadata  Data providers must use simple Dublin Core  Very sparse schema for describing objects  dc:title must contain main title, sorted title and alternative titles  dc:subject doesn’t distinguish between geographical, hierarchical, temporal…

19 Using richer metadata  Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC  Developed testbed for grant deliverables  currently only shows MODS work…  http://www.hti.umich.edu/m/mods/

20 Other stuff  Well, make it smaller somehow…  Clean up Boolean interface  squinch fields together  include more normalization  Make it available through federated search  Proselytize sharing metadata  Test, test, test

21 Contact me  Kat Hagedorn  UM Library Information Technology  khage@umich.edu  www.oaister.org


Download ppt "OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005."

Similar presentations


Ads by Google