Presentation on theme: "Towards an open library of relational metadata: the experience of RePEc (Research Papers in Economics) Thomas Krichel 2003-11-07."— Presentation transcript:
Towards an open library of relational metadata: the experience of RePEc (Research Papers in Economics) Thomas Krichel
who is me? I was an economist. I was a leisure digital librarian. –NetEcsince 1993 –RePEcsince 1997 I am "just another Perl hacker" I am a visionary –but I'm not like St. John the Baptist
who is he?
he is "St. IGNUicus" A humoristic creation of Richard M. Stallman (RMS) RMS is the father of the free software movement –a geek –a visionary St. IGNUicus shows an emphasis on the moral case for free software, rather than the business case
moral case and business case Other folks in the free software movement avoid the "f" word –free can mean cheap –cheap can mean bad They stress the business case of free software They use the term "open source software", (OSS)
RMS and us Amen, I tell you: we librarians need to learn more from the OSS movement. We need to make the concepts coming of free software more a part of our business. Let us look at a key concept: free software.
free software according to RMS Free software comes with four freedoms –The freedom to run the software, for any purpose –The freedom to study how the program works, and adapt it to your needs –The freedom to redistribute copies so you can help your neighbor –The freedom to improve the program, and release your improvements to the public, so that the whole community benefits
what has this to do with us? Just replace free software with free information. Libraries are about free information. But the analogy is not quite as simple. –When we talk about free information, we usually mean things that we can freely read (download…). free as in: $0 –We do not usually mean free information as information we are free to do things with. Free as in freedom.
moral and business There is a moral case for free information. –We rely on it. There is a business case for free information. –We need to make our own.
we rely on the moral case The citizen should be informed… Individuals in the organization should have free access… This is how we justify resources given to us. Often, members of the community who pay get privileged access.
from moral case to business case To form the business case for free information, think of "free information" as "freedom to do things" rather than $0. Thus libraries can make a crucial business case for them as agents who transform information. Recall that there are whole industries out there that produces free information.
Now for something different RePEc is an example for an Open Library. An Open Library is loosely defined an application of the OSS principles to libraries. –vague –in the making –but has some history Looking at RePEc will fix ideas.
History It started with me as a research assistant an in the Economics Department of Loughborough University of Technology in a predecessor of the Internet allowed me to download free software without effort but academic papers had to be gathered in a painful way
CoREJ published by HMSO –Photocopied lists of contents tables recently published economics journal received at the Department of Trade and Industry –Typed list of the recently received working papers received by the University of Warwick library The latter was the more interesting.
working papers early accounts of research findings published by economics departments –in universities –in research centers –in some government offices –in multinational administrations disseminated through exchange agreements important because of 4 year publishing delay
I planned to circulate the Warwick working paper list over listserv lists I argued it would be good for them –increase incentives to contribute –increase revenue for ILL After many trials, Warwick refused. During the end of that time, I was offered a lectureship, and decided to get working on my own collection.
1993: BibEc and WoPEc Fethy Mili of Université de Montréal had a good collection of papers and gave me his data. I put his bibliographic data on a gopher and called the service "BibEc" I also gathered the first ever online electronic working papers on a gopher and called the service "WoPEc".
NetEc consortium BibEcprinted papers WoPEcelectronic papers CodEcsoftware WebEcweb resource listings JokEcjokes HoPEc a lot of Ec!
WoPEc to RePEc WoPEc was a catalog record collection WoPEc remained largest web access point but getting contributions was tough In 1996 I wrote basic architecture for RePEc. –ReDIF –Guildford Protocol
1996: RePEc principle Many archives –archives offer metadata about digital objects (mainly working papers) One database –The data from all archives forms one single logical database despite the fact that it is held on different servers. Many services –users can access the data through many interfaces. –providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.
RePEc is based on 330+ archives WoPEc EconWPA DEGREE S-WoPEc NBER CEPR US Fed in Print IMF OECD MIT University of Surrey CO PAH
to form a 209k item dataset 119,000 working papers 87,000 journal articles 1,000 software components 600 book and chapter listings 3,500 author contact and publication listings 7,300 institutional contact listings
RePEc is used in many services BibEc and WoPEc Decomate Z39.50 service EconPapers NEP: New Economics Papers Inomics RePEc author service IDEAS RuPEc EDIRC LogEc
… describes documents Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Person: RePEc:per: :thomas_krichel Author- Author-Name: Paul Levine Author- Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: ftp://www.econ.surrey.ac.uk/ pub/RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: Revision-Date: Handle: RePEc:sur:surrec:9601
… describes institutions Template-Type: ReDIF-Institution 1.0 Primary-Name: University of Surrey Primary-Location: Guildford Secondary-Name: Department of Economics Secondary-Phone: (01483) Secondary- Secondary-Fax: (01483) Secondary-Postal: Guildford, Surrey GU2 5XH Secondary-Homepage: Handle: RePEc:edi:desuruk
what do open libraries do? Identify records Relate identified records These actions require human control. They prepare for assessment of performance.
key to success Have a small group of volunteers Disseminate as widely as possible Demonstrate to authors and institutions that it works for them. –institutional registration –author registration
institutional registration It started by one sad geezer making a list of departments that have a web site. I persuaded him that his data would be more widely used if integrated into the RePEc database. Now he is a happy geezer and one of our three crucial volunteers.
author registration It started when funding allowed us to hire a crazy programmer to write an author registration system. system went online as "HoPEc" in late has been renamed "RePEc author service" (RAS) recent grant from OSI allows for a rewrite and expansion.
RePEc author service RePEc document data has author names as strings. The authors register with RAS to list contact details and identify the papers they wrote. This is classic access control, but done by the authors. In a ranking of 800 most important economists, 400 are registered with RAS.
authors' incentives Authors perceive the registration as a way to achieve common advertising for their papers. Author records are used to aggregate usage logs across RePEc user services for all papers of an author. Stimulates a "I am bigger than you are" mentality. Size matters!
KEY idea 1 RePEc attracts a community of users and contributors The community itself is the focus of attention RePEc describes the living rather than the dead. Forget about documents!
KEY idea 2 Forget about users! Disseminate widely Users will come through Google anyway. And Google loves RePEc services –puts RePEc services top when the query consists of the name of an author
open library idea: serials data Serial level information is a crucial component of academic library data. Idea: build and maintain free serial records. Two ways to build: –Use volunteers and collect in a decentralized way. –Make an expensive central collection, disseminate well, charge $$$ for record changes later.
another open library idea: law Much of the legal texts are de jure free. De facto there are two companies who have comprehensive collections and charge a lot of money for the free information bundled with proprietary information. Our moral case calls for a replacement! (it will also create jobs for us)
free legal open library Have all laws and cases –online as text –identified & related Have citation metadata, so that legal citations can verified be while composing case data. Registration procedure to verify the integrity of data.
open library idea II: drugs Collect data on the composition of all drugs –drugs composition reported by drug companies, using open archives –drug components documented by the governments, using an open archive Open library brings the two together!
Am I crazy? Money does not make the world go round. Ideas do. When RMS proposed a free replacement for UNIX in the early 80s, most people dismissed the idea. Today it is reality! Similarly, when I started to work on RePEc a totally free and improved A&I dataset in 1993, nobody gave it a high probability to succeed. It is a reality!
obstacles to open libraries lack of imagination & entrepreneurship inability to form alliances user-centered thinking document-centered thinking technical competence required –OAI PMH –XML and XML Schema –Unicode the "C" word
what I do for open libraries Create an open library for library science: the rclis (reckless) dataset. Create a supporting organization: the open library society. co-workers welcome!