RePEc, a case to illustrate the evolution and future trends of repositories and open access Thomas Krichel 2007-01-15.

RePEc: a early example of an open library Thomas Krichel

1 RePEc, a case to illustrate the evolution and future trends of repositories and open access Thomas Krichel 2007-01-15

2 Acknowledgment I am grateful to Thomas Baiget for organizing today's event. And for his hospitality here in Barcelona. And for the title –It is not quite accurate to my current talk –I will try to adjust

3 open access Open access is very fashionable. It is essentially about publishing –don't charge the reader (with money) –charge the author (or provider) My work is not primarily about open access. It's more about open libaries.

4 Open libraries An Open Library is loosely defined an application of the open source software principles to library data. –vague –in the making –but has some history My main work has been on open libaries for scholarly communication.

5 Open libraries and scholarly communication RePEc is an example for an Open Library. Looking at RePEc will fix ideas. It is an open library –for scholarly communication –in Economics.

6 scholarly communication Is mainly about scholars communicating –between themselves –to students, occasionally Thus it is essentially a community activity. Traditionally, there have been two intermediaries acting as external agents. –libraries –publishers

7 when tradition ends Two external shock –There comes the Internet and reduces distribution costs to zero –There comes computer technology and reduces storage costs somewhat opportunity sets of community members and external agents increases Proposition: the future depends much on what the community members decide. External agents have little impact.

8 discipline communities Scholars of various disciplines have varying habits of research, publication, and evaluation It is likely that the Internet will emphasize those differences rather than reducing them.

9 examples: disciplines with established informal publishing Preprint communities –Physics –Mathematics, partially Working paper communities –Computer Science CiteSeer (working paper disappearing) –Economics RePEc

10 change is tough Change has to come inside the discipline. There has to come a pioneering individual who –is technically well versed –is managerially smart –has extraordinary forward thinking –is willing to take considerable risk with her career Ginsparg, Krichel, Giles & Lawrence are rare

11 RePEc History It started with me as a research assistant an in the Economics Department of Loughborough University of Technology in 1990. a predecessor of the Internet allowed me to download free software without effort but academic papers had to be gathered in a painful way

12 CoREJ published by HMSO –Photocopied lists of contents tables recently published economics journal received at the Department of Trade and Industry –Typed list of the recently received working papers received by the University of Warwick library The latter was the more interesting.

13 working papers early accounts of research findings published by economics departments –in universities –in research centers –in some government offices –in multinational administrations disseminated through exchange agreements important because of 4 year publishing delay

14 1991-1992 I planned to circulate the Warwick working paper list over listserv lists I argued it would be good for them –increase incentives to contribute –increase revenue for ILL After many trials, Warwick refused. During the end of that time, I was offered a lectureship, and decided to get working on my own collection.

15 1993: BibEc and WoPEc Fethy Mili of Université de Montréal had a good collection of papers and gave me his data. I put his bibliographic data on a gopher and called the service "BibEc" I also gathered the first ever online electronic working papers on a gopher and called the service "WoPEc".

16 NetEc consortium BibEcprinted papers WoPEcelectronic papers CodEcsoftware WebEcweb resource listings JokEcjokes HoPEc a lot of Ec!

17 WoPEc to RePEc WoPEc was a catalog record collection WoPEc remained largest web access point but getting contributions was tough In 1996 I wrote basic architecture for RePEc. –ReDIF –Guildford Protocol

18 1997: RePEc principle Many archives –archives offer metadata about digital objects (mainly working papers) One database –The data from all archives forms one single logical database despite the fact that it is held on different servers. Many services –users can access the data through many interfaces. –providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.

19 RePEc is based on 670+ archives WoPEc EconWPA DEGREE S-WoPEc NBER CEPR Blackwell US Fed in Print IMF OECD MIT University of Surrey CO PAH Elsevier

20 to form a 451k item dataset 199,000 working papers 247,000 journal articles 1,450 software components 2,600 book and chapter listings 11,600 author contact and publication listings 10,100 institutional contact listings

21 RePEc is used in many services Econpapers Decomate Z39.50 service NEP: New Economics Papers Inomics RePEc author service IDEAS RuPEc EDIRC LogEc CitEc

22 … describes documents Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Person: RePEc:per:1965-06- 05:thomas_krichel Author-Email: Author-Name: Paul Levine Author-Email: Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: pub/RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: 199603 Revision-Date: 199711 Handle: RePEc:sur:surrec:9601

23 … describes persons (RAS) template-type: ReDIF-Person 1.0 name-full: MANKIW, N. GREGORY name-last: MANKIW name-first: N. GREGORY handle: RePEc:per:1984-06-16:N__GREGORY_MANKIW email: homepage: mankiw/mankiw.html workplace-institution: RePEc:edi:deharus workplace-institution: RePEc:edi:nberrus Author-Article: RePEc:aea:aecrev:v:76:y:1986:i:4:p:676-91 Author-Article: RePEc:aea:aecrev:v:77:y:1987:i:3:p:358-74 Author-Article: RePEc:aea:aecrev:v:78:y:1988:i:2:p:173-77 ….

25 what do open libraries do? Identify records Relate identified records These actions require human control. They prepare for assessment of performance.

26 key to success Have a small group of volunteers Disseminate as widely as possible Demonstrate to authors and institutions that it works for them. –institutional registration –author registration

27 institutional registration It started by one sad geezer making a list of departments that have a web site. I persuaded him that his data would be more widely used if integrated into the RePEc database. Now he is a happy geezer and one of our three crucial volunteers.

28 RePEc author service RePEc document data has author names as strings. The authors register with RAS to list contact details and identify the papers they wrote. This is classic access control, but done by the authors.

29 author registration It started when funding allowed us to hire a crazy programmer to write an author registration system. The system went online as "HoPEc" in late 2000. It has been renamed "RePEc author service" (RAS) A recent grant from OSI allows for a rewrite and expansion.


31 LogEc It is a service by Sune Karlsson that tracks usage of items in the RePEc database –abstract views –downloads There is mail that is sent by Christian Zimmermann to –archive maintainers –RAS registrants that contains a monthly usage summary.

32 authors' incentives Authors perceive the registration as a way to achieve common advertising for their papers. Author records are used to aggregate usage logs across RePEc user services for all papers of an author. Stimulates a "I am bigger than you are" mentality. Size matters!

33 recently In 2004, Peter Jasco compared RePEc services with the EconLit proprietary professional database. –IDEAS and LogEc were Peters pick –EconLit was Peters pan. He slammed the working paper coverage of EconLit. He could have slammed other things.

34 RePEc / EconLit partnership RePEc now delivers all its working paper data to EconLit, without getting the journal data of EconLit in return. This may seem absolutely perverse! A bunch of volunteers laboring for a multi- million $$$ concern! In fact it serves RePEc well because it adds officialdom.

35 partnership with library officialdom Recently, the Zentralbibliothek fuer Wirtschaftswissenschaften (ZBW) have become interested in working with RePEc. The aim is to build a new dataset called LoTEc for the moment. LoT means Long Term, but could also mean a lot.

36 automated full-text collection The idea behind LoTEc is to build a system that collects full-text from metadata. –take title and author data –search Google for it –examine search results if the could be the full text habarovsk.pdf is a funding application with more ideas about such systems.

37 konz project The konz project software is a prototype for LoTEc –uses DBLP –title searches only –limited to PDF / MS office full-text Results of konz are thin. only 5% of full- text found. Probably the fault of bad programming (by myself).

38 will LoTEc do better? Recent work on a small sample of 460 papers by Bergstrom finds a self-archiving ratio of about 80% for top-level recent research papers. There are chances that we will find these if we can improve over the konz software. Initial tests are now running.

39 whats good a paper in an archive? Unless we have the authors permission, we can not make a paper available in archived form (on our server). All we could do is make metadata available. To gather permissions from the authors the plan is to use the RePEc author service in stage four of the ACIS project.

40 validation / permissions interface For a paper that an author has written, RAS would present a set of potential full- text files and ask –is this a full-text of this paper? –can we make an archived copy of this paper available for public use? This interface is supposed to be set up during 2007.

41 KEY idea 1 RePEc attracts a community of users and contributors The community itself is the focus of attention RePEc describes the living rather than the dead. Forget about documents!

42 KEY idea 2 Forget about users! Disseminate widely Users will come through Google anyway. And Google loves RePEc services –puts RePEc services top when the query consists of the name of an author

43 obstacles to open libraries lack of imagination & entrepreneurship inability to form alliances user-centered thinking document-centered thinking technical competence required –OAI PMH –XML and XML Schema –Unicode the "C" word

44 Thank you for your attention!

