Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis Thomas Krichel 2005-06-01.

Slides:



Advertisements
Similar presentations
Zetoc.mimas.ac.uk Zetoc Electronic Table of Contents from the British Library Zetoc Support.
Advertisements

EPrints - Introducing EPrints 3 Software William J Nixon Digital Library Development Manager, University of Glasgow With many thanks to Les Carr and the.
1 of 16 Information Access The External Information Providers © FAO 2005 IMARK Investing in Information for Development Information Access The External.
Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Institutional Repositories an opportunity for IAMSLIC Pauline Simpson Southampton Oceanography Centre, University of Southampton, UK
Open Archives and Free Online Scholarship Thomas Krichel (RePEc & Long Island University) Simeon M. Warner (ArXiv & Cornell University)
Towards an open library of relational metadata: the experience of RePEc (Research Papers in Economics) Thomas Krichel
From RePEc to 3lib. the long march for free bibliographic data Thomas Krichel
Digital scholarly communication in Economics: from NetEc to RePEc Thomas Krichel work partly sponsored by the Joint Information.
Acknowledgements Ellen Fischer for her hospitality. Michael Heinz for organizing the seminar.
The RePEc model for the academic digital library Thomas Krichel work partly sponsored by the Joint Information Systems.
RePEc, a digital commons for economics Thomas Krichel
Что делать? Thomas Krichel
Distributed Current Awareness Services Thomas Krichel
RePEc, a case to illustrate the evolution and future trends of repositories and open access Thomas Krichel
RePEc: a public-access database that promotes scholarly communication in Economics Thomas Krichel
Designing for the Discipline: Open Libraries and Scholarly Communication Thomas Krichel
Rclis in vision and reality Thomas Krichel
RePEc and OLS Thomas Krichel prepared for the first retreat for disciplinary repositories Monterey
RePEc: An Open Library for Economics Thomas Krichel Work partly supported by the Joint Information Systems Committee of.
Transforming scholarly communities with open libraries Thomas Krichel
RePEc as frontier repository, the business model and what it means to survive as network in a more and more web-collaborative academia and a developing.
Bringing scholarly communication in kicking and screaming into the Internet age Thomas Krichel
Current Awareness in a Large Digital Library José Manuel Barrueco Cruz Thomas Krichel Jeremiah Trinidad.
Bringing scholarly communication in Economics kicking and screaming into the Internet age: NetEc, RePEc and more to come Thomas Krichel
Disintermediation of Academic Publishing through the Internet: An Intermediate Report from the Front Line Thomas Krichel
Information policy issues in RePEc Thomas Krichel
Open Archives and Open Libraries Thomas Krichel
RePEc: a early example of an open library Thomas Krichel
The future of scholarly communication in Economics Thomas Krichel work partly sponsored by the Joint Information Systems.
Academic self-organization on the Internet. The example of RePEc Thomas Krichel
Document data & personal data Thomas Krichel Long Island University & Novosibirsk State University
New Century, New Metadata Thomas Krichel University of Surrey, Hitotsubashi University and Long Island University.
How to become an 800 pound gorilla: the case of RePEc. Thomas Krichel 2008–10–29.
Use your bean. Count it. Thomas Krichel
My life and times Thomas Krichel LIU & НГУ
Four slides for the future Thomas Krichel given at 4 th International Socionet seminar Novosibirsk
Free author registration Thomas Krichel LIU & НГУ
LIS510 lecture 0 Thomas Krichel feeling nervous? So am I. It is my second time. Overall approach –I follow what has been done before. –I am.
Electronic Library and Information Resources Introduction and overview.
Creating Institutional Repositories Stephen Pinfield.
Enlighten: Glasgows Universitys online institutional repository Morag Greig University Library.
Building Repositories of eprints in UK Research Universities Bill Hubbard SHERPA Project Manager University of Nottingham.
Richard Jones The Edinburgh Research Archive The Edinburgh Research Archive: ERA Institutional Repository Theses & Dissertations Conference Papers/Posters.
How the University Library can help you with your term paper
Queensland University of Technology CRICOS No J How can a Repository Contribute to University Success? APSR - The Successful Repository June 29,
Where I am coming from Thomas Krichel
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
DAEDALUS Project William J Nixon Service Development Susan Ashworth Advocacy.
Research evaluation requirements José Manuel Barrueco Universitat de València (SPAIN) Servei de Biblioteques i Documentació May, 2011.
Preprint publication and knowledge organization in Economics Sune Karlsson Stockholm School of Economics.
Thomas HeckeleiPublishing and Writing in Agricultural Economics1 Publishing and Writing in Agricultural Economics Promotionskolleg Agrarökonomik 1Introduction.
Building a discipline-specific aggregate for computing and library and information science Thomas Krichel Long Island University, NY, USA
LIS618 lecture 0 Thomas Krichel Organization homepage Contents to be discussed today. Send mail.
Introduction to LIS508 Thomas Krichel
Economists Online researchers and libraries collaborate. A subject-specific service model. Benoit Pauwels Université Libre de Bruxelles.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
Queensland University of Technology CRICOS No J HOW RESEARCHERS FIND INFORMATION IN THE NEW DIGITAL AGE Gaynor Austen Director, Library Services.
Greater Visibility, Greater Access QSpace QSpace Queen’s University Research & Learning Repository.
DAEDALUS Project William J Nixon Service Development Susan Ashworth Advocacy.
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
Merit JISC Collections Merit: presentation for UKCORR Hugh Look, Project Director.
Quality Control in RePEc ... why it is so hard?
Zetoc: Electronic Table of Contents from the British Library
The RePEc database about Economics
Thomas Krichel Long Island University, NY, USA
Zetoc: Electronic Table of Contents from the British Library
RcLIS towards a Digital Library for Information Science
….part of the OSU Libraries' suite of digital library tools…
Presentation transcript:

Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis Thomas Krichel

who is me? I was an economist. I was a leisure digital librarian. –NetEcsince 1993 –RePEcsince 1997 I am "just another Perl hacker" I am a visionary –but I'm not like St. John the Baptist

who is he?

he is "St. IGNUicus" A humoristic creation of Richard M. Stallman (RMS) RMS is the father of the free software movement –a geek –a visionary St. IGNUicus shows an emphasis on the moral case for free software, rather than the business case

moral case and business case Other folks in the free software movement avoid the "f" word –free can mean cheap –cheap can mean bad They stress the business case of free software They use the term "open source software", (OSS)

RMS and us Amen, I tell you: we librarians need to learn more from the OSS movement. We need to make the concepts coming of free software more a part of our business. Let us look at a key concept: free software.

free software according to RMS Free software comes with four freedoms –The freedom to run the software, for any purpose –The freedom to study how the program works, and adapt it to your needs –The freedom to redistribute copies so you can help your neighbor –The freedom to improve the program, and release your improvements to the public, so that the whole community benefits

what has this to do with us? Just replace free software with free information. Libraries are about free information. But the analogy is not quite as simple. –When we talk about free information, we usually mean things that we can freely read (download…). free as in: $0 –We do not usually mean free information as information we are free to do things with. Free as in freedom.

moral and business There is a moral case for free information. –We rely on it. There is a business case for free information. –We need to make our own.

we rely on the moral case The citizen should be informed… Individuals in the organization should have free access… This is how we justify resources given to us. Often, members of the community who pay get privileged access.

from moral case to business case To form the business case for free information, think of "free information" as "freedom to do things" rather than $0. Thus libraries can make a crucial business case for them as agents who transform information. Recall that there are whole industries out there that produces free information.

Now for something different RePEc is an example for an Open Library. An Open Library is loosely defined an application of the OSS principles to libraries. –vague –in the making –but has some history Looking at RePEc will fix ideas.

History It started with me as a research assistant an in the Economics Department of Loughborough University of Technology in a predecessor of the Internet allowed me to download free software without effort but academic papers had to be gathered in a painful way

CoREJ published by HMSO –Photocopied lists of contents tables recently published economics journal received at the Department of Trade and Industry –Typed list of the recently received working papers received by the University of Warwick library The latter was the more interesting.

working papers early accounts of research findings published by economics departments –in universities –in research centers –in some government offices –in multinational administrations disseminated through exchange agreements important because of 4 year publishing delay

I planned to circulate the Warwick working paper list over listserv lists I argued it would be good for them –increase incentives to contribute –increase revenue for ILL After many trials, Warwick refused. During the end of that time, I was offered a lectureship, and decided to get working on my own collection.

1993: BibEc and WoPEc Fethy Mili of Université de Montréal had a good collection of papers and gave me his data. I put his bibliographic data on a gopher and called the service "BibEc" I also gathered the first ever online electronic working papers on a gopher and called the service "WoPEc".

NetEc consortium BibEcprinted papers WoPEcelectronic papers CodEcsoftware WebEcweb resource listings JokEcjokes HoPEc a lot of Ec!

WoPEc to RePEc WoPEc was a catalog record collection WoPEc remained largest web access point but getting contributions was tough In 1996 I wrote basic architecture for RePEc. –ReDIF –Guildford Protocol

creation of RePEc It came about when I finally got one other partner, the Dutch DEGREE project, a library- lead consortium for working paper publication. I also had a contact in Sweden called Sune Karlsson for whom I was instrumental in securing funding for a Swedish version of WoPEc called S-WoPEc. I put together a protocol that would allow us to work together.

1997: RePEc principle Many archives –archives offer metadata about digital objects (mainly working papers) One database –The data from all archives forms one single logical database despite the fact that it is held on different servers. Many services –users can access the data through many interfaces. –providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.

RePEc is based on 440+ archives WoPEc EconWPA DEGREE S-WoPEc NBER CEPR US Fed in Print IMF OECD MIT University of Surrey CO PAH

to form a 300+k item dataset 146,000 working papers 154,000 journal articles 1,600 software components 900 book and chapter listings 6,400 author contact and publication listings 8,400 institutional contact listings

RePEc is used in many services EconPapers NEP: New Economics Papers Inomics RePEc author service Z39.50 service by the DEGREE partners IDEAS RuPEc EDIRC LogEc CitEc My concern is NEP, a human mediated current awareness service for RePEc. This could be the subject of a more academic talk…

… describes documents Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Person: RePEc:per: :thomas_krichel Author- Author-Name: Paul Levine Author- Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: ftp:// pub/RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: Revision-Date: Handle: RePEc:sur:surrec:9601

… describes persons (RAS) template-type: ReDIF-Person 1.0 name-full: MANKIW, N. GREGORY name-last: MANKIW name-first: N. GREGORY handle: RePEc:per: :N__GREGORY_MANKIW homepage: mankiw/mankiw.html workplace-institution: RePEc:edi:deharus workplace-institution: RePEc:edi:nberrus Author-Article: RePEc:aea:aecrev:v:76:y:1986:i:4:p: Author-Article: RePEc:aea:aecrev:v:77:y:1987:i:3:p: Author-Article: RePEc:aea:aecrev:v:78:y:1988:i:2:p: ….

… describes institutions Template-Type: ReDIF-Institution 1.0 Primary-Name: University of Surrey Primary-Location: Guildford Secondary-Name: Department of Economics Secondary-Phone: (01483) Secondary- Secondary-Fax: (01483) Secondary-Postal: Guildford, Surrey GU2 5XH Secondary-Homepage: Handle: RePEc:edi:desuruk

institutional registration This works through a system called EDIRC. Christian Zimmermann started it as a list of departments that have a web site. I persuaded him that his data would be more widely used if integrated into the RePEc database. Now he is a crucial RePEc leader.

author registration It started when funding allowed us to hire a student programmer to write an author registration system. The system went online as "HoPEc" in late It has been renamed "RePEc author service" (RAS) In 2002 grant from OSI allows for a rewrite and expansion.

RePEc author service RePEc document data has author names as strings. The authors register with RAS to list contact details and identify the papers they wrote. This is classic access control, but done by the authors. Currently one in three items in RePEc has at least one identified author

LogEc It is a service by Sune Karlsson that tracks usage of items in the RePEc database –abstract views –downloads There is mail that is sent by Christian Zimmermann to –archive maintainers –RAS registrants that contains a monthly usage summary.

authors' incentives Authors perceive the registration as a way to achieve common advertising for their papers. Author records are used to aggregate usage logs across RePEc user services for all papers of an author. Stimulates a "I am bigger than you are" mentality. Size matters!

recently In 2004, Peter Jasco compared RePEc services with the EconLit proprietary professional database. –IDEAS and LogEc were Peters pick –EconLit was Peters pan. He slammed the working paper coverage of EconLit. He could have slammed other things.

RePEc / EconLit partnership RePEc now delivers all its working paper data to EconLit, without getting the journal data of EconLit in return. This may seem absolutely perverse! A bunch of volunteers laboring for a multi- million $$$ concern! In fact it serves RePEc well because it adds officialdom.

summary until here We are talking about an open library as a collaboratory for the creation of large bibliographic aggregates. Thus we are mainly about the supply of data, rather than of services. This is one limitation. I will come to this later. RePEc only only works for Economics. This is another limitation. I will talk about this now.

scholarly communication is mainly about scholars communicating –between themselves –to students, occasionally thus it is essentially a community activity traditionally, there have been two intermediaries acting as external agents. –libraries –publishers

when tradition ends Two external shock –There comes the Internet and reduces distribution costs to zero –There comes computer technology and reduces storage costs somewhat opportunity sets of community members and external agents increases Proposition: the future depends much on what the community members decide. External agents have little impact.

discipline communities Scholars of various disciplines have varying habits of research, publication, and evaluation It is likely that the Internet will emphasize those differences rather than reducing them.

examples: disciplines with established informal publishing Preprint communities –Physics arxiv.org –Mathematics arxiv.org, partially Working paper communities –Computer Science CiteSeer (working paper disappearing) –Economics RePEc

change is tough Change has to come inside the discipline. There has to come a pioneering individual who –is technically well versed –is managerially smart –has extraordinary forward thinking –is willing to take considerable risk with her career Ginsparg, Krichel, Giles & Lawrence are rare

and what about libraries? Libraries do it systematically wrong –concentrate on access –concentrate on readers –concentrate on documents They need to –move from access to impact –move from the reader to the writer –move from documents to people

example: the institutional repository The name as attractive as a prison toilet They have been set up in many universities but remaining empty They imply a top-down, Stalin-style centralization They are resisted as any interference with departmental affairs by administration They set up for general purposes, and ends up pleasing nobody.

despite that: minimal communality Every discipline has some form of more informal communication. Many times they are conferences. Every discipline needs some formal evaluation –peer-review –overall personal review This can not be done by computer and needs human input.

rclis rclis stands for Research in Computing and Library and Information Science. It is pronounced as reckless. It is a RePEc clone. My attempt to show that the same ideas that propel RePEc also can work in that area.

technical innovation RePEc is built on attribute: value templates. rclis is built on a purpose built format called the Academic Metadata Format. I set up this format. It is tailor-made to suit the needs of rclis and RePEc. There is some usage of AMF in RePEc –RePEc OAI interface –ernad, the software feeding NEP

E-LIS It is the largest LIS eprint archive on this planet. It lives at It contains over 2000 papers. It runs in Italy but uses a system of national editors to feed in material.

DoIS DoIS is a service based on a Spanish LIS bibliography. It used to run at Manchester computing but moved to when, because of JISC regulations, we had to move from there. It contains 13k records, 9k with free full text, but the data has many errors.

using already existing resources There is already a very large computer science bibliography called DBLP, see trier.de The data has no abstracts. It has some full-text links, mainly to toll-gated sites. I have done work to convert parts of it to AMF. I am now searching if free full text versions of the papers exist anywhere on the Web. This is the Konz project.

the Konz project Current state –I use Google API to search of titles. –I examine responses and download pages. –I scan the pages for PDF and Word files. –I examine the text in the file to find the title. Limitations –pdf and word full text –conference paper data still being processed –significant hardware and disk problems.

Khabarovsk proposal There is a generic possibility of building full-text links out of bibliographic records using search engines. The authoritative bibliographic record can be used as a container to hold other objects that have a relationship to the paper –full-text instance –display page –comment –cv of author etc… See khabarovsk.pdf

DoCIS Konz currently finds 25k papers with free versions out of the paper out of a 98k searched. Not particularly exiting. This data is integrated with DBLP AMF data and the result forms a new service called DoCIS. DoCIS lives at

DoCIS service DoCIS is implemented in mod_perl with swish++ and therefore very fast. The web pages are written by XSLT scripts directly from the AMF data. The service is available to copy from the web, I am more than happy to run it on other sites. But the most interesting thing are the service principles.

construction transparency DoCIS is an open digital library service because it allows users to inspect exactly how the service runs –DoCIS is built using open source software. –There is a special interface that allows to see almost all internal file. Non visible files are specially documented. The hope is that it may be used for teaching purposes.

transportability Everything in DoCIS is built is such a way that it should be easy to move the service somewhere else and establish copies. The ideas may not make a lot of technical sense but it should increase to non- proprietary nature of the system. Note that this has not been tested.

usage transparency All usage is logged and the logs are made public. This it is hoped that it could be used for digital library research. Ways will be found to aggregate usage on different physical installations.

open digital service DoCIS is an example for a new type of service where the source code of the library is openly. It is an open library service. This contrasts favorably with the black box approach of the commercial search engines.

to do list finish a version of konz that recognizes HTML full text integrate DoCIS and DoIS finish conversion of DBLP to AMF open institutional registration for rclis open author registration for rclis open a NEP-like service for rclis

Am I crazy? Money does not make the world go round. Ideas do. When RMS proposed a free replacement for UNIX in the early 80s, most people dismissed the idea. Today it is reality! Similarly, when I started to work on RePEc a totally free and improved A&I dataset in 1993, nobody gave it a high probability to succeed. It is a reality!

obstacles to open libraries lack of imagination & entrepreneurship inability to form alliances user-centered thinking document-centered thinking technical competence required –OAI PMH –XML and XML Schema –Unicode the "C" word

Thank you for your attention! collaboration is welcome!