Что делать? Thomas Krichel

1 Что делать? Thomas Krichel

2 Background Trained economist, 1984 to 2000 Worked on electronic dissemination of academic papers in Economics since 1993 Activist of free online scholarship & academic self-documentation Pioneered business model of Open Archives Initiative Professor for Library and Information Science

3 The basic idea Scholars are not paid for writing scholarly papers. –Simply a historical fact –We assume that this will not change as we move into a more online/digital future Publishers appropriate copyright to sell one academic the output that is freely given up by another. Socially inefficient

4 Topic Author self-archiving Free online scholarship Academic self-documentation Academic disintermediation

5 Harnad Steady State Analysis Toll-gated academic publishing in the Gutenberg world with positive marginal costs. Post-Gutenberg world leads to abolition of toll-gates. All that scholars have to do is self-archive. Refutes all arguments against self-archiving. Analysis limited to free access to academic papers, not related data

6 The dynamics do matter Toll-gated layer exists publisher editors scholarly societies A free layer is slowly starting –how to create prepublication tradition –how is the system to be funded –which organizational model discipline-based institution-based

7 Anti-Harnad analysis Academic world has always needed non- academic intermediaries. Important documents are revealed in the marketing process Money makes the world go around.

8 Interim phase A time for visionaries Important first-mover advantage Time of bad business models and failed plans –NCSTRL –CogPrints Fight between the intermediaries of scholarly communication (publishers and libraries)

9 Disintermediation Publishers wish to reduce libraries to bursars of funds and access control. Libraries want to get into the publishing business. Academics will have to decide which (if any) vision will work.

10 Institution-based initiatives Idea: libraries of universities should make papers from all disciplines available on institutional servers Problem: low incentives for academics to collaborate –prime solidarity of scholar with discipline –no preprint tradition

11 Putting it up on the web Prepublication by individuals over the web is an important step Problems are: –stability of document existence and location, –thus impossible to use as a building block for a review of any kind –information retrieval difficulties –no certification of finding

12 Arms theory Free research library out of documents that are made available –by academics directly –by intermediaries through author pressure Bibliographic layer costly to maintain and will be left to commercial entities.

13 Discipline-based systems For the time being, they only work in the preprint disciplines –Mathematics –Physics leading to centralized systems and in the working paper disciplines –Computer science –Economics leading to decentralized systems

14 Polymorph scenario Development per discipline Different stages or scenarios for each group Institution based archives help, but do not complete the picture What is the complete picture?

15 Recall Bellman… To find an optimal path over time, find the optimum steady state Then calculate the optimum path leading to that steady state… (in practice, a problem of time-consistency arises but we will ignore this problem)

16 Optimal steady state? All papers freely accessible online Anybody can enter the academic process and make a paper available Extensive linking –Papers to authors –Authors to institutions –Document to document (review, references) –Document to group (peer-review)

17 What material to be deposited The papers written be reasonable authors, recognized as genuine scientific writers In practice, it is those authors that are affiliated with an academic, or otherwise recognized institution who produce such output.

18 Top to bottom approach Register institutions first Register authors second Make papers available third Contrast that with the approach of librarians….

19 Quality control Elementary quality control through the affiliation of authors. No other basic means available in contested disciplines. No other elements of quality control but of course potential for extensive quality appraisal.

20 Global/Local approach Registry of all institutions is a difficult process Local registry per discipline possible –EDIRC –World List of LIS departments

21 Incentives Free access improves exposure of research (see work by Steve Lawrence) But the general promise not enough for –Deposit on web –Formal deposit in a free system Formal deposit system has to demonstrate that is does well Lies, damn lies, and statistics

22 Impact Impact is key to academic work in almost all disciplines. Scientists have small change, but big ego. When impact can be quantified, academics start to listen, and they forget about Churchill. Our dean cant read, our dean can count.

23 Peer review to impact review Collection quality control through peer- review is part of the Gutenberg universe Pure impact review –Access logs / Download logs / Citation counts –Promotes open access Global impact review –Uses data of grouping of papers

24 Impact review formula requirements Based on the production of authors Only indirectly modifiable by authors –Deposit more papers –Deposit better papers Three elements of peer evaluation –Citations –Collection inclusion –Collection review

25 Citations Often criticized, but the only means that we have to assess impact between papers Publicly accessible citations indexes will go a long way to promote open scholarship. Such indexes can be constructed by computer Since citation styles are widely different, an approach per discipline is required.

26 Collection inclusion This is classic peer-review. Post-Gutenberg age should allow inclusion in several collections. But copyright surrender prevents this. Combat this in principle, but dont expect results any time soon. Promote licensing of publishers.

27 Collection evaluation Classic ISI citation impact reviews This data can be used in initial version of an impact review formula. Later that data should be endogenized.

28 Contents is king Nothing can be done without an initial stock of papers. All disciplines have some form of informal publication channel. Collect and archive elements in these channels. Get a coalition of collectors together.

29 Genesis 3:19 Without volunteer efforts things will not get done. In particular, the people on the upper echelons have to be volunteers. People can not be expected to be paid for the collection work. Infrastructure work can be supported through funding.

30 GNU Thinking When Richard Stallman launched the GNU project many people thought he needs to get in the shade. I guess some of you think that about me! When computer geeks can make a complete operating system available over the Internet at no cost. Decentralization is the word of the day.

31 Tools and tasks Tools –Open Archives Initiative Protocol for Metadata Harvesting –Academic Metadata Format Tasks –Deposit –Describe –Identify –Relate

32 OAI and task model Free online scholarship through open archives doing the first two tasks. Aggregators will be needed to perform the two other tasks. Can also use the OAI protocols.

33 AMF and task model AMF appears as a basic framework for aggregators to communicate with basic data providers and export data. Aggregator will need to set database structure from pile of AMF data.

34 Example fromRePEc crefwp99.html

35 Vielen Dank!

36 arXiv Too well-known to talk about here So I will talk more about RePEc. One important development: arXiv will start to identify authors.

37 RePEc Comprehensive academic self-documentation system in fact, the very essence of an academic self- documentation system –run in a decentralized way by academic volunteers –comprehensive picture of academic output activity originates with WoPEc project founded by Thomas Krichel in 1993 And so on…

38 RePEc principle Many archives –archives offer metadata about digital objects (mainly working papers) One database –The data from all archives forms one single logical database despite the fact that it is held on different servers. Many services –users can access the data through many interfaces. –providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.

39 RePEc is based on 190+ archives WoPEc EconWPA DEGREE S-WoPEc NBER CEPR US Fed in Print IMF OECD MIT University of Surrey CO PAH

40 …to form one dataset... over 140,000 items in over 1,000 series, contains working paper, published paper, software, personal and institutional data largest distributed free source about online scientific publications, over 45,000 electronic papers data is encoded using the purpose-built ReDIF format all archives follow a convention called the Guildford protocol on how to store ReDIF files and other data on their servers. Therefore the archives can be mirrored.

41 … describes documents Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Person: RePEc:per: :thomas_krichel Author- Author-Name: Paul Levine Author- Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: pub/RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: Revision-Date: Handle: RePEc:sur:surrec:9601

42 … describes persons (HoPEc) Template-Type: ReDIF-Person 1.0 Name-Full: KRICHEL, THOMAS Name-First: THOMAS Name-Last: KRICHEL Postal: 1 Martyr Court 10 Martyr Road Guildford GU1 4LF England Homepage: Workplace-Institution: RePEc:edi:desuruk Author-Paper: RePEc:sur:surrec:9801 Author-Paper: RePEc:sur:surrec:9601 Author-Paper: RePEc:rpc:rdfdoc:concepts Author-Paper: RePEc:rpc:rdfdoc:ReDIF Handle: RePEc:per: :THOMAS_KRICHEL

43 … describes institutions (EDIRC) Template-Type: ReDIF-Institution 1.0 Primary-Name: University of Surrey Primary-Location: Guildford Secondary-Name: Department of Economics Secondary-Phone: (01483) Secondary- Secondary-Fax: (01483) Secondary-Postal: Guildford, Surrey GU2 5XH Secondary-Homepage: Handle: RePEc:edi:desuruk

44 Weaknesses of RePEc No funding Difficult to grasp innovative concepts –relational database for the academic process –plethora of user and contributor services Setting-up costs are large, constant attention required Little support from the top of the academic food chain

45 Academic Metadata Format Data and metadata for action. Librarians have only documented the world; what matters is to change it. Tool for academic self-documentation –simple to compose –drop-in functionality with OAI intuition that comes from natural language

46 Open Archives Initiative Most important for Free Online Scholarship is the implicit shift in business model towards institution-based archiving.

47 AMF View of the world Author self-archiving will work if it is part of the advertisement of academics Creator has to be the descriptive focus, not the creation

48 A model of AMF instances Persons Institutions Collections Resources –Text This is what is really important about AMF

49 Natural language Nouns –person, organization, collection, text Adjective like –name, title, status, etc Verbs like –isauthorof, hassponsor, ispartof etc

50 Example 1 Simeon M. Warner AMF Design in brief ome/krichel/southampton_ _1.ppt

51 id and ref For propeller head use. Records (instances of nouns) that are authoritative can have an id. Non-authoritative records can refer to authoritative ones, using a ref.

52 Example 2

