Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.

Similar presentations


Presentation on theme: "Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance."— Presentation transcript:

1 Hussein Suleman hussein@cs.uct.ac.za University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance Computing Laboratory April 2007 What and Why of IRs

2 Discovering Research 1/3

3 Discovering Research 2/3

4 Discovering Research 3/3

5 Overview  Institutional Repositories  Self Archiving  Copyright and publishing  Software issues  The UCT-CS archive

6 Traditional (pre-DL) Open Access  Documents on an academic’s Web page.  Problems: Persistence – will the documents always be there? Authority – can we trust the authenticity of the website? Standards – are the formats and metadata the same across all websites? Discovery – how do we find these documents? Google?

7 Early Digital Libraries  Digital libraries are repositories/archives that aim to address all previous OA problems using software for management of documents.  In the 90’s subject archives were popular e.g., arXiv, RePEc, NCSTRL.  Problems: Sustainability – repositories run by organisations with limited funding. Development skill – staff needed to develop and maintain all software needed to run the archive Should the archive be centralised or distributed?

8 Institutional Repositories  Institutional Repositories (IR) are digital libraries run by an educational/research institution to archive documents owned/produced locally.  Types of institutional repositories: Departmental Special Collections Centralised Departmental Federated University Federated

9 The Departmental Archive

10 Special Collections

11 Centralised IR

12 Federated IR  Federation across Departmental IRs Connected through the OAI-PMH so discovery can take place in one place (“campus portal”) but each department runs its own archive.  Federation across Universities Connected through OAI-PMH so discovery can take place in central location e.g., Google. Example: Networked Digital Library of Theses and Dissertations (NDLTD)

13 Self-archiving  Self Archiving means taking control of and responsibility for the preservation and access to your research publications. Traditionally, by adding documents to your website. Recently, by adding documents to an IR at your institution.  Self Archiving is the best way to fill IRs Authors understand their documents best. Cost is lower than centralised archiving.

14 Why Self-Archive?  Take ownership of your research!  Easier access for collaborators (“reprints” are dead).  National/regional/institutional rules and laws.  Greater visibility to research.  Can provide access even if university does not subscribe to journals.  Complete view of individual research output.

15 Why Self-Archive in an IR?  Takes away burden to maintain website.  Professional support from library.  Much better reliability backup, migration, quality of service, etc.  Provides consistent and standardised formats/metadata.  Institutions may require it.  Complete view of institutional research output.

16 Issues: Publication and Pre-Prints  If we put pre-publication documents into an IR, does this affect publication?  Generally, NO. Why? Computer Scientists and Physicists have done this for decades with “ technical reports”. The version in the archive is (often substantially) different from the reviewed and published version. Theses and dissertations are not considered pre- publication by publishers.

17 Issues: Copyright and Post-Prints  If we deposit post-publication documents into an IR, doesn’t this violate copyright?  Generally, NO. Why? Most society publishers will allow archiving on a website or IR e.g., ACM Most commercial publishers allow archiving on a website or IR after some time (typically 12-24 months). Newer commercial publisher agreements make greater allowance for IRs. You can always negotiate with a publisher!

18 Issues: Publishers and Government  Commercial publishers require copyright transfer - Open Access publishers do not.  Some governments are mandating OA for research: US/UK/SA considering laws. Many governments have laws regarding theses.  Moral: Commercial publishers have to adapt – exclusive copyright transfer will not work if governments do not allow it!

19 Software Issues 1/2  What do you need to contribute to or access an institutional repository?  Web Browser  To contribute: maybe a way to create PDFs Adobe Acrobat. Open Source and Freeware software available!

20 Software Issues 2/2  Free software available to create an IR – Open Society Institute maintains a list: All packages support OAI-PMH – they can be connected to other systems.  EPrints  DSpace  Etc.

21 EPrints

22 DSpace

23 The UCT-CS Repository  Author self-submission  Checking of submissions  Archive-everything!  UCT-CS-specific metadata and classification systems  Hierarchical browsing  Simple and fielded searching  OAI-PMH compliance

24 Open Access  If it can’t be found in Google …  1734 hits directly from Google in March 2007.  Example: http://www.google.com/search?q=questionnaire+s ystem+UML http://www.google.com/search?q=questionnaire+s ystem+UML Kritzinger, Pieter, Marshini Chetty, Jesse Landman, Michael Marconi and Oksana Ryndina (2003) ChattaBox: A Case Study in Using UML and SDL for Engineering Concurrent Communicating Software Systems. In Proceedings Southern African Telecommunications Networks and Applications Conference, George, South Africa.

25 Why we have a repository  It was faster than simply waiting!  CS departments internationally archive technical reports (NCSTRL).  Research websites don’t last long (enough).  UCT doesn’t have an ETD project yet.  We need to improve ACCESS to our work.  We need to preserve our research output.  Bureaucracy (UCT, NRF, DoE, etc.) requires tracking publications.  We (think we) know what we are doing.

26 What we archive  Books and Book Chapters  Conference Paper and Posters  Journals (online and paginated)  Newspaper and Magazine Articles  Preprints  Presentation Slides  Conference Proceedings  Departmental Technical Reports  Electronic Theses and Dissertations  Other Stuff …

27 Infrastructure Requirements  Software: EPrints v2.2.1 plus a few changes here and there.  Server: Tacked onto an existing machine at first! 3GHz Pentium Xeon/1GB/35GB  Operating System: FreeBSD 6.0  Web server: Apache v1.3.7  Administrator: shared with other systems …

28 Community Building  Filling the archive: Get official support. Twist arms of staff. Fill archive with own publications to make others look bad. Twist arms of staff even harder. Get (student) researchers to twist student arms. “The domino effect”.

29 Copyright

30 Metadata/Citation Rendering

31 Academic Overview of Research

32 NRF / DoE Credit  We already have a departmental listing of all research output.  Where copyright does not allow, we include just a citation – no files – for completeness.

33 Interoperability  Our archive is compliant with Open Archives Initiative’s Protocol for Metadata Harvesting (OAI-PMH) v2.0.  Metadata can be freely harvested by any service provider.  baseURL: http://pubs.cs.uct.ac.za/perl/oai2http://pubs.cs.uct.ac.za/perl/oai2

34 Communities and Metadata  Participate in OAI: Metadata can be in Dublin Core.  Participate in NDLTD: Metadata can be in ETDMS. Set for theses and dissertations only.  Participate in NCSTRL: Metadata can be in RFC1807. Set for technical reports only. OAI-PMH Request:  http://pubs.cs.uct.ac.za/perl/oai2?verb=ListRecords&meta dataPrefix=oai_rfc1807&set=747970653D7465636872657 06F7274

35 Unique Non-local IP Addresses

36 Unique Crawler IP Addresses

37 Which Resources are Accessed

38 Distribution of Accesses

39 In Summary  An institutional repository is relatively simple to set up and administer. and there is lots of support out there in the community.  Don’t worry too much about the politics.  Concentrate on the advantages and start small. Over time you can deal with large- scale projects and complete institutional buy- in.

40 Links  Open Archives Initiative http://www.openarchives.org/  Budapest Open Access Initiative http://www.soros.org/openaccess/  BioMed Central http://www.biomedcentral.com/  UCT CS Research Archive http://pubs.cs.uct.ac.za  EPrints http://www.eprints.org/  DSpace http://www.dspace.org/

41 That’s all Folks! direct all questions and comments to: hussein@cs.uct.ac.za


Download ppt "Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance."

Similar presentations


Ads by Google