Presentation on theme: "Humanities Research Data – Rate me! Wolfram Horstmann Summer School, 3 July 2012."— Presentation transcript:
Humanities Research Data – Rate me! Wolfram Horstmann Digital.Humanities@Oxford Summer School, 3 July 2012
The Research Data Question Data-driven research is called the 4 th Paradigm in the Sciences. Where are humanities in the current discussion about research data? http://www.flickr.com/photos/desconciertos/160752180/
Ratings, Skepticism & Anxiety Research Excellence Framework is a reality. But it is objected that: “Humanities research threatened by demands for 'economic impact'” Guardian 13 October 2009 http://www.flickr.com/photos/komoda/7187391601/
Outline The current awareness of the importance of research data provides opportunities for the humanities to show their value. ~ The challenge is to communicate what research data means for the humanities. ~ The proposal is to state the obvious more clearly: text and images as research data of the humanities and libraries as humanities research facilities.
Texts and Images as Data Humanities work with texts and images as other subject areas work with matter, wetware, hardware or numbers. http://www.flickr.com/photos/gorgmorg/9944210/
Libraries as Research Facilities Humanities have institutionalized their research facilities centuries ago, other subject areas did it much later, with labs and centers like CERN or EMBL. http://vi.sualize.us/carl_spitzweg_bucherworm_1850_books_library_ladder_reading_picture_2Qp9.html
The Advent of the Digital Transforming the physical research facilities into digital is a laborious and expensive exercise – and its potential is not yet exploited. http://www.bodley.ox.ac.uk/librarian/rpc/manchesterpres/slide15.jpg http://www.flickr.com/photos/flex/27334821/ http://tei.oucs.ox.ac.uk/Talks/2008-08-kazan/exercise-2.xml
Digital Humanities & Libraries World Data Centers or the EBI are centralized – can Humanities Data Centers can be at each institution? http://adamcrymble.blogspot.com.es/2012/01/is-old-bailey-online-film-or-science.html
Digital Resources in the Bodleian ~approaching petabyte scale of highly structured storage for texts and images ~2.000.000 digitized images, another Million to come in the next 3 years, plus 350.000 Google Books ~100 virtual machines … and by far most of these are resources of the Humanities. REFERENCE MISSING
Cultures of Knowledge An example of highly structured, intellectually curated data: more than unique 12.000 people and 3500 locations identified in 60.000 letters with 25.000 annotations. http://www.history.ox.ac.uk/cofk/
What’s the Score? In only a few months over 10.000 scores have been described by the public. http://www.whats-the-score.org/
Broadside Ballads Collaborative research introduces novel qualities into humanities research data management. http://ballads.bodley.ox.ac.uk
Google Books at the Bodleian Approaching one download a minute: 350.000 Google books with estimated 10.000.000 pages and 25.000.000.000 words 12-18 Mar19-25 Mar 26 Mar - 1 Apr 2-8 Apr 9-15 Apr 16-22 Apr23-29 Apr 30 Apr - 6 May 7-13 May 14-20 May21-27 May 28 May - 2 Jun Total515033387111301039554528690145666883530051652844.uk120220885950170525323360538634453667270430921347.ac.uk103313285751161012622970448231232988252528031194.ox.ac.uk99112965636155912492938443531112973249827371186 Bodleian Libraries291464516306319524562680552499649224.bodley001533814862174.bodleian000000000100.ouls10648432615888994395011239.sers79187102636415410513113918112626.library-public040330333020.bodley-open391747181014116175.bodley-public514 121928213218213018.odl000000000000.ouls-open98202325195205223313381322212348128.saclib00002011410431.taylor000015634343
Size matters! Even though humanities often use qualitative and hermeneutic methodology – rather than quantitative – the size of data is significant. http://randommization.com/2011/03/08/library-has-giant-books-for-facade/
Structure matters! Sizable numbers will not give a thorough idea of digital humanities data – structure is evenly important. This can only be understood by example. http://cacm.acm.org/magazines/2010/4/81499-the-data-structure-canon/fulltext 011010101001010101010101011 000100010101001010001000101 010010011010101001010101010 101011000100010101001010001 000101010010011010101001010 101010101011000100010101001 010001000101010010011010101 001010101010101011000100010 101001010001000101010010011 010101001010101010101011000 100010101001010001000101010
Collaboration matters! Involvement of colleagues in collaborative research and the public in crowdsourcing makes a difference. http://www.flickr.com/photos/ludovicmauduit/2646525907
1 st Challenge: Diversity Humanities have a varied typology of research data, often requiring idiographic approaches. Thus, standardization is difficult (cf. citation), and so is finding computational skills. http://www.ucl.ac.uk/archaeology/studying/undergraduate/courses/ARCL2037
2 nd Challenge: Openness http://www.flickr.com/photos/uncene/364730693/ As with all researchers, competition, privacy and exploitation are impediments to data sharing. Do humanities more than others keep the “ivory tower” attitude?
Accessibility of Humanities Texts From some 30.000.000 bibliographic records it is hard to fill the humanities corpus. This might constrain discoverability of Humanities resources. Lösch, M., Waltinger, U., Horstmann, W., & Mehler, A. (2011). Building a DDC-annotated Corpus from OAI Metadata. Journal of Digital Information, 12(2) Waltinger, U., Mehler, A., Lösch, M., & Horstmann, W. (2011). Hierarchical Classification of OAI Metadata Using the DDC Taxonomy. In Chambers et al (Eds.), Advanced Language Technologies for Digital Libraries (Vol. 6699, pp. 29 - 40). Berlin / Heidelberg: Springer.
3 rd Challenge: Inherent Obstacles Humanities research data show some peculiarities. An extreme example is the closure of archaeological data to protect sites against tomb raiders. Research in the Humanities and Social Sciences : Hogenaar, A., H. Tjalsma, & M. Priddy. 2011. “Research in the Humanities and Social Sciences” http://dx.doi.org/10.2390/PUB-2011-7http://dx.doi.org/10.2390/PUB-2011-7
4 th Challenge: Implementing Policy Funders policies are an approach for opening up data – but humanities produce much data outside of the regular project life cycle. Deposit of resources or datasets Grant Holders in all areas must make any significant electronic resources or datasets created as a result of research funded by the Council available in an accessible and appropriate depository for at least three years after the end of their grant. The choice of depository should be appropriate to the nature of the project and accessible to the targeted audiences for the material produced. http://www.ahrc.ac.uk/FundingOpportunities/Documents/Research%20Funding%20Guide.pdf
1 st Opportunity: Public Understanding Humanities research data are often easier understood by the public than science data. The “Impact Regime” may even be an advantage for the humanities. http://www.queenvictoriasjournals.org/home.do
2 nd Opportunity: Cultural Heritage They are more likely to be accessed and preserved than research data in other subject areas. http://www.europeana.eu/portal/
3 rd Opportunity: Infrastructure The requirements of infrastructure for many humanities research data resemble those of digital libraries. No new research facilities have to be built. National Library of China
4 th Opportunity: New Metrics It is likely that humanities research data have an web impact advantage. High societal interest could result in higher web-o-metric and usage statistics ratings. http://newsinfo.iu.edu/pub/libs/images/usr/9584_h.jpg
Another mindset? …to see text & images as humanities research data. ~ …to see the humanities as data intensive. ~ …to see a web impact advantage for the humanities. ~ …to see libraries as humanities research facilities.
Recommendations Exploit the good accessibility of humanities research themes through newspapers, exhibitions, crowdsourcing and citizen science. ~ Make as many research outputs web accessible as possible. ~ Invest in and support new metrics such as usage statistics and web-impact. ~ Strengthen partnership between humanities and other disciplines and libraries.