Presentation on theme: "Humanities Research Data – Rate me! Wolfram Horstmann Summer School, 3 July 2012."— Presentation transcript:
Humanities Research Data – Rate me! Wolfram Horstmann Summer School, 3 July 2012
The Research Data Question Data-driven research is called the 4 th Paradigm in the Sciences. Where are humanities in the current discussion about research data?
Ratings, Skepticism & Anxiety Research Excellence Framework is a reality. But it is objected that: “Humanities research threatened by demands for 'economic impact'” Guardian 13 October 2009
Outline The current awareness of the importance of research data provides opportunities for the humanities to show their value. ~ The challenge is to communicate what research data means for the humanities. ~ The proposal is to state the obvious more clearly: text and images as research data of the humanities and libraries as humanities research facilities.
HUMANITIES AND LIBRARIES AS SOULMATES
Texts and Images as Data Humanities work with texts and images as other subject areas work with matter, wetware, hardware or numbers.
Libraries as Research Facilities Humanities have institutionalized their research facilities centuries ago, other subject areas did it much later, with labs and centers like CERN or EMBL.
The Advent of the Digital Transforming the physical research facilities into digital is a laborious and expensive exercise – and its potential is not yet exploited.
Digital Humanities & Libraries World Data Centers or the EBI are centralized – can Humanities Data Centers can be at each institution?
Digital Resources in the Bodleian ~approaching petabyte scale of highly structured storage for texts and images ~ digitized images, another Million to come in the next 3 years, plus Google Books ~100 virtual machines … and by far most of these are resources of the Humanities. REFERENCE MISSING
Cultures of Knowledge An example of highly structured, intellectually curated data: more than unique people and 3500 locations identified in letters with annotations.
What’s the Score? In only a few months over scores have been described by the public.
Broadside Ballads Collaborative research introduces novel qualities into humanities research data management.
Google Books at the Bodleian Approaching one download a minute: Google books with estimated pages and words Mar19-25 Mar 26 Mar - 1 Apr 2-8 Apr 9-15 Apr Apr23-29 Apr 30 Apr - 6 May 7-13 May May21-27 May 28 May - 2 Jun Total uk ac.uk ox.ac.uk Bodleian Libraries bodley bodleian ouls sers library-public bodley-open bodley-public odl ouls-open saclib taylor
THE STORY SO FAR
Size matters! Even though humanities often use qualitative and hermeneutic methodology – rather than quantitative – the size of data is significant.
Structure matters! Sizable numbers will not give a thorough idea of digital humanities data – structure is evenly important. This can only be understood by example
Collaboration matters! Involvement of colleagues in collaborative research and the public in crowdsourcing makes a difference.
RESEARCH DATA CHALLENGES IN THE HUMANITIES
1 st Challenge: Diversity Humanities have a varied typology of research data, often requiring idiographic approaches. Thus, standardization is difficult (cf. citation), and so is finding computational skills.
2 nd Challenge: Openness As with all researchers, competition, privacy and exploitation are impediments to data sharing. Do humanities more than others keep the “ivory tower” attitude?
Accessibility of Humanities Texts From some bibliographic records it is hard to fill the humanities corpus. This might constrain discoverability of Humanities resources. Lösch, M., Waltinger, U., Horstmann, W., & Mehler, A. (2011). Building a DDC-annotated Corpus from OAI Metadata. Journal of Digital Information, 12(2) Waltinger, U., Mehler, A., Lösch, M., & Horstmann, W. (2011). Hierarchical Classification of OAI Metadata Using the DDC Taxonomy. In Chambers et al (Eds.), Advanced Language Technologies for Digital Libraries (Vol. 6699, pp ). Berlin / Heidelberg: Springer.
3 rd Challenge: Inherent Obstacles Humanities research data show some peculiarities. An extreme example is the closure of archaeological data to protect sites against tomb raiders. Research in the Humanities and Social Sciences : Hogenaar, A., H. Tjalsma, & M. Priddy “Research in the Humanities and Social Sciences”
4 th Challenge: Implementing Policy Funders policies are an approach for opening up data – but humanities produce much data outside of the regular project life cycle. Deposit of resources or datasets Grant Holders in all areas must make any significant electronic resources or datasets created as a result of research funded by the Council available in an accessible and appropriate depository for at least three years after the end of their grant. The choice of depository should be appropriate to the nature of the project and accessible to the targeted audiences for the material produced.
RESEARCH DATA OPPORTUNITIES IN THE HUMANITIES
1 st Opportunity: Public Understanding Humanities research data are often easier understood by the public than science data. The “Impact Regime” may even be an advantage for the humanities.
2 nd Opportunity: Cultural Heritage They are more likely to be accessed and preserved than research data in other subject areas.
3 rd Opportunity: Infrastructure The requirements of infrastructure for many humanities research data resemble those of digital libraries. No new research facilities have to be built. National Library of China
4 th Opportunity: New Metrics It is likely that humanities research data have an web impact advantage. High societal interest could result in higher web-o-metric and usage statistics ratings.
Another mindset? …to see text & images as humanities research data. ~ …to see the humanities as data intensive. ~ …to see a web impact advantage for the humanities. ~ …to see libraries as humanities research facilities.
Recommendations Exploit the good accessibility of humanities research themes through newspapers, exhibitions, crowdsourcing and citizen science. ~ Make as many research outputs web accessible as possible. ~ Invest in and support new metrics such as usage statistics and web-impact. ~ Strengthen partnership between humanities and other disciplines and libraries.