Presentation on theme: "Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna."— Presentation transcript:
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna
Using Computers in the Humanities Enhancing the empirical basis Improving the testability of hypotheses Examples: –corpus linguistics –computational linguistics –their theories and methods as research tools Dynamic visualization technologies enabling virtual reconstructions in digital archaeology Visualization of social relations and many other techniques -> supporting the emergence of new research paradigms using all forms of ICT – showing their epistemic or heuristic role Research documentation Data bases, repositories, etc.
Computational linguistics as a result – a case study Early developments as a hybrid discipline Based on analytical philosophy of science and formal philosophy of language Machine translation hype in the 1950s spawned whole generations of researchers all over the world Nature of this paradigm: –Very international and global, team oriented –Transdisciplinary while preserving diversity of theories and methods (well beyond SSH and computer science) –Highly competitive – developing its own traditions of formal evaluation methods of research results, many repositories –Language of research (incl. publication) is basically English –While the objects of study are basically all languages in the world (incl field linguistics)
Research Infrastructures for Language Resources and Language Technologies – the CLARIN project: Common Language Resource and Technology Infrastructure
Basic information on CLARIN A European Network for building and strengthening collaborative infrastructures for scientific research where language resources and language technologies (LRT) are relevant An FP7 project in the ESFRI area – „preparatory phase“ Interdisciplinary orientation Builds upon existing research infrastructures as well as on previous projects and initiatives (LIRICS, Elsnet, EAGLES, etc.) and focuses on long-term sustainability and preservation, as well as access and collaboration
What are language resources and language technologies? (digital) collections of language data, language corpora –Full texts (in all languages, in diverse text types/genres) –Digital lexical resources, terminologies, ontologies –Lexicographical resources (dictionary production) –All modalities and presentation forms (spoken/speech, written, multi- modal, etc.) –Most diverse forms of use and different purposes –In all languages, in all domains, in all application contexts where they occur (…but needed for research) Technologies for –Text analysis, corpus analysis, language processing –Speech recognition, speech production, text production (multi-modal) –Machine translation, computer-assisted translation (multi-modal) –Dictionary production, Technical documentation, technical communication, etc.
Aspects of shared research infrastructures Shared Human resources: researchers, technicians and other support staff – systematic training, international mobility (PhD and other levels) Shared Technical infrastructures (in the narrower sense of the term): hardware, software (in particular language technologies – tools, etc.), computing power (grids, etc.), broadband connectivity, rooms (labs with appropriate equipments), etc. Common logistical and legal procedures (well tested and validated): IPR regulations, accessibility procedures to data and software, OA approach Shared language resources needed for research Network of collaboratories
Goals unite existing digital archives into a federation of connected archives with unified web access provide language and speech technology tools as web services operating on (language) data in archives -> SOA architecture using SW standards -> implementation of relevant interoperability standards (beyond technical interoperability – semantic IO for semantic web services) Provide access to data for scholars, support them in their work (on collaborative platforms) and encourage them to provide their data and tools to research colleagues free of charge (if possible) Overcome high degree of fragmentation (due to lack of coordination, visibility, interoperability and of sustainability) Provide expertise in all countries (service network) Provide language independent tools that can be shared
Conclusion: Research benefits: eHumanities, computational science CLARIN is a concrete contribution to initiatives such as eScience/computational science, eHumanities, etc. at the level of research infrastructures by –Strengthening the empirical data basis (depth and breadth) –Enabling an empirical test of hypotheses in language sciences on the basis of massive data sets (scalability) and computer-intensive processing using various methods –Supporting the emergence of new research paradigms that want to use multimodal and multilingual language corpora and technologies –Raises Question of identity – who are we? Computer scientists or humanities scholars, or both, or none of the two, or something different? –Conclusions for assessment and evaluation “hybrid method” approach is key team oriented – and other habits of natural sciences and computer science are prevalent ERIH – CL almost not represented yet – more focus on cross-disciplinary coverage!!