Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media.

Similar presentations


Presentation on theme: "October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media."— Presentation transcript:

1 October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media

2 October 6, 2015 WAHSP/BILAND Research team: Stephen Snelders(UU), Pim Huijnen(UU), Daan Odijk(ISLA, UvA), Fons Laan(ISLA), Maarten de Rijke (ISLA), Toine Pieters (UU),

3 10/6/2015 Research Creating big-data resources

4 National library of the Netherlands Digital Newspaper Archive National library of the Netherlands Digital Newspaper Archive > 10.000.000 pages > 1200 titles 1618 - 1995 1618 - 1995 > 30.000.000 articles Still growing...

5 How did/do you study 30 million newspaper articles?

6 Dutch press on Germany Frank van Vree (1989) Dutch press on Germany Frank van Vree (1989) > 1200 titles 1618 - 1995 1618 - 1995 > 31.000.000 articles 4 4 1930 - 1939 1930 - 1939 4.000 Sampling

7 10/6/2015 Research

8 Developing semantic document selection tools

9 October 6, 2015 Research WE NEED: A semi-automatic and interactive open-source application An application that does not replace, but supports the intuition and insights of the historical researcher with expert knowledge of a specific topic or domain. An application that is user-friendly.

10 October 6, 2015 Research Problem: Context and background of Dutch drug and eugenics debates in time Aim Understanding and evaluation of public debates around drugs, addiction and eugenics in the Netherlands, 1900- 1945 Research question What are the dynamics (in terms of patterns and trends) of public debates and sentiments around drugs and addiction, and eugenics in the Dutch newspapers in the first half of the twentieth century

11 October 6, 2015 Research Poe’s detective finds the truth by using data in those newspaper articles that do not concern the murder. In a similar way we will find terms and sentiments in those newspaper articles that may seem irrelevant, but are not.

12 12 E-everything Information-extraction Recognize structure in text Part of speech Noun, verb, … Entities people, organisations, locations, temporal expressions, … Relations Who, what, with whom, how, why

13 13 E-everything Information-extraction (2)

14 10/6/2015 Enjoyable but what does it tell us?

15 10/6/2015 Research

16 10/6/2015 Research Start Query: Opium

17 10/6/2015 Research Drugs and drug policy

18 Odijk D., de Rooij O., Peetz M-H., Pieters T., de Rijke M., Snelders S. (2012). "Semantic Document Selection", TPDL 2012: Theory and Practice of Digital Libraries: Springer, September.

19 10/6/2015 Combining and clustering queries

20 10/6/2015 Research By carefully inspecting the word counts, we found quantitative evidence for historical turning points that indicated the criminalization of the drugs debate around 1924

21 Eugenics case; query overerving (hereditarian) 1867 10/6/2015 Research Primarily associations with health related terms/entities

22 10/6/2015 Research Eugenics case;

23 Eugenics case; query overerving 1935 10/6/2015 Research In 1935, however, the medical context of using the term inheritance made way for a legal and racial context

24 E-Humanity Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Challenges: 1. OCR-Repair 2. Improving Text-mining software and data infrastructure 3. Developing new historical research strategies 4. Educating historians and other humanities researchers 10/6/2015 NEW HORIZONS in DIGITAL HUMANITIES


Download ppt "October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media."

Similar presentations


Ads by Google