Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder.

Similar presentations


Presentation on theme: "Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder."— Presentation transcript:

1 Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

2 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 2 Introduction ● October 29 th, 2004: Treaty establishing a Constitution for Europe ● 2005: rejection by referendum in France and The Netherlands ● 2005: ratifying process is stopped due to lack in public support

3 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 3 European Integration Historiography ● Need of a transnational history of the public opinion of the European project ● Focus on the role of: – Non-state actors – Civil society – Public opinion ● Distance focus from interstate relations and government policies

4 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 4 The role of media ● Mass digitization of historical materials ● Enhancement of digital techniques Media: «an important but mostly overlooked player in integration history», H.J.Trenz (2008)

5 December 8 th, 2014 CLARIN-NeDiMAH Workshop, The Hague 5 Approach ● Automatically extraction of networks of people mentioned in news stories – Weighted according to significance – Distributed according to co-occurrence – Dynamic to represent change – Containing most relevant concepts discussed

6 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 6 Social Networks in Historical Research ● Examples: – Padgett, Ansell 1993: Rise and action of the Medici – Rochat et al 2014: Rise of Venetian maritime empire – Jackson 2014: Unseen relationships in medieval Scotland ● Graphs created manually ● Mostly do not use newspapers as source

7 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 7 Computational Background ● In European integration studies, computational techniques to map out public discourse are in the earliest stage: – de Roode (2012): 1000 editorials, by hand – Medrano (2003) and Meyer (2010): traditional techniques applied to newspapers ● Entity-centric analysis of data, popular in quantitative literary analysis (Coll Ardanuy and Sporleder 2014)

8 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 8 Advantages of the network approach ● No rigid theory ● Draws our attention to public debate ● Bird's eye view of a period of time ● Generator of hypotheses

9 9 The Data ● Digitized newspapers from Dutch Royal Library ● Three newspapers: – De Tijd (catholic) – Het Vrije Volk (socialist) – De Telegraaf (no formal political affiliation) ● Period: ● Restrict search: 'Europa', 'Europese', 'Europeaan' ● High OCR confidence ● Total number of articles: 6128

10 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 10 The Method: outline ● Obtaining the nodes – Human Name Recognition – Coreference Resolution ● Establishing the relations ● Building the network

11 CLARIN-NeDiMAH Workshop, The Hague 11 Obtaining the Nodes: Human Name Recognition ● Stanford Named Entity Recognizer – Training data for modern Dutch (CoNLL-2002 shared task) – tokens, 3032 person names ● Heuristics to increase recall: – Create a list of 1650 titles or professions preceding human names: ● From wikipedia (Lijst_van_beroepen) ● Expand list by capturing the uncapitalized ● word between an age expression and a ● capitalized word – Capture any capitalized word after an age expression or a title or profession ● F-score improvement from 0.70 to 0.76 Ex 1: 'de 63-jarige Frank Donoghue' Ex 2: 'de kapitein Ben Shaw' Ex 3: 'de 21-jarige pianist Theo'

12 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 12 Obtaining the Nodes: Coreference Resolution ● For each identified human name, we keep: – Age information – Title or profession information ● Coreference resolution per document by string matching: – Assumption: two identical surface forms in the same article refer to the same person, we keep the less ambiguous referent ● We do not perform disambiguation of names, coreference in the whole dataset by string matching from least to most ambiguous, with considerations: – Match initials, hypocorisms – Age or title/profession conflict

13 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 13 Establishing the relations ● Edge: co-occurrence of nodes per article ● Characteristics of the network: – Undirected – Weighted – Dynamic

14 December 8 th, 2014HistoInformatics Workshop, Barcelona14 Establishing the Relations Attributes of the node ● We extract three attributes per node: – Other names by which the entity is referred – Year the entity was born – Title or profession preceding the entity

15 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 15 Establishing the Relations Attributes of the edge ● We extract three attributes per pair of nodes: – Tf-idf weighting for the common documents – Most common words of the common documents – List of co-ocurring news articles

16 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 16 Fragment of a network

17 December 8 th, 2014CLARIN-NeDiMAH Workshop, The Hague 17 Building of the network ● Dynamic networks: succession of yearly static networks ● Python library Networkx to construct the networks ● Network analysis software Gephi to visualize them

18 December 8 th, 2014HistoInformatics Workshop, Barcelona18 Analysis: expected results ● Important presence of personae such as: Robert Schuman, Dirk Stikker, Ernest Bevin, Konrad Adenauer, Winston Churchill, Georges Bidault, Willem Drees, Alcide de Gasperi, Jean Monnet ● Socialist newspaper gives more weight to local stories and less weight to big names – 10 most common nodes in De Tijd: 16% – 10 most common nodes in Het Vrije Volk: 10%

19 December 8 th, 2014HistoInformatics Workshop, Barcelona19 Analysis: interesting results ● Importance of politics: – central actors in networks are continuously politicians – political process rather than economic ● Ideologically-motivated process: – process of peace – central concepts are: 'solidariteit' and 'gemeenschapszin' ● Continued centrality of American politicians even though from May 1950 the process was seen as a truly European matter

20 December 8 th, 2014HistoInformatics Workshop, Barcelona20 Conclusions ● Ongoing work ● Computational techniques to strengthen the empirical foundations of new European integration history ● Network extraction: – Raises new questions – Allows more refined search – Furthers the scope of inquiry ● European Integration history is transnational, multilingual and ramified

21 December 8 th, 2014HistoInformatics Workshop, Barcelona21 Thank you for your attention


Download ppt "Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder."

Similar presentations


Ads by Google