Analyses of first names in The Netherlands: full population studies

Slides:



Advertisements
Similar presentations
World Study on Poverty and Disparities in Childhood Panama, June 30 th and July 1 st, Childhood and Poverty in Brazil Instituto de Pesquisa Econômica.
Advertisements

Measuring Ethno-Cultural Characteristics in Population Censuses United Nations Economic Commission for Europe Statistical Division Regional Training Workshop.
Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of Linguistics OTS Utrecht University.
ONCE AGAIN-ST ABANDON OPENING TO NEW COUNTRIES EXPERIENCES INSTITUTE OF EDUCATIONAL SCIENCES BUCHAREST 30 MAY 2008.
ISCI second International conference University of Western Sydney November, 2009.
First names in the Netherlands from preferences of parents to socio-geographic representations Gerrit Bloothooft Institute of Linguistics OTS Utrecht University.
Proper names in The Netherlands from the Civil Registration: full population data Gerrit Bloothooft Utrecht University / Meertens Institute KNAW
The third International Population Geography Conference Liverpool, June 2006 Proximity of adult children to their elderly parents in the Netherlands.
Alain Bertaud Urbanist Module 2: Spatial Analysis and Urban Land Planning The Spatial Structure of Cities: International Examples of the Interaction of.
H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest,
Key Issue 3: Why are Different Places Similar?
Society  Any relatively self-contained and self- sufficient group united by social relationships.  Two central components of society: Social Structure.
Additional analysis of poverty in Scotland 2013/14 Communities Analytical Services July 2015.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Methodological questions of migration and ethnocultural diversity in Population Censuses StatCapCA Training Workshop No 3 Dushanbe, March 2007 Werner Haug.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Languages in the Contemporary World Although languages have common properties, from the point of view of their users, it is the differences that count,
MARKETING MANAGEMENT Segmentation, Targeting, and Positioning.
Language UNIT 3 REVIEW. Language  What is it? Language is a systematic means of communicating ideas and feelings through the use of signs, gestures,
CONNECTIONS, DISCONNECTIONS AND RECON-NECTIONS – the social dimension of youth work, in history and today The relationship between youth work and social.
Family and household structure Part 2
Population and Culture
Segmentation, Targeting, and Positioning Building the Right Relationships with the Right Customers Chapter 7.
A brief recap of the different branches
Joshua Rosenbloom and Brandon Dupont
Sociolinguistics Sarah Alshamran.
Statistical Data Analysis
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 4 PRESENTATION AND ANALYSIS OF DATA
Hashing Alexandra Stefan.
Research Methods & Statistics in Sociology
STABILIZING WORLD POPULATION
Population and Culture
SAMPLING (Zikmund, Chapter 12.
Assistant Headteacher
THE CHANGING AMERICAN SOCIETY: SUBCULTURES
English and Arabic Proverbs
Chapter 1: Americans, Citizenship, and Government
LESSON 12 - Loops and Simulations
Chapter 6-Section 4 Voter Behavior
Introduction to Summary Statistics
1- Being an American 2- Becoming a Citizen
Chapter 6: Voters and Voter Behavior Section 4
The European Statistical Training Programme (ESTP)
Population and Culture
Gerrit Bloothooft Institute of Linguistics OTS Utrecht University
Introduction to Summary Statistics
Chapter 6: Voters and Voter Behavior
Inferential Statistics
Trust and Culture on the Web
* The impact of families and societies in transition on intergenerational relationships – the role of technology Helena Hurme & Susanne Westerback.
CHAPTER 11 Inference for Distributions of Categorical Data
UNIT 4: HUMAN GEOGRAPHY World Population Now
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 6: Voters and Voter Behavior Section 4
Mean Deviation Standard Deviation Variance.
Statistical Data Analysis
Chapter 6: Voters and Voter Behavior Section 4
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Statistics Definitions
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 6-Section 4 Voter Behavior
Chapter 5: The analysis of nonresponse
Introduction about sociology
Presentation transcript:

Analyses of first names in The Netherlands: full population studies Gerrit Bloothooft Institute of Linguistics OTS Utrecht University Bloothooft, Gerrit (The Netherlands) Naming and subcultures in the Netherlands (2 a) Tuesday 14.00 Ladies and gentlemen, in The Netherlands the naming mechanisms have considerably changed during the last century. Until 1950 the naming of relatives was dominant and far over 90 % of the children got the first name of their grandparents by tradition. This has radically changed since then, leaving freedom for parents to follow their own preferences. I assume that these preferences are related to socio-cultural factors that originate in subcultures in society. In this presentation I will demonstrate that this is indeed the case. This implies the identification of relevant subcultures and their naming preferences.

Dutch studies on first names Limited scientific work so far Dictionary (20.000 entries) Few socio-linguistic studies Limited scope, small samples Topic is extremely popular in the media CTL colloqium June 2006

Research dimensions in onomastics Name Form and spelling Origin Motives Time Place require a lot of data CTL colloqium June 2006

Full population Gemeentelijke Basis Administratie (GBA), Civil Administration Electronically from 1994 Legal right to use data for scientific research 16+ million people CTL colloqium June 2006

Connected! UiL-OTS and Meertens Institute are connected to the GBA on June 1, 2006 The right to make a rich data extraction for the full population (all persons with Dutch nationality): planned July 1, 2006 CTL colloqium June 2006

Research proposal NWO The first name revolution in the 20th century in The Netherlands – the first name as a measure of social and linguistic change CTL colloqium June 2006

Mile stones Traditional naming (after relatives) decreased enormously during the 20th century, especially second half Full freedom for parents through name law of 1970 -> Naming of children became a very personal linguistic and social expression during the last 50 years CTL colloqium June 2006

Major topics Changes in naming after relatives Relations between names and social classes (sets and spelling) Regional spread of names, dialectal influences Life cycles of names CTL colloqium June 2006

What do we get (per person) All first names Date -, place -, postal code -, land of birth, gender, date of decease (after 1994) Parents: first names, date & place of birth Children: first names, date & place of birth Administrative number of all persons with own record this is unprecedented (also internationally)! CTL colloqium June 2006

Looking for mechanisms All research topics can be described as the search for large scale mechanisms and relations Away from the individual name, towards much higher aggregation levels CTL colloqium June 2006

Towards name sets From 16+ million names with over 200.000 different first names to a much lower number of name sets that have homogeneous properties CTL colloqium June 2006

A previous study (2000-2004) First names from the National Social Security Bank (SVB) All children born since 1983 first name (official, no nick name, but..) year of birth family code (separate table) postal code (four digits) I was in the lucky circumstance that we recently acquired a database with the first name of all children born since 1983. This database included year of birth, a family code – through which we knew what children belong to the same family - , and a postal code – allowing geographical studies. The data came from the National Social Security Bank who is responsible for all social payments, among which those for children. Almost all parents are entitled to receive this payment for their children. CTL colloqium June 2006

A very rich source 4.2 million children (1983-2002) 200.000 per year 1.9 million families 176.800 different first names 108.500 unique names 3.120 names with frequency > 100 represent 85% of the children The database is very large. It contains the first name from over 3.5 million children from the period we studied: 1983-1999. These children came from 1.9 million families. These children had over 150 thousand different names, of which two-third occurred only once. A little more than 3000 names had a frequency over 100 and represent 85 % of all children. Many studies can be envisioned using such a rich and complete source. I present only one investigation, specially targetted to the rare information on names of children per family. CTL colloqium June 2006

Datareduction needed Far too many names to describe one by one Names with common properties Not from etymological point of view Not from linguistic point of view Based on choices of parents name use! CTL colloqium June 2006

Naming and social classes Hypothesis: There are social classes with own naming preferences These classes/subcultures may relate to culture/language (Frisian, Arabic, Turkish, Surinam, Antillean,..) religion (Catholic, Protestant, Islam,..) sociological status (education, income,..) geography (urban, rural, regional,..) CTL colloqium June 2006

Research aims: Identification of social classes (and their naming preferences) on the basis of the first names of children per family Study of the relation between these subcultures (first names) and socio-cultural and geographic factors I think so, and therefore the research aims are to try to identify subcultures and naming preferences on the basis of first names of children per family. If we succeed in this effort, we want to analyze these subcultures with respect to geographic and socio-cultural factors. CTL colloqium June 2006

Method (a chain of names) Parents choose first names from a set that is popular in their subculture (relatives, friends, neighbours,..) (with higher probability) [Social Group size is about 150] This is informative only if there is more than one child (more than one name) in a family Pairs of first names (from a family) as unit for analysis If parents choose names for their children from a certain set, this is only informative for us if there is more than one child. With two or more first names from a family we have the information that if one of the names belongs to a set, there is a likelihood that the other names belong to the set as well. We use pairs of first names (from the same family) as a unit for further analysis. CTL colloqium June 2006

Method (a chain of names) Children in on family: Mark, Peter, Linda If Mark is popular in a subculture, then Peter and Linda may be popular as well Name pairs: Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda - Peter Here is an example. A family has the children Mark, Peter and Linda. If we have found that Mark is popular in a subculture, then this family suggests that Peter and Linda may be popular as well in that subculture. The name pairs that underlie further analysis are in this case Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda – Peter. On the basis of such name pairs we can get information of the names of all brothers and sisters of Mark for instance, and their frequency. CTL colloqium June 2006

Method (a chain of names) Select all families with two or more children (1.17 million families, 2.81 million children) Derive all pairs of first names (from a single family) (in all, 2.12 million different pairs) Compute the frequency of each pair The higher the frequency of a pair, the more likely the first names in the pair belong to the same set In our database we had 1.17 million families with more than one child. These families had 2.81 million children. In all 2.12 million DIFFERENT pairs of names were found (many of these with low frequency). CTL colloqium June 2006

Most frequent name pairs Frequency Pair of first names 1091 Johannes Maria 790 Johanna 754 Jeroen Martijn 727 …. 572 Mohamed Fatima 459 Lars Niels Here are the most frequent pairs of first names. We see Johannes and Maria on top followed by Johannes and Johanna. Interestingly the combination Johanna and Maria is already on the fourth place, indicating strong links between the three names Johannes, Maria and Johanna. Other examples of closely related names are the Dutch names Jeroen and Martijn, the Arabic names Mohamed and Fatima, and the Scandinavian names Lars and Niels. CTL colloqium June 2006

Clustering of first names Define measure that reflects relationship between two names Combine names, which mutually have a strong relationship, into a set Johannes, Maria, Johanna, … In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about 13.000 brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. CTL colloqium June 2006

Name relationship measure Esther 7.967 girls 12.973 brothers and sisters 276 times sister Judith (= 2.1 %) Judith 4.828 girls 8.033 brothers and sisters 276 times sister Esther (= 3.4 %) Geometric average (2.7 %) A symmetric measure of relationship between the two names In some more detail, let us look at the names Esther and Judith. There are almost 8000 girls with the name Esther, and these have about 13.000 brothers and sisters. Of these, 276 were named Judith, which is 2.1 % of all brothers and sisters. There were less girls named Judith but of course these had 276 sisters Esther as well, which is 3.4 % of all brothers and sisters of Judith. We used the geometric mean of both percentages (which is 2.7 % in this case) as a measure for the relationship between both first names. If this percentage is 100, this would imply that all sisters of Esther would be named Judith, and reversily. CTL colloqium June 2006

Clustering of first names Name pairs from a (subculture-related) set have the highest relation measure Esther: Judith 2.7 Mirjam 2.4 Ruben 1.2 David 1.1 Judith: Esther 2.7 Mirjam 1.6 Ruben 1.0 Miriam 0.8 If we look at the most related names of brothers and sisters of both Esther and Judith, in the top we immediately see a preference for names from the Old Testament. These names are likely member of the same set. CTL colloqium June 2006

Clustering Start with strongly related name-pairs Add new name-pair to existing cluster or start a new cluster Iterative procedure CTL colloqium June 2006

Clustering results 4.013 first names result: 340 name sets Frequency of a pair > 4 result: 340 name sets Limited number of large sets High number of small sets top-25 of sets is most illustrative 2.887 first names 2.64 million children (75%) An iterative clustering procedure was used to establish the membership of a first name to a certain set. We limited the available material to first names from pairs that had a frequency higher than four. That were about four thousand first names. The procedure resulted in 340 different name sets. Some included a large number of names, others only a few. The top-25 of these sets already proved to be very illustrative, they contained almost 2900 first names and had a coverage of 75 % of all 3.5 million children. CTL colloqium June 2006

Features of name sets Period of maximum popularity refine! Language Traditional, Pre-modern (1950-1980), Modern Language Dutch, Frisian, English, American, French, Spanish, Italian, [Arabic, Turkish] Common Western Topic area Nature, History & Culture, Old Testament Length Short (one syllable), long In general terms the name sets could be described using the following features: The period of maximum popularity of the first names: These are traditional names, predominantly used until 1950, names that gained popularity in the fifties and sixties which we called pre-modern names, and names that became popular since then, the modern names. Furthermore, the language is an important factor. We have Dutch, Frisian, English, American, French, Spanish, Italian, Arabic and Turkish names. In this presentation I exclude the Arabic and Turkish first names because they form separate and closed classes. Then there are sets that include names that are quite commonly used in Western Societies, like Mark. There are also sets related to topic areas, such as nature, history & culture and the Old Testament. Finally the length of the name proved to be an important factor. A distinction showed between short names (of one syllable) and longer names. CTL colloqium June 2006

A map of name sets Presentation of a map of name sets Based on mutual relations between name sets The closer two name sets on the map, the more related the sets Since it is impossible to discuss the resulting name sets one by one, I attempted a presentation of name sets in the form of a map. In such a map, the closer the name sets, the more related they are. CTL colloqium June 2006

Short American & English Spanish & Italian Long American & English Short American & English Pre-modern English & French Long names from the Old Testament Names from nature Long names from history and culture Short modern Common Western Pre-modern Common Western Long French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Latin | Dutch Short traditional Dutch Frisian Here is the result. We see … Bold type name sets are the relatively larger ones, while color indicates declining name sets. Colors indicated declining name sets. This is most dramatically for the Traditional Dutch set, which dropped from over 90 to about 10 % of all names in a few generations, the pre-modern sets, but also the American & English sets already had their maximum popularity a few years ago. CTL colloqium June 2006

Dimensions Foreign Long Short Modern Pre-modern Common Western Dutch, Frisian Long Short Modern Pre-modern Traditional The previous map can be schematized with a couple of major dimensions: The language factor: Dutch & Frisian -> Common Western -> Foreign And independent from this one: The time factor: Traditional -> Pre-modern -> Modern The Length factor: short -> long names CTL colloqium June 2006

Spanish & Italian RICARDO Long American & English MICHAEL Short American & English Pre-modern English & French DENNIS KIM Names from the Old Testament DANIËL Names from nature IRIS Names from history and culture LAURENS Short modern TIM Common Western Pre-modern MARK Common Western French Scandinavian NIELS CHARLOTTE Pre-modern Dutch JEROEN Short modern Dutch BART Traditional Dutch JOHANNES | JAN Short traditional Dutch TEUN Frisian JELLE For those of you who want to get a flavor of the first names themselves, here are the name sets again but each with the most frequent first name. CTL colloqium June 2006

Geographical distribution four-digit postal code area level [3584] Big differences between pc areas city quarters villages (religion) Enough children for characterisation On average 1200 births per pc in 20 years Some further name grouping needed We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. CTL colloqium June 2006

Further grouping Traditional names (Latin form) Traditional names (Dutch) Frisian names Pre-modern names (Dutch, Western) Foreign names (English) Short modern names (Dutch, Western, Skand) Names from OT, history, culture, nature Arabic & Turkish names [unrelated group] Other [low frequent] % 8 5 3 12 24 13 7 23 CTL colloqium June 2006

Traditional Latin Dutch Spanish & Italian Long American & English Short American & English Pre-modern English & French Names from the Old Testament Names from nature Names from history and culture Short modern Western Pre-modern Western French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Short traditional Dutch Foreign History & Culture Pre-Modern Short In terms of our previous map this grouping shows as follows Traditional Latin Dutch Frisian CTL colloqium June 2006

Traditional (Dutch) Aaltje Barend Dirkje Evert Geertje Harm Jantje Klaas Margje Teunis CTL colloqium June 2006

Traditional (Latin form) Adriana Bernardus Christina Eduard Elisabeth Franciscus Geertruida Hubertus Johanna Krijn Maria CTL colloqium June 2006

Frisian names Aafke Bauke Douwe Froukje Joppe Jitske Jelle Menno Sietske Onno Wietske Wiebe CTL colloqium June 2006

Pre-modern names (Dutch, Western) Anniek Anita Carla Frank Jochem Jeroen Linda Mark Marloes Paul Suzanne CTL colloqium June 2006

Foreign names (English) Amanda Dennis Danny Chantal Henry Isabella Kim Kevin Melissa Ricardo Samantha Stephen CTL colloqium June 2006

Short names (modern, Dutch, Western, Skand) Anne Bart Eva Gijs Lisa Kaj Niels Sanne Sofie Tim CTL colloqium June 2006

Short names - Religion None Protestant Catholic Religion CTL colloqium June 2006

Old testament history, culture, nature Daniël Esther Judith Naomi Willemijn Diederik Frederieke Maurits Iris Fleur Jasmijn CTL colloqium June 2006

Income Religion Lowest Highest CTL colloqium June 2006

Arabic and Turkish names Fatima Mohamed Noura Hamza Sara Yassin Fatma Mustafa Hatice Mehmet CTL colloqium June 2006

Further geographical analysis Per pc area: percentage of children per name group (8 values) These percentages reflect social composition of the pc area Factor analysis on data from 3584 pc areas 10 typical profiles We now wanted to characterize each postal code area. For this, we computed for each name group the deviation from the grand average for The Netherlands. The most deviating name group (in a positive way) won. CTL colloqium June 2006

10 profiles Traditional – Latin form Traditional – Dutch Transitional, Traditional Dutch to pre-modern Transitional, Traditional Latin form to foreign Pre-modern Foreign Short Elite Arabic-Turkish Frisian CTL colloqium June 2006

Example profile Traditional – Latin form Traditional – Latin form Traditional – Dutch Frisian names Pre-modern names Foreign names Short names Names from OT, history, culture, nature Arabic and Turkish names other % 37 18 1 8 12 6 6 0 12 CTL colloqium June 2006

Naming map of the Netherlands Frisian pre-modern Arab Turkish traditional Dutch elite >foreign traditional Latin short foreign CTL colloqium June 2006

EU constitution votes Education level CTL colloqium June 2006

Educational level Highest Lowest Education level CTL colloqium June 2006

Conclusions Successful data reduction Name groups & subcultures language, income, education, religion Geographic representation four-digit postal code area just right The factor time should be included CTL colloqium June 2006

The Wegener connection Direct marketing company Organises twice a year a national consumer questionnaire 200.000 families per year Wide range of information Income, education level Includes first names and year of birth of all family members CTL colloqium June 2006

Correlation at family level (instead of postal code level) Name set & Income of parents Educational level (of both parents) (newspapers, underwear, cars, insurance, holidays,…..) preferences of parents CTL colloqium June 2006

Mathematical studies Life cycle of a name Zipf’s behavior A few names with high frequency, a lot of names that are unique information function of a name in communication CTL colloqium June 2006

CTL colloqium June 2006

Research dimensions in onomastics Name Form and spelling Origin Motives Time Place YES, we can do great research on this with the full population data! CTL colloqium June 2006

Contact Book: Over voornamen, Het spectrum (2004) E-mail: Gerrit.Bloothooft@let.uu.nl Homepage: www.let.uu.nl/~Gerrit.Bloothooft/personal Mail: Trans 10, 3512 JK Utrecht, The Netherlands CTL colloqium June 2006