Presentation on theme: "The biologists Tine Tammes and Jan Willem Moll and the astronomer Jacobus Kapteyn tried to apply Pearson’s statistics Ida H. Stamhuis Vrije Universiteit."— Presentation transcript:
The biologists Tine Tammes and Jan Willem Moll and the astronomer Jacobus Kapteyn tried to apply Pearson’s statistics Ida H. Stamhuis Vrije Universiteit Amsterdam
From 1983 I became educated as a historian of science 1989 PhD: Statistics in the Netherlands in the 19 th century. See also STAtOR 2002 and 2003 for three short articles About statistics in Dutch medicine and biology: Statistical Mind 1850-1940, Vol. II (Stamhuis, Klep and Maarseveen eds)
Two of the statements of today deal with the past –Statisticians should know the history of their own field; too many wheels are reinvented over and over again –“For evident reasons a theory for the benefit of biologists would be best worked out by a biologist. If he cannot do it, because he is but a poor statistician, the next best thing – still not approximately good – would be to have the work entrusted to a biologist working in close cooperation with an expert statistician. Almost the worst possible thing will be to put the task wholly on a regular statistician.” Kapteyn (1916)
Story about three scientists in Groningen, a case of collaboration between scientists (and statisticians?) with the aim to apply statistics in biology –Johannes Kapteyn (1851-1922) astronomer (statistician?) –Jan Willem Moll (1851-1933) botanist –Tine Tammes (1871-1947) geneticist How can we use this story to give substance to the two statements? For panel discussion
Small community: In 1914, ‘Senaat’,consisted of 43 members, met regularly; Faculty of Science and Mathematics: much smaller number Kapteyn and Moll: Ossenmarkt Tammes: Oranjesingel. Botanical Institute at Rozenstraat
Johannes Kapteyn (1851 – 1922) Professor in Groningen 1878 (27 years old) No astronomical observatory Photographic plates from Cape Town 1900: 345,000 stars recorded in catalogue (3 vol.)
More than a million stars were known: too many to study each of these Kapteyn developed ‘Plan of Selected Area’s’ Agreed with colleagues to concentrate on 206 stellar area’s ‘uniformly demarcated over the sky’ Kapteyn developed statistical methods
Jan Willem Moll (1851 – 1933) Studied botany in Amsterdam Became professor of botany in Groningen in 1890 1899 Botanical Laboratory at the Rozenstraat The famous Hugo de Vries was his friend and colleague
From 1894 Hugo de Vries started to apply statistical methods which he learnt from Adolphe Quetelet (1796-1874), Francis Galton (1822- 1911) and Walter Weldon (1860-1906) (This resulted in the Rediscovery of the Mendelian laws) He wrote Moll about it. Moll’s reaction “I browsed through Galton, but don't understand it yet. It takes me a surprising effort to imagine these things, especially when something new is introduced”
1890 Tine Tammes entered botanical institute 1892 teacher’s certificate in physics, chemistry, cosmography 1897 in botany, zoology, mineralogy, geology In 1897 became Moll’s assistant 1911 honorary doctor 1919 extraordinary professor: in the Netherlands the first in genetics, the second female professor
1902: Kapteyn was asked by ‘several botanical students’ and other interested persons to explain (in the Botanical Institute) the statistical methods of Quetelet, Galton, and Pearson ‘in a popular way’. Moll will have initiated this and Tammes will have been one of the ‘students’.
What happened? Kapteyn started to study; and discovered Karl Pearson’s: Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material of 1895 He published in 1903: Skew Frequency Curves in Biology and Statistics Controversy between the two scholars: see paper ISR (by Stamhuis and Seneta)
Kapteyn 1903 – Pearson 1905 – Kapteyn 1906 – Pearson 1906 [– Kapteyn (and Van Uven) 1916] 'mathematical statistics' plus examples about skew frequency curves Accusations: Inappropriate starting points, plagiarism, serious mathematical mistakes
The astronomer David Gill did not encourage his collaborator Kapteyn: “Why a man of special astronomical gifts, like yourself should waste his days in abstract mathematical work which so many men are capable of working at –- whilst there are so few to do what you can so well do -- I don’t know. After all what is the value or interest of a frequency curve compared with the structure of the universe?”
How successful was Kapteyn in his mediating role? Moll's skew curve machine: “those who are familiar with Galton's apparatus will readily understand the present one” “The effect of the various causes is strictly proportional to the absolute dimensions” ripe berries: diameters normal; cubes skew
How successful was Kapteyn in his mediating role? Let us look at Tine Tammes' efforts to use statistics She was known and admired by her colleagues through her inclination towards statistics What did she grasp and apply?
Publications Tammes applying statistics (1904). On the influence of nutrition on the fluctuating variability of some plants, (1907). Der Flachsstengel. Eine statistisch-anatomische Monographie, (1911). Das Verhalten fluktuierend variierender Merkmale bei der Bastardierung (1914). Die statistische Methode bei der Beschreibung von Nahrungsmitteln, insbesondere von Stärke. Onzième Congrès Internationale de Pharmacie, 1108-1114. (1915). Die genotypische Zusammensetzung einiger Varietäten derselben Art und ihr genetischer Zusammenhang.
1904; nutrition of plants She expressed herself about the importance of statistics: “The introduction of the statistical method (...) into botany has enabled us to formulate more sharply the formerly vague and insufficiently defined question of the influence of nutrition and also to interpret the results obtained easily and accurately.” She referred to Kapteyn: “300 measurements or countings, a number which, according to the calculations of Prof. Kapteyn, gives in investigations of this kind a sufficient guarantee of accuracy.”
Median M instead of the arithmetical mean, and the variability coefficient Q/M, instead of the standard deviation. Q is the difference between the third and the first quartile. She introduced a so-called sensibility coefficient for the median as well as for the variability. The sensibility coefficient of the median was defined as M 1 – M 2 / M 1 with M 1 representing the median of the results of good nutrition and M 2 the median of the results of bad nutrition.
Moll later explicitly referred to her introduction of this coefficient when he argued that she was worthy to become an honorary doctor and later an extraordinary professor. For botanists the definition of such a concept was apparently something exceptional and admirable.
Der Flachsstengel, eine statistisch-anatomische Monographie, 1907 285 pages She introduced the Pearson correlation coefficient. She needed 10 pages to explain what was for the readership something new: “only recently correlation phenomena have begun to be closely studied” She composed a correlation table. She did not calculate the correlation coefficient directly, but used a formula, again made with the help of Kapteyn, so that the median played a standard role
Das Verhalten fluktuierend variierender Merkmale bei der Bastardierung 1911 Hybridizations of four flax varieties and various quantitative characters: seed length and width, flower colour and others. Especially seed length produced nice results.
Mendel’s Laws for Quantitative Characters: multiple factor hypothesis No dominance and recessiveness AA aa Aa AAx aa becomesAa self pollination AA aa Aa Everything from AA aaAA AAuntil aa for exampleaa (3A 5a) ~ B (8, ½) AA aaaa AA aaaA
This work was ground braking from the viewpoint of genetics: continuous characters in the 'Mendelian domain‘ compared to her colleagues she took more characteristics of the relevant distribution into account: not only the chance of extreme outcomes, but also the shape of the corresponding Bernoulli distribution As a result she could draw well-argued conclusions on the number of factors that contributed to the value of a particular character She was apparently more familiar with this kind of probabilistic reasoning than the others working on the same problem.
Die statistische Methode bei der Beschreibung von Nahrungsmitteln, insbesondere von Stärke (1914) in a volume of an international pharmaceutical congress. “In a statistical investigation it is very important that the necessary material is randomly chosen. When this aspect is not given sufficient attention, the whole investigation is without meaning.’ She explained the choice of the use of M and Q/M. She took on the role of a mediator to make statistical methods better known in the pharmaceutical world.
Die genotypische Zusammensetzung einiger Varietäten derselben Art und ihr genetischer Zusammenhang (1914) Several Mendelian crossings Observed and expected numbers in several classes were compared. Like her colleagues in Mendelian genetics, she did not apply a formal test, like the Chi square test, to conclude if empirical outcomes and theoretical expectations were sufficiently in accordance with each other
Concluding remarks At first sight Kapteyn’s efforts not very successful. Nothing in the three extensive papers on skew frequency curves was used by Tammes. The character of the methods Tammes used was different from Kapteyn’s: the argumentations of Kapteyn were essentially mathematical. The methods Tammes used were, from a mathematical viewpoint, rather elementary. Nevertheless it is obvious that for her the application of statistical concepts was essential, that she was admired by her colleagues and attained important results. It also became clear that Kapteyn played a decisive role. In this sense Kapteyn was succesful in his role of mediator.