The three Bibliometric ‘Laws’ – plus one

Slides:



Advertisements
Similar presentations
Chapter 2: Frequency Distributions
Advertisements

Table of Contents Exit Appendix Behavioral Statistics.
BIBLIOMETRICS Presented by Asha. P Research Scholar DOS in Library and Information Science Research supervisor Dr.Y.Venkatesha Associate professor DOS.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
J-Shaped Distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Measures of Central Tendency U. K. BAJPAI K. V. PITAMPURA.
Boolean, bibliometrics, and beyond
Chapter 1: Introduction to Statistics
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
R. LAKSHMI MLIS – Final Year Student.  Publication productivity of any literature is the index to know the growth of that science. Measuring the publication.
Correlation1.  The variance of a variable X provides information on the variability of X.  The covariance of two variables X and Y provides information.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Basic Principle of Statistics: Rare Event Rule If, under a given assumption,
Bibliometric research methods Faculty Brown Bag IUPUI Cassidy R. Sugimoto.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Physics 114: Exam 2 Review Lectures 11-16
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Assumes that events are governed by some lawful order
1 Statistical Properties for Text Rong Jin. 2 Statistical Properties of Text  How is the frequency of different words distributed?  How fast does vocabulary.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Lotkaian Informetrics and applications to social networks L. Egghe Chief Librarian Hasselt University Professor Antwerp University Editor-in-Chief “Journal.
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Fundamental Concepts of Algebra
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
1 Frequency Distributions. 2 After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get.
Statistical Properties of Text
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Publication Pattern of CA-A Cancer Journal for Clinician Hsin Chen 1 *, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School of Public Health, Taipei Medical University.
Theoretical distributions: the Normal distribution.
The normal distribution
Descriptive Statistics: Tabular and Graphical Methods
Statistical analysis.
 APPLICATION OF BRADFORD LAW AND LEIMKUHLER MODEL ON JOURNAL OF SOIL BIOLOGY AND ECOLOGY (JSBE) : A COMPARATIVE STUDY Dr. Sangeeta Paliwal Associate.
Chapter 2: Methods for Describing Data Sets
Topic 3: Measures of central tendency, dispersion and shape
Statistical analysis.
Chapter 5 Probability 5.2 Random Variables 5.3 Binomial Distribution
Measures of Central Tendency: Mean, Mode, Median
Banaras Hindu University
Martin Rajman, Martin Vesely
Univariate Descriptive Statistics
Module 6: Presenting Data: Graphs and Charts
From Randomness to Probability
From Randomness to Probability
Basic Statistical Terms
Numerical Measures: Skewness and Location
Exponential & Logarithmic Equations
Descriptive and inferential statistics. Confidence interval
The Scientific Method.
The normal distribution
Information Science in International Perspective
Information Science in International Perspective
Peter Ingwersen, IVA, Denmark University College, Oslo, Norway 2011
Sampling Distributions
The Long Tail Challenge
Analysis and Interpretation of Experimental Findings
Exponential & Logarithmic Equations
Section 11-1 Review and Preview
Exponential & Logarithmic Equations
From Randomness to Probability
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

The three Bibliometric ‘Laws’ – plus one Peter Ingwersen Royal School of LIS, DK Oslo University College, NO

Three Bibliometric “laws” Lotka’s law (1926): www.lis.uiuc.edu/~jdownie/biblio/lotka.html Productivity of authors (researchers); how many researchers have written 1, 2, 3… articles? Frequency distribution (probability) Bradford’s law (1934): www.lis.uiuc.edu/~jdownie/biblio/bradford.html Scatter of the articles of a given subject over journals equal-production groups Zipf’s lov (1949): www.lis.uiuc.edu/~jdownie/biblio/zipf.html The frequency of words in text Rank/Frequency distribution Ingwersen 2011

Lotka’s law Frequency distribution of scientific production Lotka’s law of authorship describes the publication frequencies for authors within a given domain: " . . . the number (of authors) making n contributions is about 1/n² of those making one; … and the proportion of all contributors, that make a single contribution, is about 60 percent" (Lotka, 1926) Ingwersen 2011

Lotka’s law f(n) = C / n2 - where f(n) denotes the number of authors within a domain who, in a given period, have published n articles C is a constant and denotes the number of researchers that have produced 1 article The number of researchers that have published n articles is proportional with 1 / n2 – that is … Ingwersen 2011

Lotka’s law If 100 researchers have published one article each within a given domain then Lotka’s law predicts that: 100 / 22 = 25 researchers, will have published 2 articles, 100 / 32 = 11 researchers, will have published 3 articles, and 100 / 72 = 2 researchers will have published 7 articles … Ingwersen 2011

Lotka further This van be illustrated (and verified/falsified) by means of Dialog Rank command In large corpora and over a suitable time period (domain dependent) Lotka’s Law is quite accurate, but not (necessarily) in a statistical manner. Ingwersen 2011

Bradford’s law Bradford’s law of scattering is often used as a guideline to determine the number of core journals within a given subject Most of the articles on a given subject have a tendency to be bunched together in a small number of journals The rest tends to scattered over a large number of journals Ingwersen 2011

Bradford’s law (verbal) Two different wordings: a verbal and a graphical “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the number of periodicals in the nucleus and succeeding zones will be as 1 : n : n2” (S. C. Bradford, 1934) Ingwersen 2011

Bradford’s law Bradford’s law is usually presented in equal-production groups form, with this special procedure: rank the journal-titles in order of their productivity (i.e. in order of the frequency with which they publish articles on the specified subject) divide the journal-titles into m groups or zones, where each zone produces 1/m of the total number of articles then the proportion of journals in each zone will be definable by the progression n0:n1:n2 ..., where n is the Bradford multiplier  1 : n : n2 Ingwersen 2011

Bradford’s law e.g. m = 3 zones: top 8 journals produce 110 articles next 29 journals produce 133 articles next 127 journals produce 152 articles number of journals in each zone follows the progression 8:29:127 ... which is roughly equal to the progression 1:4:17 ... which is roughly equal to 40:41:42, so n = 4 (8x1 : 8x4 : 8x16) Ingwersen 2011

Bradford’s law The mathematical proportion between the number of journals in the core and in zone 2 is a constant = n (Bradford’s multiplier); and to zone 2 n² Bradford expressed this relationship as 1:n:n², but he never defined the law mathematically – n was set at approx. 5 Simple formulation: an 80/20 rule, that is, 20% of the journals in the bibliography covers 80% of the articles in the bibliography on a given subject over a given time period Ingwersen 2011

a is defined arbitrarily Bradford’s law Example: Bradford’s geophysical data: The number of articles should ideally be the same in the three zones Bradford’s law is not statistically exact, but is very widespread as a rule-of-thumb a is defined arbitrarily Ingwersen 2011

Bradford’s law 1, n, n2 = 1, 5, 25 a Journ. Articles #Jn #Art 9 journals/ 429 articles 9 journals = 1/3 of articles 258 journals/ 404 articles 5*5*9 = 225 journals = 1/3 of articles 59 journals/ 499 articles 5*9 = 45 journals = 1/3 of articles 1, n, n2 = 1, 5, 25 a Journ. Articles #Jn #Art Ingwersen 2011

Bradford’s law (graphical)  Groos’ drop Zone 3 Linear scale of cumulative number of relevant articles Zone 2 Zone 1 Ingwersen 2011 Logarithmic scale of cumulative number of journals

Bradford’s law Requirements for the bibliography if Bradford’s law is to apply: Its subject must be well-defined It must be complete, that is cover all relevant journals and articles Only a limited number of years should be examined so that all journals have the same chance of contributing with articles Groos’ drop: may reflect that the bibliography is not complete, or that the data set is flawed – dirty (duplicates)? Ingwersen 2011

Bradford’s ’Law’, graphic version – Dialog search Cumulated articles No. of journals (log) Eget eksempel Ingwersen 2011

… plus Garfield’s ’Law of Concentration’, 1971 Instead of ’bibliometric scatter’ Garfield suggested ’bibliometric concentration’ – rather an axiom: Each discipline has a small core of journals covering a large portion of articles (Bradford) The ’long tail’ (”comet tail”) of each discuipline consists of the cores of other disciplines! Hence, the limited no. of journals in SCI Ingwersen 2011

Zipf’s law Zipf’s law is used to predict word frequencies in full text The law states that if you (in a relatively large text corpus): “…list the words occurring within that text in order of decreasing frequency, the rank of a word on that list multiplied by its frequency will equal a constant.” (Zipf, 1949) r x f = c, where r is the rank number and f the frequency of each term (happens to describe the linear part of a Bradford curve) Nor Zipf’s law is statistically exact, but it is extremely valuable for term weighting purposes in search engines with automatic indexing Ingwersen 2011

Zipf’s law Ingwersen 2011

Zipf’s law (linear and logarithmic) Ingwersen 2011