Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bibliometric [scientometric, webometric, informetric …] Searching Data used for assessing impact of scholarly output Ying Sun.

Similar presentations

Presentation on theme: "Bibliometric [scientometric, webometric, informetric …] Searching Data used for assessing impact of scholarly output Ying Sun."— Presentation transcript:

1 Bibliometric [scientometric, webometric, informetric …] Searching Data used for assessing impact of scholarly output Ying Sun

2 Central idea Use of quantitative methods – statistics – to study & characterize recorded communication - ‘literature’ - of all kinds In order to: –describe research output with various indicators & distributions –use in evaluating scholarly scientific performance New tools: increased & changed significantly role of searching & searchers Dr. Ying Sun2Digital Information Retrieval

3 ToC 1.Goals, definitions 2.Reasons, applications – why? 3.Data sources for bibliometric analyses 4.Methods & measures – how? 5.A sample of examples 6.Implications for searching. Caveats Dr. Ying Sun3Digital Information Retrieval

4 1. GOALS, DEFINITIONS Bibliometrics, scientometrics, webometrics … Dr. Ying Sun4Digital Information Retrieval

5 Metric Studies Applied in many fields: Sociometrics, Econometrics, Biometrics … –deal with statistical properties, relations, & principles of a variety of entities in their domain Metric studies in information science follow these by concentrating on statistical properties & the discovery of associated relations & principles of information objects, structures, & processes Dr. Ying Sun5Digital Information Retrieval

6 Goals of Metric Studies To characterize entities under study statistically –more ambitiously to discover regularities & relations in their distributions & dynamics in order to observe predictive regularities & formulate laws describe numerically, predict, apply Same in information science –portray statistically entities under study: literature, documents, … all kinds of inf. objects & processes as related to science, institutions, the Web … but also people – authors more recently: also scholarly productivity Dr. Ying Sun6Digital Information Retrieval

7 Definitions Bibliometrics –“...the application of mathematical and statistical methods to books and other media of communication.” Alan Pritchard (1969) –“… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it.” Robert Fairthorne (1969) Dr. Ying Sun7Digital Information Retrieval

8 Definitions but with differing contexts Scientometrics bibliometric & other metric studies specifically concentrating on science Informetrics study of the quantitative aspects of information in any form - broadest Webometrics quantitative analysis of web-related phenomena Cybermetrics quantitative aspects of information resources on the whole Internet E-metrics measures of electronic resources, particularly in libraries For simplicity, we will use here bibliometrics to cover all Dr. Ying Sun8Digital Information Retrieval

9 2. BASE, REASONS, USE Why? What? What for? Dr. Ying Sun9Digital Information Retrieval

10 Based on what entities have & could be COUNTED In documents (as entities): –authors –their institutions, countries –sources – e.g. journals –references – who & what is cited –age of references –anything else that is countable In Web entities –identifying relationships between Web objects –link structures out-links in-links self-links nodes, central nodes in a way analogous to citations And derivation of structures based on any of these Dr. Ying Sun10Digital Information Retrieval

11 A lot is based on citations Citation analysis: –analysis of data derived from references cited in footnotes or bibliographies of scholarly publications Used to be just counts Now it also leads to examination & mapping of intellectual impact of scholars, projects, institutions, journals, disciplines, and nations Becoming increasingly popular & widely used – with important implications for searching Dr. Ying Sun11Digital Information Retrieval

12 Reasons for bibliometric studies Understanding of patterns –discovery of regularities, behavior –“order out of documentary chaos” [Bradford, 1948] Analysis of structures & dynamics –discovery of connections, relations, networks –search for regularities - possible predictions Discovery of impacts, effects –relation between entities & amounts of their various uses –providing support for making of decisions, policies Dr. Ying Sun12Digital Information Retrieval

13 Major branches of bibliometrics Relational Older - patterns, structures, relations, mappings –where bibliometrics started Data on what was observed –e.g. no. of articles/citations by/to an author; no. of journals with articles relevant to a topic; no. of articles/citations in/to a journal … Used for description, mapping of relations & prediction Evaluative Newer – impacts, effects –where bibliometrics became a big deal in many arenas Data from what was observed but looking for –measures of impact, prominence, ranking, bang Discovers who’s up & how much up Used for decisions, policies Dr. Ying Sun13Digital Information Retrieval

14 Seeking … Thelwall (2008) Relational Relational bibliometrics seeks to illuminate relationships within research, such as the cognitive structure of research fields, the emergence of new research fronts, or national and international co-authorship patterns Evaluative Evaluative bibliometrics seeks to assess the impact of scholarly work, usually to compare the relative scientific contributions of two or more individuals or groups Dr. Ying Sun14Digital Information Retrieval

15 Major Approaches Empirical Collection & study of data –establishment of measures –statistical & graphic analyses We will pursue some of these here –concentrate on empirical Theoretical Building of generalized models, theories –often mathematical, abstract –becoming highly specialized We will NOT pursue this here –but you should be aware that there are a lot of theoretical efforts Dr. Ying Sun15Digital Information Retrieval

16 Users Relational Mostly scholars Mostly research oriented But also librarians for decisions –e.g. on collections, purchase, weeding Evaluative – new audience Library managers Analysts University administrators (deans, provosts) Directors of institutional research National governments & ministries Grant & funding agencies Dr. Ying Sun16Digital Information Retrieval

17 Used in a Variety of Functions & Areas In collection development identifying the most-useful materials: by analyzing circulation records; journal / e-journal usage statistics; etc. In information retrieval identifying top-ranked documents, authors: those most highly-cited; most highly co-cited; most popular; etc. In the sociology of knowledge identifying structural and temporal relationships between documents, authors, research areas, universities etc. In policy making justifying, managing or prioritizing support for course of action in a number of areas – e.g. science policy, institutional policy Dr. Ying Sun17Digital Information Retrieval

18 Use of Evaluative Bibliometrics Academic, research & government institutions for : –promotion and tenure, hiring, salary raising –decisions for support of departments, disciplines –grants decision; research policy making –visualization of scholarly networks, identifying key contributions & contributors –monitoring scholarly developments –determining journal citation impact Resource allocation: –identifying authors most worthy of support; –research areas most worthy of funding –journals most worthy of support or purchase; etc. Dr. Ying Sun18Digital Information Retrieval

19 Major bibliometric factors for evaluation of academic performance For individuals Number of publications in peer reviewed journals The impact factor of those journals The h-index For institutions Total no. of publications Total no. of citations Various ratios - per faculty, project … Dr. Ying Sun19Digital Information Retrieval

20 Impact Indicators and Studies Several governments mandate citation analysis to –asses quality of research and institutions –inform decisions on support – determine support for journal –rank institutions, programs, departments, projects Many institutions practice it regulalry Dr. Ying Sun20Digital Information Retrieval

21 3. DATA SOURCES FOR BIBLIOMETRIC ANALYSES Where does stuff for analysis come from? Dr. Ying Sun21Digital Information Retrieval

22 Main Sources for Bibliometric Analyses Bibliographies, indexes –once popular, not any more –once done manually - limited Documents in databases –computerization enabled wide collection of data & development of new methods Science statistics And then there are citations –as they become automated use of bibliometrics exploded Web & Internet –mining connections & other networked aspects –but also applying some older methods to new data Dr. Ying Sun22Digital Information Retrieval

23 Institute for Scientific Information (ISI, now Thomson Reuters) ISI launched in 1962 by Eugene GarfieldEugene Garfield –started by publishing Science Citation Index (SCI) & later Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI) [all still in Dialog] –these morphed into Web of Science (WoS) All only cover an ISI selected set of journals –thus all citation results & studies are based on that set of journals, not the universe of journals and books, but the citations themselves are to whatever is cited –true of any database – Scopus, Google Scholar etc. Dr. Ying Sun23Digital Information Retrieval

24 Impact of ISI Citation Databases Major source for bibliometric analysis Revolutionized use of citations –e.g. easy citation counts, tracing, establishment of connections … became possible Provided data for new types of analysis –e.g. mapping of fields, identifying research fronts Laid base for evaluative bibliometrics Instigated new types of searching –above & beyond subject searching Dr. Ying Sun24Digital Information Retrieval

25 Expansion of Citation Data Sources Starting in early 2000s citation data are being offered by a number of databases other than Web of Science, most notably –Scopus –Google Scholar and a lot of others This expanded dramatically availability of data & types of analyses –a number of innovations were introduced –use of such data also expanded Challenge to WoS databases Dr. Ying Sun25Digital Information Retrieval

26 Connections Data from relational bibliometrics is used for sorting, ranking, mapping … in evaluative bibliometrics Raw data obtained from relational analyses is then “milked” in many ways –often combined with other data e.g. ranked citation counts and financial data, enrollment data … Dr. Ying Sun26Digital Information Retrieval

27 4. METHODS & MEASURES – HOW? Dr. Ying Sun27Digital Information Retrieval

28 Overview A few older bibliometric laws & methods: –Lotka’s law deals with distribution of authors in a field –Bradford’s law deals with distribution of articles relevant to a subject across journals where they appear From citations: –citation age (or obsolescence) –co-citation –clustering & co-citation maps –bibliographic coupling – journal impact factor –self citation (auto-citation) –& many more. Dr. Ying Sun28Digital Information Retrieval

29 Lotka’s law Lotka’s law (1926) – papers & authors Alfred Lotka ( , American mathematician, chemist and statistician) Alfred Lotka Formal Number of authors who had published n papers in a given field is roughly 1/n a the number of authors who had published one paper only Plainly A large proportion of the total literature in a field is authored by a small proportion of the total number of authors, falling down regularly, where the majority of authors produce but one paper e.g. for 100 authors, who on average each wrote one article each over a specific period, we have also 25 authors with 2 articles (100/2 2 =25), 11 with 3 articles (100/3 2 ≈ 11), 6 with 4 articles (100/4 2 ≈ 6) etc. Dr. Ying Sun29Digital Information Retrieval

30 Bradford’s law Bradford’s law (1934) – papers & journals Samuel C. Bradford ( , British mathematician and librarian) Formal If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as n : n 2 : n 3 Plainly Basically states that most articles in a subject are produced by few journals (called nucleus) and the rest are made up of many separate sources that increase in numbers in a regular, exponential way Like Lotka’s law this is a law that generally follows laws of diminishing returns Dr. Ying Sun30Digital Information Retrieval

31 Bradford’s Law: How he did it? He grouped periodicals with articles relevant to a subject (from a bibliography) into 3 zones in order of decreasing yield –from journals with largest no. of articles to those with smallest; at the end are journals with one article each on the subject Each zone had the SAME number of articles but different no. of journals The number of journals in each zone increases exponentially –e.g. suppose that a researcher has five core scientific journals for his or her subject. Suppose that in a month there are 12 articles of interest in those journals. Suppose further that in order to find another dozen articles of interest, the researcher would have to go to an additional 10 journals. Then that researcher's Bradford multiplier b m is 2 (i.e. 10/5). For each new dozen articles, that researcher will need to look in b m times as many journals. After looking in 5, 10, 20, 40, etc. Dr. Ying Sun31Digital Information Retrieval

32 Cited Half-life Formal Definition: the number of years that the number of citations take to decline to 50% of its current total value Plainly How far back in time one must go to account for one half of the citations a journal receives in a given year –e.g. if in 2008 the journal XYZ has a cited half life of 7.0, it means that articles published in XYZ between 2002 to 2008 (inclusive) account for 50% of all citations to articles from that journal (anyplace) in 2008 Dr. Ying Sun32Digital Information Retrieval

33 Citing Half-life Formal Definition: the median age of all cited articles in the journal during a given year Plainly A measure of how current (or how old) are the references cited in a journal –e.g. if in 2008 for journal XYZ citing half life was 9.0 it means that 50% of articles cited (references) in XYZ were published between years 2000 and 2008 (inclusive) Dr. Ying Sun33Digital Information Retrieval

34 Co-Citation a popular similarity measure between two entities Formal The frequency with which two items of earlier literature are cited together by the later literature 1.frequency with which two documents are cited together, or 2.frequency with which two authors are cited together irrespective of what document Plainly As of 2.: How often are two authors cited together If author A and B are both cited by C, they may be said to be related to one another, even though they don’t directly reference each other –if A and B are both cited by many other articles, they have a stronger relationship. The more items they are cited by, the stronger their relationship is Dr. Ying Sun34Digital Information Retrieval

35 Use of Co-Citation Co-citation is often used as a measure of similarity –if authors or documents are co-cited they are likely to be similar in some way This means that if collections of documents are arranged according to their co-citation counts then this should produce a pattern reflecting cognitive scientific relationships Author co-citation analysis (ACA) is a technique in that it measures the similarity of pairs of authors through the frequency with which their work is co-cited These are then arranged in maps showing a structure of an field, domain, area of research … Dr. Ying Sun35Digital Information Retrieval

36 Map of Author Co-citation Analysis of information science Zhao & Strotmann (2008 ) Dr. Ying Sun36Digital Information Retrieval

37 Bibliographic Coupling Formal Links two items that reference the same items, so that if A and B both reference C, they may be said to be related, even though they don't directly reference each other. The more items they both reference in common, the stronger their relationship is It is backward chaining, while co-citation is forward chaining Plainly Occurs when two works reference a common third work in their bibliographies e.g. If in one article S cites K & in another article B cites K., but neither S or B cite each other in those articles then S & B are bibliographically coupled because they cite K. Dr. Ying Sun37Digital Information Retrieval

38 Journal Impact Factor in Journal Citation Reports (JCR)Journal Citation Reports Formal The average number of times articles from the journal published in the past two years have been cited in the JCR year. The number of citations published in the year X to articles in the journal published in years X − 1 and X − 2, divided by the number of articles published in the journal in the years X − 1 and X − 2. Plainly Measures how often articles in a specific journal have been cited –a Journal Impact Factor for journal XYZ of 2.5 means that, on average, the articles published in XYZ one or two year ago have been cited two and a half times How to use Journal Citation ReportsJournal Citation Reports Dr. Ying Sun38Digital Information Retrieval

39 H-Index - Hirsch (2005) Formal For a scientist, is the largest number h such that s/he has at least h publications cited at least h times & the other publications have less citations each –it is more than a straight citation count because it takes into account BOTH: number of publications one had AND number of citations one received Plainly Number of papers a scientist has published that received the same number of citations An example (Dr. Tefko Saracevic as listed in Web of Science): –115 articles –of these 23 were cited at least 23 times –others were cited less –Dr. Saracevic h-index is 23 Dr. Ying Sun39Digital Information Retrieval

40 h-index Differences There are differences in typical h values in different fields, determined in part by –the average number of references in a paper in the field –the average number of papers produced by each scientist in the field –the size (number of scientists) of the field Thus, comparison of h-indexes of scientists in different fields may not be valid Keep it to the same field! –e.g. h indices in biological sciences tend to be higher than in physics Dr. Ying Sun40Digital Information Retrieval

41 Citation Frequency: Citations are Skewed Research Front A few articles are cited a lot, others less, a lot very little or not at all –80-20 distribution: 20% of articles may account for 80% of the citations –from , about one half of one percent of cited papers were cited over 200 times. Out of about 38 million source items about half were not cited at all. ( Garfield, 2005 ) Garfield, 2005 This led to identifying of a “research front” –cluster of highly cited papers in a domain –showing also links among the highly cited papers in form of maps indicating what papers are frequently cited together i.e. co-citated For searchers: identifying current & evolving research fronts in a domain Dr. Ying Sun41Digital Information Retrieval

42 5. A SAMPLE OF EXAMPLES Dr. Ying Sun42Digital Information Retrieval

43 Web of Science Citation Report for an Author Dr. Ying Sun43Digital Information Retrieval

44 Web of Science Citation Report for an Author Dr. Ying Sun44Digital Information Retrieval

45 Web of Science Journal Citation Report Dr. Ying Sun45Digital Information Retrieval

46 Scopus citation tracking for an author Dr. Ying Sun46Digital Information Retrieval

47 Scopus journal analyzer Dr. Ying Sun47Digital Information Retrieval

48 Histogram for JASIST Histogram for JASIST using Garfield's HistCiteHistCite LCS= Local Citation Score; count of how much cited in JASIST GCS=Global Citation Score; count of how much cited in all journals in WoS LCR=Local Cited References; how many references from JASIST NCR=Number of Cited References; how many references in the paper Dr. Ying Sun48Digital Information Retrieval

49 WoS: Essential Science IndicatorsEssential Science Indicators 49Digital Information Retrieval

50 WoS: IncitesIncites Dr. Ying Sun50Digital Information Retrieval

51 SCImago Journal & Country Rank (SJR) SCImago Journal & Country Rank (SJR) a great resource – from Spain Dr. Ying Sun51Digital Information Retrieval

52 SJR Journal Analysis for Information Processing & Management Dr. Ying Sun52Digital Information Retrieval

53 SJR Country Indicators Dr. Ying Sun53Digital Information Retrieval

54 University Rankings Times Higher Education ranking: QS World University Rankings - Top 400 Universities rankings/2010 rankings/2010 Shanghai ranking: Academic Ranking of World Universities - Shanghai Jiao Tong University –Miscellaneous Information on University Rankings Leiden ranking: Top 100 & 250 universities, Europe & world- Centre for Science and Technology Studies (CWTS), Leiden University, Netherlands ranking-2010-cwts.html ranking-2010-cwts.html Dr. Ying Sun54Digital Information Retrieval

55 6. IMPLICATIONS FOR SEARCHING. CAVEATS What to watch for? Ethical issue as well Dr. Ying Sun55Digital Information Retrieval

56 Role of Searchers Relational bibliometric searching Older: Connected with subject searches –adding dimension of authors, sources … Performing citation analyses –e.g. identifying key papers, authors, sources –citation pearl growing Evaluative bibliometric searching Newer - higher responsibility: Called to perform searches related to bibliometric indicators of impact –often by administrators, decision makers, policy makers, managers e.g. for tenure & promotion; resource allocation; grants; purchase decisions; justification … Dr. Ying Sun56Digital Information Retrieval

57 Implication for searching because of scatter Journals & articles are scattered, so are authors –many articles are in core journals – easy to find –BUT: a number of relevant articles will be scattered throughout other journals –These need to be found not to miss relevant articles in non-core journals High precision searching concentrates on top producing journals and authors in a subject High recall searching includes the long tail of authors and journals –but the long tail could be very long need to know when to stop Key: Adjusting effectiveness & efficiency of searching to laws of diminishing returns Dr. Ying Sun57Digital Information Retrieval

58 Caveats for Citations Citation rates & practices differ greatly among fields –citation & publication practices are NOT homogenous within specialties and fields of science (Leydesdorff, 2008) The context could be negative A citation may not be relevant to the work The second, third … author may not be cited at all Matthew effect (rich get richer) or success-breads- success mechanism works in citations –already well-known individuals receive disproportionately high rate of citation Self citation practices & citation padding –author citing him/herself; journal articles citing their own journal Dr. Ying Sun58Digital Information Retrieval

59 Caveat for Author & Citation Disambiguation Distinguishing Saracevic, T. from other authors is not hard – to zero in on that one author –Belkin, N. is harder; Kantor, P still harder, Sun, Y. almost impossible –thus, VERY careful disambiguation is necessary sometimes very time consuming; sometimes never sure Citations in articles are often messy & careless –e.g. my name while being cited was misspelled in many creative ways –no corrections are made by databases –thus, variations have to be explored to be included in citation counts Dr. Ying Sun59Digital Information Retrieval

60 Caveats for h-index - (Hirsch, 2005) “Obviously, a single number can never give more than a rough approximation to an individual’s multifaceted profile, and many other factors should be considered in combination in evaluating an individual.” “Furthermore, the fact that there can always be exceptions to rules should be kept in mind, especially in life-changing decisions such as the granting or denying of tenure.” Dr. Ying Sun60Digital Information Retrieval

61 Caveat for Webometrics & Web Sources Thelwall (2008) Web data is not quality controlled –caveat emptor (search for what it means) Web data is not standardized –e.g. there does not seem to be a simple way to separate out web citations in online journal articles from those in online course reading lists It can be impossible to find the publication date of a web page –results typically combine new and old web pages Web data is incomplete in several senses and in arbitrary ways Dr. Ying Sun61Digital Information Retrieval

62 Caveat for Journal Impact Factor (JIF) Assumption: journals with higher JIFs tend to publish higher impact research & hence tend to be better regarded. But: –JIFs vary greatly from field to field, because citation practices differ greatly –even within discrete subject fields, ranking journals based upon JIFs is problematic – it is but one measure, other characteristics are important –because of popularity journal citations misused: recommendations to authors to cite other articles in a given journal to improve its JIF Dr. Ying Sun62Digital Information Retrieval

63 Caveat for Coverage differences can be substantial Different databases cover different articles, citations, handle them differently … –there is no one answer to: “How many citations did X receive?” For the same author (institution …) different databases will provide different –no. of articles, citations; h-index; … overlap may not be great –in citations there are even ghost citations (listed as citing an article but there is no actual citation in the article) Careful comparisons & use of multiple databses are necessary A whole literature on these inconsistencies emerged –one of the frequent analyzers is Peter Jasco, U of HawaiiPeter Jasco Dr. Ying Sun63Digital Information Retrieval

64 Acknowledge To Dr. Tefko Saracevic This lecture was created based on his work Dr. Ying SunDigital Information Retrieval64

Download ppt "Bibliometric [scientometric, webometric, informetric …] Searching Data used for assessing impact of scholarly output Ying Sun."

Similar presentations

Ads by Google