Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen

Similar presentations


Presentation on theme: "Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen"— Presentation transcript:

1 Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen
Citation Indexing Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen

2 Outline Introduction to Citation Indexing Web of Science Bias
What is Citation Indexing Concept Web of Science Bias Autonomous Citation Indexing Future Application Technology Forecasting Summary

3 Why do literature search?
Avoid unwitting duplication of research Wasted time, effort & funds Plagiarism issues

4 Concept of Citations Citations symbolize the conceptual association of scientific ideas as recognized by publishing research authors. By the references they cite in their papers, authors make explicit linkages between their current research and prior work in the archive of scientific literature.

5 Distinction between "citation" and "reference"
If Paper R contains a bibliographic footnote using and describing Paper C, then R contains a reference to C, C has a citation from R. The number of references a paper has is measured by the number of items in its bibliography as endnotes, footnotes, etc., The number of citations a paper has is found by looking it up [in a] citation index and seeing how many others papers mention it." Source: Price D. J. D. Little science, big science...and beyond. New York: Columbia University Press, 1986.

6 R contains a reference to C,
Paper R …..To start, it is important to clarify the terminological distinction between "citation“[6] and "reference". In his classic book Little Science, Big Science, Derek Price gave a clear definition of both terms. He said: "It seems to me a great pity to waste a good technical term by using the words citation and reference interchangeably. I therefore propose and adopt the convention that if Paper R contains a bibliographic footnote using and describing Paper C, then R contains… R contains a reference to C, [6] The concept of citation indexing: A unique and innovative tool    for navigating the research literature. Current Contents, January 3, 1994. Paper C Little science, big science...and beyond. This is my first Current Contents® (CC®) essay under the rubric of Citation Comments. As discussed in last week's CC, this new monthly feature will focus on the applications of the Institute for Scientific Information's (ISI's) databases. 1 An appropriate topic to launch this new series is perhaps the most rudimentary -- the basic concept of citation indexing. To start, it is important to clarify the terminological distinction between "citation" and "reference". In his classic book Little Science, Big Science, Derek Price gave a clear definition of both terms. He said: "It seems to me a great pity to waste a good technical term by using the words citation and reference interchangeably. I therefore propose and adopt the convention that if Paper R contains a bibliographic footnote using and describing Paper C, then R contains a. C has a citation from R.

7 Citation Index Paper C Paper X Paper Y Paper R Paper Q

8 Citation Indexing A citation index indexes the citations an article makes, linking the article with cited works. Originally designed mainly for literature search for researchers to find subsequent articles that cite a given article. Invented by Dr. Eugene Garfield Example of a Citation Indexing Firm - Institute for Scientific Information ® (ISI)

9 Institute for Scientific Information® (ISI)
Index the linkages by listing both the cited and citing works. The ISI® databases Science Citation Index® (SCI®) Social Sciences Citation Index® (SSCI®) Arts & Humanities Citation Index® (A&HCI®) Multidisciplinary. They cover virtually all disciplines whereas traditional indexing and abstracting services are limited to a single field.

10 Web of Knowledge ISI Web of Knowledge®, a dynamic, integrated, Web-based environment ISI Web of Science® provides access to Science Citation Index (over 3,200 journals ) Social Sciences Citation Index (1400 journals) Arts & Humanities Citation Index Updated weekly. Journals from 1986 is available for Penn State Users Previous years of each index are available in PRINT at the Libraries.

11

12 Web of Science search current and retrospective multidisciplinary information from nearly 8,500 research journals in the world. users can navigate forward, backward, and through the literature, searching all disciplines and time spans to uncover lot of information relevant to their research.

13 Advantages Compared to traditional indexing-
no subjective judgments to be made about relevant descriptors faster no limit to index terms - all cited references are indexed.

14 Problems with ISI Databases
Require manual effort during indexing Expensive Bias issues One possible solution – Autonomous Citation Indexing Adapted from ‘Citation Indexing - Its Theory and Application in Science, Technology, and Humanities’ by Eugene Garfield

15 Bias in Citation Databases
Bibliometric indicators do not represent all publishing -though these databases have an international coverage, they have a certain amount of bias- They contain more minor US journals than minor European journals Non-English language journals are not as comprehensively indexed From a non-English speaking world perspective, bibliometric indicators represent only international level, predominantly English language, higher impact, peer-reviewed, publicly available research output. Source: Bibliometric Indicators and the Social Sciences, prepared for ESRC, J. Sylvan Katz SPRU, University of Sussex UK, December 1999

16 Bias in Citation Databases
One of the recurrent criticisms – journal selection is biased by the internal management decisions of ISI. Only journals are indexed- monographs are left out. A lack of correlation between the most highly cited authors based on the journal sample and those based on the monograph sample suggests that there may be two distinct populations of highly cited authors. Source: Blaise Cronin and Herbert W. Snyder. Comparative citation rankings of authors in monographic and journal literature: a study of sociology. Journal of Documentation,53(3):263–273, 1997.

17 ResearchIndex/CiteSeer
ResearchIndex: A scientific literature digital library that incorporates Autonomous citation indexing Citation context Full-text indexing Related document identification Query sensitive summaries Awareness and tracking Citation graph analysis Source: Presentation on “Searching the World Wide Web General and Scientific Information Access”, Steve Lawrence

18 CiteSeer – How does it work?
Downloads papers from the Web Convert to text and parse Obtain Citations & Do Full Text Indexing Store them in Database Query by citations or key words Source: CiteSeer: An Automatic Citation Indexing System (1998),C. Lee Giles, Kurt D. Bollacker, Steve Lawrence, Digital Libraries 98 - The Third ACM Conference on Digital Libraries

19 CiteSeer - Document Acquisition
Web search engines used for crawling Heuristics used to locate papers Pages containing words “publications”, “papers”, “postscript”, etc.). locates and downloads Postscript files identified by “.ps”, “.ps.Z”, or “.ps.gz” extensions. URLs and Postscript files that are duplicates of those already found are detected and skipped.

20 Document Parsing The downloaded Postscript files are first converted into text Information extracted include- URL , Header, Abstract, Introduction, Citations, Citation context and Full text Issues in Citation Parsing include: Natural language citations Citations to the same article (affects citation statistics)

21 Querying and Browsing First query – key word search used to return a list of citations matching the query or list of articles. Finds related documents- a combination of weighed similarity measures are used

22 Advantages of CiteSeer
Completely Autonomous - cheaper and more availability More up-to-date databases - not limited to a pre-selected set of journals or publication delays Literature search based on the context of citations Ability to recognize variant forms of citations No bias due to no subjective selection of journals Not restricted to papers – preprints, technical reports, conference proceedings also indexed. User feedback on each article Source: Autonomous Citation Matching (1999) Steve Lawrence, C. Lee Giles, Kurt Bollacker Proceedings of the Third International Conference on Autonomous Agents

23 Areas of Improvement 1. Does not cover the significant journals comprehensively. (might be less of a disadvantage over time as more journals become available online) 2. Cannot distinguish subfields as accurately (e.g. CiteSeer will not disambiguate two authors with the same name.) 3. Similar document retrieval system could be enhanced and improved. 4. Heuristics used to locate articles could be improved

24 Future prospects – Technology Forecasting
DIVA (for Database Information Visualization and Analysis system) - bibliometric analysis of collections of scientific literature and patents for technology forecasting. Documents, drawn from the technological field of interest, are visualized as clusters on a two dimensional map, permitting exploration of the relationships among the documents and document clusters Can yield insight into trends in the technological field of interest. Source: DIVA: A Visualization System for Exploring Document Databases For Technology Forecasting by Steven Morris, Zheng Wu, Camille DeYong, Sinan Salman, Dagmawi Yemenu Computers and Industrial Engineering, Vol. 43, No. 4

25 Clustering of documents

26 Document Maps

27 Document timelines

28 Document timelines

29 Document timelines

30 Document timelines ‘Polymers’ cluster report showing a plot of links to all other clusters by year

31 Document timelines ‘Polymers’ cluster report showing a plot of links to each other cluster by year.

32 A comment on bibliometric analysis
Compared to “a drunk who is looking for his keys under a street lamp” . When asked by a passer-by as to why he is looking there, the reply was “ This is where the lamp is”.

33 A comment on bibliometric analysis
Critics say that publications (and citations) just provide “easy data” and that the assessment of “real quality” needs more “quantitative considerations”.

34 Summary Citation Indexing – more the 40 years old.
Simple concept – far reaching influences, applications Many possibilities for Improvement of existing systems Developing new uses in the networked world


Download ppt "Nitish Mathew Thanks to Dr. C. Lee Giles Dr. Paul Cohen"

Similar presentations


Ads by Google