Presentation is loading. Please wait.

Presentation is loading. Please wait.

CiteSearch: Multi-faceted Fusion Approach to Citation Analysis Kiduk Yang and Lokman Meho Web Information Discovery Integrated Tool Laboratory School of.

Similar presentations


Presentation on theme: "CiteSearch: Multi-faceted Fusion Approach to Citation Analysis Kiduk Yang and Lokman Meho Web Information Discovery Integrated Tool Laboratory School of."— Presentation transcript:

1 CiteSearch: Multi-faceted Fusion Approach to Citation Analysis Kiduk Yang and Lokman Meho Web Information Discovery Integrated Tool Laboratory School of Library and Information Science, Indiana University January 18, 2007

2 2 CiteSearch: What, Why, & How Goal Goal Quality Assessment of Scholarly Publications Quality Assessment of Scholarly Publications Motivation Motivation Lack of comprehensive citation database Lack of comprehensive citation database Limitations of conventional citation analysis Limitations of conventional citation analysis One-dimensional assessment One-dimensional assessment Misleading evaluation Misleading evaluation Approach Approach Multi-faceted, Fusion-based Citation Analysis Multi-faceted, Fusion-based Citation Analysis Combine data from multiple citation databases Combine data from multiple citation databases Assess quality using various quality evaluation measures Assess quality using various quality evaluation measures

3 3 CiteSearch Study: Overview Objectives Objectives Investigate current citation analysis environment Investigate current citation analysis environment Test the viability of CiteSearch system Test the viability of CiteSearch system Method Method Search citation databases and compare the results Search citation databases and compare the results Setup Setup Study sample Study sample Publications of 15 SLIS faculty members (approx. 1,100 publications) Publications of 15 SLIS faculty members (approx. 1,100 publications) Databases used Databases used Google Scholar, Scopus, Web of Science Google Scholar, Scopus, Web of Science Citation sources Citation sources Journals and conference papers in 1996-2005 Journals and conference papers in 1996-2005

4 4 Citation Databases Web of Science Scopus Google Scholar Breadth of coverage 36M records 8,700 titles Journals (240 open access) & conference papers 28M records 15,000 titles Journals (500 open access) & conference papers 500M records Unknown 30+ document types Coverage years A&HCI: 1975- SCI: 1900- SSCI: 1956- 1996-present (with cited references) 1966-present (without cited references ) Unknown Subject area AllAllAll Data collection Data collection - WoS: 100 hours - Scopus: 200 hours - GS: over 3,000 hours

5 5 Scopus and WoS: Citation Count Scopus vs. WoS Scopus vs. WoS 14.0% (278) more citations by Scopus 14.0% (278) more citations by Scopus More comprehensive coverage by Scopus (15,000 vs. 8,700 periodicals) More comprehensive coverage by Scopus (15,000 vs. 8,700 periodicals) Scopus + WoS Scopus + WoS Scopus increases WoS citations by 35% (710) Scopus increases WoS citations by 35% (710) WoS increases Scopus citations by 19.0% (432) WoS increases Scopus citations by 19.0% (432) Relatively low overlap (58%) and high uniqueness (42%) Relatively low overlap (58%) and high uniqueness (42%) Scopus (2,301) Web of Science (2,023) 58% (1,591) 26% ( 710 ) 16% (432) Scopus  WoS (2,733)

6 6 Impact of Scopus By Research Area - varies significantly between research areas

7 7 Impact of Scopus on Faculty Members Relative Ranking Scopus significantly alters the relative ranking of those faculty members that appear in the middle of the rankings

8 8 Scopus + WoS: Citation Count By Document Type Scopus (359) WoS (229) 18% (92) 54% ( 267 ) Scopus  WoS (496) 28% (137) Conference Papers Only

9 9 Scopus + WoS: Summary of Results Coverage Coverage Varies greatly between research areas Varies greatly between research areas Increase in citations ranges from 5% to 99% by combining results from both databases Increase in citations ranges from 5% to 99% by combining results from both databases Scopus has a much better coverage of conference proceedings Scopus has a much better coverage of conference proceedings Overlap: 18% Overlap: 18% Scopus only: 54% Scopus only: 54% WoS only: 28% WoS only: 28% Ranking by citation count Ranking by citation count Relative ranking of faculty members changes significantly for those in the middle Relative ranking of faculty members changes significantly for those in the middle

10 10 Google Scholar Citations By Document Type

11 11 Citations By Language

12 12 Impact of GS By Research Area

13 13 Impact of GS on Faculty Members Relative Ranking GS does not significantly alter the rankings of faculty members

14 14 GS vs. Scopus  WoS GS increases WoS  Scopus citations by 93% (2,552) GS increases WoS  Scopus citations by 93% (2,552) Scopus  WoS increases GS citations by 26% (1,104) Scopus  WoS increases GS citations by 26% (1,104) GS identifies 53% (or 1,448) more citations than WoS  Scopus GS identifies 53% (or 1,448) more citations than WoS  Scopus GS has much better coverage of conference proceedings GS has much better coverage of conference proceedings (1,849 by GS vs. 496 by Scopus  WoS) (1,849 by GS vs. 496 by Scopus  WoS) GS has over twice as many unique citations as Scopus  WoS GS has over twice as many unique citations as Scopus  WoS (2,552 vs. 1,104, respectively ) (2,552 vs. 1,104, respectively ) Google Scholar (4,181) Scopus  WoS (2,733) 31% (1,629) 48% ( 2,552 ) 21% (1,104) GS  Scopus  WoS (5,285)

15 15 GS + Scopus  WoS: Summary of Results Coverage Coverage Varies greatly between research areas Varies greatly between research areas 23% to 144% increase by combining GS & Scopus  WoS 23% to 144% increase by combining GS & Scopus  WoS 5% to 98% increase by combining Scopus & WoS 5% to 98% increase by combining Scopus & WoS GS has strong coverage in CS & IS GS has strong coverage in CS & IS HCI, IR, computational linguistics, social informatics HCI, IR, computational linguistics, social informatics Scopus  WoS has stronger coverage in LS Scopus  WoS has stronger coverage in LS Bibliometrics, collection development, information policy Bibliometrics, collection development, information policy GS provides significantly better coverage of non-English materials GS provides significantly better coverage of non-English materials GS (7%); Scopus (1%); WoS (1%) GS (7%); Scopus (1%); WoS (1%) Ranking Ranking No significant changes in relative ranking of faculty members No significant changes in relative ranking of faculty members

16 16 Findings Scopus, WoS, and GS complement rather than replace each other Scopus, WoS, and GS complement rather than replace each other GS can be useful in showing evidence of broader international impact than could possibly be done through Scopus and WoS GS can be useful in showing evidence of broader international impact than could possibly be done through Scopus and WoS GS can be very useful for citation searching purposes; however, it is not conducive for large-scale comparative citation analyses GS can be very useful for citation searching purposes; however, it is not conducive for large-scale comparative citation analyses Scopus significantly alters the relative citation ranking of scholars as measured by Web of Science. GS does not Scopus significantly alters the relative citation ranking of scholars as measured by Web of Science. GS does not

17 17 Conclusions Multiple sources of citations should be used to generate accurate citation counts and rankings Multiple sources of citations should be used to generate accurate citation counts and rankings Citation databases complement one another Citation databases complement one another Small overlap between sources may significantly influence relative ranking Small overlap between sources may significantly influence relative ranking Multi-faceted citation analysis is needed Multi-faceted citation analysis is needed citation coverage varies by research area, document type, language citation coverage varies by research area, document type, language CiteSearch can greatly facilitate citation analysis CiteSearch can greatly facilitate citation analysis CiteSearch Enormous effort is required to Enormous effort is required to Refine search strategy Refine search strategy Parse search results Parse search results Eliminate noise (duplicate citations) Eliminate noise (duplicate citations) Extract & normalize citation metadata Extract & normalize citation metadata

18 18 CiteSearch System: Overview A Web-based citation search and analysis tool A Web-based citation search and analysis tool Work-in-progress prototype system Work-in-progress prototype system 1. Search multiple citation sources Google Scholar, Web of Science, Scopus, EBSCO, ProQuest, etc. Google Scholar, Web of Science, Scopus, EBSCO, ProQuest, etc. 2. Extract and compile citation metadata Parse & normalize the search results Parse & normalize the search results 3. Compute various citation-based quality evaluation measures Document-based measures Document-based measures Weighted citation counts, CiteRank Weighted citation counts, CiteRank Author-based measures Author-based measures Weighted publication counts, H-Index, Mentor-Index Weighted publication counts, H-Index, Mentor-Index

19 19

20 20

21 21 CiteSearch System: Architecture

22 22 End

23 23 CiteSearch System: Work-in-Progress Work-in-Progress Federated Citation Search Federated Citation Search To compile comprehensive & usable citation data To compile comprehensive & usable citation data 1. Query multiple citation databases 2. Filter out noise e.g., invalid, duplicate citations e.g., invalid, duplicate citations 3. Extract & normalize metadata bibliographical metadata (e.g., title, author, year, source, etc.) bibliographical metadata (e.g., title, author, year, source, etc.) citation metadata (e.g., doctype, subject, language, etc.) citation metadata (e.g., doctype, subject, language, etc.) Multi-faceted Citation Analysis Multi-faceted Citation Analysis To produce multi-faceted quality/impact assessment measures that To produce multi-faceted quality/impact assessment measures that account for variance in citation quality (e.g., Weighted citation counts, CiteRank) account for variance in citation quality (e.g., Weighted citation counts, CiteRank) consider various facets of evaluation metric (e.g., Document type, language) consider various facets of evaluation metric (e.g., Document type, language) accommodate diffent aspects of quality assessment (e.g., H-Index, Mentor-Index) accommodate diffent aspects of quality assessment (e.g., H-Index, Mentor-Index) 1. Compute citation-based quality scores (CQS) for each publication 2. Compute CQS for authors, schools, publishers using publication CQS 3. Compute CQS for each publication weighted by author/school/publisher scores 4. Compute CQS for authors, schools, publishers using weighted publication CQS 5. Repeat steps 3 and 4 until convergence

24 24 CiteSearch Study: Citation Databases Web of Science Web of Science 3 Institute for Scientific Information (ISI) databases 3 Institute for Scientific Information (ISI) databases Standard tool for citation studies worldwide Standard tool for citation studies worldwide 35 million records from 9,000 publishers 35 million records from 9,000 publishers Scopus Scopus Produced by Elsevier Produced by Elsevier 27 million records from 15,000 publishers 27 million records from 15,000 publishers Google Scholar Google Scholar 500 million records 500 million records UBC (http://weblogs.elearning.ubc.ca/googlescholar/archives/025964.html) UBC (http://weblogs.elearning.ubc.ca/googlescholar/archives/025964.html)http://weblogs.elearning.ubc.ca/googlescholar/archives/025964.html Unknowns Unknowns Coverage (subject, publisher, time-span) Coverage (subject, publisher, time-span) Document type and refereed status of records Document type and refereed status of records

25 25 Google Scholar Citations by Year

26 26 Sources of Unique Citations

27 27 CiteSearch Study: GS + Scopus + WoS Google Scholar (4203) 4.3% ( 230 ) 18.3% ( 970 ) 48.3% ( 2561 ) GS  Scopus  WoS (5307) Scopu s (2308) WoS (2025) 11.7% ( 617 ) 8.2% ( 435 ) 3.8% ( 204) 5.3% ( 282 )


Download ppt "CiteSearch: Multi-faceted Fusion Approach to Citation Analysis Kiduk Yang and Lokman Meho Web Information Discovery Integrated Tool Laboratory School of."

Similar presentations


Ads by Google