SciTech Strategies, Inc. BETTER MAPS BETTER DECISIONS Science Mapping and Applications: Choices and Trade-offs Kevin W. Boyack, SciTech Strategies Standards.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Trends in Conceptual Modeling: Citation Analysis of the ER Conference Papers ( ) Chaomei Chen, Il-Yeol Song, Weizhong Zhu
Development of a Community SPIRIT ‘The Journey so Far’ Alun E Morgan MPhil Student.
ISI Web of Knowledge – Innovative Solutions ISI Web of Knowledge / Web of Science – coming developments BIOSIS Archive Web Citation Index – New product.
What are the characteristics of academic journals
SciTech Strategies, Inc. William Pickering Dick Klavans Marjorie M.K. Hlava IEEE SciTech Strategies Access Innovations / Data Harmony March 23, 2010 Found.
Scopus. Agenda Scopus Introduction Online Demonstration Personal Profile Set-up Research Evaluation Tools -Author Identifier, Find Unmatched Authors,
The Thomson Reuters CITATION CONNECTION Digital Library st March – 3 rd April 2014, Jasná David Horký Country Manager – Central and Eastern Europe.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Automatic Classification of Accounting Literature Nineteenth Annual Strategic and Emerging Technologies Workshop Vasundhara Chakraborty, Victoria Chiu,
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
3. Challenges of bibliometrics DATA: many problems linked to the collection of data FIELDS – DISCIPLINES : various classifications, not satisfactory INDICATORS.
The Literature Review Literature search and review from the perspective of citations.
1 Guide to exercise 10 Bibliometric searching on indicators for journals, papers, and institutions Tefko Saracevic.
PPA 501 – Analytical Methods in Administration Lecture 2c – The Research Proposal.
What’s new in search? Internet Librarian Oct 29 th 2007.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
The Changing Role of Intangibles over the Crisis Intangibles & Economic Crisis & Company’s Value : the Analysis using Scientometric Instruments Anna Bykova.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Guillaume Rivalle APRIL 2014 MEASURE YOUR RESEARCH PERFORMANCE WITH INCITES.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE KIEV, 31 JANUARY.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Kevin W. Boyack Sandia National Laboratories Sackler Colloquium on Mapping Knowledge Domains May 11, 2003 An indicator-based characterization of PNAS Sandia.
The Web of Science database bibliometrics and alternative metrics
Design and Update of a Classification System: The UCSD Map of Science Networks and Complex Systems Talk Series Indiana University, Bloomington, IN Sept.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Bibliometrics toolkit: ISI products Website: Last edited: 11 Mar 2011 Thomson Reuters ISI product set is the market leader for.
Discovery tools and research assessment solutions APRIL 2012 Shahrooz Sharifrazy Regional Sales Manager.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
Bibliometrics: coming ready or not CAUL, September 2005 Cathrine Harboe-Ree.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
Web of Science® Krzysztof Szymanski October 13, 2010.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Math CS Physics Chemistry Biology Engineering Medical Brain/Neuro Psych Earth Sciences Social Sciences Kevin W. Boyack Mapping Examples Sandia is a multiprogram.
Literature Reviews: the Hows, Whys and Wherefores GEO 518 Anne Nolin and Dawn Wright.
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Sharad Oberoi Carnegie Mellon University DesignWebs: Learning in Engineering Project Teams.
SciVal Spotlight Training for KU Huiling Ng, SciVal Product Sales Manager (South East Asia) Cassandra Teo, Account Manager (South East Asia) June 2013.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Managing Humanity’s Knowledge and Expertise: The InfoVis Cyberinfrastructure Katy Börner & the InfoVis Lab School of Library and Information Science
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt,
Bibliometrics and Publishing Peter Sjögårde, Bibliometric analyst KTH Royal Institute of Technology, ECE School of Education and Communication in Engineering.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Overview of 2011 Workshop Results Henk F. Moed on Standards in Science Metrics and Classifications.
Information Visualization, Human-Computer Interaction, and Cognitive Psychology: Domain Visualizations Kevin W. Boyack Sandia National Laboratories.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
"Razzle Dazzle" on literature-based bibliometrics for Research Assessment P. Larédo Seminar "Research Assessment: What Next?" May 17-20, Washington.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
MARKO ZOVKO, ACCOUNT MANAGER STEPHEN SMITH, SOLUTIONS SPECIALIST JOURNALS & HIGHLY-CITED DATA IN INCITES V. OLD JOURNAL CITATION REPORTS. WHAT MORE AM.
Deep Indexing in ProQuest Health and Medical Databases.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Searching the Web for academic information Ruth Stubbings.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Role of librarians in improving the research impact and academic profiling of Indian universities J. K. Vijayakumar Ph. D Manager, Collections & Information.
Tools for Effective Evaluation of Science InCites David Horky Country Manager – Central and Eastern Europe
COST Workshop Budapest, Hungary August 29, 2016
Automatic selection of references for the creation of a biomedical literature review using citation mapping Houcemeddine Turki Faculty of Medicine of Sfax,
Optimize your research performance using SciVal
IEEE Transactions Journals Scopus Viewpoint
Conducting a STEM Literature Review
Presentation transcript:

SciTech Strategies, Inc. BETTER MAPS BETTER DECISIONS Science Mapping and Applications: Choices and Trade-offs Kevin W. Boyack, SciTech Strategies Standards Workshop August 11, 2011

Better Maps Better Decisions SciTech Strategies Agenda  Background  Applications  Choices and Trade-offs  SciTech Choices  SciTech Mapping Process  Applications Enabled by SciTech Process  Additional Findings Related to Choices  Summary 2

Better Maps Better Decisions SciTech Strategies Science Mapping  Most common definition: visualizing or creating a visual map of scientific information » Documents, journals, authors, words …  This requires » Data acquisition and cleaning » Defining elements and similarity » Partitioning and/or layout of the data  And then one creates the visual 3

Better Maps Better Decisions SciTech Strategies Why Do We Map Science?  Basic understanding » Structure and dynamics of science  High level, all of science  Discipline, specialty  Researchers and groups  Search and retrieval » “More like this”, “related articles”  Metrics » Research » Evaluation  Policy / Decision Making 4

Better Maps Better Decisions SciTech Strategies Why Do We Map Science?  Structure and dynamics » High level, all of science » Discipline, specialty » Researchers and groups  Search and retrieval » “More like this”, “related articles”  Metrics » Research » Evaluation  Policy 5 Academic View  Structure and dynamics of science » High level, all of science » Discipline, specialty » Researchers and groups  Search and retrieval » “More like this”, “related articles”  Metrics » Research » Evaluation  Policy World View

Better Maps Better Decisions SciTech Strategies Map of Science & Technology Indicators 6

Better Maps Better Decisions SciTech Strategies Choices  Data source  Unit of analysis » Document, Journal, Author, Word (phrase)  Sample size » Specialty, Discipline, All of science, Single year, Multiple years  Similarity approach » Citation, Text, Thresholds  Partitioning and/or layout » MDS, K-K, F-R, DrL (OpenOrd), Blondel, … 7

Better Maps Better Decisions SciTech Strategies Trade-offs 8 ChoiceTrade-offsComments Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context - Citation data isn’t free - De-duplication isn’t trivial - Coverage comes at a cost Unit of analysis- Breadth vs. Detail- Base on research question - Don’t base on data available Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years - Mapping ALL is costly - Specialty maps lack context - Stability or instability? Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy - Comp cost: Index < Vector - Hybrid costly, but likely best - How much is really needed? Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability - Simple often size limited - Is intuition satisfied? - Are distributions reasonable? - Useful levels of aggregation?

Better Maps Better Decisions SciTech Strategies Mapping Choices  Are often made based on what is available » Data, algorithms, expertise  When they should be made based on » The research question or application » Balancing of the applicable trade-offs  If the application is EVALUATION » Special care must be taken » Partition accuracy is critical  What is a discipline? 9

Better Maps Better Decisions SciTech Strategies Our Choices 10 ChoiceTrade-offsComments Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context - Citation data is essential and worth paying for - Future: full text? Unit of analysis- Breadth vs. Detail - Documents Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years - Mapping ALL of SCIENCE - Context is very useful - Link single years - Show instability Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy - Co-citation analysis - Future: hybrid likely - Top-n only, rest is noise Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability - Complex (DrL x 10), robust - Small clusters - We are continually testing, refining, and improving our processes

Better Maps Better Decisions SciTech Strategies Standards? 11 ChoiceTrade-offsStandards? Data source- Free vs. Costly - Single vs. Multiple Sources - Content vs. Context Bibliographic data Cleaning to different levels Use what you can get! Unit of analysis- Breadth vs. Detail Articles, journals, authors Sample size- Specialty vs. All of science - Content vs. Context - Single year vs. Multiple years Similarity approach- Simple vs. Complex - Citation vs. Text vs. Hybrid - Threshold vs. Accuracy Partitioning / Layout- Simple vs. Complex - Accuracy vs. Useability Anecdotal validation in most cases

Better Maps Better Decisions SciTech Strategies Possible Standards  Units  Data sources  Data cleanliness  Workflows  Validation  Datasets  Products (e.g. reference maps?)  … 12

Better Maps Better Decisions SciTech Strategies Our Process 13 Generate annual models, Select single year of source data Apply thresholds to select references Generate relatedness matrix (k50/cosine) Filter matrix to top-n values per reference Cluster references  intellectual base Assign current papers  research front Link communities from adjacent years using overlaps in their intellectual bases  threads Link annual models Research communities

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Highly granular model of science gives detailed structure 14

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Detailed structure enables metrics for multidisciplinary sets of topics  Competencies is the name we use for these sets of topics 15

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Research leadership (based on competencies) rather than research evaluation (based on rankings) 16 UCSDTAMU UCSF MITStellenbosch Delhi

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Mapping all of science shows context for target literature 17

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Linking annual models shows detailed evolution and instability  Consider a topic in a given year. It can be linked (or not) in time in the following ways: 18 Prior year linkage Following year linkage

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Linkage types correlate with various properties (e.g. textual coherence, prolific authors)  These can be used as predictors of future linkage » Identification of emerging topics at the micro-level 19 Prior year linkage Following year linkage Isolate 34% Death 11% Birth 14% Continuation 41% Stasis

Better Maps Better Decisions SciTech Strategies Process-Specific Applications  Superstar authors (top 1%) are predictive » Superstars avoid isolates » Superstars move out of topics before they die » Superstars are heavily involved in splits and merges (non-stasis parts of continuation) 20 Isolate 34% Death 11% Birth 14% Continuation 41% Stasis

Better Maps Better Decisions SciTech Strategies Accuracy of Similarity Approaches  Identified a large data corpus (2.15M articles) » Textual information (abstracts, MeSH) from Medline » Citation information from Scopus  Compare 13 approaches – 3 citation, 9 text, 1 hybrid  For each approach » Calculate article-article similarities » Filter similarities to reduce the number to a similarity file of reasonable size (2.15M articles, 20M sim pairs) » Cluster the articles » Measure accuracy of the cluster solution using multiple metrics  Coherence (Jensen-Shannon divergence)  Grant-to-article linkage concentration  Analyze results 21

Better Maps Better Decisions SciTech Strategies Results 22 MethodComp Cost% Coverage Coherence (combined) HerfindahlPr80 (all grants) Co-word MeSHMedium95.77% LSA MeSHVery high98.22% SOM MeSHVery high99.97% bm25 MeSHMedium93.39% Co-word TAHigh83.41% LSA TAVery high90.92% Topics TAHigh94.40% bm25 TAHigh93.91% pmraLow94.23% Bib couplingLow96.62% Co-citationLow98.37% Direct citationLow92.68% HybridMedium96.83%

Better Maps Better Decisions SciTech Strategies Summary  Science mapping involves a great number of choices » There are many trade-offs to consider » Choices constrain or enable analysis types  SciTech has made a set of mapping choices, and developed a methodology that enables » Development of a detailed, multi-year model of science for structure and evolution » Metrics on competencies (sets of topics) that are important at the institutional level » Context to be mapped along with any target literature » Prediction of continuance » Identification of emerging topics  We continue to explore accuracy and usefulness of maps 23

Better Maps Better Decisions SciTech Strategies Relevant Literature  Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., Schijvenaars, B., Skupin, A., Ma, N., & Börner, K. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLOS One, 6(3): e  Klavans, R., & Boyack, K. W. (2011). Using global mapping to create more accurate document-level maps of research fields. Journal of the American Society for Information Science and Technology, 62(1),  Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12),  Klavans, R., & Boyack, K. W. (2010). Toward an objective, reliable, and accurate method for measuring research leadership. Scientometrics, 82(3),  Klavans, R., & Boyack, K. W. (2009). Toward a consensus map of science. Journal of the American Society for Information Science and Technology, 60(3),  Boyack, K. W. (2009). Using detailed maps of science to identify potential collaborations. Scientometrics, 79(1),  Klavans, R., & Boyack, K. W. (2006). Quantitative evaluation of large maps of science. Scientometrics, 68(3),  Klavans, R., & Boyack, K. W. (2006). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology, 57(2),  Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3),  Boyack, K. W. (2004). Mapping knowledge domains: Characterizing PNAS. Proceedings of the National Academy of Sciences, 101,  Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37,

Better Maps Better Decisions SciTech Strategies Thank you! 25