Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r.

Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r

Introduction Online web-based bibliographical databases, such as ISI Web of Science, Scopus, CiteSeer, Google Scholar or NLM’s MEDLINE are common sources of data for bibliometric research. Many of these databases have serious problems and usage limitations since they were mainly designed for information retrieval, rather than for bibliometric analysis. In order to overcome these limitations, one common solution consists of downloading data from online databases, cleaning it and storing it in customized ad hoc databases.

OSS for Biblio/Sciento Metrics 1.Publish-or-Perish 2.SciMAT 3.Bibexcel 4.CiteSpace II 5.Science of Science (Sci2 ) Tool 6.HistCite 7.VOSViwer 8.UciNET; 9.Pajek 10.NodeXL

Publish of Perish (POP) Publish or Perish is a software program that retrieves and analyses academic citations. It is developed by Anne-Wil Harzing and freely available for download. It uses: – Google Scholar data and Microsoft Academic Search database are used to obtain raw data and then analyses to prepare some metrics: – It also works with External Data

SciMAT SciMAT is Java based application and compatible in LINUX, windows and other operating systems. It generates a knowledge base with key parameters like author, title, publishers, keywords, journal name and references etc. Knowledge base has sixteen entities like author, document, affiliation etc. It uses SQLite database to store the knowledge base.

Bibexcel Bibexcel is a great tool for bibliometric and citation analysis. Developed by Olle Persson from Umea University, Sweden to assist a user in analyzing bibliographic data.

CiteSpace II CiteSpace is developed by the Dr. Chaomei Chen, a Professor of Informatics, College of Computing and Informatics at Drexel University, Philadelphia, USA. Http://cluster.cis.drexel.edu/~cchen/citespace It is free Java based software used for structural and temporal analyses of networks derived on publication data such as Collaboration Networks, Authors Co-citation Networks, and Document Co- citation networks.

Science of Science (Sci2) Tool It is specifically designed for the study of science. It supports temporal, geospatial, topical, and network analysis and visualization of scholarly datasets. Developed by Cyber-infrastructure for Network Science Center Team Available: https://sci2.cns.iu.edu

HistCite HistCite software package was developed by Eugene Garfield. It is used for the purpose of bibliometric analysis and information visualization. HistCite operates on Windows OS. It is used to know the important journals in a particular subject or to identify the key authors, institutions and articles in particular subject area.

VOSviewer – It is a free software tool for constructing and visualizing bibliometric networks. – The network may be for authors, title, country etc. – Latest version: 1.6.4 ( April 7, 2016)

UciNet UCINET 6 for Windows is a software package for social network analysis, developed by Lin Freeman, Martin Everett and Steve Borgatti, and published by Analytic Technologies, USA It incorporates NetDraw network visualization tool. UCINET 6 is available for download free of cost for 90 day uses only or can be purchased.

Pajek Pajek, developed by Vladimir Batagelj & Andrej Mrvar is freely available software programme runs on windows. Pajek is able to execute various kinds of network analyses and visualizations activities

NodeXL: Network Overview, Discovery and Exploration for Excel NodeXL Basic is a free, open-source template for Microsoft® Excel® 2007, 2010, 2013 and 2016 that makes it easy to explore network graphs. – Graph Metric Calculations – Flexible Import and Export – Powerful Vertex Grouping – Zoom and Scale

Publish or Perish-POP (ver. 4): Hands-On

Publish or Perish (POP) PoP is used to do a quick literature review to identify the most cited articles or scholars in a particular field; PoP may be used to identify whether any research has been done in a particular area at all; PoP is very well suited for doing bibliometric research on both authors and journals.

Publish of Perish (POP) Publish or Perish is a software program that retrieves and analyses academic citations. It uses: – Google Scholar data and Microsoft Academic Search data to obtain raw data and then analyses to prepare some metrics: – It also works with External Data

POP-Important Metrics Total Number of Papers Total number of citations Average citations per paper Citations per author, papers per author, citations per year Hirsch's h-index :A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np-h) papers have no more than h citations each. Egghe's g-index: A set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g 2 citations.

Disadvantages of h-index A disadvantages of h-index is that it cannot decline. That means that academics who “retire” after 10/20 active years of publishing maintain their high h-index even if they never publish……….

POP-Important Metrics The contemporary h-index: It adds an age- related weighting to each cited article, giving less weight to older articles. This means that for an article published during the current year, its citations count four times. For an article published 4 years ago, its citations count only once (4/4). For an article published 6 years ago, its citations count 4/6 times, and so on.

POP-Important Metrics The Individual h-index (hI): It divides the standard h- index by the average number of authors in the articles that contribute to the h-index, in order to reduce the effects of co-authorship; the resulting index is called hI. The age-weighted citation rate (AWCR): It is an age- weighted citation rate, where the number of citations to a given paper is divided by the age of that paper. AR-index as the square root of the sum of all age- weighted citation counts over all papers that contribute to the h-index.

POP-External Data Scopus: Data can be exported or downloaded from Scopus. The data should be exported/downloaded *.csv format. Besides, downloaded data should be compatible with “citation information only”. Web of Science: Data can also be downloaded from Web of Science database in a plain text format.

External Data Source: Scopus

External Data Source: WoS

How to Merge two or more files? 1)Click start button> type cmd 2) Go to folder as: CD C:\ merge 3) C:\merge then press dir to see the files 4)Copy *.csv/txt merge.csv/txt;

POP- Search Author Search: One can look for his publications indexed under Google Scholar along with citation using query like “Mike Thelwall”. Journal Search: One can look for his publications in a particular Journal indexed under Google Scholar along with citation. Use double quote for Journal Title. General Citation Search: Under General Citation Search implies that, it is possible to execute multiple query at a time, for example, one can search articles on nanoparticles from the “Journal of Nanotechnology and Nanoscience’’ published during 20013-2015.

POP-Results

POP- Export Results Results can be exported into CSV format as: File>Save as CSV Copy> Statistics with excel header=>details statistics will be copied Copy> Results with excel header => details result will be copied

POP- Results 60 paper(s) with 1 author(s) 530 paper(s) with 2 author(s) 495 paper(s) with 3 author(s) 330 paper(s) with 4 author(s) 170 paper(s) with 5 author(s) 96 paper(s) with 6 author(s) 52 paper(s) with 7 author(s) 44 paper(s) with 8 author(s) 16 paper(s) with 9 author(s) 9 paper(s) with 10 author(s)

POP-Limitations Some limitations of POP software: – POP under Google Scholar supports only 1000 papers – POP under Microsoft Academic Search supports 1888 papers – POP supports Scopus’s *.csv format – Web of Science –supported Tab Delimited Win or Win, UTF-8, Plain Text

POP- Benefit Using POP, one can calculate various metrics value with a single click for a particular set of data, retrieved from either Google Scholar, Scopus or Web of Science. Using POP, the required data set to test Lotka’s law can easily be found. Various statistics and results can easily be copied to Excel for further analysis.

Bibexcel: Hands-On

Bibexcel: Introduction Bibexcel is a great tool for bibliometric and citation analysis. BibExcel is designed by Olle Perrsson from Umea University, Sweden to assist a user in analyzing bibliographic data.

Bibexcel: Features List of Authors with no. of documents, list of journals etc. Co-citation, bibliographic coupling, mapping and clustering analysis; Bibexcel allows interaction with other software like Pajek, Excel and SPSS; Able to import many different types of data: Scopus (*.ris ) and Web of Science (*.txt) Flexibility data management and analysis

Bibexcel: Menu File Edit DOC file Edit OUT file Add data Classify Analyze Misc Mapping Help

How Bibexcel works? Download bibliographic data either from Scopus or Web of Science Preparing Data, which will be suitable for use in Bibexcel; Analysis the data ; Preparing Reports

What Bibexcel can do? Step-1: Download data from Web of Science in Plain text format OR from Scopus in RIS format. Step-2: Restructuring of downloaded data Step-3: Creating an OUT-file. Step-4: Analysis of data; Step-5: Export files to Pajek for visualizations.

How to do the Re-structuring of data? Step-1: There are two steps of retracting the data – Insert carriage return in the file – Convert the bibliographic record to DIALOG format Step-2: Carriage return can be done – Go to Bibexcel menu. Edit doc file>Replace line feed with carriage return. Step-3: to convert the bibliographic record to DIALOG format; – Selecting the file *.tx2 and then choose option like – Misc>Convert to Dialog format> Convert from Web of Science, OR – Misc>Convert to Dialog format> Convert from Scopus RIS format.

WoS -Record Looks Like! PT- Journal| AU- Brown S; Blackmon K| TI- Aligning manufacturing strategy and business-level competitive strategy in new competitive environments: The case for strategic resonance| SO- JOURNAL OF MANAGEMENT STUDIES| NR- 190| CD- 1998, IND WEEK 1207, P22, V247; 1998, IND WEEK 1207, P24, V247; ADLER PS, 1990, P55, CALIFORNIA MANAG SPR; ANDERSON J, 1991, V1, P86, INT J PRODUCTION OPE; ZAJAC EJ, 2000, V21, P429, STRATEGIC MANAGE J; ZAJAC EJ, 1989, V10, P413, STRATEGIC MANAGE J9- J MANAGE STUD-OXFORD| JN- JOURNAL OF MANAGEMENT STUDIES, 2005, V42, N4, P793-815| UT- ISI:000229369000004 ER ||

Scopus- Record Looks Like! [*.doc] TY- JOUR| TI- Surface modification of polyacrylonitrile co-polymer membranes using pulsed direct current nitrogen plasma| T2- Thin Solid Films| VL- 597| SP- 171| EP- 182| PY- 2015| DO- 10.1016/j.tsf.2015.11.050| AU- Pal, D.; Neogi, S.; De, S.| N1- Export Date: 17 April 2016| M3- Article| DB- Scopus| UR- http://www.scopus.com/inward/record.url?eid=2-s2.0- 84959475022&partnerID=40&md5=7f25e70a7940e813bd6cfcd7c1a3ac78| ER- ||

How do you analyze the file in Bibexcel? Step-1: Creating OUT file and calculating frequency distributions. How do you create an OUT file? OUT file is a tab delimited text file. It can be imported into excel. It can be created as: Step-1: Select the *.doc file. Step-2: Entering the field TAG (e.g. AU for Author, TI for Title) under Frequency Distribution Panel in the box marked “Old Tag”; Step-3: Go to select field to be analyzed; From drop down, choose Any; separated field option; Step-4: Click on Prep Step-5: OUT file will be generated after answering question after a few clicks.

Creating Various file types in Bibexcel Step-1: *.tx2 file [carriage return] Step-2: *.doc file Step-3: *.OUT file Step-4: *.CIT file [output file under frequency dist.] Step-5: *.OUX file Step-6: *.COC file Step-7: *.jn1 or *.jn2 etc file

Bibexcel: Frequency Distribution First of all choose OUT file. Fractionalize: If a document written by two authors, each will contribute half an article. If the box is unchecked, it implies that we have chosen Whole Counts method:, Whole Counts: If the Fractionalize box is checked, it implies that we have chosen Fractionalize method. Whole String: We select Whole string from the scrollbar under ‘ Select type of unit’, Bibexcel will count whole author name.

Result-1: Document Identification No with author / title/ journal Step-1: Selecting *.doc file Step-2:Select field delimiter as Any ; separated field [from Select field to be analyzed box] Step-3: Type AU in the box marked under Old Tag Box Step-4: Press the button Prep. Step-5: Click ok, ok and yes Step-6: Result will be generated in [*.OUT file]. Step-7: The list of author or Title can be saved in a new file as – Click on view whole file – Type a file name under the box ‘ Type New file name here’ – Click on Start under select documents

Result-2: Author / title/ journal with no. of articles Step-1: Make OUT file for AU Tag and Selecting *.OUT file Step-2:Select field delimiter as Any ; separated field [from Select field to be analyzed box] Step-3: Type AU in the box marked under Old Tag Box Step-4: Choose Whole String [from frequency distribution] click on sort descending and Press the button Start. Step-5: Click ok, ok and yes Step-6: Result will be generated [*.CIT file]. How to save file The list of author or Title can be saved in a new file – Click on view whole file – Type a file name under the box ‘ Type New file name here’ – Click on Start under select documents

Result-3:Remove Duplicates Select *.doc and view whole file Put Old Tag AU Check duplicates Frequency Distribution- Whole String Select Field Any comma separated Click on “Prep” to create *.out file Select *.out file and click on ‘Start’ to remove duplicate authors. Result is the *.cit file

Result-4: Sort by Number of Docs. Select *.out file and view whole file Put Old Tag AU Check sort descending Frequency Distribution- Whole String Select Field Any comma separated Click on “Start”

Result-5: Fractionalize Method

Result-6: List of Journals with Citations Step-1: Select *.doc file Step-2: Create *.OUT file by selecting “Cited Journals with whole string” and Any ; separated field, Type T2 in Old TAG and click on “Prep”. Result is *.out file with doc. ID and Journal Name Step-3: Select *.out file and Enter Tag “TC” in the ‘Old Tag’ and click “Add Field to unit” and then say “No”; Result is *.jn1 file. The List of journals with doc. ID and Citation.

Result-7: Generate Map on Journals

Co-occurrence analysis Creating Pajek Files in Bibexcel Step-1:Download data from Scopus in*.ris format. Step-2: Go to bibexcel menu>Edit doc file>Replace Line feed with Carriage return. Got *.tx2 file Step-3: Go to bibexcel menu>Misc>Convert to Dialog format>Convert from Scopus RIS Format. Got *.doc file

Co-occurrence analysis Step-4: Select *.doc and go to ”Frequency distribution”-box, choose from the drop-down menu ”Whole string” check the checkbox labeled ”Make new out-file” and write in Old tag-field AU (Author). Click Start. This will create a new file, a.oux-file. Step-5: Choose the.oux -file and open it to The List by clicking on “View file”. Then, from the “Select field to be analyzed…” - box, choose from the drop-down menu ”Any; separated field”, Type AU in Old TAG. press on the Prep-button. This will create a new.out -file, where all the authors are listed by cases. Step-6:From The List, mark the *.out-file and choose Analyze-> Co-occurence-> Make pairs via listbox. Answer NO to the first question and OK to the second. This will result in an.coc-file.

Co-occurrence analysis Step-7:Use the.coc file to create a network file. Go to Mapping and choose “Create a.net file for Pajek. Say No and yes. The result will be un-directed graph. If we say Yes and Yes, the result will be directed graph. Step-8:Use *.cit file, to create VEC file, go to Mapping > create VEC file. This will result in an *.vec file. Step-9: To partition the co-citation matrix, Use *.coc file. Go to Analyze>Co-occurrence>Cluster pair. The result will create three files *.pe2, *.pe3, *.pe4, and *.pe5

Co-occurrence analysis Step10: Use *.pe2, Go to menu>Mapping>Create clu file Step-11: Importing files *.net, under network, *.vec under vectors, and *.clu under Partition in Pajek. Step-12: After we have opened these files in Pajek, we choose the following option from Pajeck Menu: Draw>Draw-Partition-Vector

Result: Graph

SciMAT- 1.1.03: Hands-On

Why SciMAT? SciMAT has been developed using java programming language and compatible in LINUX, windows and other operating systems. SciMAT source code can be downloaded and modified. SciMAT generates a knowledge base with key parameters like author, title, publishers, keywords, journal name and references etc. Knowledge base has sixteen entities like author, document, affiliation etc. It uses SQLite database to store the knowledge base. Knowledge base can be opened using any database browser that reads SQLite file

SciMAT-1.1.03: Installation Step-1: Download the software from http://sci2s.ugr.es/scimat/download.html http://sci2s.ugr.es/scimat/download.html Step-2: Installation of Java version 6 or more Step-3: To run SciMAT v1.1.03, unpack this zip file and execute the SciMAT jar file.zip file Step-4: User guide can be downloaded here: http://sci2s.ugr.es/scimat/software/v1.01/SciMAT -v1.0-userGuide.pdf Step-5: Install sqlite viewer [ optional & free to download]

SciMAT-1.1.03: Modules

SciMAT-1.1.03: File If a new project is selected, a new window will appear asking for the path where the knowledge base file will be stored and the name of the file. We can give any name for the file.

SciMAT-1.1.03: File Once you open the project, New Project and Existing Project option will be inactive and Add file option will be activated. The add files option allows the user to add bibliographical information, exported from bibliographical databases, to the knowledge base. SciMAT is able to read bibliographical information exported in ISI Web of Knowledge format (ISI-CE) or RIS (Scopus) format.

SciMAT-1.1.03: File New Project: You need to create a project for a particular data. You need to specify the path where the new project will be saved. First time, no need to open the project because it is already active. Open Project: Once you create a new project, you need to open it for further processing and analysis; Close Project: Once the work is over, you need to close the project. Add files: It is possible to add files for analysis. It is better to download data from scopus*.ris format and add here. Export > Groups Note: Export option works after completing the Group work. Exported file can be opened in text editor and it is an xml file.

External Data Source: Scopus

External Data Source: WoS

Download data from Web of Science Step -1: Go the website (subscription-based) Step-2: Put the query in the search box; Step-3: Add to Marked list. It will reflects the Marked List No. Step-4: Save to other file format and choose Plain text as file format to save Step-5: Give the record no. 1 to 500

How to Merge two or more files? 1)Click start button> type cmd 2) Go to folder as: CD C:\ merge 3) C:\merge then press dir to see the files 4)Copy *.csv/txt/ris merge.csv/txt/ris;

SciMAT-1.1.03: Functionality SciMAT uses the SQLite database engine in order to store the knowledge base built in the previous step. The knowledge base can be opened with any database browser that reads SQLite files. Note: you can use Sqlite viewer to view and browse the data stored in SQLite database under Knowledge base.

SciMAT-1.1.03: Knowledge Base The module to manage the knowledge base is responsible for building it, importing the data from different bibliographical sources, and cleaning and fixing the possible errors in the entities. It can be considered as a first stage in the preprocessing step.

Knowledge Base and Managers Knowledge Base SciMAT generates a knowledge base from a set of scientific documents, where the relations of the different entities related with each document (authors, keywords, journal, references, etc.) are stored. The knowledge base is composed of sixteen entities. Knowledge Base Manager The module to manage the knowledge base is responsible for building it, importing the data from different bibliographical sources, and cleaning and fixing the possible errors in the entities.

Knowledge Base and Managers Step-1: The first step in this module is to build a new project or load an existing one. It can be done through the menu File or using the buttons of the toolbar. If a new project is selected, a new window will appear asking for the path where the knowledge base file will be stored and the name of the file. We can give any extension for the file. Step-2: Once you create a new project or you open the project, New Project and Existing Project option will be inactive and Add file option, Knowledge Base Manager, Group Manager, Export and Import option will be activated.

Knowledge Base and Managers Step-3: The add files option allows the user to add bibliographical information, exported from bibliographical databases to the knowledge base. Particularly, SciMAT is able to read bibliographical information exported in ISI Web of Knowledge format (ISI-CE) or RIS (Scopus) format. While adding files you should follow Add Files> In RIS (May 2004 format). Then it will ask a) will you want to import data with reference, if so it will delay the process. You can say yes.

Supported File Format Scopus>RIS format Web of Science> Text format

Period Manager Period Manager>Add>2000-2005 >click add Go to right panel>Add> Add document

SciMAT-1.1.03:Results Top Authors Knowledgebase>Author> Author Manager Top Journals Knowledgebase>Journals> Journal Manager Top Cited Articles Titles Knowledgebase>Documents>Document Manager Year-wise Documents Knowledgebase>Publish Dates> Publish Date Manager

Science Mapping Analysis There are 11 Steps explained in the Manual

Similarity Measures- Cosine Technique Cosine similarity metric finds the normalized dot product of the two attributes. Cosine similarity will try to find cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0,

Similarity Measures-Jaccard’s Index The Jaccard coefficient measures the similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

Bibliographic Coupling Kessler (1963) proposed a technique, known as bibliographic coupling to measure the similarity between two scientific documents in terms of number of citations they make in common. Small (1973) proposed another technique, co- citation to measure the similarity between two documents as the number of common documents that cites both documents.

Cluster analysis Clustering is the process of classifying objects into sub-groups based on some similarity criteria. Hierarchical clustering is a technique by which we can obtain a sequence of partitions in which each partition is nested into the next partition in the sequence. – Agglomerative – Divisive

Agglomerative Clustering The agglomerative algorithm for hierarchical clustering starts by placing each of the objects in the data set in an individual cluster and then gradually merges those individual clusters. The divisive algorithm however, starts with the whole data set as a single cluster and then breaks it down into fewer clusters. Single Link and Complete Link are two hierarchical agglomerative clustering procedures. – Single Link Clustering Algorithm – Complete Link Clustering Algorithm

Single Link Clustering Algorithm In the Single Link clustering algorithm, clusters are merged at each stage by the single shortest link between them. During each iteration, after the clusters x and y are merged, the distance between the new cluster, say n, and some other cluster, say z, is given by dnz = min(dxz, dyz), where dnz denotes the distance between the two closest members of clusters n and z. If the clusters n and z were to be merged, then for any object in the resulting cluster, the distance to its nearest neighbor would be at most dnz.

Complete Link Clustering algorithm In the Complete Link clustering algorithm, at each stage, after the clusters x and y are merged, the distance between the new cluster, say n, and some other cluster, say z, is given by dnz = max(dxz, dyz), where dnz denotes the distance between the most distant members of clusters n and z. If n and z were to be merged, then every object in the resulting cluster would be no farther than dnz from every other object in the cluster. All objects in the cluster are thus linked to each other at some maximum distance.

SciMAT-1.1.03:Results-Filter

SciMAT: Limitations SciMAT does not support *.csv files from Scopus;

Thank you

Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r.

Similar presentations

Presentation on theme: "Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r.

Similar presentations

Presentation on theme: "Open Source Software(OSS) for Bibliometric and Scientometrics Analysis Dr. Samir Kumar Jalal Dy. Librarian, Central Library, IIT Kharagpu r."— Presentation transcript:

Similar presentations

About project

Feedback