A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011.

Slides:



Advertisements
Similar presentations
Company Name Sample Template Presenter Name
Advertisements

INTRODUCTION TO BLOGGING. WHAT IS A BLOG? Blogs are all about opening up your knowledge, expertise, processes and goals to your customers Blogs are online.
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Rclis in vision and reality Thomas Krichel
Document data & personal data Thomas Krichel Long Island University & Novosibirsk State University
Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary repositories Monterey.
Use your bean. Count it. Thomas Krichel
Four slides for the future Thomas Krichel given at 4 th International Socionet seminar Novosibirsk
LIS618 lecture 6 Thomas Krichel structure DIALOG –basic vs additional index –initial database file selection (files) Lexis/Nexis.
Michael Maune Carl von Ossietzky University, Oldenburg and Institute for Science Networking Oldenburg Distributed Open Access Reference Citations Service.
Analysis and Modeling of Social Networks Foudalis Ilias.
Web as Network: A Case Study Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
Uses and Abuses of the Efficient Frontier Michael Schilmoeller Thursday May 19, 2011 SAAC.
1 © 2010 SAGA Worldwide, LLC. All Rights Reserved.
Modelled on paper by Oren Etzioni et al. : Web-Scale Information Extraction in KnowItAll System for extracting data (facts) from large amount of unstructured.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
Chapter 8 Estimating Single Population Parameters
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
1 Summary Statistics Excel Tutorial Using Excel to calculate descriptive statistics Prepared for SSAC by *David McAvity – The Evergreen State College*
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
THE BASICS OF THE WEB Davison Web Design. Introduction to the Web Main Ideas The Internet is a worldwide network of hardware. The World Wide Web is part.
 Turn away from your computer, desk, or other work  Have a pen and paper nearby  Answer the calls promptly, by the second or third ring  Smile as.
Business Memo purpose of writer needs of reader Memos solve problems
Internet and Social Networking Research Tools for Academic Writing Copyright © 2014 Todd A. Whittaker
Any questions on today’s homework? (Sections 1.6/1.7) Reminder: You should be doing this homework without using a calculator, because calculators can’t.
1 Session Number Presentation_ID © 2001, Cisco Systems, Inc. All rights reserved. Using the Cisco TAC Website for IP Routing Issues Cisco TAC Web Seminar.
Open Bibliographic Data and Author Claiming James R. Griffin III 1, 3 and Thomas Krichel 1, 2, 3 1 Long Island University 2 Novosibirsk State University.
Web Site Evaluation (or “What Makes a Good the Kenmore West High School Library Media Center.
Challenges for the E-LIS team Thomas Krichel LIU & HГУ 2007–11—14.
Finding Book Reviews H. Calogeridis R. Caldwell UW Library Last Updated: March 2005.
LIS618 lecture 4 before searching + introduction to dialog Thomas Krichel
INFO4990 IT Research Methods Lecture on Research Ethics (Alan Fekete, based in part on slides by Judy Kay)
Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.
Research evaluation requirements José Manuel Barrueco Universitat de València (SPAIN) Servei de Biblioteques i Documentació May, 2011.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Chapter 1 Object-Oriented Analysis and Design. Disclaimer Slides come from a variety of sources: –Craig Larman-developed slides; author of this classic.
Introduction With TimeCard users can tag SharePoint events with information that converts them into time sheets. This way they can report.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
Moodle (Course Management Systems). Blogs In this Lecture, we’ll cover how to use blogs, blog capablilities and efficive blog practices.
Building a discipline-specific aggregate for computing and library and information science Thomas Krichel Long Island University, NY, USA
What makes a good interactive resume? Click for detailed information Multimedia Navigation Communication.
Structural Properties of Networks: Introduction Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
 Finding Scholarly Research on Your Topic. Your Research Journey…  You have, at this point, found information on your topic from general sources – news.
© A. Kwasinski, 2014 ECE 2795 Microgrid Concepts and Distributed Generation Technologies Spring 2015 Week #7.
Step 6 Headings And Paragraphs. Background You found a really good web page, with lots of information. But where to begin? There sure is, let’s see how.
Evaluating Web Pages Techniques to apply and questions to ask.
Sketches and prototypes for the Orlando Six Degrees of Separation Project.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Any questions on today’s homework? (Sections 1.6/1.7) Reminder: You should be doing this homework without using a calculator, because calculators can’t.
Internet Advancement Ore-Ida Council Boy Scouts of America.
A beginner ’ s guide to web research. 5 Questions to Ask About Research Websites.
SCARAB Substance No depth or written for children. Lacking the depth needed for your purpose. Written for the general public. Depth of coverage.
LIS618 last lecture building a search interface Thomas Krichel
1 CS 430: Information Discovery Lecture 5 Ranking.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Calgary Property Address Paulina Richmond Presented By : Your Name
Structural Properties of Networks: Introduction
Analysis of University Researcher Collaboration Network Using Co-authorship Jiadi Yao School of Electronic and Computer Science,
Structural Properties of Networks: Introduction
Kanban Task Manager for Outlook ‒ Introduction
Introduction With TimeCard users can tag SharePoint events with information that converts them into time sheets. This way they can report.
Structural Properties of Networks: Introduction
A Comparative Study of Link Analysis Algorithms
CS 425/625 Software Engineering Architectural Design
Thomas Krichel Long Island University, NY, USA
Citation Map Visualizing citation data in the Web of Science
Presentation transcript:

A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

Introduction Thanks – Ángel Sánchez Villegas for usage of the e-lis domain. – To Tomas Baiget, who has encouraged me to present here. Warnings – Data shown here were correct as of 1 November – I am glossing over some technical details. – Over 30 slides

overview Introduction to AuthorClaim Introduction to a co-authorship network based on restricting AuthorClaim to E-LIS documents Web interface and campaign

a known problem In publishing systems such as E-LIS, the authors are usually entered by name. It is well known that the name of an author does not identify a author – multiple ways to express the name of the same person – multiple people sharing one expression of their names

a tried solution One way to partially solve this problem is to have a system where authors can – claim papers that they have written – disclaim papers written by their homonyms The first system of this kind was the RePEc Author Service – created by Thomas Krichel in 1999 – now has registered over economists

AuthorClaim AuthorClaim is an interdisciplinary version of the RePEc Author Service. It was created by Thomas Krichel in Lives at Over authorships of over documents can be claimed. Among the documents are the E-LIS papers.

445 E-LIS papers claimed … 72 Tomas Baiget 61 Ulrich Herb 43 Antonella De Robbio 39 Thomas Krichel 26 Andrea Marchitelli & fernanda peset, 20 Ross MacIntyre 16 Dirk Lewandowski 15 Bożena Bednarek-Michalska 14 Lidia Derfert-Wolf 11 Zeno Tajoli & Imma Subirats

by 36 authors 9 Derek Law & Emma McCulloch & Philipp Mayr 8 Jeffrey Beall 7 nuria Lloret Lloret Romero 6 Benjamin John Keele 5 Adrian Pohl & Maria Francisca Abad-Garcia 4 Walther Umstaetter 3 Andrea Scharnhorst & Jose Manuel Barrueco & Thomas Hapke & Christian Hauschke & Klaus Graf 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju Gunjal & Chris L. Awre 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei Botchkarev & Christina K. Pikas & Oliver Flimm & Sridhar Gutam

so far so good I don’t really want to talk about AuthorClaim but about a services that we can build when we have identified authors. When we have this data, we can find out who has been writing papers with whom. In other words we can study the co-authorship network.

co-authorship When two registered author claim to have authored the same paper, we say that they are co-authors. The authorship relationship creates a link between the two authors. The link is symmetric, meaning that the fact that Thomas is a co-author of Imma means that Imma is a co-author of Thomas.

58 papers have been co-claimed … 12 fernanda peset 10 Tomas Baiget 8 Imma Subirats 6 Antonella De Robbio 4 nuria Lloret Lloret Romero

by 16 co-authors 2 Andrea Marchitelli & Ulrich Herb & Ross MacIntyre & Bożena Bednarek-Michalska & Thomas Krichel & Dirk Lewandowski & Lidia Derfert-Wolf 1 Derek Law & Emma McCulloch & Sridhar Gutam & Philipp Mayr

network and components When we start with one co-author, and we move to her co-authors, what other authors can be reach? We call the authors we can reach by starting from any one of them by following co- authorship relationships a component of the network.

components in the network “Scottish”: Derek Law & Emma McCulloch “Polish”: Bożena Bednarek-Michalska & Lidia Derfert-Wolf “German”: Dirk Lewandowski & Sridhar Gutam & Philipp Mayr “Giant”: Andrea Marchitelli & Ulrich Herb & Thomas Krichel & Antonella De Robbio & fernanda peset & Imma Subirats & Ross MacIntyre & nuria Lloret Lloret Romero & Tomas Baiget

the giant component The size of the giant component is larger than the combined size of all other component. It is very common, in real existing networks, that there is a giant component. As the network grows, older small components join the giant component and new small components are created. We therefore study the giant component.

centrality Who is at the center of the E-LIS author network, i.e. the most central author in E-LIS? The answer is that it depends on how we measure centrality. Two measures are commonly used – closeness centrality – betweenness centrality Both depend on a measure of distance

distance To understand that we need a measure of distance. – We say that two authors have distance one if they are co-authors. – We say that two authors have distance two if they are not co-authors, but have a common co-author. – etc

distances for Imma Subirats Tomas Baiget 1 Antonella De Robbio 1 Ulrich Herb 2 Thomas Krichel 1 nuria Lloret Lloret Romero 2 Andrea Marchitelli 2 Ross MacIntyre 2 fernanda peset 1 Imma Subirats 0

distances for Ulrich Herb Tomas Baiget 1 Antonella De Robbio 3 Ulrich Herb 0 Thomas Krichel 2 nuria Lloret Lloret Romero 3 Andrea Marchitelli 4 Ross MacIntyre 4 fernanda peset 2 Imma Subirats 2

closeness centrality The average distance of Imma is much small than the average distance of Ulrich. In fact, we can calculated to average distance of the every author from all other authors. This is what we call closeness centrality of an author.

shortest paths In order to find the distance between two authors, we have to evaluate all possible paths between them. We need to find shortest paths between. There are well-known algorithms to find them. The distance is the length of the shortest path.

diameter When we have found all shortest paths, we can find the length of the longest shortest paths between any two authors. This is called the diameter. In our network the diameter is four. This much smaller than the number of authors in the giant component (16). We say that our network has the “small world” property.

shortest paths from Tomas Baiget → Thomas Krichel → fernanda peset → nuria Lloret Lloret Romero → fernanda peset → Imma Subirats → Antonella De Robbio → Ross MacIntyre → Ulrich Herb → Imma Subirats → Antonella De Robbio → Imma Subirats → Antonella De Robbio → Andrea Marchitelli → Imma Subirats

shortest paths from Antonella De Robbio → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero → Imma Subirats → Imma Subirats → Tomas Baiget → Ulrich Herb → Imma Subirats → Tomas Baiget → Imma Subirats → fernanda peset → Andrea Marchitelli → Ross MacIntyre → Thomas Krichel

shortest paths from Ross MacIntyre → Antonella De Robbio → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero → Antonella De Robbio → Imma Subirats → fernanda peset → Antonella De Robbio → Imma Subirats → Tomas Baiget → Ulrich Herb → Antonella De Robbio → Thomas Krichel → Antonella De Robbio → Imma Subirats → Tomas Baiget → Antonella De Robbio → Imma Subirats → Antonella De Robbio → Andrea Marchitelli

what do the paths tell us? We find that some authors are appearing more often as intermediaries than other authors. In fact, we can evaluate the number of times an author appears as an intermediary in the paths. This is what we call the betweenness centrality of an author. A large number of authors have a betweenness of zero. They are called marginal authors.

summary We build a network. We find two ways to evaluate authors – closeness – betweenness Now let us look at the results.

ranking for closeness rank name closeness 1 Imma Subirats Antonella De Robbio Tomas Baiget Thomas Krichel fernanda peset Andrea Marchitelli Ross MacIntyre Ulrich Herb nuria Lloret Lloret Romero 2.75

ranking for betweenness rank name betweenness 1 Antonella De Robbio Imma Subirats Tomas Baiget fernanda peset Andrea Marchitelli, Ross MacIntyre, nuria Lloret Lloret Romero, Thomas Krichel, Ulrich Herb are all marginal.

web service E-LIS and AuthorClaim data are readily available in bulk. There is a software called icanis, developed by yours truly, that can calculate and visualize results. It is configurable via XSLT. Almost instantaneous updates are in principle possible, but not implemented.

coll.e-lis.org This is a site that I have set up. I think we need a site in the rclis domain but I am not sure what the name should be. coll.e-lis.org is a bad name too. So this is meant as a prototype.

features Rankings for closeness. Full path searching from author pages – with support for partial name entry – but within there no highlighting for parts Unclear documentation

ranking Ranking is the way forward with populating scholarly communication services. RePEc has shown this time and again. Co-authorship ranking is particularly interesting because authors have to convince their co-authors to publish papers in E-LIS and to claim them in AuthorClaim.

campaign We need to do some work on the site. Then we can have campaign and award a cash prize. I am thinking about donating $200 to the top of each category or $300 to joint winner. The competition would be time-limited, say about three months next Summer. During that time we would do frequent updates of the site.

Thank you for your attention! write to