Presentation is loading. Please wait.

Presentation is loading. Please wait.

A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011.

Similar presentations


Presentation on theme: "A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011."— Presentation transcript:

1 A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011

2 Introduction Thanks – Ángel Sánchez Villegas for usage of the e-lis domain. – To Tomas Baiget, who has encouraged me to present here. Warnings – Data shown here were correct as of 1 November 2011. – I am glossing over some technical details. – Over 30 slides

3 overview Introduction to AuthorClaim Introduction to a co-authorship network based on restricting AuthorClaim to E-LIS documents Web interface and campaign

4 a known problem In publishing systems such as E-LIS, the authors are usually entered by name. It is well known that the name of an author does not identify a author – multiple ways to express the name of the same person – multiple people sharing one expression of their names

5 a tried solution One way to partially solve this problem is to have a system where authors can – claim papers that they have written – disclaim papers written by their homonyms The first system of this kind was the RePEc Author Service – created by Thomas Krichel in 1999 – now has registered over 30000 economists

6 AuthorClaim AuthorClaim is an interdisciplinary version of the RePEc Author Service. It was created by Thomas Krichel in 2008. Lives at http://authorclaim.org. Over 100000000 authorships of over 35000000 documents can be claimed. Among the documents are the E-LIS papers.

7 445 E-LIS papers claimed … 72 Tomas Baiget 61 Ulrich Herb 43 Antonella De Robbio 39 Thomas Krichel 26 Andrea Marchitelli & fernanda peset, 20 Ross MacIntyre 16 Dirk Lewandowski 15 Bożena Bednarek-Michalska 14 Lidia Derfert-Wolf 11 Zeno Tajoli & Imma Subirats

8 by 36 authors 9 Derek Law & Emma McCulloch & Philipp Mayr 8 Jeffrey Beall 7 nuria Lloret Lloret Romero 6 Benjamin John Keele 5 Adrian Pohl & Maria Francisca Abad-Garcia 4 Walther Umstaetter 3 Andrea Scharnhorst & Jose Manuel Barrueco & Thomas Hapke & Christian Hauschke & Klaus Graf 2 Frank Havemann & Eberhard R. Hilf & Bhojaraju Gunjal & Chris L. Awre 1 Loet Leydesdorff & Peter Bolles Hirtle & Alexei Botchkarev & Christina K. Pikas & Oliver Flimm & Sridhar Gutam

9 so far so good I don’t really want to talk about AuthorClaim but about a services that we can build when we have identified authors. When we have this data, we can find out who has been writing papers with whom. In other words we can study the co-authorship network.

10 co-authorship When two registered author claim to have authored the same paper, we say that they are co-authors. The authorship relationship creates a link between the two authors. The link is symmetric, meaning that the fact that Thomas is a co-author of Imma means that Imma is a co-author of Thomas.

11 58 papers have been co-claimed … 12 fernanda peset 10 Tomas Baiget 8 Imma Subirats 6 Antonella De Robbio 4 nuria Lloret Lloret Romero

12 by 16 co-authors 2 Andrea Marchitelli & Ulrich Herb & Ross MacIntyre & Bożena Bednarek-Michalska & Thomas Krichel & Dirk Lewandowski & Lidia Derfert-Wolf 1 Derek Law & Emma McCulloch & Sridhar Gutam & Philipp Mayr

13 network and components When we start with one co-author, and we move to her co-authors, what other authors can be reach? We call the authors we can reach by starting from any one of them by following co- authorship relationships a component of the network.

14 components in the network “Scottish”: Derek Law & Emma McCulloch “Polish”: Bożena Bednarek-Michalska & Lidia Derfert-Wolf “German”: Dirk Lewandowski & Sridhar Gutam & Philipp Mayr “Giant”: Andrea Marchitelli & Ulrich Herb & Thomas Krichel & Antonella De Robbio & fernanda peset & Imma Subirats & Ross MacIntyre & nuria Lloret Lloret Romero & Tomas Baiget

15 the giant component The size of the giant component is larger than the combined size of all other component. It is very common, in real existing networks, that there is a giant component. As the network grows, older small components join the giant component and new small components are created. We therefore study the giant component.

16 centrality Who is at the center of the E-LIS author network, i.e. the most central author in E-LIS? The answer is that it depends on how we measure centrality. Two measures are commonly used – closeness centrality – betweenness centrality Both depend on a measure of distance

17 distance To understand that we need a measure of distance. – We say that two authors have distance one if they are co-authors. – We say that two authors have distance two if they are not co-authors, but have a common co-author. – etc

18 distances for Imma Subirats Tomas Baiget 1 Antonella De Robbio 1 Ulrich Herb 2 Thomas Krichel 1 nuria Lloret Lloret Romero 2 Andrea Marchitelli 2 Ross MacIntyre 2 fernanda peset 1 Imma Subirats 0

19 distances for Ulrich Herb Tomas Baiget 1 Antonella De Robbio 3 Ulrich Herb 0 Thomas Krichel 2 nuria Lloret Lloret Romero 3 Andrea Marchitelli 4 Ross MacIntyre 4 fernanda peset 2 Imma Subirats 2

20 closeness centrality The average distance of Imma is much small than the average distance of Ulrich. In fact, we can calculated to average distance of the every author from all other authors. This is what we call closeness centrality of an author.

21 shortest paths In order to find the distance between two authors, we have to evaluate all possible paths between them. We need to find shortest paths between. There are well-known algorithms to find them. The distance is the length of the shortest path.

22 diameter When we have found all shortest paths, we can find the length of the longest shortest paths between any two authors. This is called the diameter. In our network the diameter is four. This much smaller than the number of authors in the giant component (16). We say that our network has the “small world” property.

23 shortest paths from Tomas Baiget → Thomas Krichel → fernanda peset → nuria Lloret Lloret Romero → fernanda peset → Imma Subirats → Antonella De Robbio → Ross MacIntyre → Ulrich Herb → Imma Subirats → Antonella De Robbio → Imma Subirats → Antonella De Robbio → Andrea Marchitelli → Imma Subirats

24 shortest paths from Antonella De Robbio → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero → Imma Subirats → Imma Subirats → Tomas Baiget → Ulrich Herb → Imma Subirats → Tomas Baiget → Imma Subirats → fernanda peset → Andrea Marchitelli → Ross MacIntyre → Thomas Krichel

25 shortest paths from Ross MacIntyre → Antonella De Robbio → Imma Subirats → fernanda peset → nuria Lloret Lloret Romero → Antonella De Robbio → Imma Subirats → fernanda peset → Antonella De Robbio → Imma Subirats → Tomas Baiget → Ulrich Herb → Antonella De Robbio → Thomas Krichel → Antonella De Robbio → Imma Subirats → Tomas Baiget → Antonella De Robbio → Imma Subirats → Antonella De Robbio → Andrea Marchitelli

26 what do the paths tell us? We find that some authors are appearing more often as intermediaries than other authors. In fact, we can evaluate the number of times an author appears as an intermediary in the paths. This is what we call the betweenness centrality of an author. A large number of authors have a betweenness of zero. They are called marginal authors.

27 summary We build a network. We find two ways to evaluate authors – closeness – betweenness Now let us look at the results.

28 ranking for closeness rank name closeness 1 Imma Subirats 1.5 2 Antonella De Robbio 1.75 2 Tomas Baiget 1.75 2 Thomas Krichel 1.75 5 fernanda peset 1.875 6 Andrea Marchitelli 2.5 6 Ross MacIntyre 2.5 8 Ulrich Herb 2.625 9 nuria Lloret Lloret Romero 2.75

29 ranking for betweenness rank name betweenness 1 Antonella De Robbio 2.7 1 Imma Subirats 2.7 3 Tomas Baiget 2.025 4 fernanda peset 1.575 Andrea Marchitelli, Ross MacIntyre, nuria Lloret Lloret Romero, Thomas Krichel, Ulrich Herb are all marginal.

30 web service E-LIS and AuthorClaim data are readily available in bulk. There is a software called icanis, developed by yours truly, that can calculate and visualize results. It is configurable via XSLT. Almost instantaneous updates are in principle possible, but not implemented.

31 coll.e-lis.org This is a site that I have set up. I think we need a site in the rclis domain but I am not sure what the name should be. coll.e-lis.org is a bad name too. So this is meant as a prototype.

32 features Rankings for closeness. Full path searching from author pages – with support for partial name entry – but within there no highlighting for parts Unclear documentation

33 ranking Ranking is the way forward with populating scholarly communication services. RePEc has shown this time and again. Co-authorship ranking is particularly interesting because authors have to convince their co-authors to publish papers in E-LIS and to claim them in AuthorClaim.

34 campaign We need to do some work on the site. Then we can have campaign and award a cash prize. I am thinking about donating $200 to the top of each category or $300 to joint winner. The competition would be time-limited, say about three months next Summer. During that time we would do frequent updates of the site.

35 Thank you for your attention! http://openlib.org/home/krichel write to krichel@openlib.org


Download ppt "A collaboration graph for E-LIS Thomas Krichel Long Island University & Novosibirsk State University & Open Library Society 3 November 2011."

Similar presentations


Ads by Google