Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015.

Similar presentations


Presentation on theme: "Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015."— Presentation transcript:

1 Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015

2 Complex Network Multidisciplinary : Mathematics(Graph theory), Physics, Social Science, Informetrics, … Network Science: an emergent cross-disciplinary area

3 Examples of complex networks Internet WWW Transport networks Protein interaction networks Social networks...

4 Graph Theory Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in 1736. Other problems in modern science

5 Definition of Graph(Network) G ={V, E} V is a set of nodes, points, or vertices. E is a set of edges, lines, ties, or connections. an adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to which other vertices. V:={1,2,3,4,5,6} E:={{1,2},{1,5},{2,3},{2,5},{3,4},{4,5},{4,6} }

6 Types of Graph Undirected Graph Directed Graph Unweighted Graph Weighted Graph each edge has an associated weight, usually given by a weight function w: E  R.

7 Graph Structures Path Connectivity Component

8 Structural Measures Degree Centrality Number of edges incident on a node In-degree: Number of edges entering Out-degree: Number of edges leaving

9 Structural Measures Length of Shortest Path Length of Shortest Path Diameter Diameter Node scale Node scale Edge scale Edge scale Density Density Betweenness centrality Betweenness centrality Closeness centrality Closeness centrality Eigenvector centrality Eigenvector centrality Edge betweenness Edge betweenness Cluster coefficient Cluster coefficient

10 Research in network science To generalize statistical properties of complex networks: Small-World Network: small diameter, large cluster coefficient Small-World Network: small diameter, large cluster coefficient Scale-free Network: power law degree distribution Scale-free Network: power law degree distribution …. …. Research Paradigm: model/reflect complex circumstance with networks ??What’s the physical meanings of these generalize statistical properties in specific domain

11 Community Structure in Complex Network Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the national academy of sciences,99(12), 7821-7826. In social network analysis: Clique, Clan, K-shell, … to identify interesting social groups Definition: For a subgraph, internal degree is larger than external degree. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. PNAS, 101(9), 2658-2663.

12 Community Structure in Scientific Communications The scientific community is a diverse network of interacting scientists. It includes many "sub-communities" working on particular scientific fields.(Wikipedia) Sociology of Science Philosophy of Science Kuhn, The Structure of Scientific Revolutions KuhnThe Structure of Scientific Revolutions abstract vague to identify members What’s that on earth? Research group? Research team? Research institution?

13 Community Structure in Scientific Communications Implications by the research paradigm of complex network : model the scientific system with networks, detect scientific communities from the networks of scientists. Scholarly networks: Coauthor network Author citation network Author coupling network …. Semantic methods: Author clustering  Combine them both Topology-based community detection Topic-based community detection (Ding, 2010)

14 Community Structure in Scientific Communication In scientometrics, scientists hold the opinion that the community structure can reflect the structure of science It’s an approach to understand researchers, topics, publications,…. and their relations

15 Community Structure in Scientific Communication Finally, we find some methods to discover scientific communities. And the research on detecting scientific communities is still on the way by exploring various features of the scientific circumstance. We need to go further: Where is scientific community from? How does scientific community emerge? What are the properties of scientific community? Statistical & on ground What’s the role of scientific community? In this paper, we have observed a new form of scientific community with topic constrains, i.e., topical scientific community, and attempted to investigate the properties of topical scientific community rooted in the scientific circumstance.

16 Topical Scientific Community Definition in the research progress on a specific topic, a groups of researchers form a topical scientific community to address research questions of a specific topic by collaborating with each other intensively. Two significant features Interact in the same semantic space Form collaborations

17 Topical Scientific Community Figure 1. The conception of topical scientific community

18 Topical Scientific Community Detection Approach Dataset: Web of Science (WoS), metrics field(~2014), 6959 papers Fields: title, abstract, author, address, responding author, and year Author name disambiguation: 1, standardize: surname plus initials of all the words in the given names, e.g., “Strotmann, Andreas” is transformed into “Strotmann, A”. 2, extract the affiliations of the author names, keep the organizations: “School of Library and Information Science, Indiana University” will be extracted as “Indiana University”. 3, disambiguate: a)Generally, one name with the same organization is treated as a distinct author name. However, one author can have many organizations in practice. b) In a particular paper, the same standardized name with multiple organizations is assumed to be the same author.

19 Topical Scientific Community Detection Approach Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of machine Learning Research, 2003, 3:993-1022. We get: Topic z: term distribution, P(w|z) Document d: topic distribution, P(z|d) Discover topics: In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. LDA Graphic Model

20 Topical Scientific Community Detection Approach Rosen-Zvi M, Griffiths T, Steyvers M, et al. The author-topic model for authors and documents[C]//Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004: 487-494. We get: Topic z: term distribution, P(w|z) Author x: topic distribution, P(z|x) We can infer: Document d: topic distribution, P(z|d) Author Topic model(AT): authors have their word preferences pertaining to their research topics and to write a paper is to generate words from the topics of its authors. AT Graphic Model

21 Topical Scientific Community Detection Approach 1) Construct topical scientific collaboration networks for the topics. the authors pertaining to the topic become the nodes of the network the collaborations between the authors form the edges of the network 2) Detect components as topical scientific community. Any two components are isolated from each other showing no collaboration between their authors.

22 Results Topics AT LDA The optimal range for K seems to be about from 40 to 80. 50 topics are reported.

23 Results Topics Topic 2: GeographTopic 10: Patent Topic 14 : Trends 24countri0.195135technolog0.186651new0.1088 71output0.059736patent0.1833780chang0.0532 83nation0.04912392innov0.061133develop0.0512 1665european0.04042350industri0.05231553emerg0.0431 5149world0.04044135sector0.023865gener0.0261 4955usa0.0274935compani0.02122168histori0.0216 4018Musavi,SM0.3800189Zhang,Y0.47695808Block,JA0.3297 1694Miguel,S0.33336341Heimeriks,G0.32637187Babu,AR0.2609 7017Lindqvist,OV0.3279407Erfanmanesh,M0.32613707Bhavnani,SK0.2508 2773Anwar,MA0.31031633Tang,J0.32147660Marcondes,CH0.2143

24 Results Topical Scientific Collaboration Networks Metrics# of nodes# of edges Edge weights Density Min78711911.74E-04 Max1250385368.89E-04 Avg962.68220.71.234.73E-04 Std.dev87.5773.241.031.57E-04

25 Results Topical Scientific Communities Topic # of com. Topic # of com. Topic # of com. Topic # of com. Topic # of com. 0147101572010130714090 1 111272185311084175 21551212622148321034298 310813832312433864394 489146624125341094474 59515122251103511745149 696161012610036804669 711517104277437804774 884188728157381154887 972197429111391234975 5110 communities in 50 topics

26 Results Topical Scientific Communities Figure 5. The Network of Topical Scientific Community “C12_41” 7 member authors have coauthored 2 papers in this topic, forming 19 internal collaborations

27 The Characteristics of Topical Scientific Community V.S. Global Scientific Community Network metrics Topical Collaboration Networks(Avg.) Metrics Slovenian Scientists Synthetic Chemistry No. of Nodes2648,10612,6096,645 No. of Communities1021,674689532 Avg. No. of Nodes in Communities 2.584.8418.3012.49 Global Scientific Community is detected from the collaboration network for a discipline or disciplines. modularity based approaches: maximize inner links and minimize outer links for the communities.

28 The Characteristics of Topical Scientific Community VS Global Scientific Community

29 The Characteristics of Topical Scientific Community Statistical Properties MetricsNodesEdges Edge Weights Density Min2110.11 Max2748361.00 Avg2.582.121.230.95 Std.dev1.412.711.030.13 topical scientific community is a kind of meso-level structure emerging from the collaboration of researchers members interact intensively.

30 The Characteristics of Topical Scientific Community The Contributions of Topical Scientific Community Fewer authors, significant portion of papers Improved author productivities

31 Discussion Topical scientific community emerges from the author collaborations in the research activities. Topical coherence drives the collaborations between researchers for some part. Topical scientific community reflects some kind of research organization with high productivity. Implications Limitations One dataset Topic is latent and unsupervised, to v.s. other methods. Future study Generalize and go further on its characteristics Dynamics: growth law, coevolve with topics …

32 Beyond the paper Graph representation is used in text mining Some tasks can be addressed by using network measures: ranking entities/texts, keyword/key phrases extraction, feature selection,… Implications

33 Thank you! Q&A


Download ppt "Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015."

Similar presentations


Ads by Google