Jevin West, Information School, University of Washington Ian Wesley-Smith, Information School, University of Washington Carl T. Bergstrom, Department of.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Information Networks Link Analysis Ranking Lecture 8.
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Efficient and Robust Computation of Resource Clusters in the Internet Efficient and Robust Computation of Resource Clusters in the Internet Chuang Liu,
Link Analysis, PageRank and Search Engines on the Web
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Not all Journals are Created Equal! Using Impact Factors to Assess the Impact of a Journal.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
BIBLIOMETRICS & IMPACT FACTORS University of Idaho Library Workshop Series October 15 th, 2009.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
CHOOSING THE APPROPRIATE MEDIUM FOR YOUR PUBLICATION: JOURNAL SELECTION TACTICS SCIENTIFIC LAWRENCE LIBERTI, MS, RPh VP, GENERAL MANAGER JUNE 2008.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
JOURNAL CITATION REPORTS ® – “THE JCR” FEBRUARY 2009 ENHANCEMENTS GSS – Thomson Reuters, Scientific Business, A&G January 2009.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Standards in science indicators Vincent Larivière EBSI, Université de Montréal OST, Université du Québec à Montréal Standards in science workshop SLIS-Indiana.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
JOURNAL CITATION REPORTS ® “THE JCR” Jon Stroll, Key Account Manager January 2011.
Bruno Ribeiro CS69000-DM1 Topics in Data Mining. Bruno Ribeiro  Reviews of next week’s papers due Friday 5pm (Sunday 11:59pm submission closes) ◦ Assignment.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Database collection evaluation An application of evaluative methods S519.
Relative Validity Criteria for Community Mining Evaluation
Slides are modified from Lada Adamic
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
When Affinity Meets Resistance On the Topological Centrality of Edges in Complex Networks Gyan Ranjan University of Minnesota, MN [Collaborators: Zhi-Li.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
1 CS 430: Information Discovery Lecture 5 Ranking.
Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems M. RosvallM. Rosvall and C. T. BergstromC.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Recommendation in Scholarly Big Data
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Demonstrating Scholarly Impact: Metrics, Tools and Trends
Xiang Li,1 Lili Mou,1 Rui Yan,2 Ming Zhang1
Gyan Ranjan University of Minnesota, MN
Search Engines and Link Analysis on the Web
Link-Based Ranking Seminar Social Media Mining University UC3M
Advanced Scientometrics Workshop
Centrality in Social Networks
UC policy states:  "Superior intellectual attainment, as evidenced both in teaching and in research or other creative achievement, is an indispensable.
Apache Spark & Complex Network
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Scale-Space Representation of 3D Models and Topological Matching
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
An Introducation to ResearcherID
Query Independent Scholarly Article Ranking
3.3 Network-Centric Community Detection
Graph and Link Mining.
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Presented by Nick Janus
Presentation transcript:

Jevin West, Information School, University of Washington Ian Wesley-Smith, Information School, University of Washington Carl T. Bergstrom, Department of Biology, University of Washington Article-Level EigenFactor (ALEF)

Article-level Eigenfactor WSDM Cup Challenge

Journal Ranking Leading eigenvector of the random walk matrix P. Normalization West, JD et al. (2010) College of Research Libraries

Hierarchical Mapping without ALEF

Hierarchical Mapping with ALEF

Flow Distribution PageRan k ALE F Time

Smart Teleportation Lambiottee & Rosvall (2012) PhysRevE

1. calculate step weight 2. make row stochastic 3. one-step on network adjacency matric Mechani cs

Flow Distribution (JSTOR) Total Flow/Paper Year Black = ALEF Green = citations Red = PageRank Blue = unrecorded teleport

Tree Depth Cluster Size Year Tree Depth and Cluster Size Black = ALEF Green = OUTDIR Red = DIR-R Blue = DIR-UR

ALEF Strengths Performs well Simple mechanics Fast calculation High resolution partitions

West, Wesley-Smith, Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data

Papers  J.D. West, M. Rosvall, C.T. Bergstrom (2016) Ranking and mapping article-level citation networks, in prep  J.D. West, I. Wesley-Smith, C.T. Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big DataA recommendation system based on hierarchical clustering of an article-level citation network.  I. Wesley-Smith, C.T. Bergstrom, J.D. West (2016) Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF), WSDM Conference: Entity Ranking Challenge Workshop Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF)  I. Wesley-Smith, J.D. West (2016) Babel: A platform for research in scholarly article recommendation. WWW Conference, Workshop on Big Scholarly DataBabel: A platform for research in scholarly article recommendation

babel.eigenfactor.o rg Ian Wesley- Smith

Article-level Eigenfactor WSDM Cup Challenge

Data Pipeline Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Citation Scores Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Citation Scores Average Paper Score by Year (ALEF) Average Score Year

Citation Variants 24

Author Scores Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Author Scores Author Score = Average citation score of all papers How should paper credit be assigned? –Equally or Fractional? Why not sum? –Unique Scores: 72.15% vs 28.27%

Other Features? Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data Affiliation Score?

Other Features Matching datasets is hard Author Affiliation: University of Washington –george washington university –university of washington bioengineering –university of washington information school –university of washington school of law –university of washington tacoma –university of washington bothell Coverage is low: 25% of paper-author pairs have an affiliation 28

Blend Features Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Blend Features Weighted Average –Weights found via manual parameter sweep –Citation Score: 70% –Author Score: 30% Axiom: Derived scores shouldn’t outweigh the source

Randomize Zeroes Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Random Chance? Our best isn’t much better than random –Random: 52.6% –1 st : 68.3% (+30%) This judging is favorable to random chance Unscored papers assigned [0, minval * 0.999] 32

Phase I Results Citation Score Author Scores Blend Features Randomize Zeroes Final Scores Raw Data

Submissions 34

Submissions

ALEF Paper Scores Average Paper Score by Year (ALEF) Average Score Year

Final Paper Scores Average Paper Score by Year (Final) Average Score Year

Phase I – Evaluation Results th 38

Phase I – Test Results > (-3.3%) 15 th -> 2 nd 39

Eigenfactor™ & Author Scores 40

Logistics Phase II –Verticies 49,870,036 –Edges 949,577,946 Calculate Citation Scores: 34 minutes Build Paper-Author Matrix: ~2 hours Calculate Author Scores: 2 minutes Author Score Feature: 5 minutes Blending: 30 seconds

ALEF Summary Simple, fast variant of PageRank for article-level citation networks Ranks and maps More experiments and modifications Data cleaning issues Thanks to Microsoft Academic Graph and WSDM Cup Challenge

Acknowledgemen ts Carl Bergstrom, Department of Biology, University of Washington Daril Vilhena, Department of Biology, University of Washington Martin Rosvall, Department of Physics, Umea University Aditya Gandhi, Information School, University of Washington Metaknowledge Network, Templeton Foundation

Resources Info, Data, Code - Babel - J.D. West, M. Rosvall, C.T. Bergstrom (2016) Ranking and mapping article-level citation networks, in prep J.D. West, I. Wesley-Smith, C.T. Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big DataA recommendation system based on hierarchical clustering of an article-level citation network. I. Wesley-Smith, C.T. Bergstrom, J.D. West (2016) Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF), WSDM Conference: Entity Ranking Challenge WorkshopStatic Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF) I. Wesley-Smith, J.D. West (2016) Babel: A platform for research in scholarly article recommendation. WWW Conference, Workshop on Big Scholarly DataBabel: A platform for research in scholarly article recommendation Jevin West - Ian Wesley-Smith –