Recommendation in Scholarly Big Data

Slides:

Advertisements

Similar presentations

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Advertisements

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.

Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:

Active Learning and Collaborative Filtering

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Search Engines and Information Retrieval

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.

Search Engines and Information Retrieval Chapter 1.

Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.

X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.

Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Mining Cross-network Association for YouTube Video Promotion Ming Yan, Jitao Sang, Changsheng Xu*. 1 Institute of Automation, Chinese Academy of Sciences,

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.

EigenRank: A ranking oriented approach to collaborative filtering By Nathan N. Liu and Qiang Yang Presented by Zachary 1.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.

Collaborative Filtering Zaffar Ahmed

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Venue Recommendation: Submitting your Paper with Style Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering, Lehigh University.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Collaborative Deep Learning for Recommender Systems

CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH

Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.

Academic Visualization an insight into academia

How to forecast solar flares?

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Objectives of the Course and Preliminaries

Parallel Density-based Hybrid Clustering

Source: Procedia Computer Science（2015）70:

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

CS341: Project in Mining Massive Datasets Infosession

Data Mining (and machine learning)

How to use Google Scholar for Academic Research?.

Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.

The Value of Reviews Hello everyone! Thanks for joining us today in our weekly sales webinar, today's focus will be the value of reviews for local businesses.

Designing a Research Package

Weakly Learning to Match Experts in Online Community

Proposal for Term Project Operating Systems, Fall 2018

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

iSRD Spam Review Detection with Imbalanced Data Distributions

Data Science Meetup Matthew Renze Data Science Consultant

Movie Recommendation System

ISWC 2013 Entity Recommendations in Web Search

Learning Literature Search Models from Citation Behavior

Applying SVM to Data Bypass Prediction

Learning loves company

Resource Recommendation for AAN

Mashup Service Recommendation based on User Interest and Service Network Buqing Cao ICWS2013, IJWSR.

Probabilistic Latent Preference Analysis

Web Search Engines.

Applying principles of computer science in a biological context

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

The Student’s Guide to Apache Spark

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu

GhostLink: Latent Network Inference for Influence-aware Recommendation

Music Presentations (An explanation).

--WWW 2010, Hongji Bao, Edward Y. Chang

Modeling Topic Diffusion in Scientific Collaboration Networks

Recommender System.

Presentation transcript:

Recommendation in Scholarly Big Data Hello, everyone. Today, I’m going to present you the project I’ve done in this semester. In this project, I focus on the recommendation in scholarly big data especially for acemap. Since you all attend professor Wang’s lector, I think you are all familiar with acemap, I will not introduce it. Lequn Wang 2016.6.10

Motivation ZeroRank Recommender System Contents 1 2 3 In this presentation, I will fisrt talk about the motivation and then show you the two parts of my work which are ZeroRank and Recommender System. 3 Recommender System

Motivation More than 100 million papers in our dataset. For convenience of the users. As professor Wang says, a researcher can only read thousands of papers, a hard-working professor may read 20 thousand papers. However, there are millions of papers out there. They have to make a decision on which paper to read. A recommendation provides the efficiency for researchers, especially new researchers to find good, related papers and that is one of the goals of our acemap system.

ZeroRank Ranking Current Year Scientific Literatures with no Citation An interesting new question I will first introduce ZeroRank. Here, zero means zero citation. So, our goal is to rank the newly published papers with no citations so as to recommend new papers. We have done a lot of surveys on ranking scientific publications, however none is exactly the same as ranking newly published papers. So, this is a new problem we introduced, and I think it is interesting and useful.

ZeroRank For example, when the results of Infocom come out, researchers in the computer network filed will definitely want to know what new papers published, what others are doing recently and what are the hot topics in the field. However, there are so many papers published in a conference. Our goal is to recommend some new papers that have high potential of getting large number of citations in the future.

How? What makes a paper gain citations? Author Venue Affiliation Text ( Diffucult to use) Since these papers do not have citation information. In our model, we use author, venue, affiliation information. We have considered test information, however, it is difficult to incorporate the text information into our model and some researchers even claim that text information is useless in predicting paper’s popularity.

Random Walk (Like pagerank or HITS algorithm) Our model can be divided into 2 parts. The first part is feature extraction. We perform random walk in this heterogeneous network. The random walk is like pagerank or hits algorithm. After the random walk, for each paper, we can get the scores of the three features.

Learning To Rank A boosting ranking model In the second phrase , we train a learning to rank model to rank the papers.

Experiment Result 1 What is the performance of our model with other methods? 2 datasets : data mining, database 4 other ranking or citation prediction methods 4 performance metrics We select two subsets from MAG: data mining and database.

Experiment Result 1 What is the performance of our model with other methods? Data Mining: This is the comparison using NDCG in data mining dateset. We can see that ZeroRank, represented by the blue line, is the best of the four metohds.

Experiment Result 1 What is the performance of our model with other methods? Database: This is the comparison in Database dataset.

Experiment Result 1 What is the performance of our model with other methods? Data mining: These are the comparisons of the other three performance metrics in data mining

Experiment Result 1 What is the performance of our model with other methods? Database: And database.

Experiment Result 2 Which feature is the most important? In this experiment, we use learning to rank algorithm to quantitatively measure the importance of different features in determining whether a paper will receive more citations. We select 31 subfields of computer science as our datasets. We ran zerorank on 31 subfields and the result is shown in the figure. We can see that author feature is the dominant feature which is different from the conclusion by other researchers that conference is the most important.

What I did? Discuss the design of our model. Participate in implementing our model. Implement some of the comparing models. Do most of the experiments. Paper writing. We have organized our work into a paper and submitted it to CIKM. In this part of the work, I participated in discussing and implementing our model. Besides, I implemented some comparing models and did most of the experiments. I also write the papers for my part of work and help modify the whole paper.

Recommender System Do personalized recommendation. Find topic level similarity between users and papers. Now, I will show you the second part of my work – the recommender system. In the acemap system, we have the information of users if they have liked some papers. We wanna use these information to do personalized recommendation to researchers.

Why CTR? Find topic level similarity between users and papers. Born to solve the cold start problem. We have surveyed a lot of methods and found that collaborative topic regression CTR matches our system well. The first reason we use CTR is that is can find topic level similarity between users and papers. Professor Wang has said a lot of times that he wanted our acemap system to have the ability to recommend papers that are not in the same field, but have similar or same ideas or methods. I think this recommender is a first try. The second reason is that it is born to solve the cold start problem. Because our acemap is new, the user number is small. The traditional methods for recommendation can not deal with the papers that no one likes. In CTR, we can use the text information in these papers to do recommendation.

CTR CTR I will not show you the detail of CTR but the basic idea.

What is CTR? CTR is a combination of CF and LDA LDA represents a document as a vector CF represents each item and each user as a vector. In CTR, the item vector is generated by both paper content and user preference. The * between an user-paper pair measures the possibility that the user likes the paper. CTR is a combination of Collaborative filtering and Latent Dirichlet allocation. Read it

Experiment 1 Goal: use field of study information to improve the performance. Method1: User simulation Method2: Only recommend the papers with the same field of study. Since we have the field of study information form microsoft, we wanna use it to improve the performance of our the model. We’ve tried 2 methods. One is to simulate users and each user likes a particular field of study and randomly assign some papers in this field of study to him. The second is to recommend to users the papers that have the same field of study as the papers the users like.

Experiment 1 (failed) The field of study information is not precise. Papers have multiple field of study., Some papers do not have this information. The results show that both the two methods fail to improve the performance of the model. These are the three reasons I list here that may account for the failure.

Experiment 2 The second experiments which yuning has show is that as the user number grow, the performance of the system will go up dramatically. So as acemap accumulates more users, the recommendation will be better and better.

Experiment 3 For real-world system Acemap (using MAG) This is a demonstration also show by Yuning.

What I did? Participate in doing the survey. Discuss how to build the system. Do the experiments.. Give the final results results in acemap. In this recommender system, I’ve surveyed with the group members on how to do recommendation and finally we decide to use CTR in our system. We discussed mostly with Yuning how to process the data, how to tune the parameters of the model. In the experiments and implementation, yuning is in charge of processing the data, he deals with the database. I focus on the model part, to train the model, tune the parameters, get the results and evaluate the results.

Q&A That’s all for today’s presentation, any questions?