Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication Networks Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu SIGKDD, 2010 Presented by Hung-Yi Cai 2010/12/29

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines  Motivation  Objectives  Previous study  Methodology ─ Problem Formulation ─ Assumption and Framework ─ Preprocessing ─ TPFG Model ─ Model Learning  Experiments  Conclusions  Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Information network contains abundant knowledge about relationships among people or entities.  Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objectives TPFG  To propose a time-constrained probabilistic factor graph model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function and further to design an efficient learning algorithm to optimize the objective function.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Previous study  This work is different from the existing study in Relation Mining and Relational Learning. ─ Relation Mining ─ Relation Mining : the study mainly employ text mining and language processing technique on text data and structured data including web pages, user profiles and corpus of literature. ─ Relational Learning ─ Relational Learning : the study refers to the classification when objects or entities are presented in multiple relations. 5

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Problem Formulation Problem Formulation Assumption and Framework Preprocessing TPFG Model Model Learning 6

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Problem Formulation 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Assumption and Framework  Assumption 1 based on the commonsense knowledge about advisor-advisee relationships.  Assumption 2 determines that all the authors in the network have a strict order defined by the possible advising relationship. 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Preprocessing  The purpose of preprocessing is to generate the candidate graph H′ and reduce the search space. 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Preprocessing  Then we have the following rule. ─ Author aj is not considered to be a i ’s advisor if one of the following conditions holds: 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. TPFG Model  By modeling the network as a whole, this step can incorporate both structure information and temporal constraint and better analyze the relationship among individual links. 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. TPFG Model  The graph is composed of two kinds of nodes: variable nodes and function nodes. 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Model Learning Eq. (10)  To maximize the objective function and compute the ranking score along with each edge in the candidate graph H′, this step need to infer the marginal maximal joint probability on TPFG, according to Eq. (10).  Sum-product + junction tree  Sum-product + junction tree. There is a general algorithm called sum-product to compute marginal function on a factor graph based on message passing. 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Model Learning  New TPFG Inference Algorithm  New TPFG Inference Algorithm. The original sum-product or max-sum algorithm meet with difficulty since it requires that each node needs to wait for all-but-one message to arrive. 14

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Model Learning  After the two phases of message propagation, we can collect the two messages on any edge and obtain the marginal function.  The improved message propagation is still separated into two phases. sent i ─ Phase 1 : the messages sent i which passed from one to their ascendants are generated in a similar order as before. recv i ─ Phase 2 : messages returned from ascendants recv i are stored in each node. 15

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Experiment Step 17 Data Sets DBLP : The data set consists of 654,628 authors and 1,076,946 publications with time provided from 1970 to 2007. Method Sum-Product + Junction Tree (JuncT) Loopy Belief Propagation (LBP) Independent Maxima (IndMAX) SVM RULE Evaluation Aspects ROC curve

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Accuracy : Effect of rules in TPFG ─ Using R3 as filtering rules and YEAR2 as graduation year estimation method. 18

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Accuracy : Effect of network structure ─ Using DFS with a bounded maximal depth d from the given set of nodes, denoted as DFS=d, we can closures with controlled depth for a given set of authors to test. 19

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Accuracy : Effect of training data 20

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Accuracy : Case study ─ Finding that TPFG can discover some interesting relations beyond the “ground truth” from single source. 21

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Scalability Performance 22

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Application : Visualization of genealogy 23

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments  Application : Expert finding and Bole search 24

25 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 25 Conclusions  This paper studied the mining of advisor-advisee relationships from a research publication network as an attempt to discover hidden semantic knowledge in information networks.  Proposing a Time-constraint Probabilistic Factor Graph (TPFG) model to integrate local intuitive features in the network and results on the DBLP data sets demonstrate the effectiveness of the proposed approach.

26 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 26 Comments  Advantages ─ The TPFG model can mining relationship between advisor and advisee from the research publication network.  Applications ─ Relationship Mining


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication."

Similar presentations


Ads by Google