Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Similar presentations


Presentation on theme: "Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon."— Presentation transcript:

1 Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon University

2 AMiner: academic social network Research interests

3 Text-Based Approach List of publicationsResearch interests Infer

4 Text-Based Approach Term Frequency => “challenging problem” TF-IDF => “line drawing”

5 Knowledge-Driven Approach List of publications Research interests Infer Artificial Intelligence Data Mining Machine Learning Clustering Association Rules Knowledge bases

6 Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning

7 Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning

8 Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Deep Learning Natural Language Processing Deep Learning for NLP Recurrent networks for NER Infer a ranked list of concepts Kevin: Deep Learning, Natural Language Processing Jing: Recurrent Networks, Named Entity Recognition

9 Challenges Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning Two modalities – users and concepts How to leverage information from both modalities? How to connect these two modalities?

10 Approach Jane Kevin Jing Deep Learning for NLP Recurrent networks for NER Natural Language Processing Deep Learning Learn user embeddings Learn concept embeddings Social KG Model

11 User Embedding Concept Embedding

12 User Embedding Concept Embedding Gaussian distribution for user embeddings Gaussian distribution for concept embeddings Align users and concepts

13 Inference and Learning Collapsed Gibbs sampling Iterate between: 1.Sample latent variables

14 Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters Collapsed Gibbs sampling

15 Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters 3.Update embeddings Collapsed Gibbs sampling

16 AMiner Research Interest Dataset  644,985 researchers  Terms in these researchers’ publications  Filtered with Wikipedia  Evaluation  Homepage matching 1,874 researchers Using homepages as ground truth  LinkedIn matching 113 researchers Using LinkedIn skills as ground truth Code and data available: https://github.com/kimiyoung/genvector

17 Homepage Matching MethodPrecision@5 GenVector78.1003% GenVector-E77.8548% Sys-Base73.8189% Author-Topic74.4397% NTN65.8911% CountKG54.4823% Using homepages as ground truth. GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks

18 LinkedIn Matching MethodPrecision@5 GenVector50.4424% GenVector-E49.9145% Author-Topic47.6106% NTN42.0512% CountKG46.8376% GenVectorOur model GenVector-EOur model w/o embedding update CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks Using LinkedIn skills as ground truth.

19 Error Rate of Irrelevant Cases MethodError rate GenVector1.2% Sys-Base18.8% Author-Topic1.6% NTN7.2% Manually label terms that are clearly NOT research interests, e.g., challenging problem. GenVectorOur model Sys-BaseAMiner baseline: key term extraction Author-topicClassic topic models NTNNeural tensor networks

20 Qualitative Study: Top Concepts within Topics Query expansion Concept mining Language modeling Information extraction Knowledge extraction Entity linking Language models Named entity recognition Document clustering Latent semantic indexing GenVector Speech recognition Natural language *Integrated circuits Document retrieval Language models Language model *Microphone array Computational linguistics *Semidefinite programming Active learning Author-Topic

21 Qualitative Study: Top Concepts within Topics Image processing Face recognition Feature extraction Computer vision Image segmentation Image analysis Feature detection Digital image processing Machine learning algorithms Machine vision GenVector Face recognition *Food intake Face detection Image recognition *Atmospheric chemistry Feature extraction Statistical learning Discriminant analysis Object tracking *Human factors Author-Topic

22 Qualitative Study: Research Interests Feature extraction Image segmentation Image matching Image classification Face recognition GenVector Face recognition Face image *Novel approach *Line drawing Discriminant analysis Sys-Base

23 Qualitative Study: Research Interests Unsupervised learning Feature learning Bayesian networks Reinforcement learning Dimensionality reduction GenVector *Challenging problem Reinforcement learning *Autonomous helicopter *Autonomous helicopter flight Near-optimal planning Sys-Base

24 Online Test MethodError rate GenVector3.33% Sys-Base10.00% A/B test with live users  Mixing the results with Sys-Base

25 Other Social Networks? Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning

26 Conclusion  Study a novel problem  Learning social knowledge graphs  Propose a model  Multi-modal Bayesian embedding  Integrate embeddings into graphical models  AMiner research interest dataset  644,985 researchers  Homepage and LinkedIn matching as ground truth  Online deployment on AMiner

27 Thanks! https://github.com/kimiyoung/genvector Code and data:

28 Social Networks Mike Jane Kevin Jing AMiner, Facebook, Twitter… Huge amounts of information

29 Knowledge Bases Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Wikipedia, Freebase, Yago, NELL… Huge amounts of knowledge

30 Bridge the Gap Mike Jane Kevin Jing Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Better user understanding e.g. mine research interests on AMiner

31 Approach Social networkKnowledge base User embeddings Concept embeddings Social KG Model Social text Copy picture

32 Model Documents (one per user) Concepts for the user Parameters for topics

33 Model Generate a topic distribution for each document (from a Dirichlet)

34 Model Generate Gaussian distribution for each embedding space (from a Normal Gamma)

35 Model Generate the topic for each concept (from a Multinomial)

36 Model Generate the topic for each user (from a Uniform)

37 Model Generate embeddings for users and concepts (from a Gaussian)

38 Model General

39 Inference and Learning Collapsed Gibbs sampling for inference Update the embedding during learning Different from LDAs with discrete observed variables Sample latent variables Update parameters Update Embeddings Add picture

40 Methods for Comparison MethodDescription GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks


Download ppt "Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon."

Similar presentations


Ads by Google