Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon University

AMiner: academic social network Research interests

Text-Based Approach List of publicationsResearch interests Infer

Text-Based Approach Term Frequency => “challenging problem” TF-IDF => “line drawing”

Knowledge-Driven Approach List of publications Research interests Infer Artificial Intelligence Data Mining Machine Learning Clustering Association Rules Knowledge bases

Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning

Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning

Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Deep Learning Natural Language Processing Deep Learning for NLP Recurrent networks for NER Infer a ranked list of concepts Kevin: Deep Learning, Natural Language Processing Jing: Recurrent Networks, Named Entity Recognition

Challenges Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning Two modalities – users and concepts How to leverage information from both modalities? How to connect these two modalities?

Approach Jane Kevin Jing Deep Learning for NLP Recurrent networks for NER Natural Language Processing Deep Learning Learn user embeddings Learn concept embeddings Social KG Model

User Embedding Concept Embedding

User Embedding Concept Embedding Gaussian distribution for user embeddings Gaussian distribution for concept embeddings Align users and concepts

Inference and Learning Collapsed Gibbs sampling Iterate between: 1.Sample latent variables

Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters Collapsed Gibbs sampling

Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters 3.Update embeddings Collapsed Gibbs sampling

AMiner Research Interest Dataset  644,985 researchers  Terms in these researchers’ publications  Filtered with Wikipedia  Evaluation  Homepage matching 1,874 researchers Using homepages as ground truth  LinkedIn matching 113 researchers Using LinkedIn skills as ground truth Code and data available: https://github.com/kimiyoung/genvector

Homepage Matching MethodPrecision@5 GenVector78.1003% GenVector-E77.8548% Sys-Base73.8189% Author-Topic74.4397% NTN65.8911% CountKG54.4823% Using homepages as ground truth. GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks

LinkedIn Matching MethodPrecision@5 GenVector50.4424% GenVector-E49.9145% Author-Topic47.6106% NTN42.0512% CountKG46.8376% GenVectorOur model GenVector-EOur model w/o embedding update CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks Using LinkedIn skills as ground truth.

Error Rate of Irrelevant Cases MethodError rate GenVector1.2% Sys-Base18.8% Author-Topic1.6% NTN7.2% Manually label terms that are clearly NOT research interests, e.g., challenging problem. GenVectorOur model Sys-BaseAMiner baseline: key term extraction Author-topicClassic topic models NTNNeural tensor networks

Qualitative Study: Top Concepts within Topics Query expansion Concept mining Language modeling Information extraction Knowledge extraction Entity linking Language models Named entity recognition Document clustering Latent semantic indexing GenVector Speech recognition Natural language *Integrated circuits Document retrieval Language models Language model *Microphone array Computational linguistics *Semidefinite programming Active learning Author-Topic

Qualitative Study: Top Concepts within Topics Image processing Face recognition Feature extraction Computer vision Image segmentation Image analysis Feature detection Digital image processing Machine learning algorithms Machine vision GenVector Face recognition *Food intake Face detection Image recognition *Atmospheric chemistry Feature extraction Statistical learning Discriminant analysis Object tracking *Human factors Author-Topic

Qualitative Study: Research Interests Feature extraction Image segmentation Image matching Image classification Face recognition GenVector Face recognition Face image *Novel approach *Line drawing Discriminant analysis Sys-Base

Qualitative Study: Research Interests Unsupervised learning Feature learning Bayesian networks Reinforcement learning Dimensionality reduction GenVector *Challenging problem Reinforcement learning *Autonomous helicopter *Autonomous helicopter flight Near-optimal planning Sys-Base

Online Test MethodError rate GenVector3.33% Sys-Base10.00% A/B test with live users  Mixing the results with Sys-Base

Other Social Networks? Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning

Conclusion  Study a novel problem  Learning social knowledge graphs  Propose a model  Multi-modal Bayesian embedding  Integrate embeddings into graphical models  AMiner research interest dataset  644,985 researchers  Homepage and LinkedIn matching as ground truth  Online deployment on AMiner

Thanks! https://github.com/kimiyoung/genvector Code and data:

Social Networks Mike Jane Kevin Jing AMiner, Facebook, Twitter… Huge amounts of information

Knowledge Bases Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Wikipedia, Freebase, Yago, NELL… Huge amounts of knowledge

Bridge the Gap Mike Jane Kevin Jing Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Better user understanding e.g. mine research interests on AMiner

Approach Social networkKnowledge base User embeddings Concept embeddings Social KG Model Social text Copy picture

Model Documents (one per user) Concepts for the user Parameters for topics

Model Generate a topic distribution for each document (from a Dirichlet)

Model Generate Gaussian distribution for each embedding space (from a Normal Gamma)

Model Generate the topic for each concept (from a Multinomial)

Model Generate the topic for each user (from a Uniform)

Model Generate embeddings for users and concepts (from a Gaussian)

Model General

Inference and Learning Collapsed Gibbs sampling for inference Update the embedding during learning Different from LDAs with discrete observed variables Sample latent variables Update parameters Update Embeddings Add picture

Methods for Comparison MethodDescription GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Similar presentations

Presentation on theme: "Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.

Similar presentations

Presentation on theme: "Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback