Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.

Slides:



Advertisements
Similar presentations
A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.
Advertisements

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Deep Learning in NLP Word representation and how to use it for Parsing
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Distributed Representations of Sentences and Documents
Yang-de Chen Tutorial: word2vec Yang-de Chen
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Efficient Estimation of Word Representations in Vector Space
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Evgeniy Gabrilovich and Shaul Markovitch
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
Link Distribution on Wikipedia [0407]KwangHee Park.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Ganesh J1, Manish Gupta1,2 and Vasudeva Varma1
Vector Semantics Dense Vectors.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
DeepWalk: Online Learning of Social Representations
Intrinsic Subspace Evaluation of Word Embedding Representations Yadollah Yaghoobzadeh and Hinrich Schu ̈ tze Center for Information and Language Processing.
Distributed Representations for Natural Language Processing
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Neural Machine Translation
Sivan Biham & Adam Yaari
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Comparison with other Models Exploring Predictive Architectures
HyperNetworks Engın denız usta
CRF &SVM in Medication Extraction
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
A Deep Learning Technical Paper Recommender System
Intro to NLP and Deep Learning
Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Neural Language Model CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
Word Embeddings with Limited Memory
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Word Embedding Word2Vec.
Word embeddings based mapping
Word embeddings based mapping
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Socialized Word Embeddings
Vector Representation of Text
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Word embeddings (continued)
Learning to Rank with Ties
Introduction to Sentiment Analysis
Artificial Intelligence 2004 Speech & Natural Language Processing
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Fast and Discriminative Semantic Embedding
Vector Representation of Text
CS249: Neural Language Model
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding

Outline Introduction & Background – Word embedding – Skip-Gram Model Similarity Experiment – Training Process – Result

Word Embedding Word Embedding: also called word representation, a technique in NLP where word was represented (embedded) as a low dimensional continuous vector space (semantically similar words are mapped to nearby points). 1.Methods to do this representation include: Neural Network, Dimensionality reduction on the word co-occurrence matrix, Probabilistic model. 2.Based on Distributional hypothesis: linguistic items with similar distributions have similar meanings.

Word Embedding Count Based methods: compute the statistics of how often some word co-occurs with its neighbour words in a large text corpus, then map these count-statistics down to a small, dense vector for each word. Predictive methods: try to predict a word from its neighbours in terms of learned small, dense embedding vectors (considered parameters of the model). Tips: to check whether the embedding vector is meaningful or not, we can always do similarity comparison; WE boost the performance in NLP tasks such as syntactic parsing and sentiment analysis.

Word Embedding For Predictive methods, WE comes into two ways: (1)the Continuous Bag-of-Words model (CBOW) – Input of training: wi−2,wi−1,wi+1,wi+2 – Output of training: wi – Idea: predicting the word given its context. – Advantages: slightly better accuracy for the frequent words, better when using a larger dataset. (2) the Skip-Gram model – Input of training: wi – Output of training: wi−2,wi−1,wi+1,wi+2 – Idea: predicting the context given a word. – Advantage: works well with small amount of the training data, represents well even rare words or phrases.

Skip-Gram Model

Notation: W t ----input word; C ----window size; T----training size; W—vocabulary size V w ---vector representation of output Objective Function maximise the above average log probability. P(W o |W I )

Outline Introduction & Background – Word embedding – Skip-Gram Model Similarity Experiment – Training Process – Method – Result & Discussion Conclusion

Similarity Experiment

Training Process Two sets of training data – MedTrack: a collection of 17,198 clinical patient records used in the TREC 2011 and 2012 Medical Records Track. – OHSUMED: a collection of 348,566 MEDLINE medical journal abstracts used in TREC 2000 Filtering Track. Replace the terms/compound terms with concept id – Using MetaMap (provided in UMLS) to convert the free- text sequence to concept sequence.

Training Process Parameter setting for Skip-Gram model – Window size. – Dimension of embedding vector.

Methods Experiment Steps – 1. After get the vector of each concept from the training dataset, do the cosine similarity comparison for each concept pairs in Test dataset. – 2. Do the Pearson correlation coefficient for the similarities values from the expert and NLM. – 3. Compare the performance of NLM with other semantics similarity approaches.

Methods Test datasets – Ped: 29 UMLS medical concept pair developed by Pedersen et al. [15]. Semantic similarity judgements were provided by 3 physician and 9 clinical terminologists, with an inter-coder correlation of – Cav: 45 MeSH/UMLS concept pairs developed by Cavides and Cimino [5]. Similarity between concept pairs was judged by 3 physicians, with no exact consensus value reported by Cavides and Cimino.

Methods

Results

Reference De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., & Bruza, P. (2014, November). Medical semantic similarity with a neural language model. InProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp ). ACM. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp ).

Thank you!