Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.

Slides:

Advertisements

Similar presentations

A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.

Advertisements

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

Deep Learning in NLP Word representation and how to use it for Parsing

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Distributed Representations of Sentences and Documents

Yang-de Chen Tutorial: word2vec Yang-de Chen

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.

Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.

Efficient Estimation of Word Representations in Vector Space

Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.

Evgeniy Gabrilovich and Shaul Markovitch

Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel

Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.

Link Distribution on Wikipedia [0407]KwangHee Park.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi

Ganesh J1, Manish Gupta1,2 and Vasudeva Varma1

Vector Semantics Dense Vectors.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )

DeepWalk: Online Learning of Social Representations

Intrinsic Subspace Evaluation of Word Embedding Representations Yadollah Yaghoobzadeh and Hinrich Schu ̈ tze Center for Information and Language Processing.

Distributed Representations for Natural Language Processing

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Neural Machine Translation

Sivan Biham & Adam Yaari

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Comparison with other Models Exploring Predictive Architectures

HyperNetworks Engın denız usta

CRF &SVM in Medication Extraction

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

A Deep Learning Technical Paper Recommender System

Intro to NLP and Deep Learning

Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.

Vector-Space (Distributional) Lexical Semantics

Efficient Estimation of Word Representation in Vector Space

Word2Vec CS246 Junghoo “John” Cho.

Neural Language Model CS246 Junghoo “John” Cho.

Distributed Representation of Words, Sentences and Paragraphs

Word Embeddings with Limited Memory

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph

Word Embedding Word2Vec.

Word embeddings based mapping

Word embeddings based mapping

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Socialized Word Embeddings

Vector Representation of Text

Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.

Word embeddings (continued)

Learning to Rank with Ties

Introduction to Sentiment Analysis

Artificial Intelligence 2004 Speech & Natural Language Processing

Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.

Fast and Discriminative Semantic Embedding

Vector Representation of Text

CS249: Neural Language Model

Professor Junghoo “John” Cho UCLA

Presentation transcript:

Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding

Outline Introduction & Background – Word embedding – Skip-Gram Model Similarity Experiment – Training Process – Result

Word Embedding Word Embedding: also called word representation, a technique in NLP where word was represented (embedded) as a low dimensional continuous vector space (semantically similar words are mapped to nearby points). 1.Methods to do this representation include: Neural Network, Dimensionality reduction on the word co-occurrence matrix, Probabilistic model. 2.Based on Distributional hypothesis: linguistic items with similar distributions have similar meanings.

Word Embedding Count Based methods: compute the statistics of how often some word co-occurs with its neighbour words in a large text corpus, then map these count-statistics down to a small, dense vector for each word. Predictive methods: try to predict a word from its neighbours in terms of learned small, dense embedding vectors (considered parameters of the model). Tips: to check whether the embedding vector is meaningful or not, we can always do similarity comparison; WE boost the performance in NLP tasks such as syntactic parsing and sentiment analysis.

Word Embedding For Predictive methods, WE comes into two ways: (1)the Continuous Bag-of-Words model (CBOW) – Input of training: wi−2,wi−1,wi+1,wi+2 – Output of training: wi – Idea: predicting the word given its context. – Advantages: slightly better accuracy for the frequent words, better when using a larger dataset. (2) the Skip-Gram model – Input of training: wi – Output of training: wi−2,wi−1,wi+1,wi+2 – Idea: predicting the context given a word. – Advantage: works well with small amount of the training data, represents well even rare words or phrases.

Skip-Gram Model

Notation: W t ----input word; C ----window size; T----training size; W—vocabulary size V w ---vector representation of output Objective Function maximise the above average log probability. P(W o |W I )

Outline Introduction & Background – Word embedding – Skip-Gram Model Similarity Experiment – Training Process – Method – Result & Discussion Conclusion

Similarity Experiment

Training Process Two sets of training data – MedTrack: a collection of 17,198 clinical patient records used in the TREC 2011 and 2012 Medical Records Track. – OHSUMED: a collection of 348,566 MEDLINE medical journal abstracts used in TREC 2000 Filtering Track. Replace the terms/compound terms with concept id – Using MetaMap (provided in UMLS) to convert the free- text sequence to concept sequence.

Training Process Parameter setting for Skip-Gram model – Window size. – Dimension of embedding vector.

Methods Experiment Steps – 1. After get the vector of each concept from the training dataset, do the cosine similarity comparison for each concept pairs in Test dataset. – 2. Do the Pearson correlation coefficient for the similarities values from the expert and NLM. – 3. Compare the performance of NLM with other semantics similarity approaches.

Methods Test datasets – Ped: 29 UMLS medical concept pair developed by Pedersen et al. [15]. Semantic similarity judgements were provided by 3 physician and 9 clinical terminologists, with an inter-coder correlation of – Cav: 45 MeSH/UMLS concept pairs developed by Cavides and Cimino [5]. Similarity between concept pairs was judged by 3 physicians, with no exact consensus value reported by Cavides and Cimino.

Methods

Results

Reference De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., & Bruza, P. (2014, November). Medical semantic similarity with a neural language model. InProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp ). ACM. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp ).

Thank you!