Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word embeddings based mapping

Similar presentations


Presentation on theme: "Word embeddings based mapping"— Presentation transcript:

1 Word embeddings based mapping
Raymond ZHAO Wenlong (Updated on 15/08/2018 )

2 Word embeddings Vector space models represent words using low-fixed-dim vector Try to capture word relations via inner products Can group semantically similar words, and encode rich linguistic patterns( like word2vec (Mikolov et al., 2013) or GloVe (Pennington et al., 2014)) To apply vector model to sentence / doc, one must select an appro composition fuction

3 A typical NN Model A composition function g + classifier (on final representation) Unordered functions: treat input texts as bags of word embeddings Syntactic functions take word order and sentence structure into account - like NN ( CNN/RNN, g depends on a parse tree of the input sequence) Composition function is a math process for combining multiple words into a single vector Syntactic functions require more training time for huge datasets - RNN - computer a syntactic parse tree A deep unordered model Apply a composition function g to the sequence of word embeddings Vw The output is a vector z that servers as input to a logistic regression function Syntactic functions - g depend on a parse tree of the input sequence

4 - A deep unordered model
SWEB Model - A deep unordered model By Duke University 2018, ACL Source code is on github Could obtains near state-of-the-art accuracies on sentence and document-level tasks

5 Paper’s result Document-level classification
Dataset: Yahoo! Ans. and AG News SWEM model exhibits stronger performances, relative to both LSTM and CNN compositional architectures Marry the speed of unordered functions with the accuracy of syntactic functions Computational efficient - fewer parameters

6 Paper’s result Sentence-level task SWEM yields inferior accuracies
Approximate 20 words on average

7 Simple word-embedding model
SWEM-aver: take the information of each sequence into account via the addition operation ( take the info of each word) Max pooling: extract the most salient features (get the info of key words) SWEM-concat SWEM-hier: swem-aver on a local window, then a global max-pooling for each window (like n-grams)

8 The experiments SWEM-aver using Keras Current baseline model
Use our amazon review texts ( 830k texts and 19.8k unique tokens) Use pre-trained Glove word embeddings (a dataset of 1B tokens) Current baseline model - multiclass logistic regression - activation =’sigmoid’ + loss = ‘categorical_crossentropy’ Current Accuracy on cpu classification: On Keras/Tensorflow Multi-class classification

9 The experiments On SWEM-aver Alg using Keras
- The current experiments on RAM, Screen Size, Hard Disk and Graphics Coprocessor Configurator

10 The experiments On SWEN-max Alg
- The current experiments on RAM, Screen Size, Hard Disk and Graphics Coprocessor Configurator

11 The experiments On SWEN-con Alg
Concatenate SWEM-aver and SWEM-max together Remove punctuation (a bit improvements)

12 The experiments - todo Try to use SWEM-hier alg
Try to use SVM/CRF classifiers Currently use multiclass logistic regression Try to use topic model for short Texts

13 Thanks Thanks Dr. Wong


Download ppt "Word embeddings based mapping"

Similar presentations


Ads by Google