Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28 Word2vec Tutorial Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28.

Similar presentations


Presentation on theme: "Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28 Word2vec Tutorial Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28."— Presentation transcript:

1 Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28
Word2vec Tutorial Zhe Ye

2 Outline One-hot representation vs word vectors Requirement
Virtual environment Python Corpus Gensim Training word vectors Evaluation Analogy Word clustering

3 One-hot representation vs word vectors
Sparse: using 3000K dimensions to represent vocabulary with 3000K word types Not related Word vectors Dense: using 300 (or less) to represent vocabulary with 3000K word types related

4 Outline One-hot representation vs word vectors Requirement
Virtual environment Python Corpus Gensim Training word vectors Evaluation Analogy Word clustering

5 Virtual environment Features
Provide separate dependency libraries Do not require admin or sudo to install package Two famous tools which provide these features Anaconda It’s convenient in windows (scipy) Virtualenv

6 Python It’s very popular in NLP or (Data Science)
It’s very simple and easy to understand Version: Python 2.7

7 Corpus Tokenized plain text (Chinese and English is ok)
我们 很 高兴 We are very happy . Tokenized plain text resource atmt.org/lm-benchmark/1-billion-word-language- modeling-benchmark-r13output.tar.gz Tokenizer LTP for Chinese ( Stanford Tokenizer (

8 Gensim Implementing a wrapper for word2vec ( It provide python api

9 Outline One-hot representation vs word vectors Requirement
Virtual environment Python Corpus Gensim Training word vectors Evaluation Analogy Word clustering

10 Training word vectors Linux+virtualenv+gensim is recommended
Windows 10 (64bit) + anaconda+gensim is ok

11 Outline One-hot representation vs word vectors Requirement
Virtual environment Python Corpus Gensim Training word vectors Evaluation Analogy Word clustering

12 Evaluation Analogy Word Clustering
vector(‘paris’)-vector(‘France’)+vector(‘Italy’)=vector(‘Rome’) Word Clustering


Download ppt "Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28 Word2vec Tutorial Zhe Ye yezhejack@sjtu.edu.cn 2017.9.28."

Similar presentations


Ads by Google