BERT 李宏毅 Hung-yi Lee https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations: Putting Words into.

BERT 李宏毅 Hung-yi Lee Contextual Word Representations: Putting Words into Computers BERT: 這篇不錯主要是分析 BERT

Word Embedding 1-of-N Encoding Word Class apple = [ 1 0 0 0 0]
bag = [ ] cat = [ ] dog = [ ] elephant = [ ] dog rabbit run jump cat tree flower Word Class dog cat bird class 1 Class 2 Class 3 ran flower jumped tree apple walk

A word can have multiple senses.
A word can have multiple senses. Have you paid that money to the bank yet ? It is safest to deposit your money in the bank . The victim was found lying dead on the river bank . They stood on the river bank to fish. Examples The : The hospital has its own blood bank. The third sense or not?

More Examples 這是加賀號護衛艦他是尼祿這也是加賀號護衛艦她也是尼祿

Contextualized Word Embedding
… … the river bank Contextualized Word Embedding … money Examples … in the bank … … own blood bank

Embeddings from Language Model (ELMO)
RNN-based language models (trained from lots of sentences) e.g. given “潮水退了就知道誰沒穿褲子” 潮水退了就潮水退了就 RNN RNN RNN … … RNN RNN RNN … RNN RNN RNN … RNN RNN RNN … <BOS> 潮水退了退了就知道

ELMO Each layer in deep LSTM can generate a latent representation.
Which one should we use??? … … … … RNN RNN RNN … … RNN RNN RNN … ℎ 2 RNN RNN RNN … … RNN RNN RNN … RNN RNN RNN … RNN RNN RNN … ℎ 1 <BOS> 潮水退了退了就知道

ELMO 𝛼 1 + 𝛼 2 = Learned with the down stream tasks ELMO large small
潮水退了就知道 ……

Bidirectional Encoder Representations from Transformers (BERT)
BERT = Encoder of Transformer Encoder Learned from a large amount of text without annotation …… BERT 潮水退了就知道 ……

Training of BERT Approach 1: Masked LM BERT vocabulary size
Predicting the masked word Approach 1: Masked LM Linear Multi-class Classifier …… Can we compare BERT and ELMO?? BERT 潮水退了就知道 …… [MASK]

Training of BERT BERT Approach 2: Next Sentence Prediction yes
[CLS]: the position that outputs classification results [SEP]: the boundary of two sentences Linear Binary Classifier Approaches 1 and 2 are used at the same time. BERT [CLS] 醒醒吧 [SEP] 你沒有妹妹

Training of BERT BERT Approach 2: Next Sentence Prediction No
[CLS]: the position that outputs classification results [SEP]: the boundary of two sentences Linear Binary Classifier Approaches 1 and 2 are used at the same time. BERT [CLS] 醒醒吧 [SEP] 眼睛業障重

How to use BERT – Case 1 BERT class Linear Classifier
Trained from Scratch Input: single sentence, output: class Example: Sentiment analysis (our HW), Document Classification BERT ( Fine-tune [CLS] w1 w2 w3 sentence

How to use BERT – Case 2 BERT Linear Cls class Linear Cls class
Input: single sentence, output: class of each word Example: Slot filling BERT ( [CLS] w1 w2 w3 sentence

How to use BERT – Case 3 BERT Input: two sentences, output: class
Example: Natural Language Inference Linear Classifier Given a “premise”, determining whether a “hypothesis” is T/F/ unknown. BERT determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”. [CLS] [SEP] w1 w2 w3 w4 w5 Sentence 1 Sentence 2

output: two integers (𝑠, 𝑒)
How to use BERT – Case 4 Extraction-based Question Answering (QA) (E.g. SQuAD) 17 Document: 𝐷= 𝑑 1 , 𝑑 2 ,⋯, 𝑑 𝑁 Query: 𝑄= 𝑞 1 , 𝑞 2 ,⋯, 𝑞 𝑁 77 79 QA Model 𝐷 𝑠 𝑠=17,𝑒=17 𝑄 𝑒 output: two integers (𝑠, 𝑒) Answer: 𝐴= 𝑞 𝑠 , ⋯, 𝑞 𝑒 𝑠=77,𝑒=79

How to use BERT – Case 4 BERT Learned from scratch 0.3 0.5 0.2 Softmax
dot product The answer is “d2 d3”. BERT determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”. [CLS] [SEP] q1 q2 d1 d2 d3 question document

How to use BERT – Case 4 BERT Learned from scratch 0.1 0.2 0.7 Softmax
The answer is “d2 d3”. dot product BERT determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”. [CLS] [SEP] q1 q2 d1 d2 d3 question document

BERT 屠榜 …… 屠榜 SQuAD 2.0

Enhanced Representation through Knowledge Integration (ERNIE)
Designed for Chinese BERT From 擺渡 ERNIE Source of image:

What does BERT learn? https://arxiv.org/abs/1905.05950
BERT Rediscovers the Classical NLP Pipeline

Multilingual BERT Trained on 104 languages
Multilingual BERT Trained on 104 languages Task specific training data for English En Class 1 Class 2 Class 3 Task specific testing data for Chinese Zh ? Zh ? Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT Zh ? Zh ? Zh ?

Generative Pre-Training (GPT)
Generative Pre-Training (GPT) Transformer Decoder BERT (340M) 93.6M 340M 1542M 40 GB ELMO (94M) GPT-2 (1542M) Source of image:

退了 Many Layers … 𝑏 2 𝛼 2,1 𝛼 2,2 𝑣 1 𝑘 1 𝑞 1 𝑣 2 𝑘 2 𝑞 2 𝑣 3 𝑘 3 𝑞 3 𝑣 4 𝑘 4 𝑞 4 𝑎 1 𝑎 2 𝑎 3 𝑎 4 <BOS> 潮水

就 Many Layers … 𝑏 3 𝛼 3,1 𝛼 3,2 𝛼 3,3 𝑣 1 𝑘 1 𝑞 1 𝑣 2 𝑘 2 𝑞 2 𝑣 3 𝑘 3 𝑞 3 𝑣 4 𝑘 4 𝑞 4 𝑎 1 𝑎 2 𝑎 3 𝑎 4 <BOS> 潮水退了就

Zero-shot Learning? Reading Comprehension 𝑑 1 , 𝑑 2 ,⋯, 𝑑 𝑁 ,
CoQA 𝑑 1 , 𝑑 2 ,⋯, 𝑑 𝑁 , ”Q:”, 𝑞 1 , 𝑞 2 ,⋯, 𝑞 𝑁 , “A:” Summarization 𝑑 1 , 𝑑 2 ,⋯, 𝑑 𝑁 ,”TL;DR:” 展現神蹟 Translation English sentence 1 = French sentence 1 English sentence 2 = French sentence 2 English sentence 3 =

Visualization https://arxiv.org/abs/1904.02679
(The results below are from GPT-2) Visualization

There are more and more examples:

https://talktotransformer.com/

GPT-2 Credit: Greg Durrett

Can BERT speak? Unified Language Model Pre-training for Natural Language Understanding and Generation BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model Insertion Transformer: Flexible Sequence Generation via Insertion Operations Insertion-based Decoding with automatically Inferred Generation Order

BERT 李宏毅 Hung-yi Lee https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations: Putting Words into.

Similar presentations

Presentation on theme: "BERT 李宏毅 Hung-yi Lee https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations: Putting Words into."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BERT 李宏毅 Hung-yi Lee https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations: Putting Words into.

Similar presentations

Presentation on theme: "BERT 李宏毅 Hung-yi Lee https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html Contextual Word Representations: Putting Words into."— Presentation transcript:

Similar presentations

About project

Feedback