Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concepts of CNN, RNN and Applications in Document Summarization

Similar presentations


Presentation on theme: "Concepts of CNN, RNN and Applications in Document Summarization"— Presentation transcript:

1 Concepts of CNN, RNN and Applications in Document Summarization
Qingxia Liu

2 Overview Staged method Jointly learning CNN RNN Architectures Concepts
Applications CNN convolution operation pooling typical architecture RNN three typical architectures bidirectional RNNs Gated : LSTM, GRU Architectures encoder-decoder / seq2seq autoencoder attention Staged method CNNLM [IJCAI’15_optimizing] Jointly learning NN-SE [ACL’16_neural] NeuSum [ACL’18_score] Ref: Ian J. Goodfellow, Yoshua Bengio, Aaron C. Courville: Deep Learning. Adaptive computation and machine learning, MIT Press 2016, ISBN , pp

3 Neural Network

4 CNN (Convolutional Neural Network)
Convolution operation a special knid of linear operation input: grid-like topology param: kernel output: feature map

5 CNN ReLu NN-layer: linear operation + noninear activation

6 CNN Pooling output a summary statistic of the nearby info max pooling
pool width = 3 Pooling output a summary statistic of the nearby info max pooling reports the maximum output within a rectangular neighborhood Invariance to translation i.e. if translate the input by a small amount, the values of most of the pooled output do not change a strong prior handling inputs of varying size downsampling improve computational efficiency (fewer process) in other NN architectures Boltzmann machines, autoencoders

7 RNN (Recurrent Neural Network)
motivation: sequence modeling hidden units

8 many to many e.g. MT, POS (2) one to many e.g. img caption (3) many to one e.g. sentiment analysis

9 CNN VS. RNN Input Output Parameter sharing
CNN: grid data -> one result RNN: sequence -> sequence (one for each time) Parameter sharing CNN:kernel,neighboring inputs RNN:the same update rule for each time, use previous output Most RNN can process sequences of variable length CNN: Invariance to translation

10 Gated RNNs Motivation gradient vanishing & gradient exploding
RNN: problem of long-term dependencies gradient vanishing & gradient exploding gated RNNs LSTM, GRU, SRU GRU: w w w w4 f f f f4 g

11 LSTM (Long Short Term Memory) (Sepp Hochreiter et al. 1997)
three gates forget, input, output: cell state: memory hidden state: usage xt

12 GRU (Gated Recurrent Unit) (Kyunghyun Cho et al. 2014)
motivation: more efficient pros: (VS. LSTM) relatively new; more efficient: fewer params (no output gate); better performance on smaller datasets; versions: fully gated unit; minimal gated unit; one gate two gates reset gate vector: rt update gate vector: zt forget gate and input gate in LSTM hidden state: Gated Recurrent Unit, fully gated version

13 Recursive Neural Network

14 Encoder-decoder / Seq2seq Architectures
p(target|context) encoder input: sequence X=(x1,x2, ... xn) output: context C a vector or sequence of vectors that summarize the input sequence decoder input: context C output: output sequence Y representation of C trained jointly applications: MT, speech recognition, QA X and Y: not necessary the same length

15 Attention mechanism (Bahdanau et al. 2015)
motivation: encoder will lost more info for longer input sequence idea: locate the region of focus during encoding not merely one context vector mask on all the encoder hidden states Soft Attention multiplies features with a (soft) mask of values between 0 and 1; Hard Attention .... mask of values exactly 0 or 1 cons: would render the system non-differentiable for training (can be trained with REINFORCE, but requires sampling of discrete actions and would lead to high variance)

16 Autoencoder trained to copy its input to its output

17 Applications in Document Summarization
Extractive Summarization Problem input: D={s1, s2, s3...., sm}, si={w1,w2,..., wn} output: S⊆D, |S|≤k Factors: salient, redundant Two-staged methods CNNLM [IJCAI’15_optimizing]: CNN+unSL, optimization Wenpeng Yin, Yulong Pei: Optimizing Sentence Modeling and Selection for Document Summarization. IJCAI 2015: Jointly learning NN-SE [ACL’16_neural]: CNN+LSTM encoder, LSTM decoder + MLP Jianpeng Cheng, Mirella Lapata: Neural Summarization by Extracting Sentences and Words. ACL (1) 2016 NeuSum [ACL’18_score]: hierarchical BiGRU encoder, GRU decoder + MLP Tiejun Zhao, Ming Zhou, Furu Wei, Nan Yang, Shaohan Huang, Qingyu Zhou: Neural Document Summarization by Jointly Learning to Score and Select Sentences. ACL (1) 2018: CNNLM: 慕尼黑大学+CMU NN-SE: Edinburgh NeuSum: HIT, MSRA

18 Document Summarization

19 CNNLM [IJCAI’15_optimizing]
Sentence Representation CNN unsupervised Learning Language model CBOW: predict next word sentVec + context words Sentence Selection Similarity matrix S, PageRank p e2 l即kernel一次处理对应的word个数,ld即kernel width(word数*wordVec维数); NCE:为loss,在word2vec中就有用到,用于替代softmax做预测并计算loss的过程(非常costly,尤其当词表非常大时) noise-contrastive estimation (噪声对比估计) 把多分类问题(计算词表中各词被选中的概率)转化为二分类问题(判断一个给定的词是下一个词的概率):喂给(ngram-1, nextWTrue),(ngram-1, nextWFalse)这样的训练pairs,来区分正确和错误的nextW(类似于language model:预测这样的ngram出现的概率); S: 用sentVec计算句间相似度,将一个句子集(n个句子)表示成n*n的相似度矩阵,(即Sij表示的是i和j之间的相似度) p向量: Pagerank(依据相似度矩阵,相当于句子两两之间连边,边权为cosine相似度); pi值:C中第i个句子的重要性;

20 NN-SE [ACL’16_extracting]
sum jointly learning hierarchical encoder + sentence extractor sequence labeling encoder: hierarchical CNN: wVec -> sentVec LSTM: sentVec -> docVec extractor: attention-based LSTM: lastDecision + stateVect-1 -> stateVect MLP: stateVect + sentVec -> sentScore p(yt=1|D) e4 任务:单文档摘要 特点:data-driven; hierarchical document encoder, attention-based extractor p(t-1): 以多大概率相信上一个判断的正确性;(train:直接与gold比较;test:curriculum learning,p(y(t-1)|d))

21 NeuSum [ACL’18_score] jointly learning encoder: extractor:
hierarchical encoder + sentence extractor encoder: word order ->sent level: BiGRU sent order->doc level: BiGRU extractor: score select Tiejun Zhao, Ming Zhou, Furu Wei, Nan Yang, Shaohan Huang, Qingyu Zhou: Neural Document Summarization by Jointly Learning to Score and Select Sentences. ACL (1) 2018: (哈工大,MSRA) 任务:单文档摘要 encoder阶段:从两个不同层面对sentence做encode(编码的对象都是句子,只是接收的信息层次不同) extractor阶段:对候选句子打分,要求打分能够考虑到已选句子;

22 Conclusion Motivation CNN VS. RNN represent (state, action) pairs
CNN: window-like RNN: long-term dependency


Download ppt "Concepts of CNN, RNN and Applications in Document Summarization"

Similar presentations


Ads by Google