Diverse Beam Search Ashwin Kalyan

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Diverse Beam Search Ashwin Kalyan Explain the objective we minimize in sequence modeling Then, tell the 3 problems with current RNN based modeling 1) Loss – evaluation mismatch (cite paper from Sasha Rush’s lab) 2) Train – test mismatch: The model is not exposed to its own predictions (cite nips 16 curriculum paper that tries to solve this. Maybe not necessary to relate to DAGGER, etc. ) Broken inference: It’s NP-hard to do exact inference (show V^T complexity) and so, approximate inference methods like beam search are used. (explain beam search right here) Take-away: modeling is not precise and on top of that inference is approximate. So, can’t say the “most-likely” caption under the model is of high quality. Further broken inference results in sequences with ”minor” changes. Give example of broken beam search In this work – we don’t fix the modeling and instead fix the inference procedure to have more diversity. The hope is to decode lists of such sentences that differ from each other significantly. ”diversity” Ashwin Kalyan

Sequence Modeling – RNN Recap Task: Model the sequence RNN RNN

Sequence Modeling – RNN Recap Task: Model the sequence Effectively, RNNs model the probability of the next token given the history i.e. RNN And the joint probability is

Sequence Modeling When the output is a sequence, we optimize for (on the training set) In captioning, image feature can be added as the first “word” boy good , a, is , This,

Inference in Seq2Seq models There are |V| choices for each word and so, the search space has |𝑉| 𝑇 sequences. So, inferring the “most-likely” sequence under the model is NP-hard. A tractable alternative is to use greedy approximate methods that decode sequences word by word – in a left to right manner.

Inference in Seq2Seq models Now that we have a “trained” model, how do we generate sequences ? Method 1: Sampling Does not necessarily generate “good quality sequences” Initial words are crucial in deciding the sentence – sampling a sub-optimal continuation can hurt

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step This A

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step #sequences grows as 𝐵 𝑇 ! Solution: Retain top-B sequences In other words, Beam Search = truncated BFS is This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is Top-2 sequences are selected This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is This picture man A Continuations are discarded person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step a is the This shows picture Till end token is generated or max time is

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. Beam Search outputs (B=4) A kitchen with a stove. A kitchen with a stove and a sink. A kitchen with a stove and a microwave A kitchen with a stove and a refrigerator

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. Beam Search outputs (B=4) A woman and a child sitting at a table with food. A woman and a child sitting at a table with a plate of food. A woman and a child sitting at a table eating food. A little girl sitting at a table with a plate of food.

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. This is not accurate when there are multiple “correct” sequences For example, the same image can be explained in many ways – you can talk about different objects in the image, different perspectives, etc. Multiple correct translations! Also, computationally wasteful – since you need to forward the same set of inputs into the RNN repeatedly!

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences log-likelihood diversity term This Person

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is Person in

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is a Cannot select ‘a’ if ∆ say hamming distance Person in a

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is a Cannot select ‘a’ if ∆ say hamming distance Person in striped

Diverse Beam Search outputs (B=4) Modifies the inference procedure to produce “diverse” lists. Diverse Beam Search outputs (B=4) A kitchen with a stove and a microwave. A kitchen with a stove and a sink. A kitchen with a sink and a refrigerator. The kitchen is clean and ready to be used.

Diverse Beam Search outputs (B=4) Modifies the inference procedure to produce “diverse” lists. Diverse Beam Search outputs (B=4) A woman and a child are eating a meal. A woman and a child sitting at a table with a plate of food. A young girl is eating a piece of cake. Two girls are sitting at a table with a cake.

Diverse Beam Search Modifies the inference procedure to produce “diverse” lists. Tries to capture the ”inherent multi-modal” nature of the task Quantitatively finds better (more human-like) captions Diversity comes for free (almost!) Requires about the same memory and computations

General Problems in Sequence Modeling Loss – Evaluation Mismatch: We care about producing “human-like” captions but optimize for a surrogate loss i.e. log-likelihood of ground truth caption given image. Train – Test Mismatch: At train time, the model is not exposed to its own outputs. But at test time, the input is sampled (or selected) from its own previous predictions. Broken Inference: Approximate (left-right greedy) inference methods. Training is not “aware” of the inference

Work with Michael Cogswell Ramprasath Selvaraju Qing Sun David Crandall Stefan Lee Dhruv Batra paper: https://arxiv.org/abs/1610.02424 code: https://github.com/ashwinkalyan/dbs/ email: ashwinkv@gatech.edu

Questions ?

Interesting Related Papers Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning [paper] Sequence-to-Sequence Learning as Beam-Search Optimization [paper] Learning to Decode for Future Success [paper] Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training [paper] Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks [paper]