Diverse Beam Search Ashwin Kalyan

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Diverse Beam Search Ashwin Kalyan Explain the objective we minimize in sequence modeling Then, tell the 3 problems with current RNN based modeling 1) Loss – evaluation mismatch (cite paper from Sasha Rush’s lab) 2) Train – test mismatch: The model is not exposed to its own predictions (cite nips 16 curriculum paper that tries to solve this. Maybe not necessary to relate to DAGGER, etc. ) Broken inference: It’s NP-hard to do exact inference (show V^T complexity) and so, approximate inference methods like beam search are used. (explain beam search right here) Take-away: modeling is not precise and on top of that inference is approximate. So, can’t say the “most-likely” caption under the model is of high quality. Further broken inference results in sequences with ”minor” changes. Give example of broken beam search In this work – we don’t fix the modeling and instead fix the inference procedure to have more diversity. The hope is to decode lists of such sentences that differ from each other significantly. ”diversity” Ashwin Kalyan

Sequence Modeling – RNN Recap Task: Model the sequence RNN RNN

Sequence Modeling – RNN Recap Task: Model the sequence Effectively, RNNs model the probability of the next token given the history i.e. RNN And the joint probability is

Sequence Modeling When the output is a sequence, we optimize for (on the training set) In captioning, image feature can be added as the first “word” boy good , a, is , This,

Inference in Seq2Seq models There are |V| choices for each word and so, the search space has |𝑉| 𝑇 sequences. So, inferring the “most-likely” sequence under the model is NP-hard. A tractable alternative is to use greedy approximate methods that decode sequences word by word – in a left to right manner.

Inference in Seq2Seq models Now that we have a “trained” model, how do we generate sequences ? Method 1: Sampling Does not necessarily generate “good quality sequences” Initial words are crucial in deciding the sentence – sampling a sub-optimal continuation can hurt

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step This A

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step #sequences grows as 𝐵 𝑇 ! Solution: Retain top-B sequences In other words, Beam Search = truncated BFS is This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is Top-2 sequences are selected This picture man A person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step is This picture man A Continuations are discarded person

Inference in Seq2Seq models Method 2: Beam Search Instead of sampling, select top-B words at each time step a is the This shows picture Till end token is generated or max time is

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. Beam Search outputs (B=4) A kitchen with a stove. A kitchen with a stove and a sink. A kitchen with a stove and a microwave A kitchen with a stove and a refrigerator

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. Beam Search outputs (B=4) A woman and a child sitting at a table with food. A woman and a child sitting at a table with a plate of food. A woman and a child sitting at a table eating food. A little girl sitting at a table with a plate of food.

Problems with Decoding Beam Search tends to produce nearly identical sequences that typically differ in the endings. This is not accurate when there are multiple “correct” sequences For example, the same image can be explained in many ways – you can talk about different objects in the image, different perspectives, etc. Multiple correct translations! Also, computationally wasteful – since you need to forward the same set of inputs into the RNN repeatedly!

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences log-likelihood diversity term This Person

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is Person in

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is a Cannot select ‘a’ if ∆ say hamming distance Person in a

Inference in Seq2Seq models Method 3: Diverse Beam Search Select top-B words that result in “different” sequences This is a Cannot select ‘a’ if ∆ say hamming distance Person in striped

Diverse Beam Search outputs (B=4) Modifies the inference procedure to produce “diverse” lists. Diverse Beam Search outputs (B=4) A kitchen with a stove and a microwave. A kitchen with a stove and a sink. A kitchen with a sink and a refrigerator. The kitchen is clean and ready to be used.

Diverse Beam Search outputs (B=4) Modifies the inference procedure to produce “diverse” lists. Diverse Beam Search outputs (B=4) A woman and a child are eating a meal. A woman and a child sitting at a table with a plate of food. A young girl is eating a piece of cake. Two girls are sitting at a table with a cake.

Diverse Beam Search Modifies the inference procedure to produce “diverse” lists. Tries to capture the ”inherent multi-modal” nature of the task Quantitatively finds better (more human-like) captions Diversity comes for free (almost!) Requires about the same memory and computations

General Problems in Sequence Modeling Loss – Evaluation Mismatch: We care about producing “human-like” captions but optimize for a surrogate loss i.e. log-likelihood of ground truth caption given image. Train – Test Mismatch: At train time, the model is not exposed to its own outputs. But at test time, the input is sampled (or selected) from its own previous predictions. Broken Inference: Approximate (left-right greedy) inference methods. Training is not “aware” of the inference

Work with Michael Cogswell Ramprasath Selvaraju Qing Sun David Crandall Stefan Lee Dhruv Batra paper: https://arxiv.org/abs/1610.02424 code: https://github.com/ashwinkalyan/dbs/ email: ashwinkv@gatech.edu

Questions ?

Interesting Related Papers Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning [paper] Sequence-to-Sequence Learning as Beam-Search Optimization [paper] Learning to Decode for Future Success [paper] Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training [paper] Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks [paper]