Learning from Descriptive Text Tamara L Berg Stony Brook University

Slides:



Advertisements
Similar presentations
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Advertisements

Adding Details to a Story Narrative Writing Lesson Craft Lessons page.30.
A View from my window Fajr Fatima, Grade 7 From my window, I can see many flowers and even a banana tree. There are cars parked outside the driveways of.
Automatic Photo Maker This program will produce a photo of you by using the data you insert, describing your facial features. BFJ.
Sammy the Snowman Discovers Shapes.
Learn the Language of Fashion Intro to Fashion Design Your Name:
(Surprised) His eyes got wider and wider, like the dog in the fairy tale with eyes as big as dinner plate. (P. 29)
Oral English week 6. Age Young middle-aged elderly old Build Fat thin slim medium-build broad shoulders(m) Height 1.70m medium height tall short tallish.
6/3/2014 LOUISIANA STATE UNIVERSITY 1 Projecting a Professional Image Career Services says...
Beyond Attributes -> Describing Images
Elements and Principles of Design. Vertical Line Perpendicular to the line of horizon; upright; going from floor to ceiling This room is great when explaining.
Bringing Semantics Into Focus Using Visual Abstraction Larry ZitnickDevi Parikh Microsoft ResearchVirginia Tech Larry ZitnickDevi Parikh Microsoft ResearchVirginia.
A Sunday Afternoon on the Island of La Grande Jatte.
Sight Word Phrases Group 1.
Adding Unlabeled Samples to Categories by Learned Attributes Jonghyun Choi Mohammad Rastegari Ali Farhadi Larry S. Davis PPT Modified By Elliot Crowley.
Dr. Bill Vicars Lifeprint. com. Practice Sheet: 14.A.
1. wall 2. floor 3. window 4. curtain 5. carpet 6. desk 7. vase 8. chair 9. lamp 10. bedside table 11. computer 12. computer discs 13. sofa 14. pillow.
Match the descriptions of the rooms with the pictures.
DESCRIBING PEOPLE guide
Chapter 8 Adjectives ADJECTIVES  What is adjective?  Adjective is used to describe a noun.  Where should an adjective be placed?  a. in front of.
UCB Computer Vision Animals on the Web Tamara L. Berg CSE 595 Words & Pictures.
Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.
Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Automatic Photo Maker This program will produce a photo of you by using the data you insert, describing your facial features.
Literacy Through Photography Project Clyde L. Rice, Jr. RE July 29, 2010.
Tinley Class made their own Alien puppet as part of our ‘Out of this World’ topic. The children in Reception have described their aliens for you.
Counting Crows Lyrics By James O’Donnell. Song: Round Here Step out the front door like a ghost Into the fog where no one notices The contrast of white.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
SBU Digital Media CSE 595 Words and Pictures Tamara L. Berg SUNY Stony Brook.
Human abilities Presented By Mahmoud Awadallah 1.
DESCRIPTION PATTERN OF EXPOSITION AP LANG. Describing places with tactile images:  …with tactile imagery:  The steamy closeness of the Brazilian jungle:
Trinity Exam Grade 1 David Begoña Andraka Isabel Andraka.
Studying Relationships Between Human Gaze, Description, and Computer Vision Kiwon Yun 1, Yifan Peng 1 Dimitris Samaras 1, Gregory J. Zelinsky 1,2, Tamara.
Created by Verna C. Rentsch and Joyce Cooling Nelson School
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
First Grade Library Skills Lesson Interprets information represented in pictures, illustrations, and simple charts and verbalizes the main idea.
Group 5 Bushra, Sahana and Drew. Blue and yellow vertical lines Can stretch or shrink to a limit Very lightweight Hollow, holes at both ends Is possible.
Easy Creativity For Kids 7 simple (and fun!) ways to sneak creativity in every day ©2013, 2014 TinyRottenPeanuts.com TinyRottenPeanuts.com.
Editing Clipart On the following slides, make the requested changes only to the large clipart. The small clipart is for reference. BEFORE AFTER Right click.
Automatic Photo Maker This program will produce a photo of you by using the data you insert, describing your facial features.
TODAY’S AGENDA 9/11 FTF - Identify the following as Inference or Observation A. The grass in the front of the school is wet. B. Maybe it rained. C. The.
 Aparently this Coca Cola advert is normal, but… Did you read well? With a naked eye you see the name of the brand and you don’t realize that the image.
Horse.
SCENE The Visual Imagery Strategy. Pre-Test  Today- Part 1: Our Purpose: How well you understand the information.  Tomorrow- Part 2: Our Purpose: How.
Design and Composition Project Written by: Samantha MacKay.
Automatic Photo Maker This program will produce a photo of yourself by using the information you provide describing your features.
Describing a person Fill-in-the-blanks version
Planning. Post Box Material: Rough & has small indents Textures: Rusty metal Shapes: Cylinders, Circles, Rectangles, Squares Colors: Red, Black & White.
Make Your Face, Get Acquainted Materials: White letter size paper Colors Pencils.
Steady camera = Sharp pictures The most important point to remember when taking pictures is to keep the camera steady. To guarantee the sharpest pictures,
Are they fridges / …? ______________ Yes, they are. No, they aren’t.
This program can produce a portrait of your face based on the information you submit... Click here.
The lovely girl on a dovecot. Holds in hand thoroughbred white to a pigeon. Warm, solar, kind picture. Will become a good gift, will decorate the nursery,
High Frequency words Kindergarten review. red yellow.
OPTICAL ILLUSIONS....
THE ELEMENTS AND PRINCIPLES OF DESIGN.
CS 1674: Intro to Computer Vision Recurrent Neural Networks
Optical Illusions.
Shoping bag with black and gold squares..
Describing a person Fill-in-the-blanks version
Automatic Photo Maker This program will produce a photo of you by using the data you insert, describing your facial features.
Similes and Metaphors in Poetry
Automatic Photo Maker This program will produce a photo of you by using the data you insert, describing your facial features.
Making Inferences.
Presented by Wanxue Dong
ANIMALS HOUSE CLOTHES WEATHER FOOD
Using More Colorful Adjectives
Making Inferences.
Presentation transcript:

Learning from Descriptive Text Tamara L Berg Stony Brook University

Vision Language Humans Tags: canon, eos, macro, japan, vacation, frog, animal, toad, amphibian, pet, eye, feet, mouth, finger, hand, prince, photo, art, light, photo, flickr, blurry, favorite, nice. Language Humans It's the perfect party dress. With distinctly feminine details such as a wide sash bow around an empire waist and a deep scoopneck, this linen dress will keep you comfortable and feeling elegant all evening long.

Visually Descriptive Text “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – Gone with the Wind How do people describe the world? Visually descriptive language provides: information about how people construct natural language for imagery. information about the world, especially the visual world. guidance for computational visual recognition. How does the world work? What should we recognize?

Visually Descriptive Text “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – from Gone with the Wind by Margaret Mitchell How do people describe the world? Visually descriptive language provides: information about how people construct natural language for imagery. information about the world, especially the visual world. guidance for computational visual recognition. How does the world work? What should we recognize?

What’s in a description? What’s in this image? man baby sling shirt glasses ladder fridge table watermelon chair boxes cups water bottle wall pacifier beard … What do people describe? “A bearded man is holding a child in a sling.” “A bearded man stands while holding a small child in a green sheet.” “A bearded man with a baby in a sling poses.” “Man standing in kitchen with little girl in green sack.” “Man with beard and baby”

What’s in a description? ✔ women ✔ bench 1) “two women sitting brunette blonde on bench reading magazine” ✔ magazine ✖ grass ✖ Predict what people will describe skirt … Given an image e.g. Spain & Perona, 2010 ✔ clouds ✖ “looking for castles in the clouds out my car window” car 2) ✖ window castle ? Given a caption Predict what’s in the image

Who’s in the picture? T.L. Berg, A.C. Berg, J. Edwards, D.A. Forsyth President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters . Model Accuracy of labeling Vision model, No Lang model 67% Vision model + Lang model 78%

Visually Descriptive Text “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – from Gone with the Wind by Margaret Mitchell How do people describe the world? Visually descriptive language provides: information about how people construct natural language for imagery. information about the world, especially the visual world. guidance for computational visual recognition. How does the world work? What should we recognize?

Vision is hard Green sheep World knowledge (from descriptive text) can be used to smooth noisy vision predictions!

Learning World Knowledge BabyTalk: Understanding and Generating Simple Image Descriptions Kulkarni, Premraj, Dhar, Li, Choi, AC Berg, TL Berg, CVPR 2011 Attributes green green grass by the lake a very shiny car in the car museum in my hometown of upstate NY. Relationships very little person in a big rocking chair Our cat Tusik sleeping on the sofa near a hot radiator.

System Flow Extract Objects/stuff Predict prepositions a) dog b) person c) sofa Predict prepositions a) dog b) person c) sofa near(a,b) 1 near(b,a) 1 against(a,b) .11 against(b,a) .04 beside(a,b) .24 beside(b,a) .17 ... near(a,c) 1 near(c,a) 1 against(a,c) .3 against(c,a) .05 beside(a,c) .5 beside(c,a) .45 near(b,c) 1 near(c,b) 1 against(b,c) .67 against(c,b) .33 beside(b,c) .0 beside(c,b) .19 brown 0.32 striped 0.09 furry .04 wooden .2 Feathered .04 ... brown 0.94 striped 0.10 furry .06 wooden .8 Feathered .08 brown 0.01 striped 0.16 furry .26 feathered .06 a) dog b) person c) sofa Predict attributes Predict labeling – vision potentials smoothed with text potentials <<null,person_b>,against,<brown,sofa_c>> <<null,dog_a>,near,<null,person_b>> <<null,dog_a>,beside,<brown,sofa_c>> Generate natural language description This is a photograph of one person and one brown sofa and one dog. The person is against the brown sofa. And the dog is near the person, and beside the brown sofa. Input Image

BabyTalk results Objects, Attributes, Prepositions This is a picture of one sky, one road and one sheep. The gray sky is over the gray road. The gray sheep is by the gray road. Here we see one road, one sky and one bicycle. The road is near the blue sky, and near the colorful bicycle. The colorful bicycle is within the blue sky. Objects, Attributes, Prepositions This is a picture of two dogs. The first dog is near the second furry dog.

Visually Descriptive Text “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – from Gone with the Wind by Margaret Mitchell How do people describe the world? Visually descriptive language provides: information about how people construct natural language for imagery. information about the world, especially the visual world. guidance for computational visual recognition. How does the world work? What should we recognize?

What should we recognize? Recognition is beginning to work Open question – what should we recognize? Maybe objects aren’t (always) the right base level entities

Object Recognition Parts, Poselets, Attributes For example: [Fergus, Perona, Zisserman2003], [Bourdev, Malik2009], … Slide Credit: Ali Farhadi

Learn which terms in descriptions are depictable attributes Automatically Discovering Attributes from Noisy Web Data T.L. Berg, A.C. Berg, J. Shih ECCV 2010 Fully beaded with megawatt crystals, this Christian Louboutin suede pump matches the gleam in your eye. Pump's linear heel plays up the alluring curves of its dipped sides. Round toe frames low-cut vamp. Tonally topstitched collar. 4" straight, covered heel shows off signature red sole. Creamy leather lining with padded insole. "Fifi" is made in Italy. Learn which terms in descriptions are depictable attributes

Given Web Images + Noisy Text Descriptions: 1) Discover visual attribute terms in text descriptions - likely domain dependent 2) Learn appearance models for attributes without labeled data 3) Characterize attributes by: type, localizability

Object Recognition Scenes For example: [Oliva, Torralba 2001], [SUN 2010], … Slide Credit: Ali Farhadi

What are the right quanta of Recognition? Farhadi & Sadeghi Recognition using Visual Phrases , CVPR 2011

Participating in Phrases Profoundly affects the appearance of objects Farhadi & Sadeghi Recognition using Visual Phrases , CVPR 2011

What should we recognize? “a sleeping dog in NTHU” “the dog is sleeping” Maybe descriptive text can inform entity hypotheses! “A dog is sleeping in” “sleeping dog in delhi”

What should we recognize? 2222 What should we recognize? “the cat is in the bag” “cat in a bag” “cat in bag” “cat in the bag” Maybe descriptive text can inform entity hypotheses!

Conclusion Use large pools of descriptive text to: Learn how people describe the visual world Learn how the world works Guide future efforts in recognition Apply this knowledge to multi-modal collections & applications

Acknowledgements Collaborators: Alex Berg, David Forsyth, Jaety Edwards, Jonathan Shih, Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Vicente Ordonez, Siming Li, Yejin Choi, Kota Yamaguchi, Vicente Ordonez Funded by NSF Faculty Early Career Development (CAREER) Program: Award #1054133