Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-driven Generation of Image Descriptions Vicente Ordonez-Roman The State University of New York Previously: Advisor: Tamara Berg.

Similar presentations


Presentation on theme: "Data-driven Generation of Image Descriptions Vicente Ordonez-Roman The State University of New York Previously: Advisor: Tamara Berg."— Presentation transcript:

1 Data-driven Generation of Image Descriptions Vicente Ordonez-Roman The State University of New York Previously: Advisor: Tamara Berg

2 What most Computer Vision systems aim to say about a picture sky trees water building bridge river tree Computer Vision

3 What we are able to say about a picture One of the many stone bridges in town that carry the gravel carriage roads. An old bridge over dirty green water. A stone bridge over a peaceful river. Our Goal

4 Lets just borrow captions from similar images! Im2Text: Describing Images Using 1 Million Captioned Photographs. Vicente Ordonez, Girish Kulkarni, Tamara L. Berg. Advances in Neural Information Processing Systems. NIPS 2011.

5 Harness the Web! Smallest house in paris between red (on right) and beige (on left). Bridge to temple in Hoan Kiem lake. The water is clear enough to see fish swimming around in it. A walk around the lake near our house with Abby. Hangzhou bridge in West lake. The daintree river by boat.... Images + Captions from the Web Transfer Caption(s) Matching using Global Image Features (GIST + Color) e.g. The water is clear enough to see fish swimming around in it.

6 GIST

7 Use the web to collect images + captions 6, 000, 000, 000 photographs! (*) A lot of them with captions (lots of them publicly available ) 90, 000, 000, 000 pictures~!! (**) A lot of them with captions (a lot of them not publicy available ) (*) http://blog.flickr.net/en/2011/08/04/6000000000/ (**) http://www.quora.com/How-many-photos-are-uploaded-to-Facebook-each-day

8 Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions

9 Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions Dog with a ball in its mouth running around like crazy on the green grass.

10 cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions

11 Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions cat in a sink

12 Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions

13 Dog with a ball in its mouth running around like crazy on the green grass. cat in a sink A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said Flickr images + captions A 10-kg cat called Hercules.. and got caught in a pet door when trying to sneak into another house to steal dog food. 'Nuff said

14 Solution: Collect hundreds of millions of captions Filter them out We found good captions have visual concepts and relation words by, in, over, beside, on top of ~1 good caption for every 1000 bad captions Im2Text: Describing Images Using 1 Million Captioned Photographs. Vicente Ordonez, Girish Kulkarni, Tamara L. Berg. Advances in Neural Information Processing Systems. NIPS 2011.

15 SBU Captioned Photo Dataset Our dog Zoe in her bed Interior design of modern white and brown living room furniture against white wall with a lamp hanging. The Egyptian cat statue by the floor clock and perpetual motion machine in the pantheon Man sits in a rusted car buried in the sand on Waitarere beach Emma in her hat looking super cute Little girl and her dog in northern Thailand. They both seemed interested in what we were doing 1 million captioned photos!

16 Results (1) while walking by the water (2) plane flying over the sun (3) shot this in a moving car at the nkve highway (4) sunset over creve coeur lake and the page bridge (5) sunset on 12th sep 2009 as seen from the field polder near my house (6) window over yellow door (7) sunset over capitol hill as seen from the roof of my building (8) an orange sky over the irish sea (9) beautiful golden sunset reflected in the waves of the ocean (10) red sky probably caused by volcanic ash from iceland (11) a view of sunset over river brahmaputa from koliyabhumura bridge (12) red sky in the morning

17 Results (1) burnt wooden door in derelict building portugal (2) peterborough cathedral norman door in south wall (3) amazing wooden door with wider light above (4) door in wall (5) girl looking in a classroom window (6) a interesting cross in a window of an ancient city (7) this mirror decorated with fruit painting was left behind by theprevious owners (8) unusual exterior wall postbox at st albans post office in st peters street al1 (9) door in oxford uk in black and white (10) 19 plate behind glass in brass mat and preserver (11) this is some of the window decoration external on the house justover the porch 0364 (12) cat in a window

18 Results (1) img8783 ginger in the red chair (2) red sky in the morning (3) the cat is in the bag and the bag is in the river quot (4) the light in the kitchen made everythin glow my little girl is growing up (5) my cat in a box that is far too small for her (6) one of the towel animals in the cabin edno ot jivotnite napraveno ot havlieni karpi v kabinata (7) baby in her later years turned from green to red but she never went fully red all over (8) if you take pictures through the hole in the bottom of a flower pot the whole of the eldritch world is revealed (9) glazed ceramic poop form in orange wooden box (10) rock garden in library (11) it s funny to capture the preciousest cat in the house at his most devillicious (12) the pink will get replaced by orange and blue in the fall

19 Results (1) starfish from the book toys to knitdashing dachs superwash sock yarn in goldfishbacking is orange fabricstuffing is pillow stuffing (2) mural of birds and trees in the crypt of wat ratburana ayutthaya (3) carvings in the rock wall (4) acrylic on paper scarlet macaws communicate in the color red withyellow and blue as visual grammar (5) epsom and table salt crystals growing in concentrated green tea solution (6) the hops dried to a golden green in a matter of a few days almosttoo pretty to bag up (7) after staring at the gorgeous colors of the leaves claes discoveredthat there were about 100 birds sleeping in the tree (8) you know you re in wisconsin when the beach has pine needles inthe sand (9) i was walking down the sidewalk and i saw this glove craft droppedin the dirt it seemed really unusual (10) made by fusing plastic bags (11) bark pattern from a ponderosa pine tree in grand canyon national park (12) the peasant that found a statue of the black virgin on a rock in ariver

20 What to do next?

21 Use High Level Content to Rerank (Objects, Stuff, People, Scenes, Captions) The bridge over the lake on Suzhou Street. The Daintree river by boat.Bridge over Cacapon river. Iron bridge over the Duck river.... Transfer Caption(s) e.g. The bridge over the lake on Suzhou Street.

22 Some success… Amazing colours in the sky at sunset with the orange of the cloud and the blue of the sky behind. Strange cloud formation literally flowing through the sky like a river in relation to the other clouds out there. Fresh fruit and vegetables at the market in Port Louis Mauritius. Tree with red leaves in the field in autumn. A female mallard duck in the lake at Luukki Espoo The sun was coming through the trees while I was sitting in my chair by the river Under the sky of burning clouds. Stained glass window in Eusebius church.

23 Still far from perfect Kentucky cows in a field. The cat in the window. Incorrect objects

24 Still far from perfect The sky is blue over the Gherkin. The boat ended up a kilometre from the water in the middle of the airstrip. Tree beside the river. Water over the road. Incorrect context Completely wrong

25 How to Evaluate? Ground truth: The car is parked next to the train station besides a building. Candidates: There is car parked in front of an office building This is the building that hosted the ceremony A vehicle stopped next to my house Similar to evaluation on Machine Translation

26 MethodBLEU score Global matching (1k)0.0774 Global matching (10k)0.0909 Global matching (100k)0.0917 Global matching (1million)0.1177 Global + Content matching (linear regression) 0.1215 Global + Content matching (linear SVM) 0.1259 BLEU score evaluation against Human Captions

27 Human Visual Verification View overlooking Kuala Lumpur from my office building Please choose the image that better corresponds to the given caption:

28 Human Visual Verification View overlooking Kuala Lumpur from my office building Please choose the image that better corresponds to the given caption: Caption from Flickr Random image

29 Human Visual Verification View overlooking Kuala Lumpur from my office building Please choose the image that better corresponds to the given caption: Caption from Flickr Random image Caption usedSuccess rate Original human caption96.0% Top caption66.7% Best from our top 4 captions92.7%

30 Human Visual Evaluation Caption usedSuccess rate Original human caption96.0% Top caption66.7% Best from our top 4 captions92.7% The view from the 13th floor of an apartment building in Nakano awesome. Please choose the image that better corresponds to the given caption: Caption produced by our system Random image

31 Human Visual Evaluation Caption usedSuccess rate Original human caption96.0% Top caption66.7% Best from our top 4 captions92.7% The view from the 13th floor of an apartment building in Nakano awesome. Please choose the image that better corresponds to the given caption: Caption produced by our system Random image

32 What to do next?

33 Lets not borrow captions from other images, lets just borrow short phrases! Collective Generation of Natural Image Descriptions. Polina Kuznetsova, Vicente Ordonez, Alexander C. Berg, Tamara L. Berg, Yejin Choi. Association for Computational Linguistics. ACL 2012. Large Scale Retrieval for Image Description Generation Vicente Ordonez, Xufeng Han, Polina Kuznetsova, Girish Kulkarni, Margaret Mitchell, Kota Yamaguchi, Karl Stratos, Amit Goyal, Jesse Dodge, Alyssa Mensch, Hal Daume III, Alexander C. Berg, Yejin Choi, Tamara L. Berg On Submission to IJCV special issue on Big Data.

34 Retrieving noun phrases from similar object detections

35 this dog was laying in the middle of the road on a back street in jaco Closeup of my dog sleeping under my desk. Detect: dog Find matching dog detections by visual similarity Peruvian dog sleeping on city street in the city of Cusco, (Peru) Contented dog just laying on the edge of the road in front of a house.. Retrieving verb phrases from similar object detections

36 Find matching region detections using appearance + arrangement Mini Nike soccer ball all alone in the grass Comfy chair under a tree. I positioned the chairs around the lemon tree -- it's like a shrine Object: car Cordoba - lonely elephant under an orange tree... Retrieving prepositional phrases from region + detection matches

37 Retrieving prepositional phrases from scene matches View from our B&B in this photo Extract scene descriptor Find matching images by scene similarity Pedestrian street in the Old Lyon with stairs to climb up the hill of fourviere I'm about to blow the building across the street over with my massive lung power. Only in Paris will you find a bottle of wine on a table outside a bookstore

38 Data Processing 1 million images: –Run object detectors –Run region based stuff detectors (e.g. grass, sky, etc) –Run global scene classifiers –Parse captions associated with images and retrieve phrases referring to objects (NPs, VPs), region relationships (PPstuff), and general scene context (PPscene).

39 Recognition, aka Vision is hard Detecting one hundred objects

40 Sometimes you can make it (a little) better Detecting mentioned objects The background is a vintage paint by number painting I have and the fabulous forest dress is by candyjunky! Kevins mom, so punxrawk in Kevs black flag hat Look in the mountain for a lion face Ecuador, amazon basin, near coca, rain forest, passion fruit flower

41 Everything together bird in water in Lincoln City Oregon coast Objects Actions Scene Stuff looking for food

42 Everything together Retrieved phrases bird in water on the beach bird in water in Lincoln City Oregon coast bird in water in Atlantic City looking for food

43 Binary Integer Linear Programming Phrase sij Position k Phrase spq Position k+1 Phrase sij Position k Phrase Vision Confidence Pairwise phrase cohesion Ngram cohesion Head words co- occurrence + =

44 Composing Descriptions Compose descriptions from phrases with ILP approach Linguistic constraints –Allow only one phrase of each type –Enforce plural/singular agreement between NP and VP Discourse constraints –Prevent inclusion of repeated phrasing Phrase cohesion constraints –n-gram statistics between phrases –Co-occurrence statistics between head words of phrases (last word or main verb) to encourage longer range cohesion

45 Good Results This is a sporty little red convertible made for a great day in Key West FL. This car was in the 4th parade of the apartment buildings. Taken in front of my cat sitting in a shoe box. Cat likes hanging around in my recliner. This is a brass viking boat moored on beach in Tobago by the ocean.

46 Bad Results This is a shoulder bag with a blended rainbow effect One of the most shirt in the wall of the house. Here you can see a cross by the frog in the sky. Not relevant Grammatically incorrect. Cognitive absurdity.

47 MethodBLEU score HMM (using cognitive phrases)0.111 HMM (without using cognitive phrases)0.114 ILP (using cognitive phrases)0.114 ILP (without using cognitive phrases)0.116 BLEU score evaluation

48 Human Forced Choice Evaluation Caption usedILP Selection ILP vs. HMM (no images, no cognitive phrases)67.2% ILP vs. HMM (no images, with cognitive phrases)66.3% ILP vs. HMM (with images, no cognitive phrases)53.17% ILP vs. HMM (with images, with cognitive phrases)54.5% ILP vs. NIPS 2011 (Global matching 1M)71.8% ILP vs. HUMAN16%

49 Visual Turing Test In some cases (16%), ILP generated captions were preferred over human written ones! Us vs Original Human Written Caption

50 Whats next?

51 Meaning from large-scale computer vision To be presented at ICCV 2013 Images with the word house Images recognized as more likely to produce the word house

52 Meaning from large-scale computer vision Images with the word girl Images recognized as more likely to produce the word girl To be presented at ICCV 2013

53 Meaning from large-scale computer vision Mammals Birds InstrumentsStructuresPlants Other Weights learned to recognize images with desk in caption Top weighted classifier outputs Weights learned over outputs of ~8k classifiers To be presented at ICCV 2013

54 Meaning from large-scale computer vision Mammals Birds InstrumentsStructuresPlants Other Weights learned to recognize images with tree in caption Top weighted classifier outputs Weights learned over outputs of ~8k classifiers To be presented at ICCV 2013

55 Meaning from large-scale computer vision Mammals Birds InstrumentsStructuresPlants Other Weights learned to recognize images with tree in caption Top weighted classifier outputs Weights learned over outputs of ~8k classifiers

56 Questions?


Download ppt "Data-driven Generation of Image Descriptions Vicente Ordonez-Roman The State University of New York Previously: Advisor: Tamara Berg."

Similar presentations


Ads by Google