Presentation is loading. Please wait.

Presentation is loading. Please wait.

SBU Digital Media CSE 595 Words and Pictures Tamara L. Berg SUNY Stony Brook.

Similar presentations


Presentation on theme: "SBU Digital Media CSE 595 Words and Pictures Tamara L. Berg SUNY Stony Brook."— Presentation transcript:

1 SBU Digital Media CSE 595 Words and Pictures Tamara L. Berg SUNY Stony Brook

2 SBU Digital Media Class Info  CSE 595: Words & Pictures  Instructor: Tamara Berg (tlberg@cs.sunysb.edu) Office: 1411 Computer Science Lectures: Tues/Thurs 1:20-2:20pm Rm 2129 CS Office Hours: Tues/Thurs 2:20-3:20pm and by appt.Tamara Berg  Course Webpage: http://tamaraberg.com/teaching/Fall_12/wordspics http://tamaraberg.com/teaching/Fall_12/wordspics

3 SBU Digital Media About Me Joined Stony Brook in 2008 –PhD from UC Berkeley 2007. –2007-2008 Yahoo! Research Research in computer vision and natural language processing - combining information from multiple forms of digital media for applications like image search and recognition.

4 SBU Digital Media You?  MS/PhD?  Experience in Comp Vision, Natural Language Processing, AI, Machine Learning?  Familiar with Matlab?

5 SBU Digital Media What’s in this picture?

6 SBU Digital Media What does the picture tell us?  Green, textured region – maybe tree?  Fuzzy black thing with a face-like part -- maybe an animal?

7 SBU Digital Media What do the words tell us? Tags: leaves, endangered, green, i love nature, chennai, nilgiri langur, monkey, forest, wildlife, perch, black, wallpaper, ARK OF WILDLIFE, topv111, WeeklySurvivor, top20HallFame, topv333, 100v10f, captive, simian

8 SBU Digital Media What do words+picture tell us? Tags: leaves, endangered, green, i love nature, chennai, nilgiri langur, monkey, forest, wildlife, perch, black, wallpaper, ARK OF WILDLIFE, topv111, WeeklySurvivor, top20HallFame, topv333, 100v10f, captive, simian

9 SBU Digital Media  Consumer Photo Collections Over the hills and far away Road, Hills, Germany, Hoffenheim, Outstanding Shots, specland, Baden- Wuerttemberg Heavenly Peacock, AlbinoPeacock, WhiteBeauty, Birds, Wildlife, FeathredaleWildlifePark, PictureAustralia, ImpressedBeauty End of the world - Verdens Ende - The lighthouse 1 Verdens ende, end of the world, norway, lighthouse, ABigFave, vippefyr, wood, coal Flickr – 3+ billion photographs, 3-5 million uploaded per day

10 SBU Digital Media Museum and Library Collections  Fine Arts Museum of San Francisco (82,000 images) Woman of Head Howard H G Mrs Gift America North bust States United Sculpture marble bowl stemmed small Irridescent glass  New York Public Library  Digital Collection The new board walk, Rockaway, Long Island Part of New England, New York, east New Iarsey and Long Iland.

11 SBU Digital Media Web Collections Billions of Web Pages

12 SBU Digital Media Video OUTSIDE IN THE RAIN THE SENATOR WEARING HIS UH BASEBALL CAP A BOSTON RED SOX CAP AS HE TALKED TO HIS SUPPORTERS HERE IN THE RAIN THE UH SENATOR THEY'RE DOING HIS BEST TO TRY TO MAKE HIS CASE THAT HE WILL BE THE MAN FOR THE MIDDLE CLASS AND UH TRY TO CONVINCE HIS SUPPORTERS TO EXPRESS THEIR SUPPORT THROUGH A VOTE ON TUESDAY IN THERE WE ARE TWENTY FOUR HOURS FROM THE GREAT MOMENT THAT THE WORLD IN AMERICA IS WAITING FOR IT I NEED TO YOU IN THESE HOURS TO GO OUT AND DO THE HARD WORK NOT ON THOSE DOORS MAKE THOSE PHONE CALLS TO TALK TO FRIENDS TAKE PEOPLE TO THE POLLS HELP US CHANGE THE DIRECTION OF THIS GREAT NATION FOR THE BETTER CAN YOU IMAGINE A UH SENATOR BEGINNING HIS DAY IN FLORIDA TODAY TrecVid 2006 – video frames with speech processing output

13 SBU Digital Media Consumer Products Soft and glossy patent calfskin trimmed with natural vachetta cowhide, open top satchel for daytime and weekends, interior double slide pockets and zip pocket, seersucker stripe cotton twill lining, kate spade leather license plate logo, imported. 2.8" drop length 14"h x 14.2"w x 6.9"d Katespade.com It's the perfect party dress. With distinctly feminine details such as a wide sash bow around an empire waist and a deep scoopneck, this linen dress will keep you comfortable and feeling elegant all evening long. * Measures 38" from center back, hits at the knee. * Scoopneck, full skirt. * Hidden side zip, fully lined. * 100% Linen. Dry clean. bananarepublic.com Internet retail transactions in 2006, 2007 of $145 billion, $175 billion (Forrester Research).

14 SBU Digital Media Lots of Data!

15 SBU Digital Media What do we want to do?

16 SBU Digital Media What do we want to do? Organize Search Browse

17 SBU Digital Media What do we want to do? Organize Search Browse

18 SBU Digital Media What do we want to do? Organize Search Browse Computing Iconic Summaries for General Visual Concepts. R. Raguram and S. Lazebnik, 2008.

19 SBU Digital Media What do we want to do? Image Search circa 2007 Organize Search Browse

20 SBU Digital Media What do we want to do? Image Search now Organize Search Browse

21 SBU Digital Media What do we want to do? Image re-ranking for “monkey” Tamara L Berg, David A Forsyth, Animals on the Web CVPR 2006 Organize Search Browse

22 SBU Digital Media What do we want to do? Visual shopping at like.com Organize Search Browse

23 SBU Digital Media What do we want to do? Visual attribute discovery Tamara L Berg, Alexander C Berg, Jonathan Shih Automatic Attribute Discovery and Characterization from Noisy Web Data ECCV 2010 Organize Search Browse

24 SBU Digital Media What do we want to do? Visual attribute discovery J. Wang, K. Markert, and M. Everingham. "Learning models for object recognition from natural language descriptions” BMVC 2009. Organize Search Browse

25 SBU Digital Media Types of Words & Pictures

26 SBU Digital Media General web pages

27 SBU Digital Media General web pages Image re-ranking for “monkey” Tamara L Berg, David A Forsyth, Animals on the Web CVPR 2006 Improving Search

28 SBU Digital Media General web pages Harvesting Image Databases from the Web Schroff, F., Criminisi, A. and Zisserman, A. ICCV 2007. Mining to build big computer vision data sets.

29 SBU Digital Media General web pages Pros? Cons?

30 SBU Digital Media Tags or keywords + images Tags: canon, eos, macro, japan, frog, animal, toad, amphibian, pet, eye, feet, mouth, finger, hand, prince, photo, art, light, photo, flickr, blurry, favorite, nice.

31 SBU Digital Media Tags or keywords + images Gang Wang, Derek Hoiem, and David Forsyth, Building text features for object image classification. CVPR, 2009. Using tags and similar images for novel image classification

32 SBU Digital Media Tags or keywords + images Tag Order as implicit cue to expected size “Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags” Sung Ju Hwang and Kristen Grauman

33 SBU Digital Media Tags or keywords + images Tags: canon, eos, macro, japan, frog, animal, toad, amphibian, pet, eye, feet, mouth, finger, hand, prince, photo, art, light, photo, flickr, blurry, favorite, nice. Pros? Cons?

34 SBU Digital Media President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Captioned images

35 SBU Digital Media President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Captioned images for face labeling Captions provide direct information about depiction!

36 SBU Digital Media Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation Jie Luo, Barbara Caputo, Vittorio Ferrari NIPS 2009 Captioned images for face and pose labeling

37 SBU Digital Media Videos with transcripts

38 SBU Digital Media M. Everingham, J. Sivic, and A. Zisserman. Hello! My name is... Buffy' - Automatic naming of characters in TV video BMVC 2006. Videos with transcripts for face labeling

39 SBU Digital Media Learning by Watching

40 SBU Digital Media P. Buehler, M. Everingham, and A. Zisserman. "Learning sign language by watching TV (using weakly aligned subtitles)". CVPR 2009. Learning Sign Language

41 SBU Digital Media Learning to Sportscast: A Test of Grounded Language Acquisition (2008) David L. ChenDavid L. Chen and Raymond J. MooneyRaymond J. Mooney Learning to Sportscast

42 SBU Digital Media Learning About Semantics

43 SBU Digital Media Traditional Recognition car shoe person

44 SBU Digital Media Beyond traditional recognition

45 SBU Digital Media Beyond traditional recognition “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – Scarlett O’Hara, Gone with the Wind.

46 SBU Digital Media Attributes Visual attribute learning from text Tamara L Berg, Alexander C Berg, Jonathan Shih Automatic Attribute Discovery and Characterization from Noisy Web Data ECCV 2010

47 SBU Digital Media Object relationships

48 SBU Digital Media Object relationships Object relationships – prepositions & adjectives Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers Abhinav Gupta and Larry S. Davis In ECCV 2008 Car is on the street

49 SBU Digital Media Cross-Language Learning Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images Shane Bergsma and Benjamin Van Durme 2011

50 SBU Digital Media Descriptive Text  Visually descriptive language offers:  1) information about the world, especially the visual world.  2) training data for how people construct natural language  to describe imagery. “It was an arresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel, starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black brows slanted upward, cutting a startling oblique line in her magnolia-white skin–that skin so prized by Southern women and so carefully guarded with bonnets, veils and mittens against hot Georgia suns” – Scarlett O’Hara, Gone with the Wind.

51 SBU Digital Media Generating descriptions for images

52 SBU Digital Media Generating Captions for News Images with Articles How Many Words is a Picture Worth? Automatic Caption Generation for News Images” Feng & Lapata 2010

53 SBU Digital Media Generating Simple Descriptions for images “This picture shows one person, one grass, one chair, and one potted plant. The person is near the green grass, and in the chair. The green grass is by the chair, and near the potted plant.” Baby Talk: Understanding and Generating Simple Image DescriptionsBaby Talk: Understanding and Generating Simple Image Descriptions (2011) Girish KulkarniGirish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, Tamara L. BergVisruth PremrajSagnik DharSiming LiYejin ChoiAlexander C. BergTamara L. Berg

54 SBU Digital Media Im2Text: Describing Images Using 1 Million Captioned Photographs Vicente Ordonez, Girish Kulkarni, Tamara L. Berg Stony Brook University NIPS 2011 One of the many stone bridges in town that carry the gravel carriage roads. An old bridge over dirty green water. A stone bridge over a peaceful river. Generate Natural Sounding Descriptions

55 SBU Digital Media Summary  Enormous amounts of data.  Lots of commercial and academic applications.  We should combine information from words & pictures intelligently.

56 SBU Digital Media Overall Class Goal  Gain exposure to interesting and current research on Words&Pictures No prior experience in Computer Vision or Natural Language Processing is required.  We will be reading a variety of research papers over the course of the semester  Please read the papers!

57 SBU Digital Media General knowledge lectures Computer Vision Natural Language Processing Features & Representations Clustering Discriminative Models & Classification Generative & Topic Models

58 SBU Digital Media Your responsibilities  Homework – 3 relatively simple assignments.  Project – final project including proposal, update, and final presentation & write-up.  Participation – read papers and participate in topic discussions.  Topic presentations – one in class topic presentation in groups of 4-5. 30% 10% Late assignments/projects will be accepted with a 10% reduction in value per day late. 

59 SBU Digital Media Homework & Projects  Assignments should be completed individually in matlab.  Projects will be in groups of 3 and can be completed in the language of your choice on the topic of your choice (must involve text and images/video).

60 SBU Digital Media Participation Experiment  Goal: interesting, lively discussions about research topics.  To encourage this goal at the end of each class please submit a paper noting how many (if any) questions you posed, answers you provided, or significant comments you made.  If this does not work, we will revert to having short sporadic pop quizzes on papers.

61 SBU Digital Media Note about papers  You won’t understand everything, especially at first.  Don’t sweat the small stuff.  Try to grasp the overall idea, what’s novel, what’s interesting, pros/cons of the method, how it relates to other things we’ve read.

62 SBU Digital Media Topic Presentations  You will give one topic presentation during the semester in groups of 4-5.  Suggested papers for each topic presentations are listed on the course website.  You are welcome to swap papers (if relevant to your topic), but please ask me at least 1 week prior to the presentation.

63 SBU Digital Media Reference Books  1) Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003.Computer Vision: A Modern Approach  2) Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.Multiple View Geometry in Computer Vision  3) Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, McGraw Hill, 2008.SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition  4) Christopher D. Manning, and Hinrich Schuetze. Foundations of Statistical Natural Language Processing Foundations of Statistical Natural Language Processing

64 SBU Digital Media For next class  Get access to matlab  Student Matlab licenses can be purchased from mathworks for $99  Do a matlab tutorial  One link on the course website, many others are available online.

65 SBU Digital Media Class Info  CSE 595: Words & Pictures  Instructor: Tamara Berg (tlberg@cs.sunysb.edu) Office: 1411 Computer Science Lectures: Tues/Thurs 1:20-2:20pm Rm 2129 CS Office Hours: Tues/Thurs 2:20-3:20pm and by appt.Tamara Berg  Course Webpage: http://tamaraberg.com/teaching/Fall_12/wordspics http://tamaraberg.com/teaching/Fall_12/wordspics


Download ppt "SBU Digital Media CSE 595 Words and Pictures Tamara L. Berg SUNY Stony Brook."

Similar presentations


Ads by Google