Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal.

Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal Perception and Prediction in Ego-Centric Video – Yipin Zhou & Tamara L.Berg

Predicting the location of “interactees”

What is the interactee : The person\object that another person interacting with.

Related Work : Human-object interactions for recognition [1] Peursum, P., West, G., Venkatesh, S.: Combining image regions and human activity for indirect object recognition in indoor wide-angle views. In: ICCV. (2005) [2] Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: using spatial and functional compatibility for recognition. PAMI 31 (2009) [3] Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: Workshop on Structured Models in Computer Vision, Computer Vision and Pattern Recognition (SMiCV). (2010) [4] Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR. (2010) [5] Yao, B., Fei-Fei, L.: Grouplet: A structured image representation for recognizing human and object interactions. In: CVPR. (2010) [6] Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: ECCV. (2010) [7] Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. PAMI 34 (2012) 601–614 [8] Delaitre, V., Fouhey, D., Laptev, I., Sivic, J., Gupta, A., Efros, A.: Scene semantics from long-term observation of people. In: ECCV. (2012)

Carried object detection Methods to detect carried objects [1] Haritaoglu, I., Harwood, D., Davis, L.: W4: real-time surveillance of people and their activities. PAMI (2000) [2] Damen, D., Hogg, D.: Detecting carried objects in short video sequences. In: ECCV. (2008)

[1] Spain, M., Perona, P.: Some objects are more equal than others: Measuring and predicting importance. In: ECCV. (2008) [2] Hwang, S.J., Grauman, K.: Learning the relative importance of objects from tagged images for retrieval and cross-modal search. IJCV 100 (2012) 134–153 [3] Berg, A., Berg, T., Daume, H., Dodge, J., Goyal, A., Han, X., Mensch, A.,Mitchell, M., Sood, A., Stratos, K., Yamaguchi, K.: Understanding and predicting importance in images. In: CVPR. (2012) [4] Liu, T., Sun, J., Zheng, N., Tang, X., Shum, H.: Learning to detect a salient object. In: CVPR. (2007) [5] Spain, M., Perona, P.: Some objects are more equal than others: Measuring and predicting importance. In: ECCV. (2008) [6] Endres, I., Hoiem, D.: Category independent object proposals. In: ECCV. (2010) [7] Lee, Y.J., Kim, J., Grauman, K.: Key-Segments for Video Object Segmen- tation. In: ICCV. (2011) [8] Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR. (2010)

Approach : data collection description interactee definition learning and prediction procedures Applications of interactee prediction

Our definition considers tow main issues :

Specifically, we say that an image displays a human-interactee interaction if either of the following holds: 1. The person is watching a specific object or person and paying specific attention to it. 2. The person is physically touching another object/person with a specific pur- pose

Interactee dataset collection [1] Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: Large-scale scene recognition from abbey to zoo. In: CVPR. (2010) [2] Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88 (2010) 303–338 255/130,519 754/28,952

We use Amazon Mechanical Turk (MTurk) to get bounding box annotations for the people and interactees in each image. The online interface instructs the annotators how to determine the interactee using the definition we mentioned

* Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR. (2011) fp :=Body pose fo:=Orientation of head and torso, fs:=Scene layout,

Avidan, S., Shamir, A.: Seam carving for content-aware image resizing. ACM Trans. Graph. 26 (2007) 10

Applications of interactee prediction our method can be used to improve: 1)object detection speed/accuracy (Since one can avoid sliding windows and ignore places that are unlikely to have objects involved in the interaction) 2) image retargeting. Typically retargeting methods try to avoid destroying key gradients in the image, or aim to preserve the people or other foreground objects. Our idea is to protect not only the people in the image from distortion, but also their predicted interactees.

Experimental Results We evaluate four things: (1) how accurately do we predict interactees, compared to several baselines? (2) how well can humans perform this task? The following table shows the human subjects’ results alongside ours, for the subset of images in either dataset where the interactee is not visible within the person bounding box (3) does interactee localization boost object detection? We see our method outperforms the baselines, While Ours uses action-independent training as usual, we also show a variant of our method where the MDN is trained only with images from the proper action class (see Ours (categ-dep)). As expected, this further helps accuracy. we run the Deformable Part Model (DPM) object detector on the entire image, then we apply our method/baselines (4) does interactee localization help retargeting?

Temporal Perception and Prediction in Ego-Centric

In this work we introduce two tasks related to temporal prediction: 1) given two short video snippets of an activity, the goal is to predict their correct temporal ordering : 1) given two short video snippets of an activity, the goal is to predict their correct temporal ordering :

2) given a longer context video plus two video snippets sampled from before or after thecontext video, the goal is to predict which video snippetwas captured closest in time after the context video: 2) given a longer context video plus two video snippets sampled from before or after thecontext video, the goal is to predict which video snippetwas captured closest in time after the context video:

for predicting pairwise temporal ordering is evaluated the following 5 methods : for predicting pairwise temporal ordering is evaluated the following 5 methods : NN Frac NN DTW LRSVMFcNet

Future Prediction Task

Personalized models outperform the general models significantly,human performance is quit good

Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal.

Similar presentations

Presentation on theme: "Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal.

Similar presentations

Presentation on theme: "Ghadeer Abu Hariri – Weizmann institute Predicting the location of “interactees” in novel human-object interactions – Chao-yeh Chen & Kristen Grauman Temporal."— Presentation transcript:

Similar presentations

About project

Feedback