Learning to Detect Human-Object Interactions with Knowledge

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Data Structures For Image Analysis
Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Source-Selection-Free Transfer Learning
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Recognition Using Visual Phrases
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Matching Geometric Models via Alignment Alignment is the most common paradigm for matching 3D models to either 2D or 3D data. The steps are: 1. hypothesize.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.
DeepWalk: Online Learning of Social Representations
New NSW Geography syllabus 7-10
Scalable Person Re-identification on Supervised Smoothed Manifold
Convolutional Neural Network
Graph Representations
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational.
Learning to Generate Networks
Pick samples from task t
Recognizing Deformable Shapes
Compositional Human Pose Regression
Nonparametric Semantic Segmentation
Presenter: Hajar Emami
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
On the Generative Discovery of Structured Medical Knowledge
Context-Aware Modeling and Recognition of Activities in Video
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Building and Analyzing Genome-Wide Gene Disruption Networks
Learning with information of features
The Open World of Micro-Videos
RIO: Relational Indexing for Object Recognition
Brief Review of Recognition + Context
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Presented by: Jacky Ma Date: 11 Dec 2001
Outline Background Motivation Proposed Model Experimental Results
Anastasia Baryshnikova  Cell Systems 
Graphs G = (V, E) V are the vertices; E are the edges.
Graph Vocabulary.
LOGAN: Unpaired Shape Transform in Latent Overcomplete Space
Jana Diesner, PhD Associate Professor, UIUC
VERB PHYSICS: Relative Physical Knowledge of Actions and Objects
Human-object interaction
Visual Question Answering
Keshav Balasubramanian
Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.
Task Fashion Landmark Detection. Task Fashion Landmark Detection.
Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.
Motivation Semantic Transformation Module Most of the existing works neglect the semantic relationship between the visual feature and linguistic knowledge,
Modeling IDS using hybrid intelligent systems
Presented By: Harshul Gupta
Background Task Fashion image inpainting Some conceptions
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Deep Structured Scene Parsing by Learning with Image Descriptions
Peng Cui Tsinghua University
SFNet: Learning Object-aware Semantic Correspondence
Improving Cross-lingual Entity Alignment via Optimal Transport
Motivation The subjects/objects are correlated to each other under semantic relationships.
Construcing Narrative Event Evolutionary Graph for Script Event Prediction 刘潇远.
Multi-Target Detection and Tracking of UAVs from a UAV
Learning to Cluster Faces on an Affinity Graph
Visual Grounding.
ICCV 2019.
CVPR 2019 Poster.
Presentation transcript:

Learning to Detect Human-Object Interactions with Knowledge

Motivation In HOI detection task, the label space is often large and intrinsically having long-tail distribution issue. Verbs and objects in HOIs share certain characteristics across various types of scenes.

Contributions Construct a knowledge graph to model the dependencies of the verbs and object categories. offer a new perspective into HOI detection with multi-modal embeddings Achieve improved performance on two benchmarks

Method Framework

Method Graph Modeling for HOIs Graph architecture: GCN Operation Node: Verb word embedding or Object word embedding. Edge: connects a valid pair of verb and object category according to the ⟨verb, object⟩ annotations from training dataset, and ⟨object1, predicate, object2⟩ triplets from general visual relationships dataset. Adjacency Matrix: initialized with binary values defining the connections (or disconnections) of nodes. GCN Operation Traditional GCN:

Method Visual Representation Same as previous works: Appearance Feature: ROI Feature. Spatial Feature: Relative Location Encoding: Fused Feature:

Method Multi-Modal Joint Embedding Learning The goal is to learn the transformations of visual feature 𝑓 ℎ𝑜 𝑋 ℎ𝑜 → 𝜑 ℎ𝑜 and GCN feature 𝑓 𝑔 𝐻 𝑣 → 𝜑 𝑔 , such that the learned paired embedding Similarity Loss:

Inference

Experiments HICO-DET

Experiments Ablation Study