Learning to Detect Human-Object Interactions with Knowledge

Slides:

Advertisements

Similar presentations

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Advertisements

INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Data Structures For Image Analysis

Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Source-Selection-Free Transfer Learning

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Recognition Using Visual Phrases

Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.

1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.

Matching Geometric Models via Alignment Alignment is the most common paradigm for matching 3D models to either 2D or 3D data. The steps are: 1. hypothesize.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Reading literacy. Definition of reading literacy: “Reading literacy is understanding, using and reflecting on written texts, in order to achieve one’s.

DeepWalk: Online Learning of Social Representations

New NSW Geography syllabus 7-10

Scalable Person Re-identification on Supervised Smoothed Manifold

Convolutional Neural Network

Graph Representations

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational.

Learning to Generate Networks

Pick samples from task t

Recognizing Deformable Shapes

Compositional Human Pose Regression

Nonparametric Semantic Segmentation

Presenter: Hajar Emami

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

On the Generative Discovery of Structured Medical Knowledge

Context-Aware Modeling and Recognition of Activities in Video

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Building and Analyzing Genome-Wide Gene Disruption Networks

Learning with information of features

The Open World of Micro-Videos

RIO: Relational Indexing for Object Recognition

Brief Review of Recognition + Context

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Presented by: Jacky Ma Date: 11 Dec 2001

Outline Background Motivation Proposed Model Experimental Results

Anastasia Baryshnikova Cell Systems

Graphs G = (V, E) V are the vertices; E are the edges.

Graph Vocabulary.

LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

Jana Diesner, PhD Associate Professor, UIUC

VERB PHYSICS: Relative Physical Knowledge of Actions and Objects

Human-object interaction

Visual Question Answering

Keshav Balasubramanian

Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.

Task Fashion Landmark Detection. Task Fashion Landmark Detection.

Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.

Motivation Semantic Transformation Module Most of the existing works neglect the semantic relationship between the visual feature and linguistic knowledge,

Modeling IDS using hybrid intelligent systems

Presented By: Harshul Gupta

Background Task Fashion image inpainting Some conceptions

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Deep Structured Scene Parsing by Learning with Image Descriptions

Peng Cui Tsinghua University

SFNet: Learning Object-aware Semantic Correspondence

Improving Cross-lingual Entity Alignment via Optimal Transport

Motivation The subjects/objects are correlated to each other under semantic relationships.

Construcing Narrative Event Evolutionary Graph for Script Event Prediction 刘潇远.

Multi-Target Detection and Tracking of UAVs from a UAV

Learning to Cluster Faces on an Affinity Graph

Visual Grounding.

CVPR 2019 Poster.

Presentation transcript:

Learning to Detect Human-Object Interactions with Knowledge

Motivation In HOI detection task, the label space is often large and intrinsically having long-tail distribution issue. Verbs and objects in HOIs share certain characteristics across various types of scenes.

Contributions Construct a knowledge graph to model the dependencies of the verbs and object categories. offer a new perspective into HOI detection with multi-modal embeddings Achieve improved performance on two benchmarks

Method Framework

Method Graph Modeling for HOIs Graph architecture: GCN Operation Node: Verb word embedding or Object word embedding. Edge: connects a valid pair of verb and object category according to the ⟨verb, object⟩ annotations from training dataset, and ⟨object1, predicate, object2⟩ triplets from general visual relationships dataset. Adjacency Matrix: initialized with binary values defining the connections (or disconnections) of nodes. GCN Operation Traditional GCN:

Method Visual Representation Same as previous works: Appearance Feature: ROI Feature. Spatial Feature: Relative Location Encoding: Fused Feature:

Method Multi-Modal Joint Embedding Learning The goal is to learn the transformations of visual feature 𝑓 ℎ𝑜 𝑋 ℎ𝑜 → 𝜑 ℎ𝑜 and GCN feature 𝑓 𝑔 𝐻 𝑣 → 𝜑 𝑔 , such that the learned paired embedding Similarity Loss:

Inference

Experiments HICO-DET

Experiments Ablation Study