Progressive Cross-media Correlation Learning

Progressive Cross-media Correlation Learning
IGTA 2018 Progressive Cross-media Correlation Learning Xin Huang, and Yuxin Peng* Institute of Computer Science and Technology, Peking University, Beijing , China

Outline Introduction Method Experiment Conclusion

Introduction What is cross-media retrieval?
Single-media retrieval: retrieve the relevant results of the same media type with the query. (e.g., image → image and text → text) Cross-media retrieval: retrieve the relevant results of different media type from the query. (e.g., image → text and text → image) Recently, deep neural network has achieved great success in some artificial intelligence tasks. The deep convolutional neural network, which is usually trained by image classification task, has been proven to be a powerful building block for many multimedia applications. For Example, we could use the net to tag images for image retrieval. And we could use the DNN feature to do object detection in images or videos.

Common Representation current research hot spot
Introduction Problem: “Heterogeneity Gap” different modalities have inconsistent representations Cross-modal common representation learning Learn projections to represent data of different modalities with the same type of “feature” Mainstream of existing methods Existing methods Traditional methods (mainly linear projections) DNN-based methods Only a few deaths from the fire are officially recorded, and deaths are Before the 1950s, Stanford "had the reputation of being a Cuninggim also charged that Stanford's religious policies were In the 1960s, the study of religion at Stanford focused not on academics, but on social and ethical issues like race and image text video audio Common Representation 现有方法主流是深度 current research hot spot

Introduction Motivation
Key problem: cross-media correlation learning from data Existing methods indiscriminately take all data for training A bicycle chain is a roller chain that transfers power from the pedals to the drive-wheel of a bicycle, thus propelling it. ... Pygmy hippos share the same general form as a hippopotamus. ... with four short legs and four toes on each foot,... Complex Correlation The bird was named for its similarity in colouration to the European magpie; it was a common practice for early settlers ... At noon on 1 August, the US Third Army was activated under the command of Lieutenant General George … 现有方法主流是深度 Easy samples: easy to capture correlation, with clear cues Hard samples: rich semantic, but with misleading and noisy information Hard samples bring negative effect especially in the early period of model training!

Outline Introduction Method Experiment Conclusion
Next, we will present the details of our method.

Method Progressive Cross-media Correlation Learning:
Core idea: Training gradually from easy samples to hard samples, guided by a large-scale dataset Step1: Reference Model Training Large-scale data with general knowledge is much more reliable Reference model is a teacher to guide sample selection

Method Step1: Reference Model Training
Hierarchical Correlation Learning Architecture Pairwise constraint: Coexistence cue Semantic constraint: Semantic consistency

Method Step3: Difficulty Assignment for Target Data
Step2: Relevance Significance Metric Use reference model to generate common representation for target data Perform intra-media and inter-media retrieval in target data Evaluate relevance significance for each pair Intra-media relevance significance Inter-media relevance significance Step3: Difficulty Assignment for Target Data Intuitively, high relevance significance means easy sampls

Method Step4: Progressive Training of Target Model
Select top-k instances with largest relevance significance Initialize Iter as 1 As Iter becomes large, more samples will be selected In late period of training, hard samples are considered to preserve diversity

Experiment Compared methods: Totally 9 state-of-the-art methods:
CCA [Hotelling, Biometrika 1936] CFA [Li et al., ACM-MM 2003] KCCA [Hardoon et al., Neural Computation 2004] Corr-AE [Feng et al., ACM MM 2014] JRL [Zhai et al., IEEE TCSVT 2014] LGCFL [Kang et al., IEEE TMM 2015] DCCA [Yan et al., CVPR 2015] CMDN [Peng et al., IJCAI 2016] Deep-SM [Wei et al., IEEE TCYB 2017]

Experiment Datasets: Reference data Target data
XMediaNet dataset (constructed by our laboratory) The first large-scale cross-media datasets with 5 media types (text, image, audio, video and 3D model), 200 categories, and 100,000 instances. We focus on the scenario of image and text, so we choose the training set of image and text data from XMediaNet with 32,000 pairs. Target data Wikipedia Dataset: 2,866 image/text pairs with 10 high-level semantic categories, which is randomly split into a training set with 2,173 pairs, a testing set with 462 pairs, and a validation set with 231 pairs following NUS-WIDE-10k Dataset: Subset of NUS-WIDE dataset, which contains 10,000 image/text pairs of 10 semantic categories. Split into a training set with 8,000 pairs, a testing set with 1,000 pairs.

Experiment Results MAP scores on 2 datasets compared with existing methods PCCL outperforms all the compared methods on 2 datasets

Conclusion Conclusion 4 steps
Proposed approach Progressive Cross-media Correlation Learning (PCCL) Idea Use a large-scale cross-media dataset to guide the progressive sample selection on another small-scale dataset 4 steps 1: Reference Model Training 2: Relevance Significance Metric 3: Difficulty Assignment for Target Data 4: Progressive Training of Target Model Achieve accuracy improvement on 2 widely-used cross-media datasets

Cross-media Retrieval
We have released XMedia dataset with 5 media types. This dataset and source codes of our related works: Interested in cross-media retrieval? Hope our recent overview would be helpful for you Beyond this work, we have … Yuxin Peng, Xin Huang, and Yunzhen Zhao, “An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges”, IEEE TCSVT, arXiv:

IGTA 2018 Thank you! Github Homepage (Source Codes)

Progressive Cross-media Correlation Learning

Similar presentations

Presentation on theme: "Progressive Cross-media Correlation Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Progressive Cross-media Correlation Learning

Similar presentations

Presentation on theme: "Progressive Cross-media Correlation Learning"— Presentation transcript:

Similar presentations

About project

Feedback