Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Robust Unsupervised Multi-Modal Network

Similar presentations


Presentation on theme: "Deep Robust Unsupervised Multi-Modal Network"— Presentation transcript:

1 Deep Robust Unsupervised Multi-Modal Network
Yang Yang 1 , Yi−Feng Wu 1 , De−Chuan Zhan 1 , Zhi−Bin Liu 2 , Yuan Jiang 1 1.National Key Laboratory for Novel Software Technology, Nanjing University 2. Tencent WXG, ShenZhen {yangy, wuyf, zhandc,

2 Consistent Multi-Modal Learning
Training phase Multiple Complete Modalities Multi-modal Models

3 Inconsistent Multi-Modal Learning
Incomplete Anomaly

4 Outline Introduction Our approach Experiments Conclusion

5 Problem to tackle Inconsistent modal feature embedding
Instances with partial modalities: Due to the failures of data collection, self deficiencies and other various reasons Anomaly of modalities: Complete instances are not necessarily consistent, which can be defined as “ inconsistent anomalies”.

6 Instances with partial modalities:
Introduction Problem to Tackle Instances with partial modalities: Can the auto-encoder structure help? Anomalies of modalities: Can the deep energy based model help robust weight learning for each modality?

7 Outline Related work Our approach Experiments Conclusion
Then, I will introduce our approach

8 Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based
Our approach Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based Incomplete modalities: Auto-encoder network of each modality can be used to minimize the reconstruction error of all the instances, including the complete and incomplete instances.

9 Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based
Our approach Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based Auto-encoders method EBM model Anomaly detection Deep EBM structure: score matching (SM) (Hyvarinen 2005) Anomaly detection: 1.probability lower than the pre-defined threshold 2. based on the reconstruction error

10 Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based
Our approach Deep Robust Unsupervised Multimodal Network (DRUMN) ——Threshold Based Loss Function: the Eco is the hyper-parameter for eliminating the class anomalies In addition, it is also notable that real world data always contain noise and outlying entries that result in the unreliable similarity matrix. we further employ the square-root loss function instead of the least squares function. This modification can calibrate each modality by considering the different noise levels of all modalities and increases the robustness of the 2nd term in SLIM the E and Er are the hyper-parameters for eliminating the feature anomalies There is large number of hyperparameters for adjusting, i.e., K(K+3)/2 , which is unprocurable.

11 Our approach Deep Robust Unsupervised Multimodal Network (DRUMN) ——Auto adapt Weights Energy variance: In addition, it is also notable that real world data always contain noise and outlying entries that result in the unreliable similarity matrix. we further employ the square-root loss function instead of the least squares function. This modification can calibrate each modality by considering the different noise levels of all modalities and increases the robustness of the 2nd term in SLIM Instances with low energy variances are always easy instances which are more convinced, or anomalies that are always hard to be notarized, we prefer to choose the samples with high energy variances, which are more uncertain

12 Outline Related Work Our approach Experiments Conclusion
In Experiments

13 Data Sets and Compared Methods
Experiments Data Sets and Compared Methods Datasets: Public Dataset FLICKR25K: 25,000 image-text pairs IAPR TC-12: 20,000 image-text pairs WIKI: 2,866 documents extracted from Wikipedia NUS-WIDE: 195,834 image text pairs Inconsistent Dataset WKG Game-Hub: 32,222 image-text pairs collected from the Game-Hub of “ Strike of Kings”. Incomplete, Inconsistent Anomalies

14 Cross Modal Retrieval (public dataset)
Experiments Cross Modal Retrieval (public dataset) Here, “T”, “I” represent the text and image separately, e.g., “I->T” denotes the case where the query is image and the retrieval result is text, and “T->I” denotes the case where the query is text and the retrieval result is image.

15 Cross Modal Retrieval (WKG dataset)
Experiments Cross Modal Retrieval (WKG dataset) Considering that WKG Game-Hub is a multi-label dataset, which exists the label imbalance problem. On MAP calculation, we measure the similarity between the query instances and ranking results by considering the sharing labels larger than 1 (L>1) or 3 (L>3) labels.

16 Experiments Retrieval results Qualitatively, it can observe that DRUMN captures the general latent feature representation represented in both the images and the texts. It is notable that most results shown are correct.

17 Influence on No. of Incomplete Multi-Modal Data
Experiments Influence on No. of Incomplete Multi-Modal Data DRUMN achieves the best on most datasets. Besides, we can also find that DRUMN achieves superiorities from high incomplete ratio.

18 Experiments Anomaly Detection Qualitatively, it reveals that DRUMN can detect both the class and feature anomalies compared to DRUMN-Thres method, in detail, the class and feature anomalies detected by DRUMN are all corrected, while the first result in class-anomalies of DRUMN-Thres is wrong.

19 Outline Related Work Our approach Experiments Conclusion
To conclude our work

20 Conclusion Main contribution: Future work:
A novel paradigm with the issues of inconsistent modalities in multi-modal learning A Deep Robust Unsupervised Multi-modal Network (DRUMN) Future work: How to fully incorporate the supervised information into semi-supervised scenario ?

21 THANK YOU !


Download ppt "Deep Robust Unsupervised Multi-Modal Network"

Similar presentations


Ads by Google