HCP: A Flexible CNN Framework for Multi-Label Image Classification Source : IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(9):1901-1907. Authors : Wei Y, Wei X, Lin M, et al. Speaker : Jiefan Tan Date : 2017/6/29 國立台灣大學資訊工程學系
Outline Introduction Proposed method Experiment Conclusions Hypotheses Extraction Training HCP Experiment Conclusions
Introduction- single-label image classification Hand-crafted features classifier Image classification +10% CNN
Introduction- single-label VS. Multi-label images Roughly aligned mis-aligned occluded
Proposed method- infrastructure of the proposed HCP
Proposed method- Hypotheses Extraction Source image Bounding boxes-BING Hypotheses-Bb Hypotheses-HS(resize) Filter out hypotheses: 1.Small area(<900 pixel) 2.High height/width (width/height)ratios(>4) High object detection recall rate Small number of hypotheses High computational efficiency
Proposed method- Training HCP pre-train (parameter initialization) single-label image image-fine-tuning(initialize final fully-connected) multi-label image hypotheses-fine-tuning hypotheses cross-hypothesis max-pooling 𝑣 (𝑗) =max( 𝑣 1 𝑗 , 𝑣 2 𝑗 ,…, 𝑣 𝑚 𝑗 ) 𝑣 𝑖 𝑖=1,…,𝑚 is the vector of output j=1,…c is the 𝑗 𝑡ℎ component of 𝑣 𝑖 m is the number of images c is the number of categories
Proposed method- Training HCP cross-hypothesis max-pooling squared loss 𝐽= 1 𝑁 𝑖=1 𝑁 𝑘=1 𝑐 ( 𝑝 𝑖𝑘 − 𝑝 𝑖𝑘 ) 2 𝑝 𝑖 is the ground-truth probability vector of the 𝑖 𝑡ℎ image 𝑝 𝑖 is the predictive probability vector N is the number of images c is the number of categories
Proposed method- HCP
Experiment Datasets Shared CNN VOC 2007 (trainval/test = 5011/4952) Alex Net VGG Net
Experiment Results 1.mAP-mean of Average Precision 2.Number-hypotheses
Experiment Results
Conclusions No ground-truth bounding box information is required. Robust to noisy and/or redundant hypotheses. Can be well pre-trained by a large single-label image dataset. The HCP outputs are intrinsically multi-label prediction results.
Thank you!