General Ideas of Pedestrian Detection

General Ideas of Pedestrian Detection
April 16, 2015 CVLAB seminar Presenter: Woong GI, Chang

Contents Brief Introductions of Pedestrian Detection through Review articles Pedestrian Detection: An Evaluation of the State of the Art. e.g., Piotr Dollar et al. (IEEE 2012) Explain several datasets mostly Caltech Pedestrian Explain Measurement method explanation Ten Years of Pedestrian Detection, What Have We Learned? e.g., Rodrigo Benenson et al. (ECCV 2014) Compare Several Approaches for Pedestrian Detection Report Current State of the Art Code Implementation Seeking the strongest rigid detector. e.g., Rodrigo Benenson. (CVPR 2013) This detector is the baseline of the current state of the art. Future Work

Brief Introductions of Pedestrian Detection Part1
Pedestrian Detection: An Evaluation of the State of the Art. e.g., Piotr Dollar et al. (IEEE 2012)

Pedestrian Detection? Pedestrian detection is an essential and significant task in any intelligent video surveillance system, as it provides the fundamental information for semantic understanding of the video footages. It has an obvious extension to automotive applications due to the potential for improving safety systems. Decade-old ideas still rule detection quality. Source: Wikipedia (2015) ;

Caltech Pedestrian Dataset
Annotation Examples (Test Dataset images)

350,000 pedestrian bounding boxes labeled in 250,000 frames.

Compare to quite well-known datasets

Occlusions and temporal correspondences are annotated. (a): BB-full: entire pedestrian (green box), BB-vis: visible region (yellow box) Labeling: Individual pedestrians: ‘Person’ , group of pedestrians: ‘People’ (b): For each pedestrians, histogram of fraction of time occluded → Over 70% are occluded (c): For each occluded pedestrians, histogram of fraction occluded →35~80% heavy, 80%~ full

Statistics of pedestrians. (d): a heat map of which region is most likely to be occluded. (e): 3 example types for 3 by 6 regularly spaced cell. (f): The top 7 results of 126 types for 3 by 6 from ~54k occluded pedestrians.

Statistics of pedestrians. (a): Expected center location of pedestrian BBs for ground truth, log-normalized (b): Expected center location of pedestrian BBs for HOG detections, log-normalized

Evaluation Methodology
Per-Window Versus Full Image Evaluation Full Image Evaluation, including spatial and scale stride and non-maximal suppression, is more suitable compared to Per-Window methodology, useful for isolating evaluation of binary classifiers. (c) evaluation that can give rise to false positives (top) or false negatives (bottom) in full image evaluation.

For ‘person’ annotation Each BBdt and BBgt may be matched at most once. BB matches with highest overlap is used. Unmatched BBdts: False Positives Unmatched BBgts: False Negatives Plot miss rate against false positives per image (using log-log plots) by varying the threshold on detection confidence. The full BB is always used for matching, not the visible BB, even for partially occluded pedestrians. All reported results on the Caltech dataset are computed using every 30th frames due to the high computational demands.

Filtering Ground Truth: For ‘people’ and different scales annotation. Introduce ‘Ignore region’ as bounding box BBig 4 types are considered BBig Any BB under 20 pixels high truncated by image boundaries Person? : ambiguous cases People BBdt matched to BBig do not count as true positive, and unmatched BBig do not count as false negatives. Matches to BBgt are preferred, meaning a BBdt can only match a BBig if it does not match any BBgt, and multiple matches to a single BBig are allowed. Summary BBdt not matched: FP. BBgt not matched: FN. BBdt matched to BBgt: TP. BBdt matched to BBig: not TP, ignored.

Filtering Detections: evaluation on only a subset of the dataset Strict filtering All detections(BBdt) outside the selected range are removed before matching. Because of false negatives, the performance is under-reported. Post filtering All detections outside the selected range are allowed to match BBgt. Any unmatched BBdt outside the range is removed and does not count as a false positive. Thus, the performance is over-reported. Expanded filtering When evaluating in a scale range from S0 to S1 pixels, all detections outside a range S0/r to S1*r are removed. r=1.25 False negative case BBdt BBgt Not counted FP

Standardized Aspect Ratio Green box: original bounding box Yellow box: fixed aspect ratio(=0.41) bounding box → use this standardized bounding box for evaluation

Performance on the Caltech Dataset Overall: Use all bb Near scale: unoccluded pedestrian over 80 pixel high Medium scale: unoccluded pedestrian over 30~80 pixel high No occlusion: unoccluded pedestrian over 50 pixel high Partial occlusion: unoccluded + partially + heavily pedestrian Reasonable: unoccluded + partially pedestrian over 50 pixel high

Inferred Info. From the Statistics
1) If one detector can detect every bounding box not full occluded box. Then reasonable measure should be… 𝑟𝑒𝑐𝑎𝑙𝑙=1− 0.5∗126𝑘×0.1 𝑜𝑐𝑐𝑙𝑢𝑑𝑒𝑑 0.5∗350𝑘 =96.40% →𝑚𝑖𝑠𝑠𝑟𝑎𝑡𝑒=3.60% Not conuted in reasonable measure 50 2)Novel Feature, Data. 3)Medium scale pedestrian detection: critical for automobile application. 4)Occlusion handling: Temporal integration (tracking),

Brief Introductions of Pedestrian Detection Part2
Ten Years of Pedestrian Detection, What Have We Learned? e.g., Rodrigo Benenson et al. (ECCV 2014) +Pedestrian Detection: An Evaluation of the State of the Art. e.g., Piotr Dollar et al. (IEEE 2012)

Brief Chronology After the evaluation metrics changed from per- window(FPPW) to per-image(FPPI), some of early detectors turned out to under-perform. Differences in detection performance are dominated by the choice of training data.

Listing of Methods Under 50% log-average Miss-rate Solution Family
DPM: DPM variants DN: Deep networks DF: Decision forests Based on raw numbers alone boosted decision trees(DF) seems suite for pedestrian detection.

Runtime of Each Detectors
Log-average miss rate versus the runtime of each detector on images from the Caltech Pedestrian Dataset. (2012) Run times of all detectors are normalized to the rate of a single modern machine, hence all times are directly comparable. Pedestrian Detection: An Evaluation of the State of the Art. e.g., Piotr Dollar et al. (IEEE 2012)

Which Detector is the best in 2014?
Spatial Pooling+ Deep Network group Use Caltech training dataset + optical flow Katamari Decision Forests group(Adaboost) Use Caltech training dataset SquaresChnFtrs+DCT+SDt+2Ped LDCF Decision Forest group(RealBoost)

Code Implementation Ten Years of Pedestrian Detection, What Have We Learned? e.g., Rodrigo Benenson et al. (ECCV 2014) Seeking the strongest rigid detector. e.g., Rodrigo Benenson. (CVPR 2013)

Overview of SquareChnFtr Detector Structure
Image normalization Feature e.g. Alessandro Rizzi (2003) Automatic Colour Equalization(ACE) Traditional HOG SquareChnFtr All All Features , Training Method: Level2 DecisionTree Adaboost 2000 level-2 decision trees Stage1: randomly negative sample 5000 Stage2,3: bootstrapping to add 5000 additional hard negatives

SquareChnFtr Detector
Image normalization Global Normalization : Automatic Colour Equalization(ACE), GreyWorld Local Normalization: gradient orientation features are normalized by the gradient magnitude in the same area. ETH dataset Illumination changes occurs frequently. The effect of normalization is more pronounced.

Feature(total 10 channels) HOG feature: 6 gradient channels magnitude channels LUV feature: 3 channels Weak classifier ThreeStumps: level-2 decision tree in a three bits vector, and use it to index a table with entries Level2 Decision Tree: Traditional HOG SquareChnFtr All All Features , 2000 level-2 decision trees

Training Method AdaBoost: weak classifiers are tweaked in favor of those instances misclassified by previous classifiers. VadaBoost: A regularized Adaboost variant that minimizes not only the margin average, but also its variance

Which Feature Pool? Feature at scale 1 (128 x 64) Random 30k (ChnFtr detector) RandomSymmetric 30k SquareChnFtr-8x8: 5120 features SquareChnFtr All: features All Feature: features Using 90 GB, 16 cores, on a GPU enabled server Random++ Adaboost in 10 runs of Random 30k Feature at scale 2 (256 x 128) All Feature SquareChnFtr 30k

Katamari-v1 detector SquareChnFtr Frame work with following features
Source: e.g. Nam et al. (ArXiv 2014); Park et al. (CVPR 2013); Ouyang & Wang (CVPR 2013)

Which Detector is the best in 2014?
Spatial Pooling+ Deep Network group Use Caltech training dataset + optical flow Katamari Decision Forests group(Adaboost) Use Caltech training dataset SquaresChnFtrs+DCT+SDt+2Ped LDCF Decision Forest group(RealBoost)

What is driving the quality progress?
Solution family Better classifiers Deformable parts Multi scale model Deep architectures Training data Additional (test time) data Exploiting context Better features Reference: Beneson, R. et al(2014) presentation slide

What is driving the quality progress?
Solution family Better classifiers Deformable parts Multi scale model Deep architectures Training data Additional (test time) data Exploiting context Better features Solution family – Overall, DPM, DN, and DF all reaches top performance. DF looks nice. Better classifiers – no conclusive empirical evidence of linear/non-linear kernel difference Deformable parts – for pedestrian, no clear evidence of occlusion handling Multi scale model – Multi-scale models provide a generic extension. But minor effect. Deep architectures – using deep architecture is not making the advantage. Training data –Data is important. Additional (test time) data – optical flow, stereo vision. Provides meaningful improvements Exploiting context – ground plane constraints, auto-context, geometry, … Better features – most approach. Despite many features, 10 channel( 6 gradient+ 1 magnitude LUV) or its variants reaches top performance. Reference: Benenson, R. et al(2014) presentation slide

Detectors are still upgrading…
Filtered Channel Feature + Optical flow(SDt) + Realboost(L4 d-tree) Dataset: Caltech + KITTI e.g. Filtered Channel Features for Pedestrian Detection (CVPR 2015)

Detectors are still upgrading…
Filtered Channel Feature + Optical flow(SDt) + Realboost(L4 d-tree) Dataset: Caltech + KITTI Goal e.g. Filtered Channel Features for Pedestrian Detection (CVPR 2015)

Future Work Features alone can explain a decade of detection quality progress. There is room for further improvement by increasing model capacity (and better features). Search for a good feature. Stronger use of additional data. Caltech dataset, and other training dataset such as KITTI Better context. Optical flow, geometry, stereo vision etc. Study deep architectures.

Next Presentation About features and context About training method
State of the art features Optical flow and stereo vision About training method used in pedestrian detection, including boosting, etc. Object detection in deep architecture

References Benenson, R., Mathias, M., Tuytelaars, T., & Van Gool, L. (2013, June). Seeking the strongest rigid detector. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on (pp ). IEEE. Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2014). Ten Years of Pedestrian Detection, What Have We Learned?. arXiv preprint arXiv: Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(4), Eurocar news (2015). Volvo V40 Pedestrian Detection with auto brake. Retrieved from auto-brake/gallery-detail.html Rizzi, A., Gatta, C., & Marini, D. (2003). A new algorithm for unsupervised global and local color correction. Pattern Recognition Letters, 24(11), Wikipedia (2015). Pedestrian detection. Retrieved from

General Ideas of Pedestrian Detection

Similar presentations

Presentation on theme: "General Ideas of Pedestrian Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

General Ideas of Pedestrian Detection

Similar presentations

Presentation on theme: "General Ideas of Pedestrian Detection"— Presentation transcript:

Similar presentations

About project

Feedback