Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.

Similar presentations


Presentation on theme: "Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung."— Presentation transcript:

1 Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and Minyi Guo* Department of Computer Science University of Otago, New Zealand * Department of Computer Science Shanghai Jiao Tong University, China

2 Contents Motivation Our work Evaluation Conclusion

3 Contents Motivation Our work Evaluation Conclusion

4 Similarity Search Definition: –To preprocess a database of N objects so that given a query object, one can effectively determine its nearest neighbors in database. Applications: –pattern recognition, chemical similarity analysis, and statistical classification, etc.

5 The problem – KNN Search K Nearest Neighbor Search: –Feature: an array of D elements f = [e 1 ] –Feature Space: a set of features Fs= {f 1 } –Feature Similarity: Euclidean distance =sqrt(Σ(f i m -f j m ) 2 ) –Search: given a query feature f q, find k features in F s so that they have the shortest distances to f q.

6 Our Case Study Feature Matching: a fundamental problem in many computer vision tasks –Use the SIFT algorithm to generate features for each image; –Use a k-Nearest Neighbors (k-NN) algorithm to find similar features between images

7 Challenges Very time-consuming: –datasets become larger: hundreds or thousands of images; –image resolution increases: 2300×1500 pixels, or higher; New platforms:  HPC turns to multi-/many-core age: AMD 16-core and 64-core machines.

8 Motivation Performance evaluation: –Find out common problems that may limit the performance of feature matching on multi- /many-core platforms. Performance tuning: –Find general methods to solve the identified problems.

9 Contents Motivation Our work Evaluation Conclusion

10 Data Distribution

11 Data Size

12 Problems Unbalanced workload: –Levels of parallelism; –Scheduling policy. Poor last-level cache utilization: –Memory architecture.

13 Levels of parallelism …….. Level_1 Level_2 Level_3 —— —— —— — Level_4 Linear KD-tree Kmeans LSH Others Level_1&2 Reference Images Query Images Features

14 Scheduling policy OpenMP scheduling policy: –Static: the scheduler will assign an equal number of tasks to each thread (not used); –Dynamic: when one thread finishes its current task, it will take new tasks from the global task queue; –Guided: chunk size is adjusted dynamically when tasks are requested from the task queue.

15 Memory architecture More cores are sharing the memory and last-level cache: –Memory bandwidth: AMD 16-core 12.8 GB/s AMD 64-core 25.6 GB/s –Last-level cache: AMD 16-core 6 MB AMD 64-core 16 MB Large images may not fit in cache and will cause many memory accesses, which leads to hitting the memory wall.

16 Divide-and-Merge We propose Divide-and-Merge: –Whole feature space is split into several smaller sub-spaces; –Search each sub-space independently; –Merge their results.

17 Divide-and-Merge

18 Time complexity Accurate algorithms: –Brute force: –Apply DM: Approximate algorithms: –Randomized KD-Tree: –Apply DM:

19 Contents Motivation Our work Evaluation Conclusion

20 Hardware and Software configuration NameCPUCacheMemoryOS Compil er AMD 16-core (AMD16) AMD Opteron Processor 8380 4 cores × 4 @ 2.5 GHz L1: 128 KB, L2: 512 KB, L3: 6144 KB 16 GiB, DDR2 800 MHz 12.8 GB/s Ubuntu 12.04.1 g++-4.4 AMD 64-core (AMD64) AMD Opteron Processor 6276 8 cores × 8 @ 2.3 GHz L1: 48 KB, L2: 1000 KB, L3: 16384 KB 64 GiB, DDR3 1333 MHz 21.32 GB/s Ubuntu 12.04.1 g++-4.4 Environment: OpenCV + OpenMP: one of the most frequently used setup for computer vision researchers to utilize parallel platforms

21 Levels of parallelism

22 Scheduling policy(on level_1&2)

23 Scheduling policy(on level_3)

24 Memory architecture 1. Original Execution 2. Apply Divide-and-Merge

25 Evaluation on Manawatu Dataset

26

27 Contents Motivation Our work Evaluation Conclusion

28 We have shown that performance tuning is demanding on modern multicore systems. We have comprehensively evaluated the impact of the three factors that have an influence on large- scale image feature matching. We have proposed a Divide-and-Merge algorithm that can greatly improve the speedup and scalability of feature matching algorithms on multicore machines.


Download ppt "Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung."

Similar presentations


Ads by Google