ACCELERATING QUERY-BY-HUMMING ON GPU Pascal Ferraro, Pierre Hanna, Laurent Imbert, Thomas Izard ISMIR 2009 Presenter: Chung-Che Wang (Focus on the performance of GPU)
2 Outline Introduction Aligning two music (omitted) Parallel implementation Tests and results (recognition rate is omitted) Conclusions
3 Introduction Powerful methods have high computational cost 160 times faster by using GPU The same program is executed on many data elements in parallel new challenges: memory operations and computational resource allocations
4 Parallel Implementation (1/4) CUDA can be seen as an extension of C that allows developers to define C functions, called kernels kernels must be written in C GPU operates as a coprocessor CUDA threads execute on GPU the rest of the program runs on a CPU
5 Parallel Implementation (2/4) Virtually launches N kernels executing the algorithm in parallel N: # of entries in the database The database is usually large --> must be sotred in the global memory not cached on GPU --> extremely important to follow the right access pattern
6 Parallel Implementation (3/4) In order to optimize the access of memory store the DB in "note" major (instead of "song" major) only store the current rows of Smith-Waterman matrices allocation is based on the query’s fixed size rather than the pieces of music’s variable sizes
7 Parallel Implementation (4/4) Each multi-processor executes: convert queries to note vector comparison between the query and each reference Each processor store its intermediate Smith-Waterman matrices (only one row) in its own shared memory space.
8 Tests and Results (1/3) Query data corpus: MIR-QBSH used in MIREX 2007/ queries Databases DB1: ground-truth noise from the Essen Collection DB2: ground-truth noise from the whole Essen Collection DB3: ground-truth MIDIs are rather short while Essen collection mainly consists of long data files a subset of the RISM A/II collection, proposed during MIREX 2005
9 Tests and Results (2/3) Three different platforms:
10 Tests and Results (3/3) Timings of the different algorithms on various GPUs and databases in mm:ss:
11 Conclusions A great care must be taken when programming memory operations bad allocation strategy can have a significant impact on the computation time Future work optimized the pre-processing phase running exclusively on the CPU --> takes 75-90% of the overall computation time implement this stage on GPU using the CUDA CUFFT library
12 The End