Presentation is loading. Please wait.

Presentation is loading. Please wait.

Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,

Similar presentations


Presentation on theme: "Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,"— Presentation transcript:

1 Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia, Vol. 7, No. 1, February 2005 Mark A. Bartsch, Member, IEEE, and Gregory H. Wakefield, Member, IEEE

2 Introduction Multimedia content is growing rapidly Efficient method of browsing is necessary Indexing and retrieval methods are media- dependent

3 Primary goal Minimize audition time for a given type of media

4 Current methods Images –Downsampling Produces a smaller version of image (thumbnail) Reduces cost of delivery and display

5 Current methods Audio: speech –Symbolic representation Produces a transcript of the audio

6 What about music? Adapt an existing method: –Downsampling (time compression) Results in highly distorted, unintelligible audio

7 What about music? Adapt an existing method (cont’d): –Symbolic representation (score transcription) Extremely difficult Results in essentially meaningless information Does not convey other important elements: –Vocal style –Instruments used –Processing effects used

8 Essential problem: Adapting existing methods cannot reduce the audition time for music and effectively convey the “gist” of the song

9 Possible Solution: Audio thumbnailing via chroma- based analysis

10 Audio thumbnailing Produces a short clip of the selection to represent the “gist” of the song

11 Chroma-based analysis Based on the extraction of chroma features from the audio Chroma Feature Extraction Algorithm: –Frame Segmentation –Feature Calculation –Correlation Calculation –Correlation Filtering –Thumbnail Selection

12 Chroma Feature Extraction Extract frequencies from audio file Calculate chroma values from frequencies: Categorize chroma values into pitch classes –12 pitch classes: A, A#/Bb, C, C#/Db, …, G#/Ab

13 Frame Segmentation Author’s Implementation: –Determined via beat tracking algorithm –Range: 0.25s to 0.56s Our Implementation: –Average of range: 0.41s

14 Feature Calculation Calculate 12-element chroma feature vector, v t for each frame: –Apply FFT to each frequency: –Constraints: Minimum frequency: 20 Hz –Lower limit of human hearing Maximum frequency: 2000 Hz –Higher frequencies effect the perception of chroma

15 Correlation Calculation Calculate similarity matrix C –Each element is equal to the correlation between two feature vectors: –High correlation along diagonals in the matrix indicate repetitions within the song

16 Correlation Filtering Calculate the filtered time-lag matrix T: –Exposes similarity between extended segments that are separated by constant lag –Filtering is performed along the diagonals of C Uses a symmetric rectangular windowing function (a uniform moving average filter) –T is then “rotated” so that the diagonals are oriented vertically

17 Thumbnail Selection Select maximum value in T –The location of this value indicates: Occurrence of the segment (the y-coordinate) Lag time (the x-coordinate) –Constraints: Minimum lag time = 1/10 of song length Maximum start time = 3/4 of song length –To reduce susceptibility to “fading repeat”

18 Results Jimmy Buffet – “Math Sucks” –System: [64, 89] Lifehouse – “You and Me” –System: [38, 63] Gavin DeGraw – “I Don’t Want To Be” –System: [95, 120] Super Mario Brothers Theme –System: [18, 43]

19 Conclusion Successfully extracted time segments which closely match the chorus of the song Feature Calculation issue: –Author’s implementation unclear

20 Possible Uses Audio domain: –Improved search capability Searching for similar songs –Audio fingerprinting Other domains: –Detection of irregular heartbeats

21 Suggested Improvements and Alternatives Image-based analysis on the waveform Tested alternatives –MSE on signal frequencies Chroma-based analysis proved more correct


Download ppt "Audio Thumbnailing of Popular Music Using Chroma-Based Representations Matt Williamson Chris Scharf Implementation based on: IEEE Transactions on Multimedia,"

Similar presentations


Ads by Google