Download presentation

Presentation is loading. Please wait.

Published byManuel Turrill Modified over 2 years ago

1
The evaluation and optimisation of multiresolution FFT Parameters For use in automatic music transcription algorithms

2
Automatic music transcription (AMT)

3
AMT Algorithms

4
Time & Frequency Resolution Time Resolution Increases Frequency Resolution Decreases Short Window Time Resolution Decreases Frequency Resolution Increases Long Window

5
Multiresolution FFT (MRFFT) High Frequency Resolution High Time Resolution FcA FcB FcC FcD FFT AFFT B FFT CFFT D

6
Time Freq Plane - Dressler

7
Window Length - Bin Alignment Note-bin alignment – The position of a fundamental frequency relative to a FFT bin frequency.

8
Note bin alignment

10
MRFFT Optimisation Cut off frequencies Subband FFT Length Optimised based on 3 characteristics determined by window length Time Resolution Frequency Resolution Note Bin Alignment

11
Scoring Calculate score for time, freq, and note-bin alignment in each subband Weight score according to notes in subband Range correct score to be between 0 and 1 Sum all scores across all bands to generate MRFFT Score

12
Note Bin Scoring If 2 note frequencies fall within same bin, FFT length is discounted as unsuitable Weighted Sub-band FFT Bin Score = Sub-band FFT Bin Score * (notes in sub-band/total notes across all bands)

13
Scoring Process The algorithm moves the cut off frequencies A, B and C through all combinations of positions. For each position, all FFT lengths between 256 and 8192 samples in increments of 128 are evaluated on each sub-band. All combinations of FFT lengths on all combinations of subbands are evaluated and scored. Subband A Subband B Subband CSubband D FcAFcBFcCFcD 80 Hz 5KHz

14
Solutions 1. 4 band MRFFT 256-8192 range 2.3 band MRFFT256-8192 range 3.Dressler 4 band MRFFT256-2048 range 4.Dressler fixed FFT Length variable bands256-2048 range 5.4 band MRFFT 256-2048 range 6.1 band FFT8192

15
Results – Subband Divisions Band A Band B Band C Band D

16
Results – MRFFT Score

17
Transcription Test – Low F Bands FcAFcB Original Solution 1 Solution 6 High F Resolution of solution 6 is reflected in Low frequency transcription accuracy

18
Transcription Test – High F Bands Solution 1 Solution 3 Solution 6

19
F-Measure Results Recall refers to the fraction of the relevant notes that were retrieved i.e. how many of the correct notes the system extracted. Precision refers to the fraction of relevant notes retrieved, relative to the total number retrieved. I.e. how many of the extracted notes that were correct. F-Measure is the weighted mean of precision and recall.

20
Peak Picker A threshold is dynamically set for each analysis window of the STFT as a percentage of the maximum magnitude within the window, with a minimum threshold heuristically decided. If a bin magnitude exceeds the threshold a note is transcribed at that point.

21
Peak Picker Robustness

22
Solution 1 Vs Solution 6 Picker

23
MRFFT Implementation 6016 FFT is performed on the entire frequency spectrum. The spectral information is then filtered to include only the frequencies required by that band. note frequency (orange magnitude) not in the frequency band considered, generates cross channel interference (red magnitudes) that contributes to the magnitudes in the sub-band of interest.

24
Cross talk indicators

25
Adjacent bins Adjacent bins in optimised MRFFT represent fundamental frequencies. Therefore any cross channel interference will contribute to energy contained in FFT bins representing note frequencies. This may contribute to false positives.

26
F Measure conclusions The results of the F-Measure are largely disappointing, and can be attributed to the inadequacies of the implemented peak picker to handle fluctuations in magnitude of local maxima. Characteristics of the MRFFT, like adjacent note representing bins, and interference generated by sub-band division methods contribute to this problem. Large variations of spectral magnitudes also contribute

27
Conclusions The theoretical scoring of MRFFT parameters resulted in favourable results for the optimised FFT. The ‘real world’ sinusoidal extraction test demonstrated initially disappointing F-Measure results for the MRFFT solutions compared to the single band 8192 FFT. However, upon closer analysis of the transcribed files, positive aspects of the MRFFT analysis were found as performance improved in the higher frequencies. Further investigation of the results revealed inadequacies of the peak picker implemented and also indicated issues with the construction of the MRFFT that require further investigation.

Similar presentations

OK

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on porter's five forces analysis of amazon Ppt on consumer protection act 1986 Ppt on importance of plants life on earth Ppt on any one mathematician rene Ppt on odisha cultures Ppt on agriculture download Ppt on conference call etiquette Ppt on file system vs dbms Ppt on human body Ppt on online library management system