Presentation on theme: "Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé."— Presentation transcript:
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé
1 Seminar Agenda ►Introduction ►State of the art ►Static vs Dynamic features ►Automatic Music Audio Summary generation Extraxtion of information from the signal Representation by states: Multipass approach ►Conclusion ►Questions
2 Seminar Introduction ►Recent topic of interest driven by commercial needs (browsing of online music) documentation (browsing over archives) as well as music information retrieval. ►Storage of audio summary has been normalized, e.g SDS of the MPEG-7 standard Set of tools allowing the storage of sequential or hierarchical summaries ►Only few techniques exist allowing the automatic generation of audio summary. Big contrast to video and text where multiple methods or approaches exist for the automatic summary generation. ►Summary can be parameterized at three levels: The type of the source The goal of the summary The output format
3 Seminar State of the art ►„Sequences „ approach A similatity matrix applied to well-chosen features allows a visual representation of the structural information of a piece of music (Foote‘s work on similarity matrix). Signal features used in this study are the Mel Frequency Cepstral Coefficients (MFCC). If a specific segment of music ranging from times t1 to t2 is repeated later in the music from t3 to t4, the succession of features between both time periods is supposed to be identical. A key point of the actual works stands in the use of static features (MFCC) as signal observation.
4 Seminar Static vs Dynamic features ►Static features represent the signal around a given time, but does not model any temporal evolution. It implies that when looking for repeated patterns in the music, the necessity to find identical evolution of the features or to average features over a period of time in order to get states. ►Dynamic features model directly the temporal evolution of the spectral shape over a fixed time duration. The choice of the duration on which the modeling is performed, determines the kind of information that we will be able to derive from signal analysis. ►Features extraction:
Static vs Dynamic features 5 Seminar ►Using static features implies that, when looking for repeated patterns in the piece of music there is the necessity to find an identical evolution of the features. ►Advantages of using dynamic features: The above mentioned problem of static features is solved with dynamic ones, i.e if some arrangement of the music masks the repetition of the initial melody sequence, repeated patterns will still be recognized. For an appropriate choice of the modeling‘s time duration, the search for repeated patterns in the music can be far easier The amount of data can be greatly reduced: for a 4 minute long music, the size of the similarity matric is around 34000*24000 in the case of the MFCC, it can be only 240*240 in the case of the dynamic features.
Automatic Music Audio Summary generation 6 Seminar ►Consider the musical piece as a succession of states. Each state representing a somehow similar information found in different parts of the piece,. ►States we are looking for are specific for each piece of music, no supervised learning is possible to find them. ►Use for the automatic audi summary generation an human like segmentation and structuring approach by subsequently analysing the data to process it. ►From the signal data: 1.Dynamic features extraction: first listening allows the detection of variations in the music without knowing if a specific part is repeated later. This segmentation defines a set of templates which we call „potential“ states 2.Finding the structure by using previously created templates: templates are compared to reduce redundancies the reduced set of templates is used as initialization for a K-Means algorithm The middle states, which are the output of the K-Mean algorithm are used for the initialisation of the Hidden Markov Model learning. finally, the optimal representation of the piece as a HMM state sequence is obtained by the application of the viterbi algorithm.
conclusion 8 Seminar ►Automatic generation of music audio summary from signal analysis without using any other informations ►Consider the musical piece as a succession of states. Each state representing a somehow similar information found in different parts of the piece, ►Audio signal: Derive dynamic features representing time evolution of the energy content in various frequency bands. From this observation derive a representation of the music on terms of states, Thanks!