Presentation on theme: "Comparison of Different Retrieval Models in Content-Based Music Retrieval Introduction Key Melody Extraction Retrieval Models Demos Experiments Conclusions."— Presentation transcript:
Comparison of Different Retrieval Models in Content-Based Music Retrieval Introduction Key Melody Extraction Retrieval Models Demos Experiments Conclusions 曾元顯、江陳威、白恒瑞、蘇珮君 Presented by Yuen-Hsien Tseng Digital Media Center National Taiwan Normal University
Introduction Desire for Content-based Music Retrieval –for copyright search Industrial interests and economic issues –for casual users to locate music work Entertainment, games, education, life –for researchers to analyze music collections Feedback to music composition theory Real-world Applications –Automatic surveillance of TV or Radio channels for possible copyright infringement –Search and buy music via Cell phones (Philips) – User identification and games ( 張智星 ) Combining speech and music recognition
Matching Problems in Different Levels Audio Signal –Hi-fidelity Recording Search by recording directly from CD (duplicate signal retrieval) –Noisy or low-quality recording Search by recordings from Radio, TV music with cell phones –Search the same tunes regardless of the same audio different pianos, singers （月亮代表我的心，陶 吉吉 ）, instruments (Canon, Pachelbel, piano vs violin), orchestras, directors Symbolic Level –See next page (focus of this work) Semantic Level –Search happy but andante songs –May need more (manually- or auto-derived) metadata
Major problems Vocabulary mismatch: the major failure in IR tasks –The terms used to search the documents do not occur in the desired documents (or mismatch the undesired) Key mismatch: melodies may be sung in any key Pitch error: incorrect perception of the pitch level Note insertion and deletion: imperfect recall Fragmentation, Consolidation, etc.: perception error or different interpretation Query formulation: melody, rhythm, chord, etc. Result “browsing”: music is for listening, not for reading
Previous Work (1/3) Acoustic feature extraction for similarity matching –loudness, pitch, brightness, bandwidth, harmonicity –all computed from music waveform –search method: mostly query by examples –acoustic features are not familiar to ordinary users –Ref: Pfeiffer, et al [‘96], Wold, et al[‘96], Foote [‘97] Hawley [‘90] developed a system: –allows MIDI keyboard input –matches tunes from their beginnings Ghias et al [‘95] developed a system: –allows queries from a microphone –converts into melodic contour of 3 symbols for key-invariance –approximate matching against 183 songs allowing note deletion, insertion, and transposition
Previous Work (2/3) McNab, et al [‘97]: New Zealand DL’s MELody inDex –also allows acoustic input by singing, humming, or playing –melody transcription, pitch encoding, approximate matching –allow matches by “interval and rhythm”,“contour and rhythm”, … –match of 9400 tunes by dynamic programming takes 21 seconds –http://www.cs.waikato.ac.nz/~nzdl/meldex David Huron [‘93] "Humdrum Toolkit” –hundreds of music tools including a DP matching tool “simil” –http://dactyl.som.ohio-state.edu/Humdrum/ Huron and Kornstaedt's "Themefinder" –allow searches by pitch, interval, contour, etc. –http://www.themefinder.org/ Typke's “Tune Server” –JavaApplet tune recognizer : convert WAV to Parsons' Code (UDR) –http://wwwipd.ira.uka.de/tuneserver/ (over 10000 tunes)
Previous Work (3/3) More recent update at scholar.google.com This work is based on our previous one in SIGIR 1999 with: –More retrieval models compared –Experiment re-done and re-confirmed Yuen-Hsien Tseng, " Content-Based Retrieval for Music Collections," Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '99, Aug. 15-19, Berkeley, U.S.A., 1999, pp.176-182.
Strategies to Overcome the Major Problems Key-invariant encoding to allow queries in any key Error-tolerant retrieval model for random pitch errors and random note insertion and deletion Query suggestion for fragmentation and consolidation Flexible and friendly interface for query formulation and relevance feedback Key melodies extraction for improving retrieval and browsing
Key Melody Extraction Key melodies: –representative fragments of music –same as keywords that give quick overview of documents –memorable parts that people easily recall –meet most user queries –reduce response time due to less data –speed up the process of browsing and selection Assumption: key melodies are repeated patterns –repetition is one of the basic composing rules [Jones, 1974] We develop an overlapping repetition identification –so far no linear algorithm is found [ Crawford & Iliopoulos, 1998 ]
Key Melody Extraction: The Algorithm Step 1: Convert the input string into a LIST of 2-note sequence. Step 2: Merge adjacent m-notes into (m+1)-notes according to the merge and accept rules until no sequences to merge –Merge rule: For any two adjacent sequences K1 and K2, if both of their occurring frequencies are greater than a threshold, they can be merged. –Accept rule: If the occurring frequency of K1 is greater than a threshold and K1 did not merge with its neighbors, then accept K1 as a key melody Step 3: Filter and sort the results according to some criteria. Complexity : O(n m) if looking up occurring frequency is O(1) –n the length of the melody –m the length of the longest repeated patterns
Key Melody Extraction: An Example Assume input string: ” CCCDCCCECC", threshold=1,separator=X. Step 1: create a list of 2-note L =(CC:5, CC:5, CD:1, DC:1, CC:5, CC:5, CE:1, EC:1, CC:5) Step 2: merging After 1st iteration: merge L into L1=(CCC:2, X:0, CCC:2, X:0), drop : (CC:5, CC:5, CD:1, DC:1, CC:5, CC:5, CE:1, EC:1), FinalList : (CC:5) After 2nd iteration: merge L1 into L2 = (X:0), drop : (X:0, X:0), FinalList : (CC:5, CCC:2) After 3rd iteration: merge L2 into L3 = ( ), drop : (X:0),FinalList : (CC:5, CCC:2) Step 3 : filtering the item CC may be removed since it is a substring of CCC.
Key Melody Extraction: Results MIDI files are collected from the Internet Each MIDI contains several tracks Each track contains the staff information of an instrument Currently only melody from each track is extracted
Pitch Profile Encoding To allow queries in any key (pitch level) Use the following symbols: Pn, Mn, and R –R : denotes a repetition of the previous note –P : a ascending note compared to the note that precedes –M : a descending note –n : number of semitones ascending or descending Example: Happy Birthday –Standard Pitch Notation: C4 C4 D4 C4 F4 E4 –Simplified Notation: 1 1 2 1 4 3 – Pitch Profile Encoding : C4 R P2 M2 P5 M1 Error ratio is 2 -- a drawback : m random errors caused by pitch error or note insertion or deletion in the original string results in at most 2m errors in the encoding string
Retrieval Model (Vector Space) Vector space model: –Documents and queries are presented by vectors –Each element in a vector is determined by an indexing scheme –The value of each element is determined by a weight scheme The similarity between document d i and query q j : Using inverted file, the query evaluation can be computed in sub-linear time
Indexing and Weight Schemes N-gram (n-note) indexing scheme: –N-gram is often referred to any n consecutive characters –N-gram method is used in note level (thus, n-note indexing) –In our n-note approach, all m-notes, where 1<= m <= n, are all indexed. Weighting scheme: –The shorter m-notes will match the input query in the presence of random errors, while the longer m-notes will favor the ones with more accurate melodies. –The weight of each m-note in queries is given by tf * ( 2(m-1) + 1 ) –The weight of each m-note in documents takes the value of 1 or 0, indicating its presence in the documents or not.
Retrieval Model: Basic Dynamic Programming DP algorithm: used for comparison ( from Sankoff & Kruskal, 1983 ) d[i, j] = min( d[i-1, j] + w(Q[i], 0), d[i-1, j-1] + w(Q[i], D[j]), d[i, j-1] + w(0, D[j]) ) –d[i, j] : accumulated distance between sequence Q and D –w(Q[i], D[j]) : cost of substituting Q[i]with D[j] –w(Q[i], 0) : cost of inserting Q[i] –w(0, D[j]) : cost of deleting D[j] –d[0, 0] = 0 –d[i, 0] = d[i-1, 0] + w(Q[i], 0), 1<= i <= n –d[0, j] = d[0, j-1] + w(0, D[j]), 1<= j <= m Complexity: O(n * m), n : length of Q, m : length of D ( i, j ) (i-1, j) (i, j-1) (i-1,j-1)
Dynamic Programming: Examples Assume costs of substitution, insertion, deletion are all 1 Fail if query string is not in the beginning. Initial conditions
Sliding-Window Dynamic Programming Windowed DP –Match only document segments of the same length –Same-length match slides through the document –Complexity: O(n 2 * m), n : length of Q, m : length of D
Modified Dynamic Programming Change the initial condition –d[0, 0] = 0 –d[i, 0] = d[i-1, 0] + w(A[i], 0), 1<= i <= n –d[0, j] = d[0, j-1] + w(0, B[j]), 1<= j <= m =>d[0, j] = 0, 1<= j <= m Initial conditions
Experiment (1999, 2000) Eight retrieval modes are compared:
Methodology Subjects: –8 subjects are invited –3 of them had a few years of training on piano –1 had a year of training on guitar –4 had only minimum musical training in school education Procedures: –Each subject is presented a title list of 30 classic music and 135 Chinese pop music –Queries are made based on their recall of the songs chosen –Top 100 reuslts are examined. If fails, cause of mismatch is advised and queries are modified accordingly by subjects –Before formal experiments, subjects are allowed to practice the procedure with 2 songs
Results from the experiments Key melody search is better than direct melody search Longer n is better for n-note indexing –longer n yields more index terms for the same collection Exact pitch encoding is better –only for the same n-note indexing –only suitable for skilled users : only 3 queries are made in correct key at the first time Dynamic programming is slightly better, but requires a large amount of evaluation time –In average, 89 seconds for 444 classic key melodies, and 536 seconds for 203 classic melodies –For n-note indexing with inverted files, only 1 or 2 seconds
Conclusions Key-invariant encoding allows queries in any key Random pitch errors, note insertion, and note deletion can be overcome by carefully designed indexing, weighting, and retrieval schemes without resorting to much time-consuming DP matching Burst errors due to fragmentation or other music factors require query suggestion –neither DP nor our approach performs well for this case –5 out of 13 cases in pop music require query reformulation due to fragmentation
Future Work More types of mismatches to analyze for automatic query candidate suggestion Effective melody extraction for polyphonic melodies –monophonic melody is assumed in this work, -- no notes occurs simultaneously –more difficulty to form a query that has simultaneous notes -- more discrepancy between queries and polyphonic melodies –Uitdenbogerd, et al,  discussed effective algorithms for extracting monophonic melodies from polyphonic ones Automatic key melody extraction allowing variations –repeated melodies with small variations enrich the melodies –Bakhmutova, et al,  has presented a semi-automatic procedure for revealing similar melodies
Related Researches and Projects ISMIR since 2000 –International Symposium on Music Information Retrieval WOCMAT since 2005 –Workshop On Computer Music and Audio Technology Digital archive application –Data mining in digital music archive Free music audio, sound processing tools and music- related visualization and mining tools –http://www.music-ir.org/evaluation/tools.html Music IR evaluation since 2005 –http://www.music-ir.org/mirexwiki/index.php/Main_Page –Test collection: music documents, query sets, and judgment –Major hurdle: copyright issue
Research Opportunities Apply language model in text retrieval to MIR –May need music composition theory Music mining –Key feature extraction –summarization –Clustering (detecting similar inter- or intra-patterns) –Classification (genre, emotion) –Annotation –Aggregated Analysis Applications –Games, entertainment, education, culture renovation, and researches –Innovative, multimedia, multimodal, and interdisciplinary
Your consent to our cookies if you continue to use this website.