Download presentation
Presentation is loading. Please wait.
Published byVictor Bourns Modified over 9 years ago
1
Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw Institute of Information Science, Academia Sinica
2
Institute of Information Science Academia Sinica 2 Extracting Information From Music Music Information Retrieval (MIR) –To develop ways of managing collections of musical material for preservation, access, research, and other uses. MIR communities & research areas [after Futrelle & Downie, 2002]
3
Institute of Information Science Academia Sinica 3 Extracting Voice Information From Music Viewing MIR from a speech-processing perspective
4
Institute of Information Science Academia Sinica 4 Singer Recognition Tasks (I) Singer Identification –Determining who is singing
5
Institute of Information Science Academia Sinica 5 Singer Recognition Tasks (II) Singer detection –Determining whether or not a specified singer is present in a music recording
6
Institute of Information Science Academia Sinica 6 Singer Recognition Tasks (III) Singer Tracking –Locating where a specified singer is present in a music recording
7
Institute of Information Science Academia Sinica 7 Singer Recognition Tasks (IV) Singer Clustering –Grouping the same-singer music recordings into a cluster
8
Institute of Information Science Academia Sinica 8 Potential Applications Indexing –Finding cameo’s or guest appearances in live concert recordings. –Identifying the singers in a movie’s musical interludes. Music recommendation systems –Suggesting music by singers with similar voices. Karaoke services –Efficiently organizing the customer’s recordings. –Personalization Copyright protection –Distinguishing between an original song and a cover-band. –Rapidly scanning suspect websites for piracy
9
Institute of Information Science Academia Sinica 9 Singer’s Vocal Characteristics Humans use several levels of perceptual cues for distinguishing among singers
10
Institute of Information Science Academia Sinica 10 Major Challenges In Singer Recognition The vast majority of popular music contains background accompaniment during most or all vocal passages –Infeasible to acquire isolated solo voice data for extracting the singer’s vocal characteristics The proposed solution: Vocal segment detection followed by solo vocal signal modeling
11
Institute of Information Science Academia Sinica 11 Vocal/Non-vocal Segmentation
12
Institute of Information Science Academia Sinica 12 Gaussian Mixture Model (I) Model description –The distribution of the feature vector x is represented by a mixture of M component Gaussian densities, i.e., is the i-th Gaussian density with mean and covariance matrix –A Gausian mixture model (GMM) is characterized by
13
Institute of Information Science Academia Sinica 13 Gaussian Mixture Model (II) Parameter estimation –Using the EM algorithm, an initial model is created, and the new model is then estimated by maximizing the auxiliary function where and –Letting for each parameter to be re-estimated, we have
14
Institute of Information Science Academia Sinica 14 Distilling Singers’ Voices From Music Substantial similarities exist between the instrumental regions and the accompaniment of the vocal signal Solo voice can be modeled via suppressing the background music estimated from the instrumental regions.
15
Institute of Information Science Academia Sinica 15 Solo Vocal Signal Modeling (I) Model Description – b can be approximately estimated using the instrumental regions of music –Our aim is to find an optimal s such that (in maximum likelihood sense)
16
Institute of Information Science Academia Sinica 16 Solo Vocal Signal Modeling (II) Parameter estimation –Defining an auxiliary function where –Letting for each parameter to be re-estimated, we have
17
Institute of Information Science Academia Sinica 17 Solo Vocal Signal Modeling (III) Re-estimation formulas for linear spectral features –Suppose V is a linear spectral feature, and S and B are additive in the time domain, then v t = s t + b t – is the convolution of the solo and background music densities, i.e., – and can be shown in the following form:
18
Institute of Information Science Academia Sinica 18 Solo Vocal Signal Modeling (IV) Re-estimation formulas for cepstral features –Suppose V is a cepstral feature, and S and B are additive in the time domain, then v t = log[exp(s t )+exp(b t )]. We approximate v t max (s t, b t ). –It can be shown that
19
Institute of Information Science Academia Sinica 19 Singer Identification (SID) Block diagram
20
Institute of Information Science Academia Sinica 20 SID Experiments Music data –200 tracks from Mandarin pop music CDs –10 female & 10 male singers –5 tracks/singer for training; 5 tracks/singer for testing –20-min instrumental-only data for training the non- vocal GMM –22.05 kHz sampling rate (down-sampled from 44.1 kHz) Vocal/Non-vocal segmentation –82.3% frame accuracy
21
Institute of Information Science Academia Sinica 21 Singer Clustering (I) Block diagram
22
Institute of Information Science Academia Sinica 22 Singer Clustering (II) An example of the characteristic vectors
23
Institute of Information Science Academia Sinica 23 Singer Clustering (III) Determining the number of clusters –Bayesian Information Criterion (BIC) Measuring how well the model fits a data set, and how simple the model is, specifically –The BIC for a K-clustering is computed by: –A reasonable number of clusters can be determined by
24
Institute of Information Science Academia Sinica 24 Singer Clustering Experiments (I) Music data –200 tracks (20 singers; 10 tracks/singer) Assessment method –Cluster purity k is the purity of the cluster k, n k the total no. of recordings in the cluster k, and n kp the no. of recordings in the cluster k that were performed by singer p –Average purity M is the total no. of recordings, and K the no. of clusters
25
Institute of Information Science Academia Sinica 25 Singer Clustering Experiments (II) Results
26
Institute of Information Science Academia Sinica 26 Summary We have –Separated vocal from non-vocal segments of music; –Isolated singers’ vocal characteristics form the background music; –Distinguished singers from one another. We will –Handle wider variety of music data including duets, trios, chorus, background vocals, or music with multiple simultaneous or non- simultaneous singers; –Deal with the other problems of voice information retrieval from music, such as lyric transcription and singing language recognition.
27
Institute of Information Science Academia Sinica 27 To Probe Further (I) Selected references –Music information retrieval A. L. Uitdenbogerd, “Music IR: past, present, and future,” Proceedings of International Symposium on Music Information Retrieval, 2000. J. Futrelle and J. S. Downie, “Interdisciplinary communities and research issues in music information retrieval,” Proceedings of International Conference on Music Information Retrieval, pp. 215–221, 2002. –Artist recognition B. Whitman, G. Flake, and S. Lawrence, “Artist detection in music with Minnowmatch,” Proceedings of IEEE Workshop on Neural Networks for Signal Processing, 2001. A. Berenzweig, D. P. W. Ellis, and S. Lawrence, “Using voice segments to improve artist classification of music,” Proceedings of International Conference on Virtual, Synthetic and Entertainment Audio, 2002. –Singer identification Y. E. Kim and B. Whitman, “Singer identification in popular music recordings using voice coding features,” Proceedings of International Conference on Music Information Retrieval, pp. 164–169, 2002. C. C. Liu, and C. S. Huang, “A singer identification technique for content-based classification of MP3 music objects,” Proceedings of International Conference on Information and Knowledge Management, pp. 438–445, 2002. T. Zhang, “Automatic Singer Identification,” Proceedings of International Conference on Multimedia and Expo, 2003. W. H. Tsai, H. M. Wang, and D. Rodgers, “Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal,” Proceedings of European Conference on Speech Communication and Technology, 2003. –Singer clustering W. H. Tsai, H. M. Wang, D. Rodgers, S. S. Cheng, and H. M. Yu, “Blind clustering of popular music recordings based on singer voice characteristics,” to appear in Proceedings of International Conference on Music Information Retrieval, 2003.
28
Institute of Information Science Academia Sinica 28 To Probe Further (II) General resources –Important conferences International Conference on Music Information Retrieval International Computer Music Conference IEEE International Conference on Multimedia and Expo ACM International Multimedia Conference International Conference on New Interfaces for Musical Expression –Organizations International Computer Music Association (http://www.computermusic.org/) The Australasian Computer Music Association (http://www.acma.asn.au/) ACM Multimedia (http://www.acm.org/sigmm/) Acoustical Society of America (http://asa.aip.org/) –Journals Computer Music Journal (http://www-mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=15) Journal of New Music Research (http://www.swets.nl/jnmr/jnmr.html) Computing in Musicology (http://www.ccarh.org/publications/books/cm/) –Useful links http://www.leighsmith.com/Browsers/Cmusic.html http://www2.siba.fi/Kulttuuripalvelut/computers.html
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.