Audio data extraction and Visualization

Audio data extraction and Visualization
OR: How I stopped worrying and learned to love the digital Joshua Neumann University of Florida Twitter: joshua_neumann5

What IS Music Data? Audio signal processing is a form of music information retrieval (MIR) for extracting data from sounded (rather than notated) music Music information retrieval for notated music (scores) is predicated on Optical Music Recognition Single Interface for Music Score Searching and Analysis (SIMSSA) at McGill Center for Computer-Assisted Research in the Humanities (CCARH) at Stanford Audio signal processing is based on feature extraction from an audio file @joshua_neumann5

What are Features? Features are the individual components of a sounded audio file Basic components of an audio file are: beats, pitches, volume, and sound quality How these combine together, in isolated moments and over time form elements of the expressive palette Tempo Dynamics Articulations Pitch relationships Phrasing Breathing Timbre Vocal diction (for texted music) @joshua_neumann5

Why extract features? Feature extraction is deconstructivist by nature
Feature extraction results in data that describes aspects of musical sounds Data tends to follow one of two main paths: Visualization and Analysis Centre for the History and Analysis of Recorded Music ( ) Research Centre for Musical Performance as Creative Practice ( ) Cambridge Centre for Musical Performance Studies ( ?) Performance Studies Network Machine Learning – serves as ”ground truth” or “control group” data Music Information Retrieval Evaluation eXchange (hosted at U Illinois) @joshua_neumann5

How to extract features?
Humdrum – open source, thorough, but requires coding ability Transcribe! – less thorough, easy interface, pay for usage Sonic Visualiser – open source, thorough, easy interface, frequently updated Upload file Select analysis type Allow machine to identify features or manual entry Machine extraction works better with popular music, as its forms, harmonies, melodies, and performances tend to be simpler and more regulated than Art Music Human entry of information gives control to user Machine learning learning curve is about 70-80%, which is actually about the same percentage as two random human analysts @joshua_neumann5

Sonic Visualiser Demo Tempo Spectogram Melodic Range Spectogram
Hz extraction Polyphonic or note transcription Chordino Harmonic feature extraction Chord estimation Dynamics @joshua_neumann5

Opera Feature Extraction
Try to determine where the beats fall in this example: do they line up with where Sonic Visualiser wants to place them? Human data creation/curation is useful for machine learning (MIREX) Issues: recording copyright often prevents How to work around? Tempo tap tracks Technology is not advanced enough yet to automate the data capture process Now you have data, so what do you do with it? Go to Sonic Visualiser and demo tempo extraction (marked vs. unmarked files) and computer estimation vs human entry @joshua_neumann5

Data Visualization Centre for the History and Analysis of Recorded Music (CHARM) developed Scape Plots as part of their “Mazurka Project” Scape Plots use color-coding to convey information about feature relationships across a musical performance Dyna-Scapes reveal the relationships of dynamics within a performance Time Scapes reveal the relationships of tempo across a performance Scape Plots also reveal the hierarchical structure of features in a performance A performer’s use of a feature through time contributes to the identity of his or her performance, much like finger printing Time Scape color coding is a scale based on the order of the light/color spectrum ROYGBIV Green represents the average, or ‘global’ tempo Yellow, Orange, and Red are progressively faster-than-average (mildly, moderately, significantly) Blue, Indigo, and Violet are progressively slower-than-average (mildly, moderately, significantly) @joshua_neumann5

Time Scapes for Franco Corelli and Marcello Giordani
Use in describing in a tangible way, and in an instant, the musical events of a given performance Structural crossings (blue) are temporal-textual links, which reveal non-linear textual connections that are at best difficult to grasp regardless of the number of times someone hears a performance. Links emerging from temporal textual emphases can be an indicator of a performer’s dramatic interpretation and understanding of a given aria. Nessun dorma comes in Act III, after an exchange of riddles to win the Princess’s hand in marriage, her devastation at Calaf’s success, and his counter riddle that if she learns his name by dawn, he’ll surrender his life, freeing her from marrying him. The Princess has decreed that no one in Peking will sleep, on pain of death, until the Prince’s name is discovered.

Data Visualization Franco Corelli, Marcello Giordani, 07 November 2009
04 March 1961 Data Visualization Marcello Giordani, 07 November 2009 Quick overview of structure of graph Use in describing in a tangible way, and in an instant, the musical events of a given performance Structural crossings (blue) are temporal-textual links, which reveal non-linear textual connections that are at best difficult to grasp regardless of the number of times someone hears a performance. Links emerging from temporal textual emphases can be an indicator of a performer’s dramatic interpretation and understanding of a given aria. Nessun dorma comes in Act III, after an exchange of riddles to win the Princess’s hand in marriage, her devastation at Calaf’s success, and his counter riddle that if she learns his name by dawn, he’ll surrender his life, freeing her from marrying him. The Princess has decreed that no one in Peking will sleep, on pain of death, until the Prince’s name is discovered. Corelli’s textual-temporal emphases from 04 March 1961 suggests his primary concern is to assure the princess (circle) and perhaps himself that he will be victorious at dawn. (ANIMATE GIORDANI) For Marcello Giordani, the textual-temporal emphasis over the choral interjection suggests that he is as concerned for them as he is for himself and what is at stake at this point in the opera. Giordani confirmed this assessment in an interview, stating that “here, Calaf is love and loves everyone, so he is concerned for the Princess and the Crowd” (circle emphases) Nessun dorma! (1-12) Tu pure, o Principessa…e di speranza! (13-36) Ma il mio mistero…splenderà! (37-64) Ed il mio bacio..fa mia (65-82) (coro) Il nome suo nessun saprà…(Calaf) Tramontate stelle! (83-106) All’alba vincerò! ( ) Vincerò! ( ) Vincerò! ( ) Coda ( ) Nessun dorma! (1-12) Tu pure, o Principessa…e di speranza! (13-36) Ma il mio mistero…splenderà! (37-64) Ed il mio bacio..fa mia (65-82) (coro) Il nome suo nessun saprà…(Calaf) Tramontate stelle! (83-106) All’alba vincerò! ( ) Vincerò! ( ) Vincerò! ( ) Coda ( )

Corelli and Giordani time Scapes, Cont.
Corelli’s textual-temporal emphases from 04 March 1961 suggests his primary concern is to assure the princess and perhaps himself that he will be victorious at dawn. For Marcello Giordani, the textual-temporal emphasis over the choral interjection suggests that he is as concerned for them as he is for himself and what is at stake at this point in the opera. Giordani confirmed this assessment in an interview, stating that “here, Calaf is love and loves everyone, so he is concerned for the Princess and the Crowd”

Performance Network Analysis
Performances of the same musical work exist in natural relationship with each other There are many ways of measuring these relationships Tradition – accounts for corpus behavioral patterns that already exist; allows the entry of new patterns or sets of patterns into the corpus; and reflects the resulting change in the overall corpus that results from new admissions Statistical Mean – accounts for a corpus of numbers; allows new numbers to enter the corpus; reflects the change resulting from new admissions THEREFORE: Statistical Mean (Quantitative) APPROXIMATES Performance Tradition (qualitative) Statistical Correlation (Pearson) measures the strength of similarity between data sets (of performances and embodiments of the tradition) Network Construction facilitates visual arrangement of these entities Analyzing the network can reveal characteristics that are either unprecedented or significantly more time consuming to discover with purely qualitative analysis @joshua_neumann5

Turandot at the Metropolitan Opera
Data sets for Turandot at the Metropolitan Opera Six excerpts of varying length; all feature primarily solo singing Act I “Signore, ascolta!” – 75 inter-beat tempo relationships (IBTR), (76 beats) “Non piangere, Liù” – 91 IBTR (92 beats) Act II “In questa Reggia” – 199 IBTR (200 beats) “Straniero, ascolta!” – 575 IBTR (576 beats) Act III ”Nessun dorma” – 130 IBTR (131 beats) “Tu che di gel sei cinta” – 92 IBTR (93 beats) @joshua_neumann5

Data sets for Turandot at the Metropolitan Opera Each excerpt appears in nineteen recordings Each excerpt also has eighteen arithmetical representations of “tradition” Overwhelming amount of IBTR data in this small study “Signore, ascolta!” – 2775 “Non piangere, Liù” – 3367 “In questa Reggia” – 7363 “Straniero, ascolta!” – 21275 ”Nessun dorma” – 4810 “Tu che di gel sei cinta” – 3404 42994 unique IBTR data points 43216 beat marker data points (86210 total) @joshua_neumann5

Data sets for Turandot at the Metropolitan Opera Correlation (Pearson) reduces the amount of data to 1332 points per excerpt, measuring the average strength of relationship between each stream (excerpt) Results in 7992 correlation values for the Turandot study Still prohibitively difficult to ascertain relationships between and among performances and tradition One alternative is to view the thirty seven scape plots for each excerpt’s performances and tradition, but given the limitations of the human mind, this is also time consuming and difficult to develop an understanding of the whole and its constituent parts @joshua_neumann5

CYTOSCAPE Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. Primary uses in molecular and systems biology, genomics, and proteomics Also useful for Social Science and General Complex Network Analysis Requires CSV file (sometimes Excel files work, but not always) Basic Relationship: a SOURCE INTERACTS with a TARGET Sources and Targets are nodes (each node can be a source or a target or both) Interactions are edges (Correlation values in my analysis) Layout options reflect different attributes of a network’s structure Go to Cytoscape and demo with Nessun dorma network @joshua_neumann5

joshuaoneumann@ufl.edu @joshua_neumann5
Result of mapping all 1332 correlation values for 19 performances and 18 instances of tradition for Calaf’s Act I aria, “Non piangere, Liù,” at once. Color coding reflects the strength of correlation. It’s pretty, but not terribly practical. Result of mapping all 1332 correlation values at once. Color coding reflects the strength of correlation. It’s pretty, but not terribly practical @joshua_neumann5

Ordered correlation network
Limiting the number of relationships under consideration clarifies the kinds and strengths of relationships in a network First order correlation networks consider only the strongest relationship from one node to another Second order correlation networks consider the two strongest relationships from each node, etc… Go to cytoscape @joshua_neumann5

GROUPINGS Groupings beg questions about “why” or ”how”
Two clear groupings for Turandot’s aria “In questa Reggia” Comparison of all performance demographic information provides a clear picture of the most significant factor in these groupings: productions appear to have affected performance practices

Mapping knowledge Structure
Preserving the “source-interaction-target” formula in data prep allows the mapping of knowledge structures Linkable to semantic web ontologies Useful for meta-analysis of a project or showing how one’s project fits into and changes a field of study @joshua_neumann5

QUESTIONS? Joshua Neumann University of Florida Twitter: joshua_neumann5

Audio data extraction and Visualization

Similar presentations

Presentation on theme: "Audio data extraction and Visualization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Audio data extraction and Visualization

Similar presentations

Presentation on theme: "Audio data extraction and Visualization"— Presentation transcript:

Similar presentations

About project

Feedback