Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Representation of Musical Information Donald Byrd School of Music Indiana University Updated 8 March 2006 Copyright © 2003-06, Donald Byrd.

Similar presentations

Presentation on theme: "1 Representation of Musical Information Donald Byrd School of Music Indiana University Updated 8 March 2006 Copyright © 2003-06, Donald Byrd."— Presentation transcript:

1 1 Representation of Musical Information Donald Byrd School of Music Indiana University Updated 8 March 2006 Copyright © , Donald Byrd

2 30 Jan Classification: Surgeon General’s Warning Classification (ordinary hierarchic) is dangerous –Almost everything in the real world is messy –Absolute correlations between characteristics are rare –Example: some mammals lay eggs; some are “naked” –Example: musical instruments (piano as percussion, etc.) Nearly always, all you can say is “an X has characteristic A, and usually also B, C, D…” Leads to: –People who know better claiming absolute correlations –Arguments among experts over which characteristic is most fundamental –Don changing his mind

3 rev. 3 Feb. 3 Dimensions of Music Representations (1) (After Wiggins et al (1993). A Framework for the Evaluation of Music Representation Systems.)

4 30 Jan Dimensions of Music Representations (2) Expressive completeness –How much of all possible music can the representation express? –Includes synthesized as well as acoustic sounds! –Waveform (=audio) is truly “complete” –Exception, sort of: conceptual music E.g., Tom Johnson: Celestial Music for Imaginary Trumpets (notes on 100 ledger lines), Cage: 4’ 33” (of silence), etc. Structural generality –How much of the structure in any piece of music can the representation express? –Music notation with repeat signs, etc. still expresses nowhere near all possible structure

5 30 Jan Representation vs. Encoding Representation: what information is conveyed? –More abstract (conceptual) –Basic = general type of info; specific = exact type Encoding: how is the information conveyed? –More concrete: in computer (“bits”)…or on paper (“atoms”)!) One representation can have many encodings –“Atoms” example: music notation in printed or Braille form –“Bits” example: any kind of text in ASCII vs. Unicode –“Bits” example: formatted text in HTML, RTF,.doc

6 27 Jan. 6 Basic Representations of Music & Audio Audio (e.g., CD, MP3): like speech Time-stamped Events (e.g., MIDI file): like unformatted text Music Notation: like text with complex formatting

7 27 Jan. 7 Basic Representations of Music & Audio AudioTime-stamped EventsMusic Notation Common examplesCD, MP3 fileStandard MIDI FileSheet music UnitSample Event Note, clef, lyric, etc. Explicit structurenone little (partial voicing much (complete information) voicing information) Avg. rel. storage Convert to left- easyOK job: easy Convert to right1 note: pretty easyOK job: fairly hard- other: hard or very hard Ideal formusic musicmusic bird/animal sounds sound effects speech

8 rev. 15 Feb. 8 Basic and Specific Representations vs. Encodings Basic and Specific Representations (above the line) Encodings (below the line)

9 30 Jan Selfridge-Field on Describing Musical Information Cf. Selfridge-Field, E. (1997). Describing Musical Information. What is Music Representation? (informal use of term!) –Codes in Common Use: solfegge (pitch only), CMN, etc. –“Representations” for Computer Application: “total”, MIDI Parameters of Musical Information –Contexts: sound, notation/graphical, analytic, semantic; gestural? –Concentrates on 1st three Processing Order: horizontal or vertical priority Code Categories –Sound Related Codes: MIDI and other –Music Notation Codes: DARMS, SCORE, Notelist, Braille!?, etc. –Musical Data for Analysis: Plaine and Easie, Kern, MuseData, etc. –Representations of Musical Patterns and Process –Interchange Codes: SMDL, NIFF, etc.; almost obsolete!

10 10 Review: The Four Parameters of Notes Four basic parameters of a definite-pitched musical note 1. pitch: how high or low the sound is: perceptual analog of frequency 2. duration: how long the note lasts 3. loudness: perceptual analog of amplitude 4. timbre or tone quality Above is decreasing order of importance for most Western music …and decreasing order of explicitness in CMN!

11 11 Review: How to Read Music Without Really Trying CMN shows at least six aspects of music: –NP1. Pitches (how high or low): on vertical axis –NP2. Durations (how long): indicated by note/rest shapes –NP3. Loudness: indicated by signs like p, mf, etc. –NP4. Timbre (tone quality): indicated with words like “violin”, “pizzicato”, etc. –Start times: on horizontal axis –Voicing: mostly indicated by staff; in complex cases also shown by stem direction, beams, etc. See “ Essentials of Music Reading” musical example.

12 10 Feb. 12 Complex Notation (Selfridge-Field’s Fig. 1-4) Complications on staff 2: Editorial additions (small notes) Instruments sharing notes only some of the time Mixed durations in double stops Multiple voices (divisi notation) Rapidly gets worse with more than 2!

13 rev. 12 Feb. 13 Complex Notation (Selfridge-Field’s Fig. 1-4) Multiple voices rapidly gets worse with more than 2 2 voices in mm. 5-6: not bad: stem direction is enough 3 voices in m. 7: notes must move sideways 4 voices in m. 8: almost unreadable—without color! Acceptable because exact voice is rarely important

14 1 Feb Domains of Musical Information Independent graphic and performance info common –Cadenzas (classical), swing (jazz), rubato passages (all music) CMN “counterexamples” show importance of independent graphic and logical info –Debussy: bass clef below the staff –Chopin: noteheads are normal 16ths in one voice, triplets in another Mockingbird (early 1980’s) pioneered three domains: –Logical: “ note is a qtr note” (= ESF(Selfridge-Field)’s “notation”) –Performance: “ note sounds for 456/480ths of a quarter” (= ESF’s “sound”; also called gestural) –Graphic: “ notehead is diamond shaped” (= ESF’s “ notation”) –Nightingale and other programs followed SMDL added fourth domain –Analytic: for Roman numerals, Schenkerian level, etc. (= ESF’s “analytic”)

15 10 Feb. 15 Different Classifications of Music Encodings

16 16 Mozart: Variations for piano, K. 265, on “Ah, vous dirais-je, Maman”, a.k.a. Twinkle

17 27 Jan. 17 Representation Example: a Bit of Mozart The first few measures of Variation 8 of the “Twinkle” Variations

18 27 Jan. 18 In Notation Form: Nightingale Notelist %Notelist-V2 file='MozartRepresentationEx' partstaves=2 0 startmeas=193 C stf=1 type=3 C stf=2 type=10 K stf=1 KS=3 b K stf=2 KS=3 b T stf=1 num=2 denom=4 T stf=2 num=2 denom=4 A v=1 npt=1 stf=1 S1 'Variation 8' D stf=1 dType=5 N t=0 v=1 npt=1 stf=1 dur=5 dots=0 nn=72 acc=0 eAcc=3 pDur=228 vel= appear=1 R t=0 v=2 npt=1 stf=2 dur=-1 dots= appear=1 N t=240 v=1 npt=1 stf=1 dur=5 dots=0 nn=74 acc=0 eAcc=3 pDur=228 vel= appear=1 N t=480 v=1 npt=1 stf=1 dur=5 dots=0 nn=75 acc=0 eAcc=2 pDur=228 vel= appear=1 N t=720 v=1 npt=1 stf=1 dur=5 dots=0 nn=77 acc=0 eAcc=3 pDur=228 vel= appear=1 / t=960 type=1 N t=960 v=1 npt=1 stf=1 dur=4 dots=0 nn=79 acc=0 eAcc=3 pDur=456 vel= appear=1 (etc. File size: 1862 bytes)

19 5 Feb. 19 An Event Form: Standard MIDI File (file dump) 0: 4D E0 4D54 MThd ‡MT 16: 726B FF B70 C000 FF58 rk......Q..p¿..X 32: FF 2F00 4D54 726B ñ4./.MTrk.. 48: FF E 6F U....Piano.êH8Å 64: C90 4A A 400C 904B 80: B40 0C90 4D D 400C 96: 904F F F : F D : 804D 400D FF2F 004D B : FF E6F 8F B Pianoè.êA+ÅdÄ 160: C C : C FF2F 192: 00.

20 27 Jan. 20 An Event Form: Standard MIDI File (interpreted) Header format=1 ntrks=3 division=480 Track #1 start t=0 Tempo microsec/MIDI-qtr= t=0 Time sig=2/4 MIDI-clocks/click=24 32nd-notes/24-MIDI-clocks=8 t=2868 Meta event, end of track Track end Track #2 start t=0 Meta Text, type=0x03 (Sequence/Track Name) leng=5 Text = t=0 NOn ch=1 num=72 vel=56 t=228 NOff ch=1 num=72 vel=64 t=240 NOn ch=1 num=74 vel=56 t=468 NOff ch=1 num=74 vel=64 (etc. File size: 193 bytes)

21 5 Feb MIDI (Musical Instrument Digital Interface) (1) Invented in early 80’s –Dawn of personal computers –Designed as simple (& cheap to implement) real-time protocol for communication between synthesizers –Low bandwidth: Kbps Top bit of byte: 1 = status, 0 = data –Numbers usually 7 bits (range 0-127); sometimes 14 or even 21 Message types –Channel Voice –Channel Mode –System Common –System Real-Time –System Exclusive

22 5 Feb MIDI (2) Important standard Events are mostly Channel Voice msgs –Note On: channel (1-16), note number (0-127), on velocity –Note Off: channel, note number, off velocity Can change “voice” any time with Program Change msg A way around the 16-channel limit: cables –may or may not correspond to a physical cable –each cable supports 16 channels independent of others –Systems with 4 (=64 channels) or 8 cables (=128) are common MIDI Monitor allows watching MIDI in real time –Freeware and open source!

23 8 Feb MIDI Sequencers Record, edit, & play SMFs (Standard MIDI Files) Standard views –Piano roll often with velocity, controllers, etc., in parallel –Event list –Other: Mixer, “Music notation”, etc. –Standard editing Adding digital audio –Personal computers & software-development tools have gotten more & more powerful – => "digital audio sequencers”: audio & MIDI (stored in hybrid encodings) Making results more musical: “Humanize” –Timing, etc. isn’t mechanical—but not really musical!

24 5 Feb Another Warning: Terminology (1) A perilous question: “How many voices does this synthesizer have?” Syllogism –Careless and incorrect use of technical terms is dangerous to your learning very much –Experts use technical terms carelessly most of the time –Beginners often use technical terms incorrectly –Therefore, your learning very much is in danger Somewhat exaggerated, but only somewhat

25 6 Feb Another Warning: Terminology (2) Not-too-serious case: “system” –Confusion because both standard (common) computer term & standard (rare but useful) music term Serious case: patch, program, timbre, or voice –Vocabulary def.: Patch: referring to event-based systems such as MIDI and most synthesizers (particularly hardware synthesizers), a setting that produces a specific timbre, perhaps with additional features. The terms "voice", "timbre", and "program" are all used for the identical concept; all have the potential to cause substantial confusion and should be avoided as much as possible –“Patch” is the only unambiguous term of the four –…but the official MIDI specification (& almost everything else) talks about “voices” (as in “Channel Voice messages control the instrument's 16 voices”) –…and to change the “voice”, you use a “program change”!

26 6 Feb Another Warning: Terminology (3) Some terminology is just plain difficult Example: “Representation” vs. “Encoding” –Distinction: 1st is more abstract, 2nd more concrete –…but what does that mean? –Explaining milk to a blind person: “a white liquid...” Don’s precision involves being very careful with terminology, difficult or not –Vocabulary is important source –Cf. other sources –Contributions are welcome

27 5 Feb. 27 Standard MIDI Files (1) File format = encoding Standard approved in 1988 Very compact Files made up of chunks with 4-character type One Header chunk (“Mthd”) –Gives format, number of tracks, basis for timing Any number of Track chunks (“MTrk”) –Stream of MIDI events and metaevents preceded by time –1st track is always timing track

28 5 Feb Standard MIDI Files (2) Metaevents –Set Tempo (in timing track only) –Text, Lyrics, Key/time signatures, instrument name, etc. What’s missing? –Voice information limited to 16 channels –Dynamics, beams, tuplets, articulation, expression marks, note spelling, etc.: much less structure than CMN Attempts to overcome limitations –Expressive MIDI, NotaMIDI, etc. –ZIPI –In a (more ambitious) way, Csound, etc. –None of the limited attempts caught on

29 12 Feb Separating Representations Doesn’t Work! (1) Really “doesn’t work well for many purposes” We shouldn’t be surprised –Close relative of “Classification is Dangerous to Your Health” Example: many popular notation encodings (e.g., MusicXML) add event info Example: multiple domains for notation add in event info (performance domain) Example: Csound combines audio & events Hybrid systems

30 3 Feb Separating Representations Doesn’t Work! (2) Extreme example of musical necessity: Jimi Hendrix’s version of the Star-Spangled Banner at Woodstock (1969) –Goes from pure melody => noteless texture => back repeatedly –What would music-IR system do to recognize the Star- Spangled Banner? –…or Taps? (a very different problem!) Attempts have been/are being made to combine all three basic representations

31 31 Even One Note can be Hairy Experience in the early days of Kurzweil (ca. 1983) –Piano middle C(!) never sounded “good”...except first, low-quality recording Couldn’t tell why from waveform, spectrogram, etc. –Variable sampling rates were unusable An expensive mistake: cost ca. $1,000,000 –Scale on the flute didn’t sound realistic to a flutist—but it was –Lesson 1: expectations influence perception –Lesson 2: nothing about music is clear-cut or simple

32 11 Feb Musical Acoustics (1) Acoustics involves physics Musical (opposed to architectural, etc.) acoustics –Frequency (=> pitch) –Amplitude (=> loudness) –Spectrum, envelope, & “other” characteristics => timbre –Partials vs. harmonics Psychoacoustics involves psychology/perception –Perceptual coding (“lossy” compression via MP3, etc.) Timbre –Old idea (thru ca. 1960’s?): timbre is produced by static relationships of partials –…but attack helps identification more than steady state! –Reality: rich (interesting) sounds are complex; nothing is static –Time domain (waveform) vs. frequency domain (spectrogram) views

33 11 Feb Musical Acoustics (2): addsynenv addsynenv does additive synthesis of up to six partials –Each has arbitrary partial no., starting phase, and "ADSR" type envelope –Partial no. can be non-integer => not harmonic –ADSR = Attack/Decay/Sustain/Release (3 breakpoints) –…but addsynenv allows much more complex envelopes –Plays a single note in the waveform specified by partials & envelopes –Simultaneously displays “spectrogram” or “sonogram” –…but not waveform –Phase in real world normally has little effect, but can be critical in recording & digital worlds (e.g., cancellation) Additive synthesis can’t create aperiodic (non-definite pitch) sounds, or realistic attacks

34 10 Feb Musical Acoustics (3): Frequency, Temperament, Scales Frequency: A4 (A above middle C) = 440 Hz, etc. Why do we need temperaments (tuning systems)? –Frequency ratio (FR) of notes an octave (12 semitones) apart = 2:1 –FR of notes a Perfect 5th (7 st, e.g., C up to G) = 3:2 –84 st = 7 octaves; around “circle of fifths”, 84 st = 12 P5s –But (3/2)^12 = does not = 2^7! –The standard (but not only) solution: equal temperament With 12 semitones & equal temperament, semitone FR = 12th root of 2 : 1 = ca …:1 Etc.: see table in Wikipedia “Musical Acoustics” article Scales –Diatonic (major, minor, etc.), chromatic, other –Is addsynenv “whole tone” preset a timbre, or a scale!?

35 31 Jan. 35 An Audio Form: AIFF, 22K, 8 bit mono (file dump) 0: 464F 524D AE F 4D4D FORM..ÉÆAIFFCOMM 16: D 32: E SSND..Éà.. 48: : FF FEFC ˛¸ 80: FBFB FCFC FCFD FEFF ˚˚¸¸¸˝˛ : FFFF FEFE ˛˛ 112: FFFF FFFF FF FE FCFC FCFC FBFC ˛¸¸¸¸˚¸ 128: FDFE FF FFFF ˝˛ : FEFE FF ˛˛ : FF FEFE FDFD FBFA FBFC FEFF FFFF....˛˛˝˝˚˙˚¸˛ : FF FEFE FEFD FDFE FF ˛˛˛˝˝˛ : : FFFD FBFA FAFC FDFF FF00 00FF FFFF FEFD.˝˚˙˙¸˝ ˛˝ 224: FDFC FBFB FCFE ˝¸˚˚¸˛ : FFFD FBF8 F6F5 F5F ˝˚¯ˆıı˜ 256: F9FB FDFD FDFE FEFE FEFE FEFD FDFC FDFF ˘˚˝˝˝˛˛˛˛˛˛˝˝¸˝. (etc. File size: 99,254 bytes)

36 10 Feb Uncompressed Audio Files: Formats File formats are (containers for) encodings Only common formats are “PCM”: WAVE (.wav: especially Windows), AIFF (especially Macintosh) Sampling rate, sample size, no. of channels independent of format Typically in file header (different between formats) Higher sampling rate => reproduce higher frequencies (sampling theorem, a.k.a. Nyquist theorem) –Max. possible frequency = 1/2 SR Larger sample size => better signal-to-noise ratio (SNR) and/or dynamic range –Rule of thumb: 6 dB/bit –16-bit samples (CD) = maximum of ca. 96 dB SNR

37 1 Feb Uncompressed Audio Files: Size Header size usually negligible compared to data Mozart AIFF file is low fidelity mono –22,050 samples/channel/sec. * 1 byte/sample * 1 channels * 4.5 sec. = 99,000 bytes –At CD quality, would take 44,100 samples/channel/sec. * 2 bytes/sample * 2 channels * 4.5 sec. = 792,000 bytes CD can store up to 74 min.(or 80) of music Total amount of digital data –44,100 samples/channel/sec. * 2 bytes/sample * 2 channels * 74 min. * 60 sec./min. = 783,216,000 bytes

38 13 Feb Specs for Some Common Audio Formats

39 11 Feb An Event/Audio Hybrid Form: Csound Developed by Barry Vercoe (MIT) in early 80’s Builds on Music IV, Music V, etc. (Bell Labs in early 60’s) Widely used in electronic/computer music studios Hybrid representation: orchestra = audio, score = events –Orchestra: defines “instruments” (timbre, articulation, dynamics) –Score: tells instruments when to play and with what parameter values One statement per line and comments after a semicolon First character of score statement is opcode Remaining data consists of numeric parameter fields (pfields) to be used by that action

40 3 Feb. 40 Csound Example Sine-tone generator to play the first five notes of the Mozart example, at quarter note = 80 => an eighth note lasts 3/8 =.375 sec. Score: f ; a sine wave function table ; a few notes of Mozart i i i i i Opcode f interprets parameter fields as follows: p1 - function table number being created p2 - creation time, or time at which the table becomes readable p3 - table size (number of points) p4 - generating subroutine, chosen from a prescribed list. i (note) statements invoke the p1 instrument at time p2, duration p3 seconds; passes all p-fields to that instrument.

41 16 Feb Compressed Audio Files: Formats Don’t confuse data compression with dynamic-range compression (a.k.a. audio level compression or limiting) Codec = compressor/decompressor Lossless compression –Standard methods (e.g.,, etc.) don’t do much for audio –Sometimes called packing –Audio specific methods MLP used for DVD-Audio Apple and Microsoft Lossless Nearly lossless compression : DTS Lossy compression –Depends on psychoacoustics (“perceptual coding”)

42 22 Feb Compressed Audio Files: Lossy Compression Depends on psychoacoustics (“perceptual coding”) 1. Divide signal into sub-bands by frequency 2. Take advantage of: Masking (“shadows”), via amplitude within critical bands Threshhold of audibility (varies with frequency) Redundancy among channels (5.1 even more than stereo!) Etc. MPEG-1 layers I thru III (MP-1, 2, 3), AAC get better & better compression via more & more complex techniques –“There is probably no limit to the complexity of psychoacoustics.” --Pohlmann, 5th ed. –However, there probably is an “asymptotic” limit to compression! Implemented in hardware or software codecs

43 17 Feb Compressed Audio Files: Lossy Compression Evaluation via critical listening is essential –ITU 5-point scale 5 = imperceptible, 4 = perceptible but not annoying, 3 = slightly annoying, 2 = annoying, 1 = very annoying –Careful tests: often double-blind, triple-stimulus, hidden reference E.g., ISO qualifying AAC with 31 expert listeners (cf. Hall article) –Test materials chosen to stress codecs Common useful tests: glockenspiel, castanets, triangle, harpsichord, speech, trumpet Soulodre’s worst-case tracks: bass clarinet arpeggio, bowed double bass, harpsichord arpeggio, pitch pipe, muted trumpet References: Hall article “Cramped Quarters”, Pohlmann Principles of Digital Audio (both on my reserve)

44 17 Feb Hybrid Representation & Compression Events (with “predefined” timbre) take very little space –Mozart fragment AIFF (CD-quality audio): 794,166 bytes –Mozart fragment MIDI file: 193 bytes –Timbre takes same amount of space, regardless of music length! –Problem: don’t have exact timbre for any performance Mike Hawley’s approach: find structure in audio; create events & timbre definition –Hawley, Michael J. (1990). The Personal Orchestra, or, Audio Data Compression by 10000:1. Usenix Computing Systems Journal 3(2), pp. 289—329. Could hybrid event/audio representation lead to his “audio data compression by a factor of 10,000”? Maybe, but no time soon!

45 20 Feb Explicit Structure & Music Notation (1) David Huron on “Explanatory Goals of Music Analysis” “Music exhibits a multitude of different kinds of structures.” –Formal: Symmetrical partitioning of tone row in Webern's Op. 24 Concerto –Functional: Defensive vigil and the early morning singing of the Mekranoti indians of Brazil –Physiological: Critical bands & Haydn's String Quartet Op. 17, No. 3 –Perceptual: Auditory attention & ramp dynamics in Beethoven's Piano Sonata No. 4, Op. 7 –Idiomatic: Arnold's Fantasy for Trumpet –Economic: Prokofiev's piano Concerto No. 4 for left-handed pianist –Personal: B-A-C-H and J.S. Bach's Brandenburg Concerto No. 2 –Social: Frank Sinatra's All or Nothing at All heard in 1959 vs –Cultural: Geographical difference between Slavic & Germanic folk cadences –Linguistic: Rhythmic contrast between Frère Jacques & English Country Garden

46 7 Mar Explicit Structure & Music Notation (2) Music notation is about explicit information/structure Huron was talking about explanation; do we really need explicit information for IR? –O’Maidin (and others): no need for logical domain Interesting examples –Unmarked triplets: obvious in Schubert, etc.; not in Bartok Qtet 5, I –Bach Goldberg Variations; D-major Prelude, WTC Book II: notation sidesteps triplets by changing time signature (and tempo) –Hendrix’ Star-Spangled Banner improvisation Byrd & Isaacson –Need logical domain because (for many purposes) must interpret at some point, & silly for everyone who needs it to re-do! –Byrd & Isaacson (2005). A Music Representation Requirement Specification for Academia. CMJ 27, no. 4 (2003), pp. 43–57 –But logical info may be less authoritative

47 47 Explicit Structure & Music Notation (3) Joan Public’s problem: find a song, given some of the melody and some lyrics –Needs notes and text (lyrics) –Common question for music librarians, esp. in public libraries Musicologist’s problem: authorship/origin of works in manuscripts –Full symbolic data is important, even “insignificant” details of notation (John Howard)

48 22 Feb Music Notation: Fully Structured Representation Many forms (“specific” representations) –Early and modern Western, Gamelan notation (Java), Indian, etc. Evolved with music they represent CMN probably most elaborate, especially w/r/t time –Western polyphony very demanding of synchronization History of Western music notation in ten minutes –Manuscript Facsimiles at –Neumes (6th to 13th century) –Mensural notation: black (ca ), white (ca ) –CMN (a.k.a. TMN) is really “CWMN” (ca present) included barlines, piano/forte from beginning …but other dynamics gradually: pp/ff c.1750, mf c.1762, mp c.1837, etc.; Romanticism => wider range …and more and more specific (metronome & expression marks, etc.) –Tablature goes back centuries, too: important for lute (& viols?)

49 2 Mar A Systematic Approach to Music Representation Byrd & Isaacson (2003). A Music Representation Requirement Specification for Academia. Computer Music Journal 27(4), pp Intended to help choose representation for encoded music in Variations2 Purpose influences choice of encoding as well as representation Wiggins et al. (1993) give three sorts of tasks: 1. recording: user wants record of musical object, to be retrieved later 2. analysis: user wants analyzed version, not “raw” musical object 3. generation/composition Descriptive vs. prescriptive notation :: pitch in CMN vs. tablature Tuplets are exceptionally complex & subtle; often invisible (unmarked) A “huge table” takes up half of the article CMN has endless details, but most aren’t important for most purposes! Assigned importance “Required”, “Very Desirable”, “Desirable” Result: chose MusicXML (but priorities changed & little done with it)

50 rev. 15 Feb. 50 Music Notation Software and Intelligence (1) Cf. Byrd, D. (1994). Music Notation Software and Intelligence. Cases where famous composers flagrantly violate important rules, yet results are easily readable Fig. 1. Changing time signature in middle of the measure (J.S. Bach) Fig. 2. A measure with four horizontal positions for notes that are all on the downbeat (Brahms) Very different ways to have two clefs in effect at the same time: Fig. 3. Bizarrely obvious (Debussy) Fig. 4. So subtle, must think about the 3/8 meter to see bass and treble clefs are both in effect throughout the measure (Ravel) Really nothing very strange going on in any of these

51 22 Feb Music Notation Software and Intelligence (2) Rules of CMN interact and aren’t always consistent Programmers try to help users by having programs do things “automatically” A good idea if software knows enough to do the right thing “almost all” the time Notation programs convert CMN to performance (MIDI) and vice-versa => makes things worse Severo Ornstein’s complaint: programs that assume a defined rhythmic structure

52 22 Feb Surprise: Music Notation has Meta-Principles! 1. Maximize readability (intelligibility) –Avoid clutter = “Omit Needless Symbols” –Try to assume just the right things for audience –Audience for CMN is (primarily) performers –General principle of any communication Applies to talks as well as music notation! –Examples: Schubert (avoid tuplet numerals), Bach (avoid tuplets) 2. Minimize space used –Save space => fewer page turns (helps performer); also cheaper to print (helps publisher) –Squeezing much music into little space is a major factor in complexity of CMN –Especially important for music: real-time, performer’s hands full –Examples: Telemann, Debussy, Ravel (for all, reduce staves)

53 27 Feb Music Notation: Attempts at Standard Encodings (1) Structured encodings on computers (all CMN) Almost all are text; a few (e.g., NIFF) are binary Three “generations” Early (DARMS, MUSTRAN, SCORE, IML, etc.) –From 1960’s: mainframe computers, batch processing (no interactive editing), punch card input –Logical and (usually limited) Graphic domains –IML used for Princeton’s Josquin Masses project: first(?) music IR –SCORE important in music publishing bcs emphasizes Graphic –Mozart (“Twinkle” Variation no. 8) in MUSTRAN (first 5 notes) GS, K3$K, 2=4, WMPW, ‘Variation 8’, 8C+, 8D+, 8E+, 8F+, /, 4G+ 2nd generation (Humdrum/kern, MuseData, Notelist, etc.) –From 1970’s and 80’s: mini and early personal computers –Added Performance domain –Far more verbose: 1st 5 notes of Mozart in Notelist = c. 12 lines

54 27 Feb Music Notation: Attempts at Standard Encodings (2) Recent (3rd generation) –NIFF, GUIDO, SMDL, MusicXML, MEI, etc. –Almost all intended as interchange codes –Some are extensible (GUIDO, MEI) –Most based on XML Non-XML –NIFF = Notation Interchange File Format Binary, not text—probably a mistake Mostly a flop: maybe because apparent complexity scared developers –GUIDO text, but not XML

55 27 Feb GUIDO and Representational Adequacy Representational adequacy: simple things have simple representations –Simple (really “encodings”!) => almost as compact as MUSTRAN –Mozart first 5 notes: Mozart first 5 notes: [ \clef \key \meter c+2/8 d e& f g/4 ] –…but with other symbols: [ \clef \text \key \meter \intens c+2/8 d e& f g/4 ] Design Layers: Basic (Logical domain only), Advanced (adds Graphic), Extended (adds Performance, user extensible; unfinished?) “GUIDO Music Notation Fmt” talk: GUIDO is (was?) an academic project => free, open-source GUIDO NoteServer: Other GUIDO tools: MIDI => GMN, GMN => gif (like NoteServer) –Used in Variations2 to display themes GUIDOXML exists, but never much done with it –GUIDO’s developers never liked XML, felt it accomplished nothing GUIDO was fairly popular, but seems to be dying out—too bad –Almost everyone else likes XML: cause and effect?

56 27 Feb Music Notation: Attempts at Standard Encodings (3) XML-based (concept of markup language) –SGML = Standard Generalized Markup Language –“Application” of SGML for music SMDL = Standard Music Description Language: early & v. powerful, but a flop –XML = eXtensible Markup Language is hugely popular –Applications of XML for music MusicXML is by far most popular; most verbose (5 notes of Mozart = 270 lines!) MEI also significant; others include MusiXML, MNML, NIFFML, etc. etc. Castan’s site lists programs importing & exporting each encoding –Gives an idea of which are most important/popular –MusicXML is hands-down winner; next are GUIDO, NIFF, SCORE

57 5 Mar HTML, XML, and Markup Languages Basic limitation of HTML: too concrete –Prescriptive, not descriptive –Says what to show, but no way to say what “G minor” is (semantics) –Limits reuse of information Solution: markup languages –SGML = Standard Generalized Markup Language –“Applications” of SGML HTML = HyperText Markup Language (Web pages) SMDL = Standard Music Description Language: early & v. powerful, but a flop –XML = eXtensible Markup Language is hugely popular Subset of/simpler than SGML; mostly replaced it (except for HTML) Stricter than SGML => HTML replaced by XHTML Application defined by DTD or schema (cf. MusicXML’s) Musical metadata examples: iTunes Music Library.xml, Variations2 –Can say “G minor” is a “key”, but still no explicit semantics Improvement: the Semantic Web (?): see –XML-based; still no real semantics, but more explicit structure

58 3 Mar MusicXML MusicXML from Michael Good/Recordare, Inc. –Cf. Designed as a “practical interchange format” Now the de facto standard encoding of CMN –Interfaces to MusicPad, karaoke, etc., as well as notation programs “Recordare editions” (songs) & examples on Web site – Heavily influenced by MuseData; also Humdrum/kern(?) –=> Conversion to/from can be very high-quality –Dolet for Finale does MusicXML Finale –Dolet 1.0 also did MuseData (CCARH encoding) 1.0 emphasizes Logical; much Performance info too MusicXML 1.1 improves formatting (Graphical) Also has support for tablature

59 1 Mar MusicXML and Variations2 Real-world situation: choice of encoding for Variations2 –In Byrd & Isaacson Requirements, MusicXML 1.0 implemented all “Required” and most “Very Desirable”; so did MEI; others? –Decision came to “MusicXML vs. Everything Else” (Jan. 2004) –Also considered MEI (Perry Roland), GUIDO, Humdrum/kern, NIFF –Did MusicXML 1.0 “have serious enough drawbacks, considering the whole situation and not just the representation itself, to make anything else worth considering?” Most important: software support...other programs and ours (planned an integrated notation engine) Other considerations: complexity, migration paths (future conversion), flexibility/extensibility, etc. –Conclusion: no.

60 14 Feb. 60 Declarative vs. Procedural Representations Representation in cognition: Hofstadter’s example –Declarative: How many people live in Chicago? –Procedural: How many chairs are there in your living room? –Probably mixed: Recall of a melody Principle (and application to programming) –Declarative: information is explicit (programming: what to do) –Procedural: information is implicit (programming: how to do it) –In programming languages, well-known ones are procedural All representations we’ve looked at are declarative –Exception: possible to use CSound procedurally –Very limited exception: repeat signs, D.C., D.S., etc. Procedural representations of music much more specialized –Pla(?), Common Music, Stella, BP1(?)

61 14 Feb. 61 Declarative vs. Procedural Representations and Variations2 Wiggins et al. (1993) give three sorts of tasks: 1. recording: user wants record of musical object, to be retrieved later 2. analysis: user wants analyzed version, not “raw” musical object 3. generation/composition Variations2 concerned most with 1st, less with 2nd, least with 3rd Declarative representations (not procedural) are much more appropriate for 1st and usually 2nd => considered only declarative for Variations2 Cf. Byrd & Isaacson Requirements

62 rev. 15 Feb. 62 Procedural Representation of a Simplified Version of the Mozart Example In C-like syntax int scale[] = { 0, 2, 3, 5, 7, 8, 10 }; procedure PlayTetrachord(float startTime, int startNoteNum, float duration) { float time = startTime; for (int i = 0; i<4; i++) { PlayNote(time, duration, startNoteNum+scale[i]); time += duration; } procedure main() { PlayTetrachord(0, 72, duration,.5); PlayNote(2, 72+scale[4], 1); PlayNote(3, 72+scale[4], 2); PlayTetrachord(4, 53, duration,.5); }

63 2 Mar Music Collections: Available or Not Cf. “Candidate Music IR Test Collections” list – How much music content is available in digital form? –In audio form, lots and lots At cost: iTunes, Rhapsody, Napster, etc. (millions of tracks) “Free”: Classical Music Archives, etc. (tens of 1000s of tracks) –In event (mostly MIDI file) form, far less At cost: ?? “Free”: Classical Music Archives, etc. (tens of 1000s of tracks) –In notation (CMN and tablature) form, similar to event form At cost: Sunhawk, MusicNotes, etc. (tens of 1000s of movements) Free: CCARH (1000s of mvmts), “tab” sites (tens of 1000s) Not even much in notation form that’s not available; why? –No standard encoding (until now) => must usually convert –Much smaller audience for notation than audio

64 3 Mar Encoding & Representation Conversion (1) Converting specific representations should be easier than basic representations –Example: CMN => tablature or vice-versa Converting encodings of the same representation should be easier than specific representations –Example: MusicXML Finale; MP3.wav Generally true, but: –Any encoding of audio to any other is relatively simple –Likewise for time-stamped events –Notation is by far most difficult: with complex music, big problem Examples –Nightingale to MIDI File, back to Nightingale –Finale equivalents with MusicXML and MIDI File

65 rev. 22 Feb. 65 Encoding & Representation Conversion (2) Nightingale to Notelist, back to Nightingale –Loses text font/position, articulation mark and slur position, instrument names, etc. Nightingale to MIDI File, back to Nightingale –Totally loses text, articulation marks and slurs, etc.—but keeps instrument names! –Distorts logical note duration (uses approx. performance duration); loses note spelling, clefs, dynamic marks. Finale equivalents with MusicXML and MIDI File –Finale/MusicXML probably better than Nightingale/Notelist –Finale/MIDI can be better than Nightingale/MIDI—but why? Both representations and programs lose/distort information! Murphy’s Law(?): “Anything that can go wrong, will.”

66 20 Feb. 66 Encoding Conversion Demo (1) Nightingale: Open Webern example

67 20 Feb. 67 Encoding Conversion Demo (2) Nightingale: Open Webern, Save Notelist, Open Notelist

68 20 Feb. 68 Representation Conversion Demo Nightingale: Open Webern, Export MIDI File, Import MIDI File

69 3 Mar Encoding & Representation Conversion Why is information distorted/lost? Two steps in getting from one program to another 1. Export: notation program converts internal form to the one it’s saving –If the format it’s saving can’t represent some information, it can’t put that information into the file; but even if format can, program might not do it correctly—or might not try. 2. Import: notation program converts format it’s opening into its internal form –Even if some kind of information is in the file and program can handle it, it might not do it correctly—or might not try. What information is distorted/lost because of limits of encoding vs. limits of programs? Can tell only by looking at encoding, and may not be easy Example: notation program => MIDI file => same notation program –MIDI files can handle text, but both Nightingale & (default) Finale lose it!

70 6 Mar Converting to Notation from Audio or “Non-music” Audio Music Recognition (AMR) –Monophonic (not too hard) vs. polyphonic (very hard) –Programs available since ca. 2000, but little visible progress –Still subject of much serious research –Not “ready for prime time” for anything but monophonic Optical Music Recognition (OMR) –Landmark report by Selfridge-Field et al (1994) –Programs available since ca. 1994; significant progress –Table of programs available on my Web site –Still subject of some serious research, e.g., Byrd & Schindele –Far more promising than AMR in near future Chris Raphael (1999): “OMR is orders of magnitude easier”

71 8 Mar OMR: The State of the Art (1) Byrd & Schindele (2006), Prospects for Improving OMR with Multiple Recognizers proposes multiple recognizers –Recognition = segmentation + classification –Programs studied: PhotoScore, SharpEye, SmartScore –Hand error count (with “feel”, some automatic) of not much music –Cf. Craig Sapp’s error analysis of Mozart page with SharpEye –Main conclusions MROMR development requires much music => automatic comparison of results Will “always” need access to specialized editing UI to correct errors –Probably not hard to get MROMR output back into SharpEye SharpEye is most accurate, but error rates still too high MROMR is worth trying Evaluation is a major roadblock –Difficult to evaluate: see Droettboom & Fujinaga, etc. –Not even a standard test database(s) –But prospects for improvement are good

72 5 Mar OMR: The State of the Art (2) Practical now for much music, but nowhere near all Bill Clemmons’ 1-to-10 scale of music suitability for OMR (2004) –“SmartScore works amazingly well for some things and so badly with others that creating a file from scratch is less time consuming” –Has students scan standard SATB choral octavo, parts & accompaniment (1), & piano reduction of Parsifal prelude (10) –Choral octavo is so clean, can get accurate file after only a handful of corrections; Parsifal import is essentially unusable Lots of already-scanned images: in Variations2, CD Sheet Music, on the Web

Download ppt "1 Representation of Musical Information Donald Byrd School of Music Indiana University Updated 8 March 2006 Copyright © 2003-06, Donald Byrd."

Similar presentations

Ads by Google