Picture to Score: Driving Vector Animations with Music in the XML Ecosystem

paper, specified "short paper"
  1. 1. Stephen Ramsay

    University of Nebraska–Lincoln

  2. 2. Brian Pytlik-Zillig

    University of Nebraska–Lincoln

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Audio can be united with video using a number of different techniques. Among the most common are “score to picture” and procedural generation.
“Score to picture” is a feature of most modern DAWs (Digital Audio Workstations), such as
Pro Tools, Logic, Cubase, and
REAPER. A composer plays forward the video—usually in the very advanced stages of post production—and sets cues within the software around which a musical soundtrack can be structured. Thus a composer might set a cue to indicate suspense leading up to a particular moment, or the beginning and end of a romantic scene that should be accompanied with incidental music.

Procedural generation goes the other way. Here, a composer creates music—often in a sophisticated audio synthesis environment like
Max/MSP, Pd, Impromptu, or
SuperCollider—and uses properties of the audio signal or of the overall program flow to cue events in a video presentation. Since these are full-fledged (if visual) programming languages, driving video with them often means combining the complexity of software engineering with the complexity of handling audio and video signals.

In this short presentation, we describe our experiments with a method of uniting audio and video that lies somewhere between these two approaches. Unlike the practice associated with contemporary filmmaking, our method begins with a musical score and uses events indicated within it as the set of cues for an animation. Rather than use procedural programming or digital signal processing to inform the creation of cues, we use the ordinary conventions of Western musical notation.
To accomplish this, we first represent the score in MusicXML. This might seem an odd choice, given that the MIDI (Musical Instrument Digital Interface) standard was designed precisely to indicate performance events over time. MusicXML, by contrast, was primarily conceived as a way of providing interoperability among software for rendering musical scores as printable objects. Yet MusicXML contains, as one explanation of the standard puts it, a “MIDI-compatible part” concerned with how the music should sound (as opposed to how it should look) (MakeMusic, 2016).
Our system exploits these MIDI-compatible elements—along with several other features of the markup—in order to indicate where a change might occur in an animation. In this way, we are able to use such things as rehearsal marks (sectional markings intended to make it easier for conductors to refer to particular passages), tempo markings, indications of changes in volume (amplitude), emphases, articulations, orchestration, and other aspects of musical notation as cues. And since everything about the duration of a piece and the relationship of the cues within the piece are discernable from the MusicXML file alone, we are able to produce SVG animations that are perfectly in sync with the music from which they are “generated”. In the simplest case, this might involve simple changes in color or the movement of shapes, but the system is fully capable of quite advanced 2D animation.
From an artistic standpoint, our way of doing things hearkens back to the earliest days of animation when popular short films were synced to the music of Wagner, Rossini, and Dukas. In this sense, ours is perhaps a new way of doing an old-fashioned thing. But unlike earlier eras, artists today have access to very sophisticated tools for producing digital art. Digital artists regularly use vector graphics programs (like Adobe
Illustrator and the free
Inkscape) that can generate SVG, and scoring programs (like
Finale, Sibelius, and the free
MuseScore) that can generate MusicXML. What is missing, we think, is a robust way to bridge these two technologies.

Our system provides a very sophisticated bridge in the form of
Indigo—an SVG animation system developed at CDRH that we have recently re-engineered along the lines we illustrate above. In this presentation, we briefly explain how Indigo works and demonstrate how it can facilitate interoperability between SVG and MusicXML (perhaps with the world premier of an original animated score in honor of this year's conference theme).

Our presentation requires only the most rudimentary knowledge of musical notation and SVG.


MakeMusic (2016). MusicXML: Tutorial. http://www.musicxml.com/tutorial/the-midi-compatible-part/ (accessed 27 Oct 2016).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2016
"Digital Identities: the Past and the Future"

Hosted at Jagiellonian University, Pedagogical University of Krakow

Kraków, Poland

July 11, 2016 - July 16, 2016

454 works by 1072 authors indexed

Conference website: https://dh2016.adho.org/

Series: ADHO (11)

Organizers: ADHO