USING NATURAL LANGUAGE PROCESSING TECHNIQUES FOR MUSICAL PARSING

paper
Authorship
  1. 1. Rens Bod

    School of Computing - University of Amsterdam

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. INTRODUCTION

We investigate whether probabilistic parsing techniques from Natural
Language Processing (NLP) can be used for musical parsing. As in natural
language, a piece of music can be segmented into groups or phrases which
can be conveniently represented by a phrase-structure tree
(Longuet-Higgins 1976; Lerdahl & Jackendoff 1983; Desain & Honing 1999).
One of the main challenges for musical parsers is the problem of
ambiguity: several different phrase structures may be compatible with a
given musical sequence while a listener typically hears only one
structure. In this paper we will consider three parsing techniques from
the NLP literature that use a probabilistic heuristic to solve ambiguity.
We develop a new parser which combines two of these techniques, and which
can correctly predict 86.5% of the phrases in the Essen Folksong
Collection (Schaffrath 1995).

Probabilistic parsing techniques have recently gained a vast interest in
NLP (see Bod 1998; Manning & Schuetze 1999). Instead of using a formal
grammar, a probabilistic parser learns how to parse new input by
generalizing from examples of previously annotated data. In case of
ambiguity, a probabilistic parser computes the most probable phrase
structure for a given input. With the current availability of large
annotated resources in computer-assisted musicology, we may wonder whether
we can also apply such probabilistic techniques to musical parsing.

In section 2 we will test three such techniques on the Essen Folksong
Collection which consists of 6,251 (mostly) European folksongs that are
encoded with the Essen Associative Code (ESAC). ESAC represents pitches by
scale degree numbers replacing the movable syllables "do", "re", "mi",
etc. Thus 1 corresponds to "do", 2 corresponds to "re", etc. Chromatic
alterations are represented by adding either a "#" or "b" after the
number. The plus ("+") and minus ("-") signs are added before the number
if a note falls resp. above or below the principle octave. Duration is
represented by adding a period or (one or more) underscores after the
number. A period increases duration by 50% and an underscore increases
duration by 100%. A pause is represented by 0, possibly followed by
duration indicators. ESAC uses hard returns [HRt] to indicate phrase
boundaries. To make a connection with NLP annotations, we converted ESAC's
phrase boundary indications to bracketing representations, where "("
indicates the start of a phrase, and ")" the end of a phrase. For more
details on ESAC, see Selfridge-Field (1995).

The following gives an example of the ESAC encoding of the German children
song "Ziehe durch durch die goldne Bruecke", together with its phrase
structure (we leave out barlines and meter signature that will not be used
by our parsers):

(1)

(553_553_)(55654_2)(24422442)(244543_1)(55_355_3)(55565432_)

The only other information that we (automatically) added to the ESAC
encodings are label annotations for each phrase (the label P) and for the
whole song (the label S) such that we obtain a conventional parse tree for
each song. Thus, the structure in (1) becomes:

(2)

S( P(553_553_) P(55654_2) P(24422442) P(244543_1) P(55_355_3) P(55565432_))

2. THE PROBABILISTIC PARSING TECHNIQUES

We have considered three techniques from the NLP literature in building a
parser for the Essen Folksong collection: the Treebank grammar technique
(Charniak 1996; Bod 1993), the Markov grammar technique (Collins 1999),
and the Data-Oriented Parsing (DOP) technique (Bod 1998). We will see
that the best results are obtained by a parser which combines the
Markov grammar technique with the DOP technique. We should keep in mind
that each probabilistic parser is trained on a randomly selected training
set from the full data, while the remaining data is used to test the parser.

2.1 The Treebank Grammar Technique

The Treebank grammar technique simply reads the (context-free) rules off
the phrase structures in the training set. For example, the following
rules can be read from the structure in (2):

(3)

S -> PPPPPP
P -> 553_553_
P -> 55654_2
P -> 24422442
P -> 244543_1
P -> 55_355_3
P -> 55565432_

Next, each rule is assigned a probability by observing how often it is
used in the training data. That is, the probability of a rule R is
computed by dividing the number of occurrences of R by the total number of
occurrences of rules that expand the same nonterminal as R. The
probability of a phrase structure tree for a musical sequence (i.e. a
folksong) is computed by the product of the probabilities of the rules it
consists of. We used a standard best first parsing algorithm to compute
the most probable phrase structure for each folksong (see Charniak 1993:
chapter 5). Although we may already foresee that a Treebank grammar is
doomed to misparse test sequences if it is does not find the correct rule
in the training set, it will serve as the basis for our more sophisticated
parsing techniques.
We randomly divided the Essen Folksong Collection into a training
set of 5,251 folksongs and a test set of 1,000 folksongs. The training set
folksongs were used to extract the Treebank grammar rules and their
probabilities. The resulting parser correctly predicted 28.7% of the
phrases for the test set folksongs.

2.2 The Markov Grammar Technique

An alternative to Treebank grammars which has become quite popular in NLP
is known as a Markov grammar (Collins 1999). While a Treebank grammar can
only assign probabilities to rules that have literally been seen in the
training set, a Markov grammar can assign a probability to any possible
rule. This is accomplished by decomposing a rule and its probability by a
Markov process (see Collins 1999: 44-48). For example, a third-order
Markov process estimates the probability p of a rule P -> 12345 by:
p(12345 | P) = p(1 | P) x p(2 | P, 1) x p(3 | P, 1, 2) x p(4 | P, 2, 3) x
p(5 | P, 3, 4) x p(END | P, 4, 5). The conditional probability p(END | P,
4, 5) encodes the probability that a P-rule ends after pitches 4 and 5.
Thus even if the rule P -> 12345 does not appear literally in the training
set, we can still estimate its probability by using a certain finite
history of pitches. The extension to a larger history follows by obvious
generalization of the above example. We used the smoothing technique given
in Charniak (1993: 39-43) to back off to shorter histories if zero counts
were obtained for certain histories. The probability of a phrase structure
is computed by the product of the probabilities of the rules that partake
in the structure, just as with Treebank grammars.
Using the same training/test set division as in 2.1, a Markov
grammar technique based on a history of four pitches correctly predicted
77.1% of the phrases in the Essen Folksong test set, which is a huge
improvement over the Treebank grammar technique. (Experiments with higher
or lower order Markov models resulted in an accuracy decrease.)

2.3 Extending the Markov Grammar with the DOP Technique

Although the Markov grammar obtained better results than the Treebank
grammar, it does not take into account global context in computing the
probability of a structure. In order to include global context, we
conditioned over the S-rule higher in the structure in computing the
probability of a P-rule. This approach corresponds to the Data-Oriented
Parsing (DOP) technique (Bod 1998) which can condition over any higher or
lower rule in a tree, and which has recently been integrated with the
Markov grammar technique (Sima'an 2000). As an example take again the
P-rule P -> 12345 and a higher S-rule such as S -> PPPP; then a third
order DOP Markov model computes the probability of this rule as p(1 | S ->
PPPP) x p(2 | S -> PPPP, 1) x p(3 | S -> PPPP, 1, 2) x p(4 | S -> PPPP, 2,
3) x p(5 | S -> PPPP, 3, 4) x p(END | S -> PPPP, 4, 5). The probability of
a phrase structure is again computed by the product of the probabilities
of the rules that partake in the structure.
This DOP Markov parser correctly predicted 86.5% of the phrases in
the Essen Folksong test set (using a history of four pitches and the same
training/test set division as before), which is an improvement of 9.4%
over the Markov parser. We also checked our results on 10 random divisions
of training/test set data. The DOP Markov parser obtained an average
phrase accuracy of 86.2% with a standard deviation of 1.8%, while the
Markov parser obtained an average of 76.9% with a standard deviation of
1.3%. These differences were statistically significant according to paired
t-testing.

3. CONCLUSION

We have given an example-based statistical parsing technique for western
folksongs which obtains 86.5% on a randomly held-out test set from the
Essen Folksong Collection. While the Essen Folksong collection has
previously been used to investigate melodic arch contours (Huron 1996),
this work presents the first parser tested on this collection, to the best
of our knowledge. Our parser may be used as a baseline for Essen folksongs
against which other (e.g. rule-based) approaches can be compared. Our
parser may also be used to speed up the time-consuming annotation of newly
collected folksongs, thereby contributing to the creation of larger
musical databases. As future research, we want to investigate whether our
results can be improved by incorporating musical knowledge into the
parser.

REFERENCES

R. Bod, 1993. Using an Annotated Corpus as a Stochastic Grammar,
Proceedings EACL'93, Utrecht.
R. Bod, 1998. Beyond Grammar: An Experience-Based Theory of Language. CSLI
Publications, Stanford.
E. Charniak, 1993. Statistical Language Learning, Cambridge, The MIT
Press.
E. Charniak, 1996. Tree-bank Grammars, Proceedings AAAI-96, Menlo Park.
M. Collins, 1999. Head-Driven Statistical Models for Natural Language
Parsing, PhD-thesis, University of Pennsylvania, PA.
P. Desain and H. Honing, 1999. Computational Model of Beat Induction: The
Rule-Based Approach. Journal of New Music Research, 1999.
D. Huron, 1996. The Melodic Arch in Western Folksongs. Computing in
Musicology, 10, 2-23.
F. Lerdahl and R. Jackendoff, 1983. A Generative Theory of Tonal Music.
Cambridge, The MIT Press.
H. Longuet-Higgins, 1976. Perception of Melodies. Nature 263, October 21,
646-653.
C. Manning and H. Schuetze, 1999. Foundations of Statistical Natural
Language Processing. Cambridge, The MIT Press.
H. Schaffrath, 1995. The Essen Folksong Collection in the Humdrum Kern
Format. D. Huron (ed.). Menlo Park, CA: Center for Computer Assisted
Research in the Humanities.
E. Selfridge-Field, 1995. The Essen Musical Data Package. Menlo Park, CA:
Center for Computer Assisted Research in the Humanities.
K. Sima'an, 2000. Tree-gram Parsing: Lexical Dependencies and Structural
Relations, Proceedings ACL'2000, Hong Kong, China.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2001

Hosted at New York University

New York, NY, United States

July 13, 2001 - July 16, 2001

94 works by 167 authors indexed

Series: ACH/ICCH (21), ALLC/EADH (28), ACH/ALLC (13)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None