VII International Symposium on Systematic and Comparative Musicology III International Conference on Cognitive Musicology2001 Jyväskylä, Finland / Conference Program, Proceedings & List of Participants
Petri Toiviainen & Tuomas EerolaUniversity of JyväskyläFinland
A common problem in comparative musicology and ethnomusicology is that large collectionsof music are difficult to classify and visualize. Therefore, a tool which could be applied toeither acoustic signals or symbolic representations would be useful. The choice of the featuresthat are extracted from a collection of music and subsequently used by the tool should bepsychologically relevant for the task. This study presents a simple data-mining tool for data-bases that use a symbolic representation of melodic information. The statistical distributionsof melodic events are considered as a suitable features for several reasons. Firstly, thedistributions are relatively straightforward to analyze computationally. Secondly, it has beenshown that listeners are sensitive to pitch distributional information (Kessler et al. 1984;Oram & Cuddy, 1995; Krumhansl et al, 1999) and they can be used to predict similarityrelationships between melodies (Eerola et al, 2001). It is also noteworthy that ethno-musicology has a long tradition in using statistical information to classify music (Freemanand Merriam, 1956; Lomax, 1968) and that there has been more recent attempts to classifymusical styles according to their statistical features (Järvinen, Toiviainen, & Louhivuori,1999).
Feature extraction and visualization (SOM)
The method is based on first extracting the common statistical measures of music. Theseconsist of the distributions of pitches, intervals and durations as well as the distributions of pitch, interval, and duration transitions (Figure 1). It is assumed that all melodies aretransposed to a common key before the statistical features are extracted from each melodyseparately.
VII International Symposium on Systematic and Comparative Musicology III International Conference on Cognitive Musicology2001 Jyväskylä, Finland / Conference Program, Proceedings & List of Participants
Figure 1. The distribution of pitches, intervals, and durations extracted from the melody
Och riddaren hangångar sig till hafsjöstranden ned 
The representations of the musical materials obtained with the statistical analysis constitute aset of vectors with a large number of components. Because of the high dimensionality of thedata, the mutual relations of the data items can be difficult to determine. Therefore, we usedthe self-organizing map (SOM) for the visualization of these mutual relations (Kohonen,1997). The SOM is an artificial neural network that simulates the process of self-organizationin the central nervous system with a simple, yet effective, numerical algorithm. It consists of atwo-dimensional planar array of simple processing units, each of which is associated with areference vector. The dimensionality of these reference vectors is equal to that of the vectorsused as input. After being trained with the input vectors, the SOM provides a non-lineartopographic mapping from the multidimensional input space to the two-dimensional array. Inother words, each input vector is mapped to some unit in the array, and vectors that are closeto each other in the input space are mapped near each other. In addition, the SOM identifiesthe most salient features of the input set by detecting in each part of the input vectordistribution the dimensions with the highest variance. Figure 2 depicts schematically theprinciples of the mapping provided by the SOM.
Figure 2. A schematic presentation of the self-organizing map. Similar vectors,such as A and B, are mapped near each other. Dissimilar vectors, such as B andC, are mapped far away from each other.
After the extraction of statistical features, each feature is for the trained of a SOM. The mapsthus obtained can be used separately for visualization. Furthermore, a Super SOM can be
Duration distribution
1. 1 2. 2 4. 3 4 6 8. 8 16. 12 16 24 32
Note distribution
C C# D D# E F F# G G# A A# B
Interval distribution
-P5 -d5 -P4 -M3 -m3 -M2 -m2 P1 +m2 +M2 +m3 +M3 +P4 +d5 +P5
VII International Symposium on Systematic and Comparative Musicology III International Conference on Cognitive Musicology2001 Jyväskylä, Finland / Conference Program, Proceedings & List of Participants
43trained with the vectors consisting of the outputs of these SOMs (see Figure 3). This yields atwo-dimensional Supermap on which melodies with similar features are proximally located.In other words, melodies that display similar statistical properties in terms of pitch, intervaland duration distributions and their transitions are located at adjacent positions on theSupermap. Increased perceptual validity is obtained if a weighting scheme corresponding toempirical findings of the importance of each feature for listeners' similarity formations is usedin the teaching of the Super SOM.
feature extractionSuper SOM
Figure 3. Overview of the method.
The method applied to a large folk song collection
The method was applied to a corpus of melodies that consists of 6,252 folk songs from theEssen collection (Schaffrath, 1995) and 2,226 Chinese folk songs. The songs in the Essencollection are mainly from Germanic regions and the Chinese songs are from the northern partand border region of Ningxia and Shanxi. One of the advantages of using this particularcorpus was that all songs are encoded in symbolic format (
 Humdrum **kern
format, Huron,1994) and an electronic version of the database is published and distributed by the Center forComputer Assisted Research in the Humanities (CCARH). Another benefit of this corpus isthat all transcriptions include the definition of the genre, geographical region, rhythm type,key, and a free description of the content and context in the form keywords. This additionalinformation can be used as search criteria in the visualization tool and thus extend the utilityof the corpus.The demonstration of the method is divided into three tools. Tool 1 provides a coarseoverview of features by displaying the organization of each map together with the entropy of each feature. Entropy is a measure of complexity that has been used previously indiscriminating musical styles (Knopoff, & Hutchinson, 1983; Snyder, 1990). This tool showsthe songs with similar features in proximate areas and can thus be used to investigate thesimilarity relationships between the songs. It also enables the playback of any chosen song oneach SOM. A demonstration of this tool is available on the WWW (
).Tool 2 provides a visualization of the statistical features as represented by the SOMs. Tool 3combines keyword search with the similarity relations of the features. This tool can be used tofind stylistic clusters or specific locations of the songs containing any selected criteria such as"ballads", 3/4 time-signatures, "Tirol" or any combination of these. This facilitatesformulating and answering musically and culturally interesting questions from the corpus.
VII International Symposium on Systematic and Comparative Musicology III International Conference on Cognitive Musicology2001 Jyväskylä, Finland / Conference Program, Proceedings & List of Participants
Figure 3. An example of the Tool 3 displaying the mapping locations of the melodies of the Han (AChinese ethnic group).
Conclusions and future directions
A method for the analysis of large corpus of music and specific practical tools for musicaldata mining were presented. The method was based on the statistical distribution of symbolicevents and subsequent investigation of similarity relationships. Self-organizing neuralnetwork (SOM) was used to visualize the feature vectors. Examples included keyword-basedinvestigation of musical features as well as separate topological maps of all the songs for eachextracted feature. One possible application of the method is to use it to find stylisticdisparities or similarities between materials from distant cultural regions and employ thisinformation when creating hypothesis for cross-cultural comparisons. However, there iscurrently a lot of room for the improvement of the method itself. For example, taking intoaccount the overall melodic contour, hierarchical reduction of the melodic surface, perceptualweighting of the events according to the metrical position and salience and phrasing wouldprovide more sophistication and increase the perceptual relevance of the method. Furtherresearch would be needed to assess the applicability of the present method to audio-basedmaterial.
Eerola, T., Järvinen, T., Louhivuori, J. & Toiviainen, P. (2001). Statistical features andperceived similarity of folk melodies.
 Music Perception, 18 
, 275-296.Freeman, L. C. and Merriam, A. P. (1956). Statistical classification in anthropology: Anapplication to ethnomusicology.
 American Anthropologist, 58 
, 464-472.Huron, D. (1994).
UNIX tools for musical research: The humdrum toolkit reference manual
.Stanford, CA: Center for Computer Assisted Research in Humanities.Järvinen, T., Toiviainen, P., & Louhivuori, J. (1999). Classification and categorization of musical styles with statistical analysis and self-organizing maps. In A. Patrizio, G. A.Wiggins & H. Pain (Eds.)
Proceedings of the AISB'99 Symposium on Musical Creativity
(pp. 54-57). Edinburgh: The Society for the Study of Artificial Intelligence andSimulation of Behaviour.Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music in Bali and the west.
 Music Perception, 2
, 131-65.Knopoff, L. & Hutchinson, W. (1983). Entropy as a measure of style: The influence of samplelength.
 Journal of Music Theory
, 27, 75-97.Kohonen, T. (1997). Self-organizing maps (2nd ed.). Berlin: Springer.
VII International Symposium on Systematic and Comparative Musicology III International Conference on Cognitive Musicology2001 Jyväskylä, Finland / Conference Program, Proceedings & List of Participants
45Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Järvinen, T., & Eerola, T. (1999). Melodicexpectation in Finnish folk hymns: Convergence of statistical, behavioral, andcomputational approaches.
 Music Perception, 17(2)
, 151-196.Lomax, A. (1968).
Folk song style and culture
. Washington, D.C.: American Association forthe Advancement of Science.Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributionalinformation in melodic sequences.
Psychological Research , 57 
, 103-118.Schaffrath, H. (1995).
The Essen Folksong Collection in Kern Format 
. [computer database].D. Huron (ed.). Menlo Park, CA: Center for Computer Assisted Research in theHumanities, 1995.Snyder, J. L. (1990). Entropy as a measure of musical style: The influence of a prioriassumptions.
 Music Theory Spectrum, 12
, 121-160.
of 5