Music acoustic features: Do machine predictions correspond to human judgments?
Presenter Name:Maya Flannery
School/Affiliation:McMaster University
Co-Authors:Matthew Woolhouse
Abstract:
Researchers’ methods of music description and classification have long been criticized. Musical genre, for example, maintains little consistency between category definitions and has consequently been called intrinsically ambigu- ous. Recent methods have approached music classification differently: in terms of the structural and expressive musical cues used by composers and performers. This approach allows for more consistently defined music char- acteristics and, as a result, for stimuli to be reliably produced and manip- ulated in experiments. The present study investigated the effectiveness and potential benefits of such an approach. First, a number of machine learning algorithms were trained to predict levels of six musical features (i.e., articu- lation, dynamic, register, tempo, texture, and timbre) from the output of a music information retrieval tool named Essentia. We refer to these features as Music Acoustic Features (MAFs). Optimal algorithms were then used to predict levels of MAFs in 44 real-world musical excerpts. Finally, in a lis- tening task, participants (N = 43) provided ratings for the same six MAFs and excerpts. The results of each method, machine predictions and human judgments, were then compared for their consistency. Significant correla- tions were found between the levels of MAFs predicted by both methods. The procedure outlined here showed that MAFs can be reliably produced and manipulated, effectively measured within audio stimuli, and are read- ily perceived by listeners. MAFs can thus be effectively applied in music research as a reliable way to develop experiments with well-defined musical stimuli. Furthermore, since MAFs can be identified within existing audio, MAFs can enrich previous research by clarifying ambiguous results with clear and consistent descriptions of music.