Automating music stimuli creation and analyses: A music synthesis algorithm for producing ground truth data
Name:Maya Flannery
School/Affiliation:McMaster University
Co-Authors:Lauren Fink
Virtual or In-person:In-person
Abstract:
Researchers in both academia and industry create predictive computer models to automatically “tag” musical features. Such algorithmic processes produce objective descriptions of very large amounts of music, allowing for detailed analyses of music listening behaviour. However, the quality of such analyses is affected by the type and size of ground truth datasets used when training predictive models. Data must be numerous and well-labelled to produce accurate and generalizable predictions. This work outlines an algorithm that automatically synthesizes labelled musical excerpts and trains corresponding predictive models. The algorithm steps are as follows: 1) Parameterization, where features of music, that can be manipulated along a continuum (e.g., tempo), are defined as input to the algorithm; 2) Permutation and Combination, where sampled parameters in Step 1 are combined to produce complete symbolic representations of musical excerpts; 3) Synthesizing, where the excerpts in Step 2 are rendered as audio; 4) Feature Extraction, where music information retrieval tools extract low-level features of the audio; 5) Identification, where extracted features are used to train machine models to predict relevant labels; and lastly, 6) Verification, where human feedback is integrated into model training to evaluate how well models accurately follow human perception. This algorithm allows for the creation of large and diverse ground truth datasets, increasing predictive models' generalizability to real-world audio. These datasets and corresponding models can be applied widely, in analyses of existing recorded music and in the generation of future experimental stimuli, as standardized methods to improve the consistency and interpretation of results.