The introduction of techniques for compressing and streaming of audio data in recent time has significantly changed the way music is consumed and archived. Personal music collections may nowadays comprise ten-thousands of music titles. Even mobile devices are able to store some thousands of songs. But these magnitudes are nothing compared to the vast amount of music data digitally available on the Internet.
Several features have been proposed to describe music on a low, signal-processing based level. Some of these have already been incorporated as description schemes for annotation of audio data into the MPEG-7 standard. However, in contrast to text documents that can be sufficiently well represented by statistics about the contained terms, audio data seems far too complex to be described by statistics on signals alone. Additionally, such a representation does only allow query-by-example.
Learning a mapping between audio features and contextual interpretations would be the key to solve this problem, enabling a user to formulate a query in a way that is close to his way of describing music contents, e.g. using natural language or at least combinations of terms. For this task, models describing how music is perceived are needed, as well as methods for the extraction, analysis and representation of linguistic descriptions of music. On the other hand, more sophisticated audio features and analysis of the music structure can narrow the semantic gap. But even if a mapping can be found, it cannot be considered as universally valid. It will rather be biased depending on the user’s preferences, making it necessary to think about personalization at some point as well.
Previous LSAS workshops managed to gather a multidisciplinary group of researchers, and included presentations covering signal-processing, social, musicological and usability aspects of semantic audio analysis. Our purpose is to continue fostering this line of research in a rapidly expanding and promising field.