ICME 2006 Toronto

T2.3: Music Information Retrieval

Sunday, July 9, 2006 (13:30 - 16:30, Varley)

Instructor

Professor George Tzanetakis, University of Victoria, Canada

Abstract

Music has always been profoundly transformed by advances in technology. Examples of such transformations include the use of music notation, the invention of recording and more recently digital music storage and distribution. Portable music players such as Apple’s iPods can today store thousands of songs and online music sales have been steadily increasing. It is likely that in the near future anyone will be able to access digitally all of recorded music in human history. In order to efficiently interact with these large collections of music it is necessary to develop tools that have some understanding of the actual musical content. Music Information Retrieval (MIR) is an emerging research discipline that deals with all aspects of organizing and extracting information from music. Interest in MIR has been steadily increasing as can be evidenced by the numbers of MIR-related papers in ICME, ICASSP and other conferences as well as the sixth year existence of ISMIR which is a conference solely focused on MIR. In this tutorial, an overview of the current state of the art in MIR with special emphasis on the use of signal processing and machine learning techniques will be provided. Many of the techniques used in MIR have their roots in more traditional areas such as Speech Recognition, Psycoacoustics and Audio Compression. However music has several unique characteristics which have led researchers to develop music-specific signal processing methods.

Course Outline

The tutorial will be roughly divided between the following topics. The provided times are approximate and connections will be made between the ideas in each topic.

I. History and Overview of Music, Information Retrieval and Music Information Retrieval
A quick introduction to the fundamentals of music, information retrieval (MIR) followed by a general overview of the history and current state of the art in MIR both in academia and industry.

II. Audio Feature Extraction
Audio feature extraction forms the basis of most algorithms that extract content information from music signals in audio format. Specifics topics covered include: Short Time Fourier Transform, Wavelets, Perceptually-motivated filterbanks, Linear Prediction, Audio compression with specific emphasis on how they are applied to the processing of music signals.

III. Content-based Similarity Retrieval, Segmentation and Classification
Content-based similarity retrieval enables music to be searched based on how it sounds rather than metadata such as the artist or the album. The automatic classification of music into genres, styles and moods will also be covered. Segmentation is the process of locating changes of “texture” in music. Different approaches to segmentation such as abrupt-change detection, hidden-markov modeling and use of the similarity matrix will be covered.

IV. Music-specific Processing
Although initial work in Music Information Retrieval (MIR) mainly used ideas and features from Speech Analysis, a recent trend has been to develop music-specific feature extraction and analysis algorithms. Examples include: tempo induction, rhythm representations, pitch-content representations, structural analysis and polyphonic score and audio alignment.

V. Query-by-humming
In query-by-humming the user sings or hums a melody to the computer which then searches a database of music material for finding the corresponding score or audio recording. This challenging problem combines robust pitch extraction and segmentation, complex approximate string matching techniques and efficient database searching.

VI. Fingerprinting & Watermarking
Audio fingerprinting is the process of extracting a digital signature based on audio content that uniquely characterizes a specific piece of music independently of its playback medium and audio compression. It can be used for copyright protection, linking songs with metadata and identifying unknown pieces of music. Audio watermarking is the process of embedding additional information into a piece of music by modifying the actual audio signal in such a way that changes are not perceived by our ears. Techniques and audio features that have been proposed for fingerprinting and watermarking will be described.

VII. Content-Aware User Interfaces
Various interesting user interfaces have been proposed for visualizing, browsing and interacting with collections of audio and music signals. The most interesting ones use directly the results of content-analysis to inform their appearance and interaction with the user.

VIII. Challenges and connections to other research areas
MIR is a new emerging research area and there are many challenges for the future. One of the most important is the problem of identifying and characterizing multiple sound sources in a mixture. Robust speech recognition and auditory scene analysis are examples of areas that would benefit from progress in this problem.

Audio-Visual equipment and materials will be provided to attendees including a laptop projector and an audio system for playing audio examples. Participants will be provided with handouts of the tutorial slides, pointers to online resources and software, an annotated bibliography of approximately 200 papers, and an MIR overview paper covering the topics of the tutorial (approximately 20-30 pages).

Speaker Biography

George Tzanetakis is an assistant Professor of Computer Science at the University of Victoria (also cross-listed in Music). He received his PhD degree in Computer Science from Princeton University in May 2002 and was a Post-Doctoral Fellow at Carnegie Mellon University working on query-by-humming systems with Prof. Dannenberg and on video and audio retrieval with the Informedia group. In addition he was chief designer of the audio fingerprinting technology of Moodlogic Inc., and developed a real-time music speech classification system for All Music Publishing, The Netherlands. His research deals with all stages of audio content analysis such as feature extraction, segmentation, classification with specific focus on Music Information Retrieval (MIR). His work on musical genre classification is frequently cited and received the IEEE Signal Processing Society Young Author Award in 2004. He has presented tutorials on MIR and audio feature extraction at several international conferences. He is associate editor of Computer Music Journal and of the IEEE Transactions on Speech and Audio Processing. He is also an active musician and has studied saxophone performance, music theory and composition.


©2008 Conference Management Services, Inc. -||- email: webmaster@icme2006.org -||- Last updated Friday, June 23, 2006