Catálogo de publicaciones - libros

Compartir en
redes sociales


Information Retrieval for Music and Motion

Meinard Müller

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Theory of Computation; Computer Applications; Information Storage and Retrieval; Multimedia Information Systems; Computer Graphics

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74047-6

ISBN electrónico

978-3-540-74048-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer Berlin Heidelberg 2007

Tabla de contenidos

Introduction

In this chapter, we provide motivating and domain-specific introductions of the information retrieval problems raised in this book. Sect. 1.1 covers the music and Sect. 1.2, the motion domain. These two sections also include an outline of the two parts, provide a summary of all chapters, and discuss general literature relevant to music information retrieval and motion retrieval, respectively. Finally, in Sect. 1.3, we reveal the conceptual relations between the two parts. In particular, we point out the general concepts for content-based information retrieval, which apply to both music and motion domains, and even beyond.

Palabras clave: Dynamic Time Warping; Audio Feature; Inverted List; Motion Capture Data; Music Information Retrieval.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 1-13

Fundamentals on Music and Audio Data

In the first part of this monograph, we discuss content-based analysis and retrieval techniques for music and audio data. To account for the interdisciplinary character of this research field, we start in this chapter with some fundamentals on music representations and digital signal processing. In particular, we summarize basic facts on the score, MIDI, and audio format (Sect. 2.1). We then review various forms of the Fourier transform (Sect. 2.2) and give a short account of digital convolution filters (Sect. 2.3). Doing so, we hope to refine and sharpen the understanding of the required basic signal transforms. This will be essential for the design as well as for the proper interpretation of musically relevant audio features, see Chap. 3.

Palabras clave: Discrete Fourier Transform; Magnitude Response; Audio Data; Music Representation; Musical Tone.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 17-50

Pitch- and Chroma-Based Audio Features

Automatic music processing poses a number of challenging questions because of the complexity and diversity of music data. As discussed in Sect. 2.1, one generally has to account for various aspects such as the data format (e.g., score, MIDI, audio), the instrumentation (e.g., orchestra, piano, drums, voice), and many other parameters such as articulation, dynamics, or tempo. To make music data comparable and algorithmically accessible, the first step in all music processing tasks is to extract suitable features that capture relevant key aspects while suppressing irrelevant details or variations. Here, the notion of similarity is of crucial importance in the design of audio features. In some applications and particularly in the case in music retrieval, one may be interested in characterizing an audio recording irrespective of certain details concerning the interpretation or instrumentation. Conversely, other applications may be concerned with measuring just the niceties that relate to a musician’s individual articulation or emotional expressiveness.

Palabras clave: Audio Signal; Emotional Expressiveness; Short Time Fourier Transform; Audio Feature; Musical Note.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 51-67

Dynamic Time Warping

Dynamic time warping (DTW) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions (Fig. 4.1). Intuitively, the sequences are warped in a nonlinear fashion to match each other. Originally, DTW has been used to compare different speech patterns in automatic speech recognition, see [170]. In fields such as data mining and information retrieval, DTW has been successfully applied to automatically cope with time deformations and different speeds associated with time-dependent data. In this chapter, we introduce and discuss the main ideas of classical DTW (Sect. 4.1) and summarize several modifications concerning local as well as global parameters (Sect. 4.2). To speed up classical DTW, we describe in Sect. 4.3 a general multiscale DTW approach. In Sect. 4.4, we show how DTW can be employed to identify all subsequence within a long data stream that are similar to a given query sequence (Sect. 4.4). A discussion of related alignment techniques and references to the literature can be found in Sect. 4.5.

Palabras clave: Automatic Speech Recognition; Edit Distance; Dynamic Time Warping; Cost Matrix; Constraint Region.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 69-84

Music Synchronization

Modern digital music libraries contain textual, visual, and audio data. Recall that musical information is represented in diverse data formats which, depending upon the respective application, differ fundamentally in their structure and content. In this chapter, we introduce various synchronization tasks to automatically link different data streams given various formats (score, MIDI, audio) that represent the same piece of music (Sect. 5.1). Particularly, two different synchronization procedures are described in detail. First, we present an efficient and robust multiscale DTW approach for time-aligning two different CD recordings of the same piece (Sect. 5.2). Using chroma-based audio features, our algorithm yields good synchronization results for harmony-based music at a reasonable resolution level that is sufficient in view of music retrieval and navigation applications. Second, we discuss an algorithm for score- audio synchronization, which aligns the musical onset times given by a score with their physical occurrences a CD recording of the same piece (Sect. 5.3). Using semantically meaningful onset features, this algorithm works particularly well for piano music and yields alignments at a high temporal resolution. In Sect. 5.4, we describe possible research directions, give further references to the literature, and discuss some problems related to music synchronization. The first three sections of this chapter closely follow [5], [142], and [141], respectively.

Palabras clave: Dynamic Time Warp; Audio Feature; Synchronization Algorithm; Synchronization Task; Note Onset.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 85-108

Audio Matching

In the context of music retrieval, the query-by-example paradigm has attracted a large amount of attention: given a query in the form of a music excerpt, the task is to automatically retrieve all excerpts from the database containing parts or aspects which are somehow similar to the query. This problem is particularly difficult for digital waveform-based audio data such as CD recordings. Because of the complexity of such data, the notion of similarity used to compare different audio clips is a delicate issue and largely depends on the respective application as well as on the user requirements. In this chapter, we consider the problem of audio matching . Here the goal is to retrieve all audio clips from the database that in some sense represent the same musical content as the query clip. This is typically the case when the same piece of music is available in several interpretations and arrangements. For example, given a 20-s excerpt of Bernstein’s interpretation of the theme of Beethoven’s Fifth Symphony, the goal is to find all other corresponding audio clips in the database; this includes the repetition in the exposition or in the recapitulation within the same interpretation as well as the corresponding excerpts in all recordings of the same piece conducted, e.g., by Karajan or Sawallisch. Even more challenging is to also include arrangements such as Liszt’s piano transcription of Beethoven’s Fifth or a synthesized version of a corresponding MIDI file. Obviously, the degree of difficulty increases with the degree of variations one wants to permit in the audio matching.

Palabras clave: Quantization Function; Inverted List; Audio Clip; Chroma Index; Query Length.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 109-139

Audio Structure Analysis

The alignments and crosslinks obtained from audio synchronization (Chap. 5) and audio matching (Chap. 6) can be used to conveniently switch between different versions of a piece of music ( interdocument navigation ). We will now address the problem of audio structure analysis , which lays the basis for intradocument navigation . One major goal of the structural analysis of an audio recording is to automatically extract the repetitive structure or, more generally, the musical form of the underlying piece of music. Recent approaches such as [14,37,46,49,81,127,129,139,161]work well for music where the repetitions largely agree with respect to instrumentation and tempo as it is typically the case for popular music. For other classes of music including Western classical music, however, musically similar audio segments may exhibit significant variations in parameters such as dynamics, timbre, execution of note groups, musical key, articulation, and tempo progression. In this chapter, we propose robust and efficient algorithms for structure analysis that identifies musically similar segments. To obtain a flexible and robust algorithm, the idea is to simultaneously account for possible variations at various stages and levels. At the feature level, we use coarse chroma-based audio features that absorb microvariations. To cope with local variations, we design an advanced cost measure by integrating contextual information (Sect. 7.2). Finally, we describe a new strategy for structure extraction that can cope with more global variations (Sects. 7.3 and 7.4). Our experimental results with classical and popular music show that our algorithm performs successfully even in the presence of significant musical variations (Sect. 7.5). In Sect. 7.1, we start by summarizing a general strategy for audio structure analysis and introduce some notation that is used throughout this chapter. Related work and future research directions will be discussed in Sect. 7.6. In this chapter, we closely follow [139]. The enhancement strategy of self-similarity matrices by introducing a contextual local cost measure has first been described in [137].

Palabras clave: Audio Signal; Path Structure; Popular Music; Musical Structure; Path Relation.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 141-168

SyncPlayer: An Advanced Audio Player

In the previous chapters, we have discussed various MIR techniques and algorithms for automatically generating annotations and linking structures of interrelated music data. The generated data can be used to support inter- and intradocument browsing and retrieval in complex and inhomogeneous music collections, thus allowing users to discover and explore music in an intuitive and multimodal way. To demonstrate the potentials of our MIR techniques, we have developed the SyncPlayer system [114], which is a client-server based advanced audio player. The SyncPlayer integrates novel functionalities for multimodal presentation of audio as well as symbolic data and comprises a search engine for lyrics and other metadata. In Sect. 8.1, we give an overview of the SyncPlayer system, which consists of a server as well as a client component. The server component, as will be described in Sect. 8.2, includes functionalities such as audio identification, data retrieval, and data delivery. In contrast, the client component constitutes the user front end of the system and provides the user interfaces for the services offered by the server (Sect. 8.3). A discussion of related work and possible extensions of our system can be found in Sect. 8.4. A demo version of the SyncPlayer is available at [199].

Palabras clave: Audio Recording; Short Time Fourier Transform; Audio Segment; Lyric Annotation; Music Collection.

Part I - Analysis and Retrieval Techniques for Music Data | Pp. 169-183

Fundamentals on Motion Capture Data

The second part of this monograph deals with content-based analysis and retrieval of 3D motion capture data as used in computer graphics for animating virtual human characters. In this chapter, we provide the reader with some fundamental facts on motion representations. We start with a short introduction on motion capturing and introduce a mathematical model for the motion data as used throughout the subsequent chapters (Sect. 9.1).We continue with a detailed discussion of general similarity aspects that are crucial in view of motion comparison and retrieval (Sect. 9.2). Then, in Sect. 9.3, we formally introduce the concept of kinematic chains, which are generally used to model flexibly linked rigid bodies such as robot arms or human skeletons. Kinematic chains are parameterized by joint angles, which in turn can be represented in various ways. In Sect. 9.4, we describe and compare three important angle representations based on rotation matrices, Euler angles, and quaternions. Each of these representations has its strengths and weaknesses depending on the respective analysis or synthesis application.

Palabras clave: Euler Angle; Kinematic Chain; Unit Quaternion; Forward Kinematic; Motion Capture Data.

Part II - Analysis and Retrieval Techniques for Motion Data | Pp. 187-209

DTW-Based Motion Comparison and Retrieval

As we have seen in Chap. 4, dynamic time warping is a flexible tool for comparing time series in the presence of nonlinear time deformations. In this context, the choice of suitable local cost or distance measures is of crucial importance, since they determine the kind of (spatial) similarity between the elements (frames) of the two sequences to be aligned. For the mocap domain, we introduce two conceptually different local distance measures – one based on joint angle parameters and the other based on 3D coordinates – and discuss their respective strengths and weaknesses (Sect. 10.1). The importance of DTW is then illustrated by some synthesis and analysis applications (Sect. 10.2). By comparing a motion data stream to itself, one obtains a cost or distance matrix that exhibits self-similarities within the motion. In Sect. 10.3, we describe how this idea can be exploited for motion retrieval. Finally, in Sect. 10.4, we discuss some work related to DTW-based motion retrieval.

Palabras clave: Point Cloud; Dynamic Time Warping; Cost Matrix; Unit Quaternion; Walk Motion.

Part II - Analysis and Retrieval Techniques for Motion Data | Pp. 211-226