Catálogo de publicaciones - libros

Compartir en
redes sociales

Advances in Multimedia Information Processing: 6th Pacific Rim Conference on Multimedia, Jeju Island, Korea, November 11-13, 2005, Proceedings, Part I

Yo-Sung Ho ; Hyoung Joong Kim (eds.)

En conferencia: 6º Pacific-Rim Conference on Multimedia (PCM) . Jeju Island, South Korea . November 13, 2005 - November 16, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Multimedia Information Systems; Information Storage and Retrieval; Computer Communication Networks; Information Systems Applications (incl. Internet); Computer Graphics; Image Processing and Computer Vision

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-30027-4

ISBN electrónico

978-3-540-32130-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11581772_61

Classification of Audio Signals Using Gradient-Based Fuzzy c-Means Algorithm with Divergence Measure

Dong-Chul Park; Duc-Hoai Nguyen; Seung-Hwa Beack; Sancho Park

Multimedia databases usually store thousands of audio files such as music, speech and other sounds. One of the challenges in modern multimedia system is to classify and retrieve certain kinds of audio from the database. This paper proposes a novel classification algorithm for a content-based audio retrieval. The algorithm, called Gradient-Based Fuzzy c-Means Algorithm with Divergence Measure (GBFCM(DM)), is a neural network-based algorithm which utilizes the Divergence Measure to exploit the statistical nature of the audio data to improve the classification accuracy. Experiment results confirm that the proposed algorithm outperforms 3.025%-5.05% in accuracy in comparison with conventional algorithms such as the k-Means or the Self-Organizing Map.

Palabras clave: Discrete Wavelet Transform; Audio Signal; Audio Data; Code Vector; Multimedia Database.

Pp. 698-708

doi: 10.1007/11581772_62

Variable Bit Quantization for Virtual Source Location Information in Spatial Audio Coding

Sang Bae Chon; In Yong Choi; Jeongil Seo; Koeng-Mo Sung

In [1,2,3], Binaural Cue Coding (BCC) was introduced for multi-channel spatial rendering for MPEG 4 – SAC (Spatial Audio Coding) to reduce bitrate of multi-channel audio signal. In [4,5], Virtual Source Location Information (VSLI) was introduced to replace Inter-Channel Level Difference, the most determinant parameter in BCC system. Here, Variable Bit Quantization (VBQ) for VSLI is proposed to reduce bitrate at the quantization block in VSLI-based BCC systems removing statistically invalid range.

Palabras clave: Sound Source; Audio Signal; Global Vector; Audio Code; Left Center.

Pp. 709-719

doi: 10.1007/11581772_63

The Realtime Method Based on Audio Scenegraph for 3D Sound Rendering

Jeong-Seon Yi; Suk-Jeong Seong; Yang-Hee Nam

Recent studies have shown that the combination of auditory and visual cues enhances the sense of immersion in virtual reality or interactive entertainment applications. However, realtime 3D audiovisual rendering requires high computational cost. In this paper, to reduce realtime computation, we suggest a novel framework of optimized 3D sound rendering, where we define Audio Scenegraph that contains reduced 3D scene information and the necessary parameters for computing early reflections of sound. During pre-computation phase using our framework, graphic reduction and sound source reduction are accomplished according to the environment containing complex 3D scene, sound sources, and a listener. That is, complex 3D scene is reduced to a set of significant facets for sound rendering, and the resulting scene is represented as Audio Scenegraph we defined. And then, the graph is transmitted to the sound engine which clusters a number of sound sources for reducing realtime calculation of sound propagation. For sound source reduction, it is required to estimate early reflection time to test perceptual culling and to cluster sounds which are reachable to facets of each sub space according to the estimation results. During realtime phase according to the position, direction and index of the space of a listener, sounds inside sub space are played by image method and sounds outside sub space are also played by assigning clustered sounds to buffers. Even if the number of sounds is increased, realtime calculation is very stable because most calculations about sounds can be performed offline. It took very consistent time for 3D sound rendering regardless of complexity of 3D scene including hundreds of sound sources by this method. As a future study, it is required to estimate the perceptual acceptance of grouping algorithm by user test.

Palabras clave: Sound Source; Sound Propagation; Graphic Reduction; Graphic Engine; Sound Engine.

Pp. 720-730

doi: 10.1007/11581772_64

Dual-Domain Quantization for Transform Coding of Speech and Audio Signals

Jun-Seong Hong; Jong-Hyun Choi; Chang-Beom Ahn; Chae-Bong Sohn; Seoung-Jun Oh; Hochong Park

New quantization method for transform coding of speech and audio signals is proposed. The spectral coefficients obtained by the first transform are split into frequency bands, and those of each band are transformed again on a band basis, resulting in another set of coefficients for each band. Then, the efficiency of Huffman coding in two transform domains is analyzed on a band basis and a domain with better performance is selected for each band as the final quantization domain. In addition, a set of domain selection patterns with frequent occurrence is pre-defined in order to decrease the number of side-information bits for indicating the selected domains. The proposed quantization method based on the dual-domain approach is applied to ITU G.722.1 signal codec and the improvement of quantization performance for various speech and audio signals is verified.

Palabras clave: Quantization Method; Audio Signal; Selection Mode; Quantization Performance; Huffman Code.

Pp. 731-741

doi: 10.1007/11581772_65

A Multi-channel Audio Compression Method with Virtual Source Location Information

Han-gil Moon; Jeong-il Seo; Seungkwon Beak; Koeng-Mo Sung

Binaural cue coding (BCC) was introduced as an efficient representation method for MPEG-4 SAC (Spatial Audio Coding). However, in a low bit-rate environment, the spectrum of BCC output signals degrades with respect to the perceptual level. The proposed system in this paper estimates VSLI (virtual source location information) as the side information. The VSLI is the angle representation of spatial images between channels on playback layout. The subjective assessment results show that the proposed method provides better audio quality than the BCC method for encoding multi-channel signals.

Palabras clave: Audio Signal; Side Information; Reference Channel; Spatial Image; Angle Information.

Pp. 742-753

doi: 10.1007/11581772_66

A System for Detecting and Tracking Internet News Event

Zhen Lei; Ling-da Wu; Ying Zhang; Yu-chi Liu

News event detection is the task of discovering relevant, yet previously unreported real-life events and reporting it to users in human-readable form, while event tracking aims to automatically assign event labels to news stories when they arrive. A new method and system for performing the event detection and tracking task is proposed in this paper. The event detection and tracking method is based on subject extraction and an improved support vector machine (SVM), in which subject concepts can concisely and precisely express the meaning of a longer text. The improved SVM first prunes the negative examples, reserves and deletes a negative sample according to distance and class label, then trains the new set with SVM to obtain a classifier and maps the SVM outputs into probabilities. The experimental results with the real-world data sets indicate the proposed method is feasible and advanced.

Pp. 754-764

doi: 10.1007/11581772_67

A Video Summarization Method for Basketball Game

Eui-Jin Kim; Gwang-Gook Lee; Cheolkon Jung; Sang-Kyun Kim; Ji-Yeun Kim; Whoi-Yul Kim

There have been various research efforts on automatic summarization of sports video. However, most previous works were based on event detection and thus cannot reflect the semantic importance of scenes and content of a game. In this paper, a summarization method for basketball video is presented. The proposed method keeps track of score changes of the game by reading the numbers on the score board. Analysis of the score variation yields a video summary that consists of semantically important and interesting scenes such as reversal or pursuit. Experimental results indicate that the proposed method can summarize basketball video with reasonable accuracy.

Palabras clave: Candidate Region; False Recognition; Dominant Color; Video Summarization; Sport Video.

Pp. 765-775

doi: 10.1007/11581772_68

Improvement of Commercial Boundary Detection Using Audiovisual Features

Jun-Cheng Chen; Jen-Hao Yeh; Wei-Ta Chu; Jin-Hau Kuo; Ja-Ling Wu

Detection of commercials in TV videos is difficult because the diversity of them puts up a high barrier to construct an appropriate model. In this work, we try to deal with this problem through a top-down approach. We take account of the domain knowledge of commercial production and extract features that describe the characteristics of commercials. According to the clues from speech-music discrimination, video scene detection, and caption detection, a multi-modal commercial detection scheme is proposed. Experimental results show good performance of the proposed scheme on detecting commercials in news and talk show programs.

Palabras clave: Video Scene; News Program; Caption Ratio; Caption Text; Boundary Refinement.

Pp. 776-786

doi: 10.1007/11581772_69

Automatic Dissolve Detection Scheme Based on Visual Rhythm Spectrum

Seong Jun Park; Kwang-Deok Seo; Jae-Gon Kim; Samuel Moon-Ho Song

The automatic video parser, a necessary tool for the development and maintenance of a video library, must accurately detect video scene changes so that the resulting video clips can be indexed in some fashion and stored in a video database. Abrupt scene changes and wipes are detected fairly well. However, dissolve changes have been often missed. In this paper, we propose a robust dissolve detection scheme based on Visual Rhythm Spectrum. The Visual Rhythm Spectrum contains distinctive patterns or visual features for many different types of video effects. The efficiency of the proposed scheme is demonstrated using a number of video clips and some performance comparisons are made with other existing approaches.

Palabras clave: Window Size; Discrete Cosine Transform; Visual Feature; Video Clip; Inverse Discrete Cosine Transform.

Pp. 787-798

doi: 10.1007/11581772_70

A Study on the Relation Between the Frame Pruning and the Robust Speaker Identification with Multivariate t-Distribution

Younjeong Lee; Joohun Lee; Hernsoo Hahn

In this paper, we performed the robust speaker identification based on the frame pruning and multivariate t- distribution respectively, and then studied on a theoretical basis for the frame pruning using the other methods. Based on the results from two methods, we showed that the robust algorithms based on the weight of frames become the theoretical basis of the frame pruning method by considering the correspondence between the weight of frame pruning and the conditional expectation of t- distribution. Both methods showed good performance when coping with the outliers occurring in a given time period, while the frame pruning method removing less reliable frames is recommended as one of good methods and, also, the multivariate t- distributions are generally used instead of Gaussian mixture models (GMM) as a robust approach for the speaker identification. In experiments, we found that the robust speaker identification has higher performance than the typical GMM algorithm. Moreover, we showed that the trend of frame likelihood using the frame pruning is similar to one of robust algorithms.

Palabras clave: Gaussian Mixture Model; Conditional Expectation; Robust Algorithm; Clean Speech; Speaker Identification.

Pp. 799-808