Catálogo de publicaciones - libros

Compartir en
redes sociales

Advances in Multimedia Information Processing: 7th Pacific Rim Conference on Multimedia, Hangzhou, China, November 2-4, 2006, Proceedings

Yueting Zhuang ; Shi-Qiang Yang ; Yong Rui ; Qinming He (eds.)

En conferencia: 7º Pacific-Rim Conference on Multimedia (PCM) . Hangzhou, China . November 2, 2006 - November 4, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Computer Applications; Multimedia Information Systems; Information Storage and Retrieval; Computer Communication Networks; Information Systems Applications (incl. Internet); Image Processing and Computer Vision

Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink


Tipo de recurso:


ISBN impreso


ISBN electrónico


Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication

Selma Yilmazyildiz; Wesley Mattheyses; Yorgos Patsis; Werner Verhelst

This paper presents our recent and current work on expressive speech synthesis and recognition as enabling technologies for affective robot-child interaction. We show that current expression recognition systems could be used to discriminate between several archetypical emotions, but also that the old adage ”there’s no data like more data” is more than ever valid in this field. A new speech synthesizer was developed that is capable of high quality concatenative synthesis. This system will be used in the robot to synthesize expressive nonsense speech by using prosody transplantation and a recorded database with expressive speech examples. With these enabling components lining up, we are getting ready to start experiments towards hopefully effective child-machine communication of affect and emotion.

Pp. 1-8

DBN Based Models for Audio-Visual Speech Analysis and Recognition

Ilse Ravyse; Dongmei Jiang; Xiaoyue Jiang; Guoyun Lv; Yunshu Hou; Hichem Sahli; Rongchun Zhao

We present an audio-visual automatic speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system consists of three components: (i) a visual module, (ii) an acoustic module, and (iii) a Dynamic Bayesian Network-based recognition module. The vision module, locates and tracks the speaker head, and mouth movements and extracts relevant speech features represented by contour information and 3D deformations of lip movements. The acoustic module extracts noise-robust features, i.e. the Mel Filterbank Cepstrum Coefficients (MFCCs). Finally we propose two models based on Dynamic Bayesian Networks (DBN) to either consider the single audio and video streams or to integrate the features from the audio and visual streams. We also compare the proposed DBN based system with classical Hidden Markov Model. The novelty of the developed framework is the persistence of the audiovisual speech signal characteristics from the extraction step, through the learning step. Experiments on continuous audiovisual speech show that the segmentation boundaries of phones in the audio stream and visemes in the video stream are close to manual segmentation boundaries.

Pp. 19-30

An Extensive Method to Detect the Image Digital Watermarking Based on the Known Template

Yang Feng; Senlin Luo; Limin Pan

There are many types of digital watermarking algorithms, but each type corresponds with a certain detecting method to detect the watermark. However, the embedding method is usually unknown, so that it is not possible to know whether the hidden information exists or not. An extensive digital watermarking detecting method based on the known template is proposed in this paper. This method extracts some feature parameters form the spatial, DCT and DWT domains of the image and template, and then use some detecting strategies on those parameters to detect the watermark. The experiment result shows that the correct detecting rate is more than 97%. Obviously, the extensive digital watermarking detection method can be realized, and the method is valuable in theory and practice.

Pp. 31-40

Fast Mode Decision Algorithm in H.263+/H.264 Intra Transcoder

Min Li; Guiming He

In this paper, we proposed a fast mode decision algorithm in transform-domain for H.263+ to H.264 intra transcoder. In the transcoder, the residual signals carried by H.263+ bitstreams are threshold controlled to decide whether we should reuse the prediction direction provided by H.263+ or re-estimate the prediction direction. Then the DCT coefficients in H.263+ bitstreams are converted to H.264 transform coefficients entirely in the transform-domain. Finally, by using the new prediction mode and direction, the H.264 transform residual coefficients are coded to generate the H.264 bitstream. The simulation results show the performance of the proposed algorithm is close to that of a cascaded pixel-domain transcoder (CPDT) while transcoding computation complexity is significantly lower.

Pp. 41-47

Binary Erasure Codes for Packet Transmission Subject to Correlated Erasures

Frederik Vanhaverbeke; Frederik Simoens; Marc Moeneclaey; Danny De Vleeschauwer

We design some simple binary codes that are very well suited to reconstruct erased packets over a transmission medium that is characterized by correlation between subsequent erasures. We demonstrate the effectiveness of these codes for the transmission of video packets for HDTV over a DSL connection.

Pp. 48-55

Image Desynchronization for Secure Collusion-Resilient Fingerprint in Compression Domain

Zhongxuan Liu; Shiguo Lian; Zhen Ren

Collusion is a major menace to image fingerprint. Recently, an idea is introduced for collusion-resilient fingerprint by desynchronizing images in raw data. In this paper, we consider compression domain image desynchronization method and its system security. First, appropriate desynchronization forms for compression domain are presented; secondly, the system security is discussed and a secure scheme is proposed; thirdly, for evaluating the visual degradation of space desynchronization, we propose a metric called Synchronized Degradation Metric (SDM). Performance analysis including the experiments indicate the effectiveness of the proposed scheme and the metric.

Pp. 56-63

Euclidean Distance Transform of Digital Images in Arbitrary Dimensions

Dong Xu; Hua Li

A new algorithm for Euclidean distance transform is proposed in this paper. It propagates from the boundary to the inner of object layer by layer, like the inverse propagation of water wave. It can be applied in every dimensional space and has linear time complexity. Euclidean distance transformations of digital images in 2-D and 3-D are conducted in the experiments. Voronoi diagram and Delaunay triangulation can also be produced by this method.

Pp. 72-79

JPEG2000 Steganography Possibly Secure Against Histogram-Based Attack

Hideki Noda; Yohsuke Tsukamizu; Michiharu Niimi

This paper presents two steganographic methods for JPEG2000 still images which preserve histograms of discrete wavelet transform (DWT) coefficients. The first one is a histogram quasi- preserving method using quantization index modulation (QIM) with a dead zone in DWT domain. The second one is a histogram preserving method based on histogram matching using two quantizers with a dead zone. Comparing with a conventional JPEG2000 steganography, the two methods show better histogram preservation. The proposed methods are promising candidates for secure JPEG2000 steganography against histogram-based attack.

Pp. 80-87

A System for Generating Personalized Virtual News

Jian-Jun Xu; Jun Wen; Dan-Wen Chen; Yu-Xiang Xie; Ling-Da Wu

To improve the degree of immersion for strategic situation representation in strategic war gaming, the concept of virtual news and automatic generation model are presented in this paper. Via analyzing characteristic of news video, the design and generation algorithm for virtual news narrative template are given, which borrow the idea of Natural Language Process and combine the specialties of news video. And the narrative template revise algorithm is also proposed based on time constraints. Virtual News is automatically generated driven by virtual news narrative template, which retrieving relative news segments in multimedia database and selecting appropriate representation method based on the model- EEDU (Extended Entity-Description-Utility). This approach can generate virtual news according to text description about strategic situation provided by users, and furthermore provide personalized service for decision-makers. Finally, experiment results are used to indicate the validity of our system.

Pp. 96-105

Image Fingerprinting Scheme for Print-and-Capture Model

Won-gyum Kim; Seon Hwa Lee; Yong-seok Seo

This paper addresses an image fingerprinting scheme for the print-to-capture model performed by a photo printer and digital camera. When capturing an image by a digital camera, various kinds of distortions such as noise, geometrical distortions, and lens distortions are applied slightly and simultaneously. In this paper, we consider several steps to extract fingerprints from the distorted image in print-and capture scenario. To embed ID into an image as a fingerprint, multi-bits embedding is applied. We embed 64 bits ID information as a fingerprint into spatial domain of color images. In order to restore a captured image from distortions a noise reduction filter is performed and a rectilinear tiling pattern is used as a template. To make the template a multi-bits fingerprint is embedded repeatedly like a tiling pattern into the spatial domain of the image. We show that the extracting is successful from the image captured by a digital camera through the experiment.

Pp. 106-113