Catálogo de publicaciones - libros
Intelligent Multimedia Processing with Soft Computing
Yap-Peng Tan ; Kim Hui Yap ; Lipo Wang (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-23053-3
ISBN electrónico
978-3-540-32367-9
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
Knowledge Extraction in Stereo Video Sequences Using Adaptive Neural Networks
Anastasios Doulamis
In this chapter, an adaptive neural network architecture is proposed for efficient knowledge extraction in video sequences. The system is focused on video object segmentation and tracking in stereoscopic video sequences. The proposed scheme includes: (a) a retraining algorithm for adapting the network weights to current conditions, (b) a semantically meaningful object extraction module for creating a retraining set and (c) a decision mechanism, which detects the time instances when a new network retraining is activated. The retraining algorithm optimally adapts network weights by exploiting information of the current conditions with a minimal deviation of the network weights. The algorithm results in the minimization of a convex function subject to linear constraints, and thus, one minimum exists. Description of current conditions is provided by a segmentation fusion scheme, which appropriately combines color and depth information. Experimental results on real-life video sequences are presented to indicate the promising performance of the proposed adaptive neural network-based scheme.
Pp. 235-252
An Efficient Genetic Algorithm for Small Search Range Problems and Its Applications
Ja-Ling Wu; Chun-Hung Lin; Chun-Hsiang Huang
Genetic algorithms have been applied to many optimization and search problems and shown to be very efficient. However, the efficiency of genetic algorithms is not guaranteed in those applications where the search space is small, such as the block motion estimation in video coding applications, or equivalently the chromosome length is relatively short, less than 5 for example. Since the characteristics of these small search space applications are far away from that of the conventional search problems in which the common genetic algorithms worked well, new treatments of genetic algorithms for dealing with the small range search problems are therefore of interest. In this paper, the efficiency of the genetic operations of common genetic algorithms, such as crossover and mutation, is analyzed for this special situation. As expected, the so-obtained efficiency/performance of the genetic operations is quite different from that of their traditional counterparts. To fill this gap, a lightweight genetic search algorithm is presented to provide an efficient way for generating near optimal solutions for these kinds of applications. The control overheads of the lightweight genetic search algorithm are very low as compared with that of the conventional genetic algorithms. It is shown by simulations that many computations can be saved by applying the newly proposed algorithm while the search results are still well acceptable.
Pp. 253-280
Manifold Learning and Applications in Recognition
Junping Zhang; Stan Z. Li; Jue Wang
Great amount of data under varying intrinsic features are empirically thought of as high-dimensional nonlinear manifold in the observation space. With respect to different categories, we present two recognition approaches, i.e. the combination of manifold learning algorithm and linear discriminant analysis (MLA+LDA), and nonlinear auto-associative modeling (NAM). For similar object recognition, e.g. face recognition, MLA + LDA is used. Otherwise, NAM is employed for objects from largely different categories. Experimental results on different benchmark databases show the advantages of the proposed approaches.
Pp. 281-300
Face Recognition Using Discrete Cosine Transform and RBF Neural Networks
Weilong Chen; Meng Joo Er; Shiqian Wu
In this chapter, an efficient method for face recognition based on the Discrete Cosine Transform (DCT), the Fisher’s Linear Discriminant (FLD) and Radial Basis Function (RBF) neural networks is presented. First, the dimensionality of the original face image is reduced by using the DCT and large area illumination variations are alleviated by discarding the first few low-frequency DCT coefficients. Next, the truncated DCT coefficient vectors are clustered using the proposed clustering algorithm. This process makes the subsequent FLD more efficient. After implementing the FLD, the most discriminating and invariant facial features are maintained and the training samples are clustered well. As a consequence, further parameter estimation for the RBF neural networks is fulfilled easily which facilitates fast training in the RBF neural networks. Simulation results show that the proposed system achieves excellent performance with high training and recognition speed and recognition rate as well as very good illumination robustness.
Pp. 301-326
Probabilistic Reasoning for Closed-Room People Monitoring
Ji Tao; Yap-Peng Tan
In this chapter, we present a probabilistic reasoning approach to recognizing people entering and leaving a closed room by exploiting low-level visual features and high-level domain-specific knowledge. Specifically, people in the view of a monitoring camera are first detected and tracked so that their color and facial features can be extracted and analyzed. Then, recognition of people is carried out using a mapped feature similarity measure and exploiting the temporal correlation and constraints among each sequence of observations. The optimality of recognition is achieved in the sense of maximizing the joint posterior probability of the multiple observations. Experimental results of real and synthetic data are reported to show the effectiveness of the proposed approach.
Pp. 327-348
Human-Machine Communication by Audio-Visual Integration
Satoshi Nakamura; Tatsuo Yotsukura; Shigeo Morishima
The use of audio-visual information is inevitable in human communication. Complementary usage of audio-visual information enables more accurate, robust, natural, and friendly human communication in real environments. These types of information are also required for computers to realize natural and friendly interfaces, which are currently unreliable and unfriendly.
In this chapter, we focus on synchronous multi-modalities, specifically audio information of speech and image information of a face for audio-visual speech recognition, synthesis and translation. Human audio speech and visual speech information both originate from movements of the speech organs triggered by motor commands from the brain. Accordingly, such speech signals represent the information of an utterance in different ways. Therefore, these audio and visual speech modalities have strong correlations and complementary relationships. There is indeed a very strong demand to improve current speech recognition performance. The performance in real environments drastically degrades when speech is exposed to acoustic noise, reverberation and speaking style differences. The integration of audio and visual information is expected to make the system robust and reliable and improve the performance. On the other hand, there is also a demand to improve speech synthesis intelligibility as well. The multi-modal speech synthesis of audio speech and lip-synchronized talking face images can improve intelligibility and naturalness. This chapter first describes audio-visual speech detection and recognition which aim to improve the robustness of speech recognition performance in actual noisy environments in Section 1. Second, a talking face synthesis system based on a 3-D mesh model and an audio-visual speech translation system are introduced. The audiovisual speech translation system recognizes input speech in an original language, translates it into a target language and synthesizes output speech in a target language in Section 2.
Pp. 349-368
Probabilistic Fusion of Sorted Score Sequences for Robust Speaker Verification
Ming-Cheung Cheung; Man-Wai Mak; Sun-Yuan Kung
Fusion techniques have been widely used in multi-modal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consider the multiple samples extracted from a single modality as independent but coming from the same source. In this chapter, we propose a single-source, multi-sample data-dependent fusion algorithm for speaker verification. The algorithm is data-dependent in that the fusion weights are dependent on the verification scores and the prior score statistics of claimed speakers and background speakers. To obtain the best out of the speaker’s scores, scores from multiple utterances are sorted before they are probabilistically combined. Evaluations based on 150 speakers from a GSM-transcoded corpus are presented. Results show that data-dependent fusion of speaker’s scores is significantly better than the conventional score averaging approach. It was also found that the proposed fusion algorithm can be further enhanced by sorting the score sequences before they are probabilistically combined.
Pp. 369-387
Adaptive Noise Cancellation Using Online Self-Enhanced Fuzzy Filters with Applications to Multimedia Processing
Meng Joo Er; Zhengrong Li
Adaptive noise cancellation is a significant research issue in multimedia signal processing, which is a widely used technique in teleconference systems, hands-free mobile communications, acoustical echo and feedback cancellation and so on. For the purpose of implementing real-time applications in nonlinear environments, an online self-enhanced fuzzy filter for solving adaptive noise cancellation is proposed. The proposed online self-enhanced fuzzy filter is based on radial-basis-function networks and functionally is equivalent to the Takagi-Sugeno-Kang fuzzy system. As a prominent feature of the online self-enhanced fuzzy filter, the system is hierarchically constructed and self-enhanced during the training process using a novel online clustering strategy for structure identification. In the process of system construction, instead of selecting the centers and widths of membership functions arbitrarily, an online clustering method is applied to ensure reasonable representation of input terms. It not only ensures proper feature representation, but also optimizes the structure of the filter by reducing the number of fuzzy rules. Moreover, the filter is adaptively tuned to be optimal by the proposed hybrid sequential algorithm for parameters determination. Due to online self-enhanced system construction and hybrid learning algorithm, low computation load and less memory requirements are achieved. This is beneficial for applications in real-time multimedia signal processing.
Pp. 389-414
Image Denoising Using Stochastic Chaotic Simulated Annealing
Lipo Wang; Leipo Yan; Kim-Hui Yap
In this Chapter, we present a new approach to image denoising based on a novel optimization algorithm called stochastic chaotic simulated annealing. The original Bayesian framework of image denoising is reformulated into a constrained optimization problem using continuous relaxation labeling. To solve this optimization problem, we then use a noisy chaotic neural network (NCNN), which adds noise and chaos into the Hopfield neural network (HNN) to facilitate efficient searching and to avoid local minima. Experimental results show that this approach can offer good quality solutions to image denoising.
Pp. 415-429
Soft Computation of Numerical Solutions to Differential Equations in EEG Analysis
Mingui Sun; Xiaopu Yan; Robert J. Sclabassi
Computational localization and modeling of functional activity within the brain, based on multichannel electroencephalographic (EEG) data are important in basic and clinical neuroscience. One of the key problems in analyzing EEG data is to evaluate surface potentials of a theoretical volume conductor model in response to an internally located current dipole with known parameters. Traditionally, this evaluation has been performed by means of either finite boundary or finite element methods which are computationally demanding. This paper presents a soft computing approach using an artificial neural network (ANN). Off-line training is performed for the ANN to map the forward solutions of the spherical head model to those of a class of spheroidal head models. When the ANN is placed on-line and a set of potential values of the spherical model are presented at the input, the ANN generalizes the knowledge learned during the training phase and produces the potentials of the selected spheroidal model with a desired eccentricity. In this work we investigate theoretical aspects of this soft-computing approach and show that the numerical computation can be formulated as a machining learning problem and implemented by a supervised function approximation ANN. We also show that, for the case of the Poisson equation, the solution is unique and continuous with respect to boundary surfaces. Our experiments demonstrate that this soft-computing approach produces highly accurate results with only a small fraction of the computational cost required by the traditional methods.
Pp. 431-451