Catálogo de publicaciones - libros
Intelligent Data Engineering and Automated Learning: IDEAL 2005: 6th International Conference, Brisbane, Australia, July 6-8, 2005, Proceedings
Marcus Gallagher ; James P. Hogan ; Frederic Maire (eds.)
En conferencia: 6º International Conference on Intelligent Data Engineering and Automated Learning (IDEAL) . Brisbane, QLD, Australia . July 6, 2005 - July 8, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Database Management; Algorithm Analysis and Problem Complexity; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Information Systems Applications (incl. Internet); Computers and Society
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26972-4
ISBN electrónico
978-3-540-31693-0
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11508069_11
Multi-attributes Image Analysis for the Classification of Web Documents Using Unsupervised Technique
Samuel W. K. Chan
The aim of this research is to develop a system based on multi-attributes image analysis and a neural network self-organization feature map (SOFM) that will facilitate the automated classification of images or icons in Web documents. Four different image attribute sets are extracted. The system integrates different image attributes without demanding any particular primitive to be dominant. The system is implemented and the results generated show meaningful clusters. The performance of the system is compared with the Hierarchical Agglomerative Clustering (HAC) algorithm. Evaluation shows that similar images will fall onto the same region in our approach, in such a way that it is possible to retrieve images under family relationships.
- Data Mining and Knowledge Engineering | Pp. 78-85
doi: 10.1007/11508069_12
Automatic Image Annotation Based on Topic-Based Smoothing
Xiangdong Zhou; Jianye Ye; Lian Chen; Liang Zhang; Baile Shi
Automatic image annotation has attracted much attention recently, due to its wide applicability (such as image retrieval by semantics). Most of the known statistical model-based annotation methods learn the joint distribution of the keywords and the image blobs decomposed by segmentation or gride approaches. The effects of these methods suffer from the sparseness of the image blobs. As a result, the estimated joint distribution is need to be “smoothed”. In this paper, we present a topic-based smoothing method to overcome the sparseness problems, and integrated with a general image annotation model. Experimental results on 5,000 images demonstrate that our method can achieves significant improvement in annotation effectiveness over an existing method.
- Data Mining and Knowledge Engineering | Pp. 86-93
doi: 10.1007/11508069_13
A Focused Crawler with Document Segmentation
Jaeyoung Yang; Jinbeom Kang; Joongmin Choi
The focused crawler is a topic-driven document-collecting crawler that was suggested as a promising alternative of maintaining up-to-date Web document indices in search engines. A major problem inherent in previous focused crawlers is the liability of missing highly relevant documents that are linked from off-topic documents. This problem mainly originated from the lack of consideration of structural information in a document. Traditional weighting method such as TFIDF employed in document classification can lead to this problem.
In order to improve the performance of focused crawlers, this paper proposes a scheme of locality-based document segmentation to determine the relevance of a document to a specific topic. We segment a document into a set of sub-documents using contextual features around the hyperlinks. This information is used to determine whether the crawler would fetch the documents that are linked from hyperlinks in an off-topic document.
- Data Mining and Knowledge Engineering | Pp. 94-101
doi: 10.1007/11508069_14
An Intelligent Grading System Using Heterogeneous Linguistic Resources
Yu-Seop Kim; Woo-Jin Cho; Jae-Young Lee; Yu-Jin Oh
In this paper, we propose an intelligent grading system using heterogeneous linguistic resources. We used latent semantic kernel as one resource in former research and found that a deficit of indexed terms gave rise to performance bottleneck. To solve this, we expand answer papers, written by students and instructors, by utilizing one of widely used linguistic resources, WordNet. We supplement the papers with words semantically related to indexed terms of papers. The added words are selected from the synonyms and hyponyms on WordNet. And to get rid of the criterion decision problem, we use partial score of each question and evaluate the correlation coefficient between grading results of the proposed approach and human instructors. The proposed approach in this research achieves maximally 0.94 correlation coefficient to instructors, which is 0.06 higher than that of the former research.
- Data Mining and Knowledge Engineering | Pp. 102-108
doi: 10.1007/11508069_15
Probabilistic Data Generation for Deduplication and Data Linkage
Peter Christen
In many data mining projects the data to be analysed contains personal information, like names and addresses. Cleaning and pre-processing of such data likely involves deduplication or linkage with other data, which is often challenged by a lack of unique entity identifiers. In recent years there has been an increased research effort in data linkage and deduplication, mainly in the machine learning and database communities. Publicly available test data with known deduplication or linkage status is needed so that new linkage algorithms and techniques can be tested, evaluated and compared. However, publication of data containing personal information is normally impossible due to privacy and confidentiality issues. An alternative is to use artificially created data, which has the advantages that content and error rates can be controlled, and the deduplication or linkage status is known. Controlled experiments can be performed and replicated easily. In this paper we present a freely available data set generator capable of creating data sets containing names, addresses and other personal information.
- Data Mining and Knowledge Engineering | Pp. 109-116
doi: 10.1007/11508069_16
Mining Job Logs Using Incremental Attribute-Oriented Approach
Idowu O. Adewale; Reda Alhajj
With the emergence of grid computing, researchers in different fields are making use of the huge computing power of the grid to carry out massive computing tasks that are beyond the power of a single processor. When a computing task (or job) is submitted to the grid, some useful information about the job is logged in the database by the Scheduler. The computing infrastructure that makes up the grid is expensive; hence, it is of great importance to understand the resource usage pattern. In this paper, we propose an incremental attribute-oriented approach that mines data within a given time interval. We test our approach using a real life data of logs of jobs submitted to Western Canada Research Grid (WestGrid). We also develop an incremental attribute-oriented mining tool to implement the proposed approach. Our approach uncovers some hidden patterns and changes that take place over a period of time.
- Data Mining and Knowledge Engineering | Pp. 117-124
doi: 10.1007/11508069_17
Dimensional Reduction of Large Image Datasets Using Non-linear Principal Components
Silvia S. C. Botelho; Willian Lautenschlger; Matheus Bacelo de Figueiredo; Tania Mezzadri Centeno; Mauricio M. Mata
In this paper we apply a Neural Network (NN) to reduce image dataset, distilling the massive datasets down to a new space of smaller dimension. Due to the possibility of these data have nonlinearities, traditional multivariate analysis, like the Principal Component Analysis (PCA), may not represent reality. Alternatively, Nonlinear Principal Component Analysis (NLPCA) can be performed by a NN model to fulfill that deficiency. However, when the dimension of the image increases, NN may easily saturate. This work presents an original methodology associated with the use of a set of cascaded multi-layer NN with a bottleneck structure to extract nonlinear information of the large set of image data. We illustrate its good performance with a set of tests against comparisons using this methodology and PCA in the treatment of oceanographic data associated with mesoscale variability of an oceanic boundary current.
- Data Mining and Knowledge Engineering | Pp. 125-132
doi: 10.1007/11508069_18
Classification by Instance-Based Learning Algorithm
Yongguang Bao; Eisuke Tsuchiya; Naohiro Ishii; Xiaoyong Du
The basic k-nearest-neighbor classification algorithm works well in many domains but has several shortcomings. This paper proposes a tolerant instance-based learning algorithm TIBL and it’s combining method by simple voting of TIBL, which is an integration of genetic algorithm, tolerant rough sets and k-nearest neighbor classification algorithm. The proposed algorithms seek to reduce storage requirement and increase generalization accuracy when compared to the basic k-nearest neighbor algorithm and other learning models. Experiments have been conducted on some benchmark datasets from the UCI Machine Learning Repository. The results show that TIBL algorithm and it’s combining method, improve the performance of the k-nearest neighbor classification, and also achieves higher generalization accuracy than other popular machine learning algorithms.
- Data Mining and Knowledge Engineering | Pp. 133-140
doi: 10.1007/11508069_19
Analysis/Synthesis of Speech Signals Based on AbS/OLA Sinusoidal Modeling Using Elliptic Filter
Kihong Kim; Jinkeun Hong; Jongin Lim
The analysis-by-synthesis/overlap-add (AbS/OLA) sinusoidal model has been applied to a broad range of speech and audio signal processing, such as coding, analysis and synthesis, fundamental frequency modification, time and frequency scale modification. This model uses an iterative analysis-by-synthesis procedure to estimate the sinusoidal parameters {amplitudes, frequencies, and phases}. However, one drawback of this model is that the analysis frame length is generally fixed in analyzing the signal. As a result, since each sinusoidal parameter has different frequencies, an analysis frame with fixed length cannot an optimal spectral resolution to each sinusoidal parameter. In this paper, in order to overcome this drawback and to estimate sinusoidal parameter more accurately, an AbS/OLA sinusoidal model using an elliptic filter is presented and evaluated against the performance of conventional AbS/OLA sinusoidal model. Our proposed AbS/OLA sinusoidal model is found to achieve better performance, in terms of spectral characteristics, phase characteristics, and the synthetic speech quality, than conventional model.
- Data Mining and Knowledge Engineering | Pp. 141-148
doi: 10.1007/11508069_20
Robust Model Adaptation Using Mean and Variance Transformations in Linear Spectral Domain
Donghyun Kim; Dongsuk Yook
In this paper, we propose robust speech adaptation technique using continuous density hidden Markov models (HMMs) in unknown environments. This adaptation technique is an improved maximum likelihood linear spectral transformation (ML-LST) method, which aims to find appropriate noise parameters in the linear spectral domain. Previously, ML-LST and many transform-based adaptation algorithms have been applied to the Gaussian mean vectors of HMM systems. In the improved ML-LST for the rapid adaptation, the mean vectors and covariance matrices of an HMM based speech recognizer are transformed simultaneously using a small number of transformation parameters. It is shown that the variance transformation provides important information which can be used to handle environmental noise, in the similar manner that the mean transformation does.
- Data Mining and Knowledge Engineering | Pp. 149-154