Catálogo de publicaciones - libros

Compartir en
redes sociales


Discovery Science: 9th International Conference, DS 2006, Barcelona, Spain, October 7-10, 2006, Proceedings

Ljupčo Todorovski ; Nada Lavrač ; Klaus P. Jantke (eds.)

En conferencia: 9º International Conference on Discovery Science (DS) . Barcelona, Spain . October 7, 2006 - October 10, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Philosophy of Science; Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Computer Appl. in Administrative Data Processing; Computer Appl. in Social and Behavioral Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-46491-4

ISBN electrónico

978-3-540-46493-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Automatic Water Eddy Detection in SST Maps Using Random Ellipse Fitting and Vectorial Fields for Image Segmentation

Armando Fernandes; Susana Nascimento

The impact of water eddies off the Iberian coast in the chemistry and biology of the ocean ecosystems, on the circulation of ocean waters and on climate still needs to be studied. The task of identifying water eddies in sea surface temperature maps (SST) is time-consuming for oceanographers due to the large number of SST available. This motivates the present investigation aiming to develop an automatic system capable of performing that task. The system developed consists of a pre-processing stage where a vectorial field is calculated using an optical flow algorithm with one SST map and a matrix of zeros for input. Next, a binary image of the modulus of the vectorial field is created using an iterative thresholding algorithm. Finally, five edge points of the binary image, classified according to their gradient vector direction, are randomly selected and an ellipse corresponding to a water eddy fitted to them.

II - Long Papers | Pp. 77-88

Mining Approximate Motifs in Time Series

Pedro G. Ferreira; Paulo J. Azevedo; Cândida G. Silva; Rui M. M. Brito

The problem of discovering previously unknown frequent patterns in time series, also called motifs, has been recently introduced. A motif is a subseries pattern that appears a significant number of times. Results demonstrate that motifs may provide valuable insights about the data and have a wide range of applications in data mining tasks. The main motivation for this study was the need to mine time series data from protein folding/unfolding simulations. We propose an algorithm that extracts approximate motifs, i.e. motifs that capture portions of time series with a similar and eventually symmetric behavior. Preliminary results on the analysis of protein unfolding data support this proposal as a valuable tool. Additional experiments demonstrate that the application of utility of our algorithm is not limited to this particular problem. Rather it can be an interesting tool to be applied in many real world problems.

II - Long Papers | Pp. 89-101

Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

Yaakov HaCohen-Kerner; Hananya Beck; Elchai Yehudai; Dror Mughaz

Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both languages. Secondly, Hebrew and Aramaic are richer than English in their morphology forms. The classification is done using six different sets of stylistic features: quantitative features, orthographic features, topographic features, lexical features and vocabulary richness. Each set of features includes various baseline features, some of them formalized by us. SVM has been chosen as the applied machine learning method since it has been very successful in text classification. The quantitative set was found as very successful and superior to all other sets. Its features are domain-independent and language-independent. It will be interesting to apply these feature sets in general and the quantitative set in particular into other domains as well as into other.

II - Long Papers | Pp. 102-113

A New Family of String Classifiers Based on Local Relatedness

Yasuto Higa; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda

This paper introduces a new family of based on local relatedness. We use three types of local relatedness measurements, namely, (), (), and (). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.

II - Long Papers | Pp. 114-124

On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets

Ata Kabán; Jianyong Sun; Somak Raychaudhury; Louisa Nolan

Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional data, we show that a more optimal combination of these objectives can be achieved by integrating them both into a consistent probabilistic model. In this way, the projection step will fulfil a role of regularisation, guarding against the curse of dimensionality. As a result, the tradeoff between clustering and visualisation turns out to enhance the predictive abilities of the overall model. We present results on both synthetic data and two real-world high-dimensional data sets: observed spectra of early-type galaxies and gene expression arrays.

II - Long Papers | Pp. 125-136

Mining Sectorial Episodes from Event Sequences

Takashi Katoh; Kouichi Hirata; Masateru Harao

In this paper, we introduce a of the form ↦, where is a set of events and is an event. The sectorial episode ↦ means that every event of is followed by an event . Then, by formulating the and the of sectorial episodes, in this paper, we design the algorithm to extract all of the from a given event sequence by traversing it just once. Finally, by applying the algorithm to bacterial culture data, we extract sectorial episodes representing .

II - Long Papers | Pp. 137-148

A Voronoi Diagram Approach to Autonomous Clustering

Heidi Koivistoinen; Minna Ruuska; Tapio Elomaa

Clustering is a basic tool in unsupervised machine learning and data mining. Distance-based clustering algorithms rarely have the means to autonomously come up with the correct number of clusters from the data. A recent approach to identifying the natural clusters is to compare the point densities in different parts of the sample space.

In this paper we put forward an agglomerative clustering algorithm which accesses density information by constructing a Voronoi diagram for the input sample. The volumes of the point cells directly reflect the point density in the respective parts of the instance space. Scanning through the input points and their Voronoi cells once, we combine the densest parts of the instance space into clusters.

Our empirical experiments demonstrate the proposed algorithm is able to come up with a high-accuracy clustering for many different types of data. The Voronoi approach clearly outperforms -means algorithm on data conforming to its underlying assumptions.

II - Long Papers | Pp. 149-160

Itemset Support Queries Using Frequent Itemsets and Their Condensed Representations

Taneli Mielikäinen; Panče Panov; Sašo Džeroski

The purpose of this paper is two-fold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and their condensed representations. Second, we evaluate the usefulness of condensed representations of frequent itemsets to answer itemset support queries using the proposed query algorithms and index structures. We study analytically the worst-case time complexities of querying condensed representations and evaluate experimentally the query efficiency with random itemset queries to several benchmark transaction databases.

II - Long Papers | Pp. 161-172

Strategy Diagram for Identifying Play Strategies in Multi-view Soccer Video Data

Yukihiro Nakamura; Shin Ando; Kenji Aoki; Hiroyuki Mano; Einoshin Suzuki

In this paper, we propose a strategy diagram to acquire knowledge of soccer for identifying play strategies in multi-view video data. Soccer, as the most popular team sport in the world, attracts attention of researchers in knowledge discovery and data mining and its related areas. Domain knowledge is mandatory in such applications but acquiring domain knowledge of soccer from experts is a laborious task. Moreover such domain knowledge is typically acquired and used in an ad-hoc manner. Diagrams in textbooks can be considered as a promising source of knowledge and are intuitive to humans. Our strategy diagram enables a systematic acquisition and use of such diagrams as domain knowledge for identifying play strategies in video data of a soccer game taken from multiple angles. The key idea is to transform multi-view video data to sequential coordinates then match the strategy diagram in terms of essential features. Experiments using video data of a national tournament for high school students show that the proposed method exhibits promising results and gives insightful lessons for further studies.

II - Long Papers | Pp. 173-184

Prediction of Domain-Domain Interactions Using Inductive Logic Programming from Multiple Genome Databases

Thanh Phuong Nguyen; Tu Bao Ho

Protein domains are the building blocks of proteins, and their interactions are crucial in forming stable protein-protein interactions (PPI) and take part in many cellular processes and biochemical events. Prediction of protein domain-domain interactions (DDI) is an emerging problem in computational biology. Different from early works on DDI prediction, which exploit only a single protein database, we introduce in this paper an integrative approach to DDI prediction that exploits multiple genome databases using inductive logic programming (ILP). The main contribution to biomedical knowledge discovery of this work are a newly generated database of more than 100,000 ground facts of the twenty predicates on protein domains, and various DDI findings that are evaluated to be significant. Experimental results show that ILP is more appropriate to this learning problem than several other methods. Also, many predictive rules associated with domain sites, conserved motifs, protein functions and biological pathways were found.

II - Long Papers | Pp. 185-196