Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery in Databases: PKDD 2007: 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Ramon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 11º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74975-2

ISBN electrónico

978-3-540-74976-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

An Effective Approach to Enhance Centroid Classifier for Text Categorization

Songbo Tan; Xueqi Cheng

Centroid Classifier has been shown to be a simple and yet effective method for text categorization. However, it is often plagued with model misfit (or inductive bias) incurred by its assumption. To address this issue, a novel Model Adjustment algorithm was proposed. The basic idea is to make use of some criteria to adjust Centroid Classifier model. In this work, the criteria include training-set errors as well as training-set margins. The empirical assessment indicates that proposed method performs slightly better than SVM classifier in prediction accuracy, as well as beats it in running time.

- Short Papers | Pp. 581-588

Automatic Categorization of Human-Coded and Evolved CoreWar Warriors

Nenad Tomašev; Doni Pracner; Miloš Radovanović; Mirjana Ivanović

CoreWar is a computer simulation devised in the 1980s where programs loaded into a virtual memory array compete for control over the virtual machine. These programs are written in a special-purpose assembly language called and referred to as . A great variety of environments and battle strategies have emerged over the years, leading to formation of different warrior types. This paper deals with the problem of automatic warrior categorization, presenting results of classification based on several approaches to warrior representation, and offering insight into ambiguities concerning the identification of strategic classes. Over 600 human-coded warriors were annotated, forming a training set for classification. Several major classifiers were used, SVMs proving to be the most reliable, reaching accuracy of 84%. Classification of an evolved warrior set using the trained classifiers was also conducted. The obtained results proved helpful in outlining the issues with both automatic and manual Redcode program categorization.

- Short Papers | Pp. 589-596

Utility-Based Regression

Luis Torgo; Rita Ribeiro

Cost-sensitive learning is a key technique for addressing many real world data mining applications. Most existing research has been focused on classification problems. In this paper we propose a framework for evaluating regression models in applications with non-uniform costs and benefits across the domain of the continuous target variable. Namely, we describe two metrics for asserting the costs and benefits of the predictions of any model given a set of test cases. We illustrate the use of our metrics in the context of a specific type of applications where non-uniform costs are required: the prediction of rare extreme values of a continuous target variable. Our experiments provide clear evidence of the utility of the proposed framework for evaluating the merits of any model in this class of regression domains.

- Short Papers | Pp. 597-604

Multi-label Lazy Associative Classification

Adriano Veloso; Wagner Meira; Marcos Gonçalves; Mohammed Zaki

Most current work on classification has been focused on learning from a set of instances that are associated with a single label (i.e., single-label classification). However, many applications, such as gene functional prediction and text categorization, may allow the instances to be associated with multiple labels simultaneously. Multi-label classification is a generalization of single-label classification, and its generality makes it much more difficult to solve.

Despite its importance, research on multi-label classification is still lacking. Common approaches simply learn independent binary classifiers for each label, and do not exploit dependencies among labels. Also, several small disjuncts may appear due to the possibly large number of label combinations, and neglecting these small disjuncts may degrade classification accuracy. In this paper we propose a multi-label lazy associative classifier, which progressively exploits dependencies among labels. Further, since in our lazy strategy the classification model is induced on an instance-based fashion, the proposed approach can provide a better coverage of small disjuncts. Gains of up to 24% are observed when the proposed approach is compared against the state-of-the-art multi-label classifiers.

- Short Papers | Pp. 605-612

Visual Exploration of Genomic Data

Michail Vlachos; Bahar Taneri; Eamonn Keogh; Philip S. Yu

In this study, we present methods for comparative visualization of DNA sequences in two dimensions. First, we illustrate a transformation of gene sequences into numerical trajectories. The trajectory visually captures the nucleotide content of each sequence, allowing for fast and easy visualization of long DNA sequences. Then, we project the relative placement of the trajectories on the 2D plane using a spanning-tree arrangement method, which allows the efficient comparison of multiple sequences. We demonstrate with various examples the applicability of our technique in evolutionary biology and specifically in capturing and visualizing the molecular phylogeny between species.

- Short Papers | Pp. 613-620

Association Mining in Large Databases: A Re-examination of Its Measures

Tianyi Wu; Yuguo Chen; Jiawei Han

In the literature of data mining and statistics, numerous interestingness measures have been proposed to disclose succinct object relationships of association patterns. However, it is still not clear when a measure is truly effective in large data sets. Recent studies have identified a critical property, (), for measuring event associations in large data sets, but many existing measures do not have this property. We thus re-examine the null-invariant measures and find interestingly that they can be expressed as a generalized mathematical mean, and there exists a total ordering of them. This ordering provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications.

- Short Papers | Pp. 621-628

Semantic Text Classification of Emergent Disease Reports

Yi Zhang; Bing Liu

Traditional text classification studied in the information retrieval and machine learning literature is mainly based on topics. That is, each class represents a particular topic, e.g., sports and politics. However, many real-world problems require more refined classification based on some . For example, in a set of sentences about a disease, some may report outbreaks of the disease, some may describe how to cure the disease, and yet some may discuss how to prevent the disease. To classify sentences at this semantic level, the traditional bag-of-words model is no longer sufficient. In this paper, we study semantic sentence classification of disease reporting. We show that both keywords and sentence semantic features are useful. Our results demonstrated that this integrated approach is highly effective.

- Short Papers | Pp. 629-637