Catálogo de publicaciones - libros

Compartir en
redes sociales


Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWMŽ06 Conference held in Ustrón, Poland, June 19-22, 2006

Mieczysław A. Kłopotek ; Sławomir T. Wierzchoń ; Krzysztof Trojanowski (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33520-7

ISBN electrónico

978-3-540-33521-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer 2006

Tabla de contenidos

Adaptive Document Maps

Krzysztof Ciesielski; Michał Dramiński; Mieczysław A. Kłopotek; Dariusz Czerski; Sławomir T. Wierzchoń

As document map creation algorithms like WebSOM are computationally expensive, and hardly reconstructible even from the same set of documents, new methodology is urgently needed to allow to construct document maps to handle streams of new documents entering document collection. This challenge is dealt with within this paper. In a multi-stage process, incrementality of a document map is warranted. The quality of map generation process has been investigated based on a number of clustering and classification measures. Conclusions concerning the impact of incremental, topic-sensitive approach on map quality are drawn.

IV - Regular Sessions: Web Technologies | Pp. 109-120

Faster Frequent Pattern Mining from the Semantic Web

Joanna Józefowska; Agnieszka Ławrynowicz; Tomasz Łukaszewski

In this paper we propose a method for frequent pattern discovery from the knowledge bases represented in OWL DLP. OWL DLP, known also as Description Logic Programs, is the intersection of the expressivity of OWL DL and Logic Programming. Our method is based on a special form of a trie data structure. A similar structure was used for frequent pattern discovery in classical and relational data mining settings giving significant gain in efficiency. Our approach is illustrated on the example ontology.

IV - Regular Sessions: Web Technologies | Pp. 121-130

Collective Behaviour of Cellular Automata Rules and Symmetric Key Cryptography

Miroslaw Szaban; Franciszek Seredy nski; Pascal Bouvry

Cellular automata (CA) is applied in cryptographic systems. Genetic algorithm (GA) is used to search among predefined set of rules new subsets of rules controlling CA. A high quality pseudorandom numbers sequences (PNSs) are generated by CA applying new subsets of rules. Discovered subset create very efficient cryptographic module used as pseudorandom numbers sequences generator (PNSG). The bad subsets of rules are also discovered and eliminated.

IV - Regular Sessions: Web Technologies | Pp. 131-140

Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach

Jerzy W. Grzymała-Busse; Steven Santoso

In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 143-152

Tableau Method with Free Variables for Intuitionistic Logic

Boris Konev; Alexander Lyaletski

In this paper, we address proof search in tableaux with free variables for intuitionistic logic by introducing the notion of an admissible substitution into a quantifier-free calculus. Admissibility of a substitution is determined by the quanti fier structure of given formulae and by dependencies between variables in the substitution. With this notion of admissibility, we avoid the need for both Skolemisation and checking different possible orders of quantifier rule applications. We demonstrate our approach on a series of examples.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 153-162

A Similarity Measure between Tandem Duplication Trees

Jakub Koperwas; Krzysztof Walczak

This paper opens the gate to understanding the nature of unequal crossing-over process, which is one of the mechanisms that leads to creation of new genes. The Data Mining and Tree Mining approaches are being modified to fit that particular biological problem. The novel notions: the similarity of duplication process and the similarity of a duplication region are proposed, and settled as the fundament of further analysis. The role and applications of the duplication process similarity measure are discussed. The roadmap for further extensive studies together with first interesting results are presented.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 163-172

Finding Optimal Decision Trees

Petr Máša; Tomáš Kočka

This paper presents a new algorithm that finds the generative model of a decision tree from data. We show that for infinite data and finite number of attributes the algorithm always finds the generative model (i.e. the model of the decision tree, from which the data were generated) except measure zero set of distributions. The algorithm returns reasonable results even when the above-mentioned assumptions are not satisfied. The algorithm is polynomial in the number of leaves of the generative model compared to the exponential complexity of the trivial exhaustive search algorithm. Similar result was recently obtained for learning Bayesian networks from data ([1],[2]). Experimental comparison of the new algorithm with the CART standard on both simulated and real data is shown. The new algorithm shows significant improvements over the CART algorithm in both cases. The whole paper is for simplicity restricted to binary variables but can be easily generalized.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 173-181

Attribute Number Reduction Process and Nearest Neighbor Methods in Machine Learning

Aleksander Sokołowski; Anna Gładysz

Several nearest neighbor methods were applied to process of decision making to E522144 and modified bases, which are the collections of cases of melanocytic skin lesions. Modification of the bases consists in reducing the number of base attributes from 14 to 13, 4, 3, 2 and finally 1. The reduction process consists in concatenations of values of particular attributes. The influence of this process on the quality of decision making process is reported in the paper.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 183-187

The Use of Compound Attributes inAQ Learning

Janusz Wojtusiak; Ryszard S. Michalski

Compound attributes are named groups of attributes that have been introduced in Attributional Calculus (AC) to facilitate learning descriptions of objects whose components are characterized by different subsets of attributes. The need for such descriptions appears in many practical applications. A method for handling compound attributes in AQ learning and testing is described and illustrated by examples.

V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 189-198

Residuals for Two-Way Contingency Tables, Especially Those Computed forMultiresponces

Guillermo Bali Ch.; Dariusz Czerski; Mieczysław A. Kłopotek; Andrzej Matuszewski

The problem of testing statistical hypotheses of independence of two multiresponse variables is considered. This is a specific inferential environment to analyze certain patterns, particularly for the questionnaire data. Data analyst normally looks for certain combination of responses being more frequently chosen by responders than the other ones. A formalism that is adequate for the considerations of such a kind is connected with calculation of p-values within the so-called posthoc analysis. Since this analysis is often connected with one cell of an appropriate contingency table only, we consider residual (or deviate) of this cell. As a result of theoretical and experimental study we consider algorithms that can be effective for the problem. We consider the case of 2 × 3 tables. Some aspects are relevant also for classical i.e. uniresponsive contingency tables.

VI - Regular Sessions: Statistical Methods in Knowledge Discovery | Pp. 201-210