Catálogo de publicaciones - libros
Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWMŽ06 Conference held in Ustrón, Poland, June 19-22, 2006
Mieczysław A. Kłopotek ; Sławomir T. Wierzchoń ; Krzysztof Trojanowski (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-33520-7
ISBN electrónico
978-3-540-33521-4
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer 2006
Tabla de contenidos
Adaptive Document Maps
Krzysztof Ciesielski; Michał Dramiński; Mieczysław A. Kłopotek; Dariusz Czerski; Sławomir T. Wierzchoń
As document map creation algorithms like WebSOM are computationally expensive, and hardly reconstructible even from the same set of documents, new methodology is urgently needed to allow to construct document maps to handle streams of new documents entering document collection. This challenge is dealt with within this paper. In a multi-stage process, incrementality of a document map is warranted. The quality of map generation process has been investigated based on a number of clustering and classification measures. Conclusions concerning the impact of incremental, topic-sensitive approach on map quality are drawn.
IV - Regular Sessions: Web Technologies | Pp. 109-120
Faster Frequent Pattern Mining from the Semantic Web
Joanna Józefowska; Agnieszka Ławrynowicz; Tomasz Łukaszewski
In this paper we propose a method for frequent pattern discovery from the knowledge bases represented in OWL DLP. OWL DLP, known also as Description Logic Programs, is the intersection of the expressivity of OWL DL and Logic Programming. Our method is based on a special form of a trie data structure. A similar structure was used for frequent pattern discovery in classical and relational data mining settings giving significant gain in efficiency. Our approach is illustrated on the example ontology.
IV - Regular Sessions: Web Technologies | Pp. 121-130
Collective Behaviour of Cellular Automata Rules and Symmetric Key Cryptography
Miroslaw Szaban; Franciszek Seredy nski; Pascal Bouvry
Cellular automata (CA) is applied in cryptographic systems. Genetic algorithm (GA) is used to search among predefined set of rules new subsets of rules controlling CA. A high quality pseudorandom numbers sequences (PNSs) are generated by CA applying new subsets of rules. Discovered subset create very efficient cryptographic module used as pseudorandom numbers sequences generator (PNSG). The bad subsets of rules are also discovered and eliminated.
IV - Regular Sessions: Web Technologies | Pp. 131-140
Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach
Jerzy W. Grzymała-Busse; Steven Santoso
In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 143-152
Tableau Method with Free Variables for Intuitionistic Logic
Boris Konev; Alexander Lyaletski
In this paper, we address proof search in tableaux with free variables for intuitionistic logic by introducing the notion of an admissible substitution into a quantifier-free calculus. Admissibility of a substitution is determined by the quanti fier structure of given formulae and by dependencies between variables in the substitution. With this notion of admissibility, we avoid the need for both Skolemisation and checking different possible orders of quantifier rule applications. We demonstrate our approach on a series of examples.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 153-162
A Similarity Measure between Tandem Duplication Trees
Jakub Koperwas; Krzysztof Walczak
This paper opens the gate to understanding the nature of unequal crossing-over process, which is one of the mechanisms that leads to creation of new genes. The Data Mining and Tree Mining approaches are being modified to fit that particular biological problem. The novel notions: the similarity of duplication process and the similarity of a duplication region are proposed, and settled as the fundament of further analysis. The role and applications of the duplication process similarity measure are discussed. The roadmap for further extensive studies together with first interesting results are presented.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 163-172
Finding Optimal Decision Trees
Petr Máša; Tomáš Kočka
This paper presents a new algorithm that finds the generative model of a decision tree from data. We show that for infinite data and finite number of attributes the algorithm always finds the generative model (i.e. the model of the decision tree, from which the data were generated) except measure zero set of distributions. The algorithm returns reasonable results even when the above-mentioned assumptions are not satisfied. The algorithm is polynomial in the number of leaves of the generative model compared to the exponential complexity of the trivial exhaustive search algorithm. Similar result was recently obtained for learning Bayesian networks from data ([1],[2]). Experimental comparison of the new algorithm with the CART standard on both simulated and real data is shown. The new algorithm shows significant improvements over the CART algorithm in both cases. The whole paper is for simplicity restricted to binary variables but can be easily generalized.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 173-181
Attribute Number Reduction Process and Nearest Neighbor Methods in Machine Learning
Aleksander Sokołowski; Anna Gładysz
Several nearest neighbor methods were applied to process of decision making to E522144 and modified bases, which are the collections of cases of melanocytic skin lesions. Modification of the bases consists in reducing the number of base attributes from 14 to 13, 4, 3, 2 and finally 1. The reduction process consists in concatenations of values of particular attributes. The influence of this process on the quality of decision making process is reported in the paper.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 183-187
The Use of Compound Attributes inAQ Learning
Janusz Wojtusiak; Ryszard S. Michalski
Compound attributes are named groups of attributes that have been introduced in Attributional Calculus (AC) to facilitate learning descriptions of objects whose components are characterized by different subsets of attributes. The need for such descriptions appears in many practical applications. A method for handling compound attributes in AQ learning and testing is described and illustrated by examples.
V - Regular Sessions: Foundations of Knowledge Discovery | Pp. 189-198
Residuals for Two-Way Contingency Tables, Especially Those Computed forMultiresponces
Guillermo Bali Ch.; Dariusz Czerski; Mieczysław A. Kłopotek; Andrzej Matuszewski
The problem of testing statistical hypotheses of independence of two multiresponse variables is considered. This is a specific inferential environment to analyze certain patterns, particularly for the questionnaire data. Data analyst normally looks for certain combination of responses being more frequently chosen by responders than the other ones. A formalism that is adequate for the considerations of such a kind is connected with calculation of p-values within the so-called posthoc analysis. Since this analysis is often connected with one cell of an appropriate contingency table only, we consider residual (or deviate) of this cell. As a result of theoretical and experimental study we consider algorithms that can be effective for the problem. We consider the case of 2 × 3 tables. Some aspects are relevant also for classical i.e. uniresponsive contingency tables.
VI - Regular Sessions: Statistical Methods in Knowledge Discovery | Pp. 201-210