Catálogo de publicaciones - libros

Compartir en
redes sociales


Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM' 05 Conference held in Gdansk, Poland, June 13-16, 2005

Mieczysław A. Kłopotek ; Sławomir T. Wierzchoń ; Krzysztof Trojanowski (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-25056-2

ISBN electrónico

978-3-540-32392-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Feature Extraction by the SCoTLASS: An Illustrative Example

Anna Bartkowiak; Nickolay T. Trendafilov

Derivation of new features of observed variables has two important goals: reduction of dimensionality and de-noising. A desired property of the derived new features is their meaningful interpretation. The SCoTLASS method (Jolliffe, Trendafilov and Uddin, 2003) offers such possibility.

We explore the properties of the SCoTLASS method applied to the yeast genes data investigated in (Bartkowiak et al., 2003, 2004). All the derived features have really a simple meaningful structure: each new feature is spanned by two original variables belonging to the same block.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 3-11

Rule Induction for Click-Stream Analysis: Set Covering and Compositional Approach

Petr Berka; Vladimír Laš; Tomáš Kočka

We present a set covering algorithm and a compositional algorithm to describe sequences of www pages visits in click-stream data. The set covering algorithm utilizes the approach of rule specialization like the well known CN2 algorithm, the compositional algorithm is based on our original KEX algorithm, however both algorithms deal with sequences of events (visited pages) instead of sets of attributevalue pairs. The learned rules can be used to predict next page to be viewed by a user or to describe the most typical paths of www pages visitors and the dependencies among the www pages. We have successfully used both algorithms on real data from an internet shop and we mined useful information from the data.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 13-22

Family of Instance Reduction Algorithms Versus Other Approaches

Ireneusz Czarnowski; Piotr Jędrzejowicz

The goal of the paper is to compare the performance of instance reduction algorithms (IRA) with other approaches. The paper briefly describes a family of instance reduction algorithms proposed by the authors. To evaluate their performance the computational experiment is carried out. The experiment involves comparing a performance of IRA with several other approaches using alternative machine learning classification tools.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 23-30

Minimum Spanning Trees Displaying Semantic Similarity

Włodzisław Duch; Paweł Matykiewicz

Similarity of semantic content of web pages is displayed using interactive graphs presenting fragments of minimum spanning trees. Homepages of people are analyzed, parsed into XML documents and visualized using TouchGraph LinkBrowser, displaying clusters of people that share common interest. The structure of these graphs is strongly affected by selection of information used to calculate similarity. Influence of simple selection and Latent Semantic Analysis (LSA) on structures of such graphs is analyzed. Homepages and lists of publications are converted to a word frequency vector, filtered, weighted and similarity matrix between normalized vectors is used to create separate minimum sub-trees showing clustering of people’s interest. Results show that in this application simple selection of important keywords is as good as LSA but with much lower algorithmic complexity.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 31-40

Concept Description Vectors and the 20 Question Game

Włodzisław Duch; Julian Szymański; Tomasz Sarnatowicz

Knowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured text sources, are described. Information collected in this way is used to formulate questions that have high discriminating power in the twenty questions game.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 41-50

Automatic Scripts Retrieval and Its Possibilities for Social Sciences Support Applications

Yali Ge; Rafał Rzepka; Kenji Araki

This paper introduces our method for automatic Schankian-like scripts retrieval from the Internet resources and its preliminary results which might be interesting for Social Sciences researchers. We describe the first module of our system, which is supposed to automatically retrieve commonsensical knowledge from the Web resources by using web-mining techniques. It retrieves minimal “object — action — action” scripts which show humans’ common activities changing due the origin of a webpage author. Such data can be used in fields of economics, psycholinguistics, sociolinguistics, psychology, sociology or in language education. By this paper we would like to make NLP researchers notice the potential of commonsense retrieval and encourage them to consider creating such tools for their languages.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 51-58

The Analysis of the Unlabeled Samples of the Iron Age Glass Data

Karol Grudziński; Maciej Karwowski

The late iron age glass database consists of a significant proportion of the samples, classification of which is unknown. The data-mining methods such as the rule induction, the clusterization and the visualization are used in this paper to classify these samples to the one of the three main chronological periods (La Tene C1, La Tene C2, La Tene D1) of the glass artifacts. The results of the experiments performed with the C4.5 and the Ridor algorithms followed by the analysis conducted by domain experts indicate, that the unlabeled samples constitute a mixture of all classes in which LT C2 and LT D1 are in majority.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 59-66

Discriminant versus Strong Rule Sets

Jerzy W. Grzymala-Busse; Witold J. Grzymala-Busse; Jay Hamilton

The main objective of our research was to compare two completely different approaches to rule induction. In the first approach, represented by the LEM2 rule induction algorithm, induced rules are discriminant, i.e., every concept is completely described and rules are consistent. In the second approach, represented by the IRIM rule induction algorithm, a few strong and simple rules are induced. These rules do not necessarily completely describe concepts and, in general, are inconsistent. Though LEM2 frequently outperforms IRIM, the difference in performance is, statistically, insignificant. Thus IRIM, inducing a few strong but simple rules is a new and interesting addition to the LERS data mining system.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 67-76

IDARM — Mining of Indirect Association Rules

Przemysław Kazienko

Typical association rules, called in the paper “direct”, reflect relationships existing between items that relatively often co-occur in common transactions. In the web domain items correspond to pages and transactions to user sessions. The main idea of new approach is to discover indirect associations existing between pages that rarely occur together but there are other, “third” pages, called transitive, with which they appear relatively frequently. Two types of indirect associations rules are described in the paper: partial indirect associations and complete ones. The former respect a single transitive page, while the latter cover all existing transitive pages. The presented IDARM algorithm extracts complete indirect association rules with their important measure — confidence, using pre-calculated direct rules.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 77-86

Building a Concept Hierarchy from a Distance Matrix

Huang-Cheng Kuo; Jen-Peng Huang

Concept hierarchies are important in many generalized data mining applications, such as multiple level association rule mining. In literature, concept hierarchy is usually given by domain experts. In this paper, we propose algorithms to automatically build a concept hierarchy from a provided distance matrix. Our approach is modifying the traditional hierarchical clustering algorithms. For the purpose of algorithm evaluation, a distance matrix is derived from the concept hierarchy built by our algorithm. Root mean squared error between the provided distant matrix and the derived distance matrix is used as evaluation criterion. We compare the traditional hierarchical clustering and our modified algorithm under three strategies of computing cluster distance, namely single link, average link, and complete link. Empirical results show that the traditional algorithm under complete link strategy performs better than the other strategies. Our modified algorithms perform almost the same under the three strategies; and our algorithms perform better than the traditional algorithms under various situations.

Part I - Regular Sessions: Knowledge Discovery and Exploration | Pp. 87-95