Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Data Mining: Theoretical Aspects and Applications: 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 14-18, 2007. Proceedings

Petra Perner (eds.)

En conferencia: 7º Industrial Conference on Data Mining (ICDM) . Leipzig, Germany . July 14, 2007 - July 18, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Pattern Recognition; Image Processing and Computer Vision; Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73434-5

ISBN electrónico

978-3-540-73435-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Predicting Page Occurrence in a Click-Stream Data: Statistical and Rule-Based Approach

Petr Berka; Martin Labský

We present an analysis of the click-stream data with the aim to predict the next page that will be visited by an user based on a history of visited pages. We present one statistical method (based on Markov models) and two rule induction methods (first based on well known set covering approach, the other base on our compositional algorithm KEX). We compare the achieved results and discuss interesting patterns that appear in the data.

- Web Mining | Pp. 135-147

Improved IR in Cohesion Model for Link Detection System

K. Lakshmi; Saswati Mukherjee

Given two stories, Story Link Detection System identifies whether they are discussing the same event. Standard approach in link detection system is to use cosine similarity measure to find whether the two documents are linked. Many researchers applied query expansion technique successfully in link detection system, where models are built from the relevant documents retrieved from the collection using query expansion. In this approach, success depends on the quality of the information retrieval system. In the current research, we propose a new information retrieval system for query expansion that uses intra-cluster similarity of the retrieved documents in addition to the similarity with respect to the query document. Our technique enhances the quality of the retrieval system thus improving the performance of the Link Detection System. Combining this improved IR with our Cohesion Model provides excellent result in link detection. Experimental results confirm the effect of the improved retrieval system in query expansion technique.

- Web Mining | Pp. 148-162

Improving a State-of-the-Art Named Entity Recognition System Using the World Wide Web

Richárd Farkas; György Szarvas; Róbert Ormándi

The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER task, and proved to be particularly effective in the speech-to-text scenario.

- Web Mining | Pp. 163-172

ISOR-2: A Case-Based Reasoning System to Explain Exceptional Dialysis Patients

Olga Vorobieva; Alexander Rumyantsev; Rainer Schmidt

In medicine many exceptions occur. In medical practice and in knowledge-based systems too, it is necessary to consider them and to deal with them appropriately. In medical studies and in research, exceptions shall be explained. We present a system that helps to explain cases that do not fit into a theoretical hypothesis. Our starting points are situations where neither a well-developed theory nor reliable knowledge nor a priori a proper case base is available. So, instead of reliable theoretical knowledge and intelligent experience, we have just some theoretical hypothesis and a set of measurements.

In this paper, we propose to combine CBR with a statistical model. We use CBR to explain those cases that do not fit the model. The case base has to be set up incrementally, it contains the exceptional cases, and their explanations are the solutions, which can be used to help to explain further exceptional cases.

- Data Mining in Medicine | Pp. 173-183

The Role of Prototypical Cases in Biomedical Case-Based Reasoning

Isabelle Bichindaritz

Representing biomedical knowledge is an essential task in biomedical informatics intelligent systems. Case-based reasoning (CBR) holds the promise of representing contextual knowledge in a way that was not possible before with traditional knowledge representation and knowledge-based methods. A main issue in biomedical CBR has been dealing with maintenance of the case base, and particularly in medical domains, with the rate of generation of new knowledge, which often makes the content of a case base partially obsolete. This article proposes to make use of the concept of prototypical case to ensure that a CBR system would keep up-to-date with current research advances in the biomedical field. It proposes to illustrate and discuss the different roles that prototypical cases can serve in biomedical CBR systems, among which to organize and structure the memory, to guide the retrieval as well as the reuse of cases, and to serve as bootstrapping a CBR system memory when real cases are not available in sufficient quantity and/or quality. This paper presents knowledge maintenance as another role that these prototypical cases can play in biomedical CBR systems.

- Data Mining in Medicine | Pp. 184-198

A Search Space Reduction Methodology for Large Databases: A Case Study

Angel Kuri-Morales; Fátima Rodríguez

Given the present need for Customer Relationship and the increased growth of the size of databases, many new approaches to large database clustering and processing have been attempted. In this work we propose a methodology based on the idea that statistically proven search space reduction is possible in practice. Two clustering models are generated: one corresponding to the full data set and another pertaining to the sampled data set. The resulting empirical distributions were mathematically tested to verify a tight non-linear significant approximation.

- Applications of Data Mining | Pp. 199-213

Combining Traditional and Neural-Based Techniques for Ink Feed Control in a Newspaper Printing Press

Cristofer Englund; Antanas Verikas

To achieve robust ink feed control an integrating controller and a multiple models-based controller are combined. Experimentally we have shown that the multiple models-based controller operating in the training region is superior to the integrating controller. However, for data originating from outside the multiple models training region, the integrating controller has the advantage. It is, therefore, suggested to combine the two techniques in order to improve robustness of the control system.

- Applications of Data Mining | Pp. 214-227

Active Learning Strategies: A Case Study for Detection of Emotions in Speech

Alexis Bondu; Vincent Lemaire; Barbara Poulain

Machine learning indicates methods and algorithms which allow a model to learn a behavior thanks to examples. Active learning gathers methods which select examples used to build a training set for the predictive model. All the strategies aim to use the less examples as possible and to select the most informative examples. After having formalized the active learning problem and after having located it in the literature, this article synthesizes in the first part the main approaches of active learning. Taking into account emotions in Human-machine interactions can be helpful for intelligent systems designing. The main difficulty, for the conception of calls center’s automatic shunting system, is the cost of data labeling. The last section of this paper propose to reduce this cost thanks to two active learning strategies. The study is based on real data resulting from the use of a vocal stock exchange server.

- Applications of Data Mining | Pp. 228-241

Neural Business Control System

M. Lourdes Borrajo; Juan M. Corchado; E. S. Corchado; M. A. Pellicer

The firms have need of a control mechanism in order to analyse whether they are achieving their goals. A tool that automates the business control process has been developed based on a case-based reasoning system. The objective of the system is to facilitate the process of internal auditing. The system analyses the data that characterises each one of the activities carried out by the firm, then determines the state of each activity and calculates the associated risk. This system uses a different problem solving method in each of the steps of the reasoning cycle. A Maximum Likelihood Hebbian Learning-based method that automates the organization of cases and the retrieval stage of case-based reasoning systems is presented in this paper. The proposed methodology has been derived as an extension of the Principal Component Analysis, and groups similar cases, identifying clusters automatically in a data set in an unsupervised mode. The system has been tested in 10 small and medium companies in the textile sector, located in the northwest of Spain and the results obtained have been very encouraging.

- Applications of Data Mining | Pp. 242-254

A Framework for Discovering and Analyzing Changing Customer Segments

Mirko Böttcher; Martin Spott; Detlef Nauck

Identifying customer segments and tracking their change over time is an important application for enterprises who need to understand what their customers expect from them. Customer segmentation is typically done by applying some form of cluster analysis. In this paper we present an alternative approach based on associaton rule mining and a notion of interestingness. Our approach allows us to detect arbitrary segments and analyse their temporal development. Our approach is assumption-free and pro-active and can be run continuously. Newly discovered segments or relevant changes will be reported automatically based on the application of an interestingness measure.

- Applications of Data Mining | Pp. 255-268