Catálogo de publicaciones - libros
Advances in Data Mining: Theoretical Aspects and Applications: 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 14-18, 2007. Proceedings
Petra Perner (eds.)
En conferencia: 7º Industrial Conference on Data Mining (ICDM) . Leipzig, Germany . July 14, 2007 - July 18, 2007
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Database Management; Pattern Recognition; Image Processing and Computer Vision; Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); Artificial Intelligence (incl. Robotics)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2007 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-73434-5
ISBN electrónico
978-3-540-73435-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2007
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2007
Tabla de contenidos
Predicting Page Occurrence in a Click-Stream Data: Statistical and Rule-Based Approach
Petr Berka; Martin Labský
We present an analysis of the click-stream data with the aim to predict the next page that will be visited by an user based on a history of visited pages. We present one statistical method (based on Markov models) and two rule induction methods (first based on well known set covering approach, the other base on our compositional algorithm KEX). We compare the achieved results and discuss interesting patterns that appear in the data.
- Web Mining | Pp. 135-147
Improved IR in Cohesion Model for Link Detection System
K. Lakshmi; Saswati Mukherjee
Given two stories, Story Link Detection System identifies whether they are discussing the same event. Standard approach in link detection system is to use cosine similarity measure to find whether the two documents are linked. Many researchers applied query expansion technique successfully in link detection system, where models are built from the relevant documents retrieved from the collection using query expansion. In this approach, success depends on the quality of the information retrieval system. In the current research, we propose a new information retrieval system for query expansion that uses intra-cluster similarity of the retrieved documents in addition to the similarity with respect to the query document. Our technique enhances the quality of the retrieval system thus improving the performance of the Link Detection System. Combining this improved IR with our Cohesion Model provides excellent result in link detection. Experimental results confirm the effect of the improved retrieval system in query expansion technique.
- Web Mining | Pp. 148-162
Improving a State-of-the-Art Named Entity Recognition System Using the World Wide Web
Richárd Farkas; György Szarvas; Róbert Ormándi
The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER task, and proved to be particularly effective in the speech-to-text scenario.
- Web Mining | Pp. 163-172
ISOR-2: A Case-Based Reasoning System to Explain Exceptional Dialysis Patients
Olga Vorobieva; Alexander Rumyantsev; Rainer Schmidt
In medicine many exceptions occur. In medical practice and in knowledge-based systems too, it is necessary to consider them and to deal with them appropriately. In medical studies and in research, exceptions shall be explained. We present a system that helps to explain cases that do not fit into a theoretical hypothesis. Our starting points are situations where neither a well-developed theory nor reliable knowledge nor a priori a proper case base is available. So, instead of reliable theoretical knowledge and intelligent experience, we have just some theoretical hypothesis and a set of measurements.
In this paper, we propose to combine CBR with a statistical model. We use CBR to explain those cases that do not fit the model. The case base has to be set up incrementally, it contains the exceptional cases, and their explanations are the solutions, which can be used to help to explain further exceptional cases.
- Data Mining in Medicine | Pp. 173-183
The Role of Prototypical Cases in Biomedical Case-Based Reasoning
Isabelle Bichindaritz
Representing biomedical knowledge is an essential task in biomedical informatics intelligent systems. Case-based reasoning (CBR) holds the promise of representing contextual knowledge in a way that was not possible before with traditional knowledge representation and knowledge-based methods. A main issue in biomedical CBR has been dealing with maintenance of the case base, and particularly in medical domains, with the rate of generation of new knowledge, which often makes the content of a case base partially obsolete. This article proposes to make use of the concept of prototypical case to ensure that a CBR system would keep up-to-date with current research advances in the biomedical field. It proposes to illustrate and discuss the different roles that prototypical cases can serve in biomedical CBR systems, among which to organize and structure the memory, to guide the retrieval as well as the reuse of cases, and to serve as bootstrapping a CBR system memory when real cases are not available in sufficient quantity and/or quality. This paper presents knowledge maintenance as another role that these prototypical cases can play in biomedical CBR systems.
- Data Mining in Medicine | Pp. 184-198
A Search Space Reduction Methodology for Large Databases: A Case Study
Angel Kuri-Morales; Fátima Rodríguez
Given the present need for Customer Relationship and the increased growth of the size of databases, many new approaches to large database clustering and processing have been attempted. In this work we propose a methodology based on the idea that statistically proven search space reduction is possible in practice. Two clustering models are generated: one corresponding to the full data set and another pertaining to the sampled data set. The resulting empirical distributions were mathematically tested to verify a tight non-linear significant approximation.
- Applications of Data Mining | Pp. 199-213
Combining Traditional and Neural-Based Techniques for Ink Feed Control in a Newspaper Printing Press
Cristofer Englund; Antanas Verikas
To achieve robust ink feed control an integrating controller and a multiple models-based controller are combined. Experimentally we have shown that the multiple models-based controller operating in the training region is superior to the integrating controller. However, for data originating from outside the multiple models training region, the integrating controller has the advantage. It is, therefore, suggested to combine the two techniques in order to improve robustness of the control system.
- Applications of Data Mining | Pp. 214-227
Active Learning Strategies: A Case Study for Detection of Emotions in Speech
Alexis Bondu; Vincent Lemaire; Barbara Poulain
Machine learning indicates methods and algorithms which allow a model to learn a behavior thanks to examples. Active learning gathers methods which select examples used to build a training set for the predictive model. All the strategies aim to use the less examples as possible and to select the most informative examples. After having formalized the active learning problem and after having located it in the literature, this article synthesizes in the first part the main approaches of active learning. Taking into account emotions in Human-machine interactions can be helpful for intelligent systems designing. The main difficulty, for the conception of calls center’s automatic shunting system, is the cost of data labeling. The last section of this paper propose to reduce this cost thanks to two active learning strategies. The study is based on real data resulting from the use of a vocal stock exchange server.
- Applications of Data Mining | Pp. 228-241
Neural Business Control System
M. Lourdes Borrajo; Juan M. Corchado; E. S. Corchado; M. A. Pellicer
The firms have need of a control mechanism in order to analyse whether they are achieving their goals. A tool that automates the business control process has been developed based on a case-based reasoning system. The objective of the system is to facilitate the process of internal auditing. The system analyses the data that characterises each one of the activities carried out by the firm, then determines the state of each activity and calculates the associated risk. This system uses a different problem solving method in each of the steps of the reasoning cycle. A Maximum Likelihood Hebbian Learning-based method that automates the organization of cases and the retrieval stage of case-based reasoning systems is presented in this paper. The proposed methodology has been derived as an extension of the Principal Component Analysis, and groups similar cases, identifying clusters automatically in a data set in an unsupervised mode. The system has been tested in 10 small and medium companies in the textile sector, located in the northwest of Spain and the results obtained have been very encouraging.
- Applications of Data Mining | Pp. 242-254
A Framework for Discovering and Analyzing Changing Customer Segments
Mirko Böttcher; Martin Spott; Detlef Nauck
Identifying customer segments and tracking their change over time is an important application for enterprises who need to understand what their customers expect from them. Customer segmentation is typically done by applying some form of cluster analysis. In this paper we present an alternative approach based on associaton rule mining and a notion of interestingness. Our approach allows us to detect arbitrary segments and analyse their temporal development. Our approach is assumption-free and pro-active and can be run continuously. Newly discovered segments or relevant changes will be reported automatically based on the application of an interestingness measure.
- Applications of Data Mining | Pp. 255-268