Catálogo de publicaciones - libros
Foundations and Novel Approaches in Data Mining
Tsau Young Lin ; Setsuo Ohsuga ; Churn-Jung Liau ; Xiaohua Hu (eds.)
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Appl.Mathematics/Computational Methods of Engineering; Artificial Intelligence (incl. Robotics)
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-28315-7
ISBN electrónico
978-3-540-31229-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin/Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11539827_11
Rough Set Strategies to Data with Missing Attribute Values
Jerzy W. Grzymala-Busse
In this paper we assume that a data set is presented in the form of the incompletely specified decision table, i.e., some attribute values are missing. Our next basic assumption is that some of the missing attribute values are lost (e.g., erased) and some are ”do not care„ conditions (i.e., they were redundant or not necessary to make a decision or to classify a case). Incompletely specified decision tables are described by characteristic relations, which for completely specified decision tables are reduced to the indiscernibility relation. It is shown how to compute characteristic relations using an idea of block of attribute-value pairs, used in some rule induction algorithms, such as LEM2. Moreover, the set of all characteristic relations for a class of congruent incompletely specified decision tables, defined in the paper, is a lattice. Three definitions of lower and upper approximations are introduced. Finally, it is shown that the presented approach to missing attribute values may be used for other kind of missing attribute values than lost values and ”do not care„ conditions.
Pp. 197-212
doi: 10.1007/11539827_12
Privacy-Preserving Collaborative Data Mining
Justin Zhan; LiWu Chang; Stan Matwin
Privacy-preserving data mining is an important issue in the areas of data mining and security. In this paper, we study how to conduct association rule mining, one of the core data mining techniques, on private data in the following scenario: Multiple parties, each having a private data set, want to jointly conduct association rule mining without disclosing their private data to other parties. Because of the interactive nature among parties, developing a secure framework to achieve such a computation is both challenging and desirable. In this paper, we present a secure framework for multiple parties to conduct privacy-preserving association rule mining.
Pp. 213-227
doi: 10.1007/11539827_13
Impact of Purity Measures on Knowledge Extraction in Decision Trees
Mitja Lenič; Petra Povalej; Peter Kokol
Symbolic knowledge representation is crucial for successful knowledge extraction and consequently for successful data mining. Therefore decision trees and association rules are most commonly used symbolic knowledge representations. Often some sorts of purity measures are used to identify relevant knowledge in data. Selection of appropriate purity measure can have important impacton quality of extracted knowledge. In this paper a novel approach for combining purity measures and thereby altering background knowledge of extraction method is presented. An extensive case study on 42 UCI databases using heuristic decision tree induction as knowledge extraction method is also presented.
Pp. 229-242
doi: 10.1007/11539827_14
Multidimensional On-line Mining
Ching-Yao Wang; Tzung-Pei Hong; Shian-Shyong Tseng
In the past, incremental mining approaches usually considered getting the newest set of knowledge consistent with the entire set of data inserted so far. Users can not, however, use them to obtain rules or patterns only from their interesting portion of the data. In addition, these approaches only focused on finding frequent patterns in a specified part of a database. That is, although the data records are collected in under certain time, place and category, such contexts (circumstances) have been ignored in conventional mining algorithms. It will cause the lack of patterns or rules to help users solve problems at different aspects and with diverse considerations. In this paper, we thus attempt to extend incremental mining to online decision support under multidimensional context considerations. We first propose the to structurally and systematically retain the additional context information and mining information for each inserted dataset into a database. We then develop an algorithm based on the proposed multidimensional pattern relation to correctly and effciently fulfill diverse on-line mining requests.
Pp. 243-257
doi: 10.1007/11539827_15
Quotient Space Based Cluster Analysis1
Ling Zhang Bo Zhang; Bo Zhang
In the paper, the clustering is investigated under the concept of granular computing, i.e.,the framework of quotient space theory. In principle, there are mainly two kinds of similarity measurement used in cluster analysis: one for measuring the similarity among objects (data, points); the other for measuring the similarity between objects and clusters (sets of objects). Therefore, there are mainly two categories of clustering corresponding to the two measurements. Furthermore, the fuzzy clustering is gained when the fuzzy similarity measurement is used. From the granular computing point of view, all these categories of clustering can be represented by a hierarchical structure in quotient spaces. From the hierarchical structures, several new characteristics ofclustering can be obtained. It may provide a new way for further investigating clustering.
Pp. 259-269
doi: 10.1007/11539827_16
Research Issues in Web Structural Delta Mining
Qiankun Zhao; Sourav S. Bhowmick; Sanjay Madria
Web structure mining has been a well-researched area during recent years. Based on the observation that data on the web may change at any time in any way, some incremental data mining algorithms have been proposed to update the mining results with the corresponding changes. However, noneof the existing web structure mining techniques is able to extract useful and hidden knowledge from the While the knowledge from snapshot is important and interesting, the knowledge behind the corresponding changes may be more critical and informative in some applications. In this paper, we propose a novel research area of web structure miningcalled The distinct feature of our research is that our mining objectis the sequence of historical changes of web structure (also called ). For web structural delta mining, we aim to extract useful, interesting, and novel web structures and knowledge considering their historical, dynamic, and temporal properties.We propose three major issues of web structural delta mining, . Moreover, we present a list of potential applications where the web structural delta mining results can be used.
Pp. 272-289
doi: 10.1007/11539827_17
Workflow Reduction for Reachable-path Rediscovery in Workflow Mining
Kwang-Hoon Kim; Clarence A. Ellis
This paper newly defines a workflow reduction mechanism that formally and automatically reduces an original workflow process to a minimal set of activities, which is called minimal-workflowmodel in this paper. It also describes about the implications of the minimal-workflow model on workflow mining that is a newly emerging research issue for rediscovering and reengineering workflow models from workflow logs containing workflow enactment and audit information gathered being executed on workflow engine. In principle, the minimal-workflow model is reduced from the original workflow processby analyzing dependencies among its activities. Its main purpose is to minimize discrepancies between the modeled workflow process and the enacted workflow process as it is actually being executed. That is, we can get a complete set of activity firing sequences (all reachable-pathsfrom the start to the end activity on a workflow process) on buildtime. Besides, we can discover from workflow logsthat which path out of all reachable paths a workcase (instance of workflow process) has actually followed through on runtime. These are very important information gain acquiring the runtime statistical significance and knowledge for redesigning and reengineering the workflow process. The minimal-workflow model presented in this paper is used to be a decision tree induction technique for mining and discovering a reachable-path of workcase from workflow logs. In a consequence, workflow miningmethodologies and systems are rapidly growing and coping with a wide diversity of domains in terms of their applications and working environments. So, the literature needs various, advanced, and specialized workflow mining techniques and architectures that are used for finally feed-backing their analysis results to the redesign and reengineering phase of the existingworkflow and business
Pp. 289-310
doi: 10.1007/11539827_18
Principal Component-based Anomaly Detection Scheme
Mei-Ling Shyu; Shu-Ching Chen; Kanoksri Sarinnapakorn; LiWu Chang
Pp. 311-329
doi: 10.1007/11539827_19
Making Better Sense of the Demographic Data Value in the Data Mining Procedure
Katherine M. Shelfer; Xiaohua Hu
Data mining of personal demographic data is being used as a weapon in the War on Terrorism, but we are forced to acknowledge that it is a weapon loaded with interpretations derived from the use of dirty data in inherently biased systems that mechanize and de-humanize individuals. While the unit of measure is the individual in a local context, the global decision context requires thatwe understand geolocal reflexive selves who have psychological and social/societal relationship patterns thatcan differ markedly and change over time and in response to pivotal events. Local demographic data collectionprocesses fail to take these realities into account at the data collection stage. As a result, existing data values rarely represent an individual's multi-dimensional existence in a form that can be mined. An abductive approach to data mining can be usedto improve the data . Working from the “decision-in,„ we can identify and address challenges associated with demographic data collection and suggest ways to improve the quality of the data available for the data mining procedure. It is important to note that exchanging old values for new values is rarely a 1:1 substitution where qualitative data is involved. Different constituent userpopulations may require different levels of data complexity and they will need to improve their understanding of the data values reported at the local level if they are to effectively relate various local demographic databases in new and different global contexts.
Pp. 331-362
doi: 10.1007/11539827_20
An Effective Approach for Mining Time-Series Gene Expression Profile
Vincent S. M. Tseng; Yen-Lo Chen
Time-series data analysis is an important problem in data mining fields due to the wide applications. Although some time-series analysis methods have been developed in recent years, they can not effectively resolve the fundamental problems in time-series gene expression mining in terms of scale transformation, offset transformation, time delay and noises. In this paper, we propose an effective approach for mining time-series data and apply it on time-series gene expression profile analysis. The proposed method utilizes dynamic programming technique and correlation coefficient measure to find the best alignment between the time-series expressions under the allowed number of noises. Through experimental evaluation, our method was shown to effectively resolve the four problems described above simultaneously. Hence, it can find the correct similarity and imply biological relationships between gene expressions.
Pp. 363-376