Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Quality-Aware Association Rule Mining

Laure Berti-Équille

The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called “interesting” rule noted LHS -> RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.

- Association Rule Mining | Pp. 440-449

IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding

Henry Tan; Tharam S. Dillon; Fedja Hadzic; Elizabeth Chang; Ling Feng

Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel representation, enumeration, and introducing the constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.

- Association Rule Mining | Pp. 450-461

Maintaining Frequent Itemsets over High-Speed Data Streams

James Cheng; Yiping Ke; Wilfred Ng

We propose a false-negative approach to approximate the set of () over a sliding window. Existing approximate algorithms use an error parameter, , to control the accuracy of the mining result. However, the use of leads to a dilemma. A smaller gives a more accurate mining result but higher computational complexity, while increasing degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.

- Association Rule Mining | Pp. 462-467

Generalized Disjunction-Free Representation of Frequents Patterns with at Most Negations

Marzena Kryszkiewicz

The discovery of frequent patterns and their representations has attracted a lot of attention in the data mining community. An extensive research has been carried out mainly in discovering positive patterns. Recently, the generalized disjunction–free representation GDFLR of all frequent patterns both with and without negation has been proposed. There are cases, however, when a user is interested in patterns with a restricted number of negated items. In this paper, we offer the -GDFLR representation as an adaptation of GDFLR, which represents all frequent patterns with at most negated items. Algorithms discovering this representation are discussed as well. The experimental results show that -GDFLR is more concise than GDFLR.

- Association Rule Mining | Pp. 468-472

Mining Interesting Imperfectly Sporadic Rules

Yun Sing Koh; Nathan Rountree; Richard O’Keefe

Detecting association rules with low support but high confidence is a difficult data mining problem. To find such rules using approaches like the Apriori algorithm, support must be set very low, which results in a large amount of redundant rules. We are interested in rules; i.e. those that fall below a support level but above the level of support expected from random coincidence. In this paper we introduce an algorithm called MIISR to find a particular type of sporadic rule efficiently: where the support of the antecedent as a whole falls below maximum support, but where items may have quite high support individually. Our proposed method uses item constraints and coincidence pruning to discover these rules in reasonable time.

- Association Rule Mining | Pp. 473-482

Improved Negative-Border Online Mining Approaches

Ching-Yao Wang; Shian-Shyong Tseng; Tzung-Pei Hong

In the past, we proposed an (EMPR) to structurally and systematically store previously mining information for each inserted block of data, and designed a (NOM) approach to provide ad-hoc, query-driven and online mining supports. In this paper, we try to use appropriate data structures and design efficient algorithms to improve the performance of the NOM approach. The data structure is utilized to organize and maintain all candidate itemsets such that the candidate itemsets with the same proper subsets can be considered at the same time. The derived NOM (LNOM) approach will require only one scan of the itemsets stored in EMPR, thus saving much computation time. In addition, a hashing technique is used to further improve the performance of the NOM approach since many itemsets stored in EMPR may be useless for calculating the counts of candidates. At last, experimental results show the effect of the improved NOM approaches.

- Association Rule Mining | Pp. 483-492

Association-Based Dissimilarity Measures for Categorical Data: Limitation and Improvement

Si Quang Le; Tu Bao Ho; Le Sy Vinh

Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. This paper presents a dissimilarity measure for categorical data based on the relations among attributes. This measure not only has the advantage of value variance but also overcomes the limitations of condition the probability-based measure when applied to databases whose attributes are independent. Experiments with 30 databases also showed that the proposed measure boosted the accuracy of Nearest Neighbor classification in comparison with other tested measures.

- Association Rule Mining | Pp. 493-498

Is Frequency Enough for Decision Makers to Make Decisions?

Shichao Zhang; Jeffrey Xu Yu; Jingli Lu; Chengqi Zhang

There are many advanced techniques that can efficiently mine frequent itemsets using a minimum support. However, the question that remains unanswered is whether the minimum support can really help decision makers to make decisions. In this paper, we study four summary queries for frequent itemsets mining, namely, 1) finding a support-average of itemsets, 2) finding a support-quantile of itemsets, 3) finding the number of itemsets that greater/less than the support-average, i.e., an approximated distribution of itemsets, and 4) finding the relative frequency of an itemset. With these queries, a decision maker will know whether an itemset in question is greater/less than the support-quantile; the distribution of itemsets; and the frequentness of an itemset. Processing these summary queries is challenging, because the minimum-support constraint cannot be used to prune infrequent itemsets.

- Association Rule Mining | Pp. 499-503

: High Performance Frequent Itemset Mining with Efficient Bit-Vector Projection Technique

Shariq Bashir; Abdul Rauf Baig

Mining frequent itemset using bit-vector representation approach is very efficient for small dense datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. We also present a new frequent itemset mining algorithm (eal lgorithm for ining atterns) using bit-vector representation approach and our bit-vector projection technique. The performance of the is compared with the current best frequent itemset mining algorithms. Different experimental results on sparse datasets show that mining frequent itemset using is faster than the current best algorithms.

- Association Rule Mining | Pp. 504-508

Evaluating a Rule Evaluation Support Method Based on Objective Rule Evaluation Indices

Hidenao Abe; Shusaku Tsumoto; Miho Ohsaki; Takahira Yamaguchi

In this paper, we present an evaluation of novel rule evaluation support method for post-processing of mined results with rule evaluation models based on objective indices. Post-processing of mined results is one of the key issues in a data mining process. However, it is difficult for human experts to evaluate many thousands of rules from a large dataset with noises completely. To reduce the costs of rule evaluation task, we have developed the rule evaluation support method with rule evaluation models, which are obtained with objective indices of mined classification rules and evaluations of a human expert for each rule. To evaluate performances of learning algorithms for constructing rule evaluation models, we have done a case study on the meningitis data mining as an actual problem. Furthermore, we have also evaluated our method on four rulesets from the four kinds of UCI datasets.

- Association Rule Mining | Pp. 509-519