Catálogo de publicaciones - libros
Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings
Alípio Mário Jorge ; Luís Torgo ; Pavel Brazdil ; Rui Camacho ; João Gama (eds.)
En conferencia: 9º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Porto, Portugal . October 3, 2005 - October 7, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-29244-9
ISBN electrónico
978-3-540-31665-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11564126_41
A Probabilistic Clustering-Projection Model for Discrete Data
Shipeng Yu; Kai Yu; Volker Tresp; Hans-Peter Kriegel
For discrete co-occurrence data like documents and words, calculating optimal projections and clustering are two different but related tasks. The goal of projection is to find a low-dimensional latent space for words, and clustering aims at grouping documents based on their feature representations. In general projection and clustering are studied independently, but they both represent the intrinsic structure of data and should reinforce each other. In this paper we introduce a probabilistic clustering-projection (PCP) model for discrete data, where they are both represented in a unified framework. Clustering is seen to be performed in the projected space, and projection explicitly considers clustering structure. Iterating the two operations turns out to be exactly the variational EM algorithm under Bayesian model inference, and thus is guaranteed to improve the data likelihood. The model is evaluated on two text data sets, both showing very encouraging results.
Palabras clave: Cluster Center; Latent Dirichlet Allocation; Discrete Data; Nonnegative Matrix Factorization; Document Cluster.
- Long Papers | Pp. 417-428
doi: 10.1007/11564126_42
Collaborative Filtering on Data Streams
Jorge Mario Barajas; Xue Li
Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.
- Short Papers | Pp. 429-436
doi: 10.1007/11564126_43
The Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-Based FIM Algorithms
Ferenc Bodon; Lars Schmidt-Thieme
In this paper we investigate the relationship between closed itemset mining, the complete pruning technique and item ordering in the Apriori algorithm. We claim, that when proper item order is used, complete pruning does not necessarily speed up Apriori, and in databases with certain characteristics, pruning increases run time significantly. We also show that if complete pruning is applied, then an intersection-based technique not only results in a faster algorithm, but we get free closed-itemset selection concerning both memory consumption and run-time.
- Short Papers | Pp. 437-444
doi: 10.1007/11564126_44
Community Mining from Multi-relational Networks
Deng Cai; Zheng Shao; Xiaofei He; Xifeng Yan; Jiawei Han
Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining assume that there is only one kind of relation in the network, and moreover, the mining results are independent of the users’ needs or preferences. However, in reality, there exist multiple, heterogeneous social networks, each representing a particular kind of relationship, and each kind of relationship may play a distinct role in a particular task. In this paper, we systematically analyze the problem of mining hidden communities on heterogeneous social networks. Based on the observation that different relations have different importance with respect to a certain query, we propose a new method for learning an optimal linear combination of these relations which can best meet the user’s expectation. With the obtained relation, better performance can be achieved for community mining.
Palabras clave: Social Network; Social Network Analysis; Relation Extraction; Community Mining; Linear Regression Problem.
- Short Papers | Pp. 445-452
doi: 10.1007/11564126_45
Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest
Deborah R Carvalho; Alex A. Freitas; Nelson Ebecken
In the last few years, the data mining community has proposed a number of objective rule interestingness measures to select the most interesting rules, out of a large set of discovered rules. However, it should be recalled that objective measures are just an estimate of the true degree of interestingness of a rule to the user, the so-called real human interest. The latter is inherently subjective. Hence, it is not clear how effective, in practice, objective measures are. More precisely, the central question investigated in this paper is: “how effective objective rule interestingness measures are, in the sense of being a good estimate of the true, subjective degree of interestingness of a rule to the user?” This question is investigated by extensive experiments with 11 objective rule interestingness measures across eight real-world data sets.
Palabras clave: Rank Number; True Degree; Exception Rule; Rule Interestingness; Minimum Generalization.
- Short Papers | Pp. 453-461
doi: 10.1007/11564126_46
A Kernel Based Method for Discovering Market Segments in Beef Meat
Jorge Díez; Juan José del Coz; Carlos Sañudo; Pere Albertí; Antonio Bahamonde
In this paper we propose a method for learning the reasons why groups of consumers prefer some food products instead of others. We emphasize the role of groups given that, from a practical point of view, they may represent market segments that demand different products. Our method starts representing people’s preferences in a metric space; there we are able to define a kernel based similarity function that allows a clustering algorithm to discover significant groups of consumers with homogeneous tastes. Finally in each cluster, we learn, with a SVM, a function that explains the tastes of the consumers grouped in the cluster. To illustrate our method, a real case of consumers of beef meat was studied. The panel was formed by 171 people who rated 303 samples of meat from 101 animals with 3 different aging periods.
Palabras clave: Preference Function; Ranking Function; Aging Period; Feature Subset Selection; Preference Criterion.
- Short Papers | Pp. 462-469
doi: 10.1007/11564126_47
Corpus-Based Neural Network Method for Explaining Unknown Words by WordNet Senses
Bálint Gábor; Viktor Gyenes; András Lőrincz
This paper introduces an unsupervised algorithm that collects senses contained in WordNet to explain words, whose meaning is unknown, but plenty of documents are available that contain the word in that unknown sense. Based on the widely accepted idea that the meaning of a word is characterized by its context, a neural network architecture was designed to reconstruct the meaning of the unknown word. The connections of the network were derived from word co-occurrences and word-sense statistics. The method was tested on 80 TOEFL synonym questions, from which 63 questions were answered correctly. This is comparable to other methods tested on the same questions, but using a larger corpus or richer lexical database. The approach was found robust against details of the architecture.
Palabras clave: Latent Semantic Analysis; Semantic Distance; Reconstruction Network; Connectivity Matrix; Question Word.
- Short Papers | Pp. 470-477
doi: 10.1007/11564126_48
Segment and Combine Approach for Non-parametric Time-Series Classification
Pierre Geurts; Louis Wehenkel
This paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multi-variate and variable length time series. The essential ingredients of the algorithm are randomization, segmentation of time-series, decision tree ensemble based learning of subseries classifiers, combination of subseries classification by voting, and cross-validation based temporal resolution adaptation. Experiments are carried out with this method on 10 synthetic and real-world datasets. They highlight the good behavior of the algorithm on a large diversity of problems. Our results are also highly competitive with existing approaches from the literature.
Palabras clave: Base Learner; Multivariate Time Series; Supervise Learning Algorithm; Time Series Database; Single Decision Tree.
- Short Papers | Pp. 478-485
doi: 10.1007/11564126_49
Producing Accurate Interpretable Clusters from High-Dimensional Data
Derek Greene; Pádraig Cunningham
The primary goal of cluster analysis is to produce clusters that accurately reflect the natural groupings in the data. A second objective is to identify features that are descriptive of the clusters. In addition to these requirements, we often wish to allow objects to be associated with more than one cluster. In this paper we present a technique, based on the spectral co-clustering model, that is effective in meeting these objectives. Our evaluation on a range of text clustering problems shows that the proposed method yields accuracy superior to that afforded by existing techniques, while producing cluster descriptions that are amenable to human interpretation.
Palabras clave: Singular Vector; Spectral Cluster; Normalise Mutual Information; Document Cluster; Soft Cluster.
- Short Papers | Pp. 486-494
doi: 10.1007/11564126_50
Stress-Testing Hoeffding Trees
Geoffrey Holmes; Richard Kirkby; Bernhard Pfahringer
Hoeffding trees are state-of-the-art in classification for data streams. They perform prediction by choosing the majority class at each leaf. Their predictive accuracy can be increased by adding Naive Bayes models at the leaves of the trees. By stress-testing these two prediction methods using noise and more complex concepts and an order of magnitude more instances than in previous studies, we discover situations where the Naive Bayes method outperforms the standard Hoeffding tree initially but is eventually overtaken. The reason for this crossover is determined and a hybrid adaptive method is proposed that generally outperforms the two original prediction methods for both simple and complex concepts as well as under noise.
- Short Papers | Pp. 495-502