Catálogo de publicaciones - libros
Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings
Alípio Mário Jorge ; Luís Torgo ; Pavel Brazdil ; Rui Camacho ; João Gama (eds.)
En conferencia: 9º European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) . Porto, Portugal . October 3, 2005 - October 7, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-29244-9
ISBN electrónico
978-3-540-31665-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11564126_51
Rank Measures for Ordering
Jin Huang; Charles X. Ling
Many data mining applications require a ranking, rather than a mere classification, of cases. Examples of these applications are widespread, including Internet search engines (ranking of pages returned) and customer relationship management (ranking of profitable customers). However, little theoretical foundation and practical guideline have been established to assess the merits of different rank measures for ordering. In this paper, we first review several general criteria to judge the merits of different single-number measures. Then we propose a novel rank measure, and compare the commonly used rank measures and our new one according to the criteria. This leads to a preference order for these rank measures. We conduct experiments on real-world datasets to confirm the preference order. The results of the paper will be very useful in evaluating and comparing rank algorithms.
- Short Papers | Pp. 503-510
doi: 10.1007/11564126_52
Dynamic Ensemble Re-Construction for Better Ranking
Jin Huang; Charles X. Ling
Ensemble learning has been shown to be very successful in data mining. However most work on ensemble learning concerns the task of classification. Little work has been done to construct ensembles that aim to improve ranking. In this paper, we propose an approach to re-construct new ensembles based on a given ensemble with the purpose to improve the ranking performance, which is crucial in many data mining tasks. The experiments with real-world data sets show that our new approach achieves significant improvements in ranking over the original Bagging and Adaboost ensembles.
Palabras clave: Ranking Performance; Ensemble Learning; Test Subset; Data Mining Task; Original Ensemble.
- Short Papers | Pp. 511-518
doi: 10.1007/11564126_53
Frequency-Based Separation of Climate Signals
Alexander Ilin; Harri Valpola
The paper presents an example of exploratory data analysis of climate measurements using a recently developed denoising source separation (DSS) framework. We analysed a combined dataset containing daily measurements of three variables: surface temperature, sea level pressure and precipitation around the globe. Components exhibiting slow temporal behaviour were extracted using DSS with linear denoising. These slow components were further rotated using DSS with nonlinear denoising which implemented a frequency-based separation criterion. The rotated sources give a meaningful representation of the slow climate variability as a combination of trends, interannual oscillations, the annual cycle and slowly changing seasonal variations.
Palabras clave: Power Spectrum; Independent Component Analysis; Empirical Orthogonal Function; Slow Component; Independent Component Analysis.
- Short Papers | Pp. 519-526
doi: 10.1007/11564126_54
Efficient Processing of Ranked Queries with Sweeping Selection
Wen Jin; Martin Ester; Jiawei Han
Existing methods for top- k ranked query employ techniques including sorting, updating thresholds and materializing views. In this paper, we propose two novel index-based techniques for top- k ranked query: (1) indexing the layered skyline, and (2) indexing microclusters of objects into a grid structure. We also develop efficient algorithms for ranked query by locating the answer points during the sweeping of the line/hyperplane of the score function over the indexed objects. Both methods can be easily plugged into typical multi-dimensional database indexes. The comprehensive experiments not only demonstrate that our methods outperform the existing ones, but also illustrate that the application of data mining technique (microclustering) is a useful and effective solution for database query processing.
Palabras clave: Score Function; Query Processing; Query Time; Skyline Query; Sweeping Process.
- Short Papers | Pp. 527-535
doi: 10.1007/11564126_55
Feature Extraction from Mass Spectra for Classification of Pathological States
Alexandros Kalousis; Julien Prados; Elton Rexhepaj; Melanie Hilario
Mass spectrometry is becoming an important tool in proteomics. The representation of mass spectra is characterized by very high dimensionality and a high level of redundancy. Here we present a feature extraction method for mass spectra that directly models for domain knowledge, reduces the dimensionality and redundancy of the initial representation and controls for the level of granularity of feature extraction by seeking to optimize classification accuracy. A number of experiments are performed which show that the feature extraction preserves the initial discriminatory content of the learning examples.
Palabras clave: Feature Extraction; Peak Detection; Discriminatory Information; Initial Representation; Spatial Redundancy.
- Short Papers | Pp. 536-543
doi: 10.1007/11564126_56
Numbers in Multi-relational Data Mining
Arno J. Knobbe; Eric K. Y. Ho
Numeric data has traditionally received little attention in the field of Multi-Relational Data Mining (MRDM). It is often assumed that numeric data can simply be turned into symbolic data by means of discretisation. However, very few guidelines for successfully applying discretisation in MRDM exist. Furthermore, it is unclear whether the loss of information involved is negligible. In this paper, we consider different alternatives for dealing with numeric data in MRDM. Specifically, we analyse the adequacy of discretisation by performing a number of experiments with different existing discretisation approaches, and comparing the results with a procedure that handles numeric data dynamically. The discretisation procedures considered include an algorithm that is insensitive to the multi-relational structure of the data, and two algorithms that do involve this structure. With the empirical results thus obtained, we shed some light on the applicability of both dynamic and static procedures (discretisation), and give recommendations for when and how they can best be applied.
Palabras clave: Numeric Data; Dynamic Approach; Numeric Attribute; Inductive Logic Programming; Nominal Attribute.
- Short Papers | Pp. 544-551
doi: 10.1007/11564126_57
Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation
Bruce Knuteson; Ricardo Vilalta
We describe a methodology to assist scientists in quantifying the degree of evidence in favor of a new proposed theory compared to a standard baseline theory. The figure of merit is the log-likelihood ratio of the data given each theory. The novelty of the proposed mechanism lies in the likelihood estimations; the central idea is to adaptively allocate histogram bins that emphasize regions in the variable space where there is a clear difference in the predictions made by the two theories. We describe a software system that computes this figure of merit in the context of particle physics, and describe two examples conducted at the Tevatron Ring at the Fermi National Accelerator Laboratory. Results show how two proposed theories compare to the Standard Model and how the likelihood ratio varies as a function of a physical parameter (e.g., by varying the particle mass).
Palabras clave: Particle Physics; Variable Space; Relative Entropy; Particle Accelerator; Actual Observation.
- Short Papers | Pp. 552-560
doi: 10.1007/11564126_58
Improved Naive Bayes for Extremely Skewed Misclassification Costs
Aleksander Kołcz; Abdur Chowdhury
Naive Bayes has been an effective and important classifier in the text categorization domain despite violations of its underlying assumptions. Although quite accurate, it tends to provide poor estimates of the posterior class probabilities, which hampers its application in the cost-sensitive context. The apparent high confidence with which certain errors are made is particularly problematic when misclassification costs are highly skewed, since conservative setting of the decision threshold may greatly decrease the classifier utility. We propose an extension of the Naive Bayes algorithm aiming to discount the confidence with which errors are made. The approach is based on measuring the amount of change to feature distribution necessary to reverse the initial classifier decision and can be implemented efficiently without over-complicating the process of Naive Bayes induction. In experiments with three benchmark document collections, the decision-reversal Naive Bayes is demonstrated to substantially improve over the popular multinomial version of the Naive Bayes algorithm, in some cases performing more than 40% better.
- Short Papers | Pp. 561-568
doi: 10.1007/11564126_59
Clustering and Prediction of Mobile User Routes from Cellular Data
Kari Laasonen
Location-awareness and prediction of future locations is an important problem in pervasive and mobile computing. In cellular systems (e.g., GSM) the serving cell is easily available as an indication of the user location, without any additional hardware or network services. With this location data and other context variables we can determine places that are important to the user, such as work and home. We devise online algorithms that learn routes between important locations and predict the next location when the user is moving. We incrementally build clusters of cell sequences to represent physical routes. Predictions are based on destination probabilities derived from these clusters. Other context variables such as the current time can be integrated into the model. We evaluate the model with real location data, and show that it achieves good prediction accuracy with relatively little memory, making the algorithms suitable for online use in mobile environments.
- Short Papers | Pp. 569-576
doi: 10.1007/11564126_60
Elastic Partial Matching of Time Series
L. J. Latecki; V. Megalooikonomou; Q. Wang; R. Lakaemper; C. A. Ratanamahatana; E. Keogh
We consider a problem of elastic matching of time series. We propose an algorithm that automatically determines a subsequence b ′ of a target time series b that best matches a query series a . In the proposed algorithm we map the problem of the best matching subsequence to the problem of a cheapest path in a DAG (directed acyclic graph). Our experimental results demonstrate that the proposed algorithm outperforms the commonly used Dynamic Time Warping in retrieval accuracy.
Palabras clave: Time Series; Dynamic Time Warping; Longe Common Subsequence; Poor Quality Data; Target Series.
- Short Papers | Pp. 577-584