Catálogo de publicaciones - libros
Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings
Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)
En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-33206-0
ISBN electrónico
978-3-540-33207-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11731139_81
Image Classification Via LZ78 Based String Kernel: A Comparative Study
Ming Li; Yanong Zhu
Normalized Information Distance (NID) [1] is a general-purpose similarity metric based on the concept of Kolmogorov Complexity. We have developed this notion into a valid kernel distance, called LZ78-based string kernel [2] and have shown that it can be used effectively for a variety of 1D sequence classification tasks [3]. In this paper, we further demonstrate its applicability on 2D images. We report experiments with our technique on two real datasets: (i) a collection of real-life photographs and (ii) a collection of medical diagnostic images from Magnetic Resonance (MR) data. The classification results are compared with those of the original similarity metric (i.e. NID) and several conventional classification algorithms. In all cases, the proposed kernel approach demonstrates better or equivalent performance when compared with other candidate methods but with lower computational overhead.
- Multimedia Mining | Pp. 704-712
doi: 10.1007/11731139_82
Distributed Pattern Discovery in Multiple Streams
Jimeng Sun; Spiros Papadimitriou; Christos Faloutsos
Given groups of streams which consist of ,..., co-evolving streams in each group, we want to: (i) incrementally find local patterns within a single group, (ii) efficiently obtain global patterns across groups, and more importantly, (iii) efficiently do that in real time while limiting shared information across groups. In this paper, we present a distributed, hierarchical algorithm addressing these problems. Our experimental case study confirms that the proposed method can perform hierarchical correlation detection efficiently and effectively.
- Stream Data Mining | Pp. 713-718
doi: 10.1007/11731139_83
COMET: Event-Driven Clustering over Multiple Evolving Streams
Mi-Yen Yeh; Bi-Ru Dai; Ming-Syan Chen
In this paper, we present a framework for event-driven Clustering Over Multiple Evolving sTreams, which, abbreviated as COMET, monitors the distribution of clusters on multiple data streams and online reports the results. This information is valuable to support corresponding online decisions. Note that as time advances, the data streams are evolving and the clusters they belong to will change. Instead of directly clustering the multiple data streams periodically, COMET applies an efficient cluster adjustment procedure only when it is required. The signal of requiring to do cluster adjustments is defined as an ”event.” We design a mechanism of event detection which employs piecewise linear approximation as the key technique. The piecewise linear approximation is advantageous in that it can not only be performed in real time as the data comes in, but also be able to capture the trend of data. When an event occurs, through split and merge operations we can report the latest clustering results effectively with high clustering quality.
- Stream Data Mining | Pp. 719-723
doi: 10.1007/11731139_84
Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors
Ming-Yen Lin; Sue-Chen Hsueh; Sheng-Kun Hwang
Mining frequent itemsets over data streams is an emergent research topic in recent years. Previous approaches generally use a fixed support threshold to discover the patterns in the stream. However, the threshold will be changed to cope with the needs of the users and the characteristics of the incoming data in reality. Changing the threshold implies a re-mining of the whole transactions in a non-streaming environment. Nevertheless, the "look-once" feature of the streaming data cannot provide the discarded transactions so that a re-mining on the stream is impossible. Therefore, we propose a method for variable support mining of frequent itemsets over the data stream. A synopsis vector is constructed for maintaining statistics of past transactions and is invoked only when necessary. The conducted experimental results show that our approach is efficient and scalable for variable support mining in data streams.
- Stream Data Mining | Pp. 724-728
doi: 10.1007/11731139_85
Hardware Enhanced Mining for Association Rules
Wei-Chuan Liu; Ken-Hao Liu; Ming-Syan Chen
In this paper, we propose a hardware-enhanced mining framework to cope with many challenging data mining tasks in a data stream environment. In this framework, hardware enhancements are implemented in commercial Field Programmable Gate Array (FPGA) devices, which have been growing rapidly in terms of density and speed. By exploiting the parallelism in hardware, many data mining primitive subtasks can be executed with high throughput, thus increasing the performance of the overall data mining tasks. Simple operations like counting, which take a major portion of conventional mining execution time, can in fact be executed on the hardware enhancements very efficiently. Subtask modules that are used repetitively can also be replaced with the equivalent hardware enhancements. Specifically, we realize an Apriori-like algorithm with our proposed hardware-enhanced mining framework to mine frequent temporal patterns from data streams. The frequent counts of 1-itemsets and 2-itemsets are obtained after one pass of scanning the datasets with our hardware implementation. It is empirically shown that the hardware enhancements provide the scalability by mapping the high complexity operations such as subset itemsets counting to the hardware. Our approach achieve considerably higher throughput than traditional database architectures with pure software implementation. With fast increase in applications of mobile devices where power consumption is a concern and complicated software executions are prohibited, it is envisioned that hardware enhanced mining is an important direction to explore.
- Stream Data Mining | Pp. 729-738
doi: 10.1007/11731139_86
A Single Index Approach for Time-Series Subsequence Matching That Supports Moving Average Transform of Arbitrary Order
Yang-Sae Moon; Jinho Kim
Moving average transform is known to reduce the effect of noise and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, would incur index overhead both in storage space and in update maintenance since the methods should build multiple indexes for supporting arbitrary orders. To solve this problem, we propose a single index approach for subsequence matching that supports moving average transform of arbitrary order. For a single index approach, we first provide the notion of by generalizing the original definition of moving average transform. We then formally prove correctness of the poly-order transform-based subsequence matching. By using the poly-order transform, we also propose two different subsequence matching methods that support moving average transform of arbitrary order. Experimental results for real stock data show that our methods improve average performance significantly, by 22.4 ~ 33.8 times, over the sequential scan.
- Stream Data Mining | Pp. 739-749
doi: 10.1007/11731139_87
Efficient Mining of Emerging Events in a Dynamic Spatiotemporal Environment
Yu Meng; Margaret H. Dunham
This paper presents an efficient data mining technique for modeling multidimensional time variant data series and its suitability for mining emerging events in a spatiotemporal environment. The data is modeled using a data structure that interleaves a clustering method with a dynamic Markov chain. Novel operations are used for deleting obsolete states, and finding emerging events based on a scoring scheme. The model is incremental, scalable, adaptive, and suitable for online processing. Algorithm analysis and experiments demonstrate the efficiency and effectiveness of the proposed technique.
- Stream Data Mining | Pp. 750-754
doi: 10.1007/11731139_88
A Multi-Hierarchical Representation for Similarity Measurement of Time Series
Xinqiang Zuo; Xiaoming Jin
In a large time series database, similarity searching is a frequent subroutine to find the similar time series of the given one. In the process, the performance of similarity measurement directly effects the usability of the searching results. The proposed methods mostly use the sum of the distances between the values on the time points, e.g. Euclidean Distance, dynamic time warping (DTW) etc. However, in measuring, they do not consider the hierarchy of each point in time series according to importance. This causes that they cannot accurately and efficiently measure similarity of time series. In the paper, we propose a Multi-Hierarchical Representation (MHR) to replace the original one based on the opinion that the points of one time series should be compared with the ones of another with the same importance in measuring. MHR gives the hierarchies of the points, and then the original one can be represented by the Multi-Hierarchical subseries, which consist of points in the same hierarchy. The distance between the representations can be computed as the measuring result. Finally, the synthetic and real data sets were used in the effectiveness experiments comparing ours with other major methods. And the comparison of their efficiencies was also performed on the real data set. All the results showed the superiority of ours in terms of effectiveness and efficiency.
- Temporal Data Mining | Pp. 755-764
doi: 10.1007/11731139_89
Multistep-Ahead Time Series Prediction
Haibin Cheng; Pang-Ning Tan; Jing Gao; Jerry Scripps
Multistep-ahead prediction is the task of predicting a sequence of values in a time series. A typical approach, known as multi-stage prediction, is to apply a predictive model step-by-step and use the predicted value of the current time step to determine its value in the next time step. This paper examines two alternative approaches known as independent value prediction and parameter prediction. The first approach builds a separate model for each prediction step using the values observed in the past. The second approach fits a parametric function to the time series and builds models to predict the parameters of the function. We perform a comparative study on the three approaches using multiple linear regression, recurrent neural networks, and a hybrid of hidden Markov model with multiple linear regression. The advantages and disadvantages of each approach are analyzed in terms of their error accumulation, smoothness of prediction, and learning difficulty.
- Temporal Data Mining | Pp. 765-774
doi: 10.1007/11731139_90
Sequential Pattern Mining with Time Intervals
Yu Hirate; Hayato Yamana
Sequential pattern mining can be used to extract frequent sequences maintaining their transaction order. As conventional sequential pattern mining methods do not consider transaction occurrence time intervals, it is impossible to predict the time intervals of any two transactions extracted as frequent sequences. Thus, from extracted sequential patterns, although users are able to predict what events will occur, they are not able to predict when the events will occur. Here, we propose a new sequential pattern mining method that considers time intervals. Using Japanese earthquake data, we confirmed that our method is able to extract new types of frequent sequences that are not extracted by conventional sequential pattern mining methods.
- Temporal Data Mining | Pp. 775-779