Catálogo de publicaciones - libros

Compartir en
redes sociales

Machine Learning and Data Mining in Pattern Recognition: 5th International Conference, MLDM 2007, Leipzig, Germany, July 18-20, 2007. Proceedings

Petra Perner (eds.)

En conferencia: 5º International Workshop on Machine Learning and Data Mining in Pattern Recognition (MLDM) . Leipzig, Germany . July 18, 2007 - July 20, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Database Management; Data Mining and Knowledge Discovery; Pattern Recognition; Image Processing and Computer Vision

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73498-7

ISBN electrónico

978-3-540-73499-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-73499-4_31

A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection

Reza Sadoddin; Ali A. Ghorbani

During the past number of years, machine learning and data mining techniques have received considerable attention among the intrusion detection researchers to address the weaknesses of knowledgebase detection techniques. This has led to the application of various supervised and unsupervised techniques for the purpose of intrusion detection. In this paper, we conduct a set of experiments to analyze the performance of unsupervised techniques considering their main design choices. These include the heuristics proposed for distinguishing abnormal data from normal data and the distribution of dataset used for training. We evaluate the performance of the techniques with various distributions of training and test datasets, which are constructed from KDD99 dataset, a widely accepted resource for IDS evaluations. This comparative study is not only a blind comparison between unsupervised techniques, but also gives some guidelines to researchers and practitioners on applying these techniques to the area of intrusion detection.

- Intrusion Detection and Networks | Pp. 404-418

doi: 10.1007/978-3-540-73499-4_32

Long Tail Attributes of Knowledge Worker Intranet Interactions

Peter Géczy; Noriaki Izumi; Shotaro Akaho; Kôiti Hasida

Elucidation of human browsing behavior in electronic spaces has been attracting substantial attention in academic and commercial spheres. We present a novel formal approach to human behavior analysis in web based environments. The framework has been applied to analyzing knowledge workers’ browsing behavior on a large corporate Intranet. Analysis indicates that users form elemental and complex browsing patterns and achieve their browsing objectives via few subgoals. Knowledge workers know their targets and exhibit diminutive exploratory behavior. Significant long tail attributes have been observed in all analyzed features. A novel distribution that accurately models it has been introduced.

- Intrusion Detection and Networks | Pp. 419-433

doi: 10.1007/978-3-540-73499-4_33

A Case-Based Approach to Anomaly Intrusion Detection

Alessandro Micarelli; Giuseppe Sansonetti

The architecture herein advanced finds its rationale in the visual interpretation of data obtained from monitoring computers and computer networks with the objective of detecting security violations. This new outlook on the problem may offer new and unprecedented techniques for intrusion detection which take advantage of algorithmic tools drawn from the realm of image processing and computer vision. In the system we propose, the normal interaction between users and network configuration is represented in the form of snapshots that refer to a limited number of attack-free instances of different applications. Based on the representations generated in this way, a library is built which is managed according to a case-based approach. The comparison between the query snapshot and those recorded in the system database is performed by computing the Earth Mover’s Distance between the corresponding feature distributions obtained through cluster analysis.

- Intrusion Detection and Networks | Pp. 434-448

doi: 10.1007/978-3-540-73499-4_34

Sensing Attacks in Computers Networks with Hidden Markov Models

Davide Ariu; Giorgio Giacinto; Roberto Perdisci

In this work, we propose an Intrusion Detection model for computer newtorks based on Hidden Markov Models. While stateful techniques are widely used to detect intrusion at the operating system level, by tracing the sequences of system calls, this issue has been rarely researched for the analysis of network traffic. The proposed model aims at detecting intrusions by analysing the sequences of commands that flow between hosts in a network for a particular service (e.g., an ftp session). First the system must be trained in order to learn the typical sequences of commands related to innocuous connections. Then, intrusion detection is performed by indentifying anomalous sequences. To harden the proposed system, we propose some techniques to combine HMM. Reported results attained on the traffic acquired from a European ISP shows the effectiveness of the proposed approach.

- Intrusion Detection and Networks | Pp. 449-463

doi: 10.1007/978-3-540-73499-4_35

FIDS: Monitoring Frequent Items over Distributed Data Streams

Robert Fuller; Mehmed Kantardzic

Many applications require the discovery of items which have occur frequently within multiple distributed data streams. Past solutions for this problem either require a high degree of error tolerance or can only provide results periodically. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.

- Frequent and Common Item Set Mining | Pp. 464-478

doi: 10.1007/978-3-540-73499-4_36

Mining Maximal Frequent Itemsets in Data Streams Based on FP-Tree

Fujiang Ao; Yuejin Yan; Jian Huang; Kedi Huang

Mining maximal frequent itemsets in data streams is more difficult than mining them in static databases for the huge, high-speed and continuous characteristics of data streams. In this paper, we propose a novel one-pass algorithm called FpMFI-DS, which mines all maximal frequent itemsets in Landmark windows or Sliding windows in data streams based on FP-Tree. A new structure of FP-Tree is designed for storing all transactions in Landmark windows or Sliding windows in data streams. To improve the efficiency of the algorithm, a new pruning technique, extension support equivalency pruning (ESEquivPS), is imported to it. The experiments show that our algorithm is efficient and scalable. It is suitable for mining MFIs both in static database and in data streams.

- Frequent and Common Item Set Mining | Pp. 479-489

doi: 10.1007/978-3-540-73499-4_37

CCIC: Consistent Common Itemsets Classifier

Yohji Shidara; Atsuyoshi Nakamura; Mineichi Kudo

We propose a novel approach which extracts consistent (100% confident) rules and builds a classifier with them. Recently, associative classifiers which utilize association rules have been widely studied. Indeed, the associative classifiers often outperform the traditional classifiers. In this case, it is important to collect high quality (association) rules. Many algorithms find only high support rules, because decreasing the minimum support to be satisfied is computationally demanding. However, it may be effective to collect low support but high confidence rules. Therefore, we propose an algorithm that produces a wide variety of 100% confident rules including low support rules. To achieve this goal, we adopt a specific-to-general rule searching strategy, in contrast to the previous many approaches. Our experimental results show that the proposed method achieves higher accuracies in several datasets taken from UCI machine learning repository.

- Frequent and Common Item Set Mining | Pp. 490-498

doi: 10.1007/978-3-540-73499-4_38

Development of an Agreement Metric Based Upon the RAND Index for the Evaluation of Dimensionality Reduction Techniques, with Applications to Mapping Customer Data

Stephen France; Douglas Carroll

We develop a metric , based upon the RAND index, for the comparison and evaluation of dimensionality reduction techniques. This metric is designed to test the preservation of neighborhood structure in derived lower dimensional configurations. We use a customer information data set to show how can be used to compare dimensionality reduction methods, tune method parameters, and choose solutions when methods have a local optimum problem. We show that is highly negatively correlated with an alienation coefficient K that is designed to test the recovery of relative distances. In general a method with a good value of also has a good value of K. However the monotonic regression used by Nonmetric MDS produces solutions with good values of , but poor values of K.

- Mining Marketing Data | Pp. 499-517

doi: 10.1007/978-3-540-73499-4_39

A Sequential Hybrid Forecasting System for Demand Prediction

Luis Aburto; Richard Weber

Demand prediction plays a crucial role in advanced systems for supply chain management. Having a reliable estimation for a product’s future demand is the basis for the respective systems. Various forecasting techniques have been developed, each one with its particular advantages and disadvantages compared to other approaches. This motivated the development of hybrid systems combining different techniques and their respective advantages. Based on a comparison of ARIMA models and neural networks we propose to combine these approaches to a sequential hybrid forecasting system. In our system the output from an ARIMA-type model is used as input for a neural network which tries to reproduce the original time series. The applications on time series representing daily product sales in a supermarket underline the excellent performance of the proposed system.

- Mining Marketing Data | Pp. 518-532

doi: 10.1007/978-3-540-73499-4_40

A Unified View of Objective Interestingness Measures

Céline Hébert; Bruno Crémilleux

Association rule mining often results in an overwhelming number of rules. In practice, it is difficult for the final user to select the most relevant rules. In order to tackle this problem, various interestingness measures were proposed. Nevertheless, the choice of an appropriate measure remains a hard task and the use of several measures may lead to conflicting information. In this paper, we give a unified view of objective interestingness measures. We define a new framework embedding a large set of measures called SBMs and we prove that the SBMs have a similar behavior. Furthermore, we identify the whole collection of the rules simultaneously optimizing all the SBMs. We provide an algorithm to efficiently mine a reduced set of rules among the rules optimizing all the SBMs. Experiments on real datasets highlight the characteristics of such rules.

- Mining Marketing Data | Pp. 533-547