Catálogo de publicaciones - libros

Compartir en
redes sociales

Machine Learning and Data Mining in Pattern Recognition: 5th International Conference, MLDM 2007, Leipzig, Germany, July 18-20, 2007. Proceedings

Petra Perner (eds.)

En conferencia: 5º International Workshop on Machine Learning and Data Mining in Pattern Recognition (MLDM) . Leipzig, Germany . July 18, 2007 - July 20, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Database Management; Data Mining and Knowledge Discovery; Pattern Recognition; Image Processing and Computer Vision

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73498-7

ISBN electrónico

978-3-540-73499-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-73499-4_21

Choosing the Kernel Parameters for the Directed Acyclic Graph Support Vector Machines

Kuo-Ping Wu; Sheng-De Wang

The directed acyclic graph support vector machines (DAGSVMs) have been shown to be able to provide classification accuracy comparable to the standard multiclass SVM extensions such as Max Wins methods. The algorithm arranges binary SVM classifiers as the internal nodes of a directed acyclic graph (DAG). Each node represents a classifier trained for the data of a pair of classes with the specific kernel. The most popular method to decide the kernel parameters is the grid search method. In the training process, classifiers are trained with different kernel parameters, and only one of the classifiers is required for the testing process. This makes the training process time-consuming. In this paper we propose using separation indexes to estimate the generalization ability of the classifiers. These indexes are derived from the inter-cluster distances in the feature spaces. Calculating such indexes costs much less computation time than training the corresponding SVM classifiers; thus the proper kernel parameters can be chosen much faster. Experiment results show that the testing accuracy of the resulted DAGSVMs is competitive to the standard ones, and the training time can be significantly shortened.

- Support Vector Machine | Pp. 276-285

doi: 10.1007/978-3-540-73499-4_22

Data Selection Using SASH Trees for Support Vector Machines

Chaofan Sun; Ricardo Vilalta

This paper presents a data preprocessing procedure to select support vector (SV) candidates. We select decision boundary region vectors (BRVs) as SV candidates. Without the need to use the decision boundary, BRVs can be selected based on a vector’s nearest neighbor of opposite class (NNO). To speed up the process, two spatial approximation sample hierarchical (SASH) trees are used for estimating the BRVs. Empirical results show that our data selection procedure can reduce a full dataset to the number of SVs or only slightly higher. Training with the selected subset gives performance comparable to that of the full dataset. For large datasets, overall time spent in selecting and training on the smaller dataset is significantly lower than the time used in training on the full dataset.

- Support Vector Machine | Pp. 286-295

doi: 10.1007/978-3-540-73499-4_23

Dynamic Distance-Based Active Learning with SVM

Jun Jiang; Horace H. S. Ip

In this paper, we present a novel active learning strategy, named dynamic active learning with SVM to improve the effectiveness of learning sample selection in active learning. The algorithm is divided into two steps. The first step is similar to the standard distance-based active learning with SVM [1] in which the sample nearest to the decision boundary is chosen to induce a hyperplane that can halve the current version space. In order to improve upon the learning efficiency and convergent rates, we propose in the second step, a dynamic sample selection strategy that operates within the neighborhood of the “standard” sample. Theoretical analysis is given to show that our algorithm will converge faster than the standard distance-based technique and using less number of samples while maintaining the same classification precision rate. We also demonstrate the feasibility of the dynamic selection strategy approach through conducting experiments on several benchmark datasets.

- Support Vector Machine | Pp. 296-309

doi: 10.1007/978-3-540-73499-4_24

Off-Line Learning with Transductive Confidence Machines: An Empirical Evaluation

Stijn Vanderlooy; Laurens van der Maaten; Ida Sprinkhuizen-Kuyper

The recently introduced transductive confidence machines (TCMs) framework allows to extend classifiers such that they satisfy the calibration property. This means that the error rate can be set by the user prior to classification. An analytical proof of the calibration property was given for TCMs applied in the on-line learning setting. However, the nature of this learning setting restricts the applicability of TCMs. In this paper we provide strong empirical evidence that the calibration property also holds in the off-line learning setting. Our results extend the range of applications in which TCMs can be applied. We may conclude that TCMs are appropriate in virtually any application domain.

- Transductive Inference | Pp. 310-323

doi: 10.1007/978-3-540-73499-4_25

Transductive Learning from Relational Data

Michelangelo Ceci; Annalisa Appice; Nicola Barile; Donato Malerba

Transduction is an inference mechanism “from particular to particular”. Its application to classification tasks implies the use of both labeled (training) data and unlabeled (working) data to build a classifier whose main goal is that of classifying (only) unlabeled data as accurately as possible. Unlike the classical inductive setting, no general rule valid for all possible instances is generated. Transductive learning is most suited for those applications where the examples for which a prediction is needed are already known when training the classifier. Several approaches have been proposed in the literature on building transductive classifiers from data stored in a single table of a relational database. Nonetheless, no attention has been paid to the application of the transduction principle in a (multi-)relational setting, where data are stored in multiple tables of a relational database. In this paper we propose a new transductive classifier, named TRANSC, which is based on a probabilistic approach to making transductive inferences from relational data. This new method works in a transductive setting and employs a principled probabilistic classification in multi-relational data mining to face the challenges posed by some spatial data mining problems. Probabilistic inference allows us to compute the class probability and return, in addition to result of transductive classification, the confidence in the classification. The predictive accuracy of TRANSC has been compared to that of its inductive counterpart in an empirical study involving both a benchmark relational dataset and two spatial datasets. The results obtained are generally in favor of TRANSC, although improvements are small by a narrow margin.

- Transductive Inference | Pp. 324-338

doi: 10.1007/978-3-540-73499-4_26

A Novel Rule Ordering Approach in Classification Association Rule Mining

Yanbo J. Wang; Qin Xin; Frans Coenen

A Classification Association Rule (CAR), a common type of mined knowledge in Data Mining, describes an implicative co-occurring relationship between a set of binary-valued data-attributes (items) and a pre-defined class, expressed in the form of an “antecedent (consequent-class” rule. Classification Association Rule Mining (CARM) is a recent Classification Rule Mining (CRM) approach that builds an Association Rule Mining (ARM) based classifier using CARs. Regardless of which particular methodology is used to build it, a classifier is usually presented as an ordered CAR list, based on an applied rule ordering strategy. Five existing rule ordering mechanisms can be identified: (1) Confi-dence-Support-size_of_Antecedent (CSA), (2) size_of_Antecedent-Confidence-Support (ACS), (3) Weighted Relative Accuracy (WRA), (4) Laplace Accuracy, and (5) ( Testing. In this paper, we divide the above mechanisms into two groups: (i) pure “support-confidence” framework like, and (ii) additive score assigning like. We consequently propose a hybrid rule ordering approach by combining one approach taken from (i) and another approach taken from (ii). The experimental results show that the proposed rule ordering approach performs well with respect to the accuracy of classification.

- Association Rule Mining | Pp. 339-348

doi: 10.1007/978-3-540-73499-4_27

Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules

J. Hernández Palancar; O. Fraxedas Tormo; J. Festón Cárdenas; R. Hernández León

The search for frequent patterns in transactional databases is considered one of the most important data mining problems. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the dataset to determine the set of frequent itemsets, thus implying high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also implying high synchronization cost. We present a novel algorithm that exploits efficiently the trade-offs between computation, communication, memory usage and synchronization. The algorithm was implemented over a cluster of SMP nodes combining distributed and shared memory paradigms. This paper presents the results of our algorithm on different data sizes experimented on different numbers of processors, and studies the effect of these variations on the overall performance.

- Association Rule Mining | Pp. 349-363

doi: 10.1007/978-3-540-73499-4_28

Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes

J. R. Méndez; B. Corzo; D. Glez-Peña; F. Fdez-Riverola; F. Díaz

Spam is a complex problem that makes difficult the exploitation of Internet resources. In this sense, several authorities have alerted about the dimension of this problem and aim everybody to fight against it. In this paper we present an extensive analysis showing how the effect of changing the dimensionality of message representation influences the accuracy of some well-known classical spam filtering techniques. The conclusions drawn from the experiments carried out will be useful for building a comparison of the dimensionality reorganization effects between classical filtering techniques and a successful spam filter model called .

- Mining Spam, Newsgroups, Blogs | Pp. 364-378

doi: 10.1007/978-3-540-73499-4_29

Blog Mining for the Fortune 500

James Geller; Sapankumar Parikh; Sriram Krishnan

In recent years there has been a tremendous increase in the number of users maintaining online blogs on the Internet. Companies, in particular, have become aware of this medium of communication and have taken a keen interest in what is being said about them through such personal blogs. This has given rise to a new field of research directed towards mining useful information from a large amount of unformatted data present in online blogs and online forums. We discuss an implementation of such a blog mining application. The application is broadly divided into two parts, the indexing process and the search module. Blogs pertaining to different organizations are fetched from a particular blog domain on the Internet. After analyzing the textual content of these blogs they are assigned a sentiment rating. Specific data from such blogs along with their sentiment ratings are then indexed on the physical hard drive. The search module searches through these indexes at run time for the input organization name and produces a list of blogs conveying both positive and negative sentiments about the organization.

- Mining Spam, Newsgroups, Blogs | Pp. 379-391

doi: 10.1007/978-3-540-73499-4_30

A Link-Based Rank of Postings in Newsgroup

Hongbo Liu; Jiahai Yang; Jiaxin Wang; Yu Zhang

Discussion systems such as Usenet, BBS, Forum are important resources for information sharing, view exchanging, problem solving and product feedback, etc. on Internet. The postings in newsgroups on Usenet represents the judgments and choices of participators. The structure of postings could provide helpful information for the users. In this paper, we present a method called PostRank to rank the postings based on the structure of newsgroup. Its results correspond to the eigenvectors of the transition probability matrix and the stationary vectors of the Markov chains. It could provide useful global information for the newsgroup and it can be used to help the users access information in it more effectively and efficiently. This method can be also applied on other discussion systems. Some experimental results and discussions on real data sets collected by us are also provided.

- Mining Spam, Newsgroups, Blogs | Pp. 392-403