Catálogo de publicaciones - libros

Compartir en
redes sociales


Discovery Science: 10th International Conference, DS 2007 Sendai, Japan, October 1-4, 2007. Proceedings

Vincent Corruble ; Masayuki Takeda ; Einoshin Suzuki (eds.)

En conferencia: 10º International Conference on Discovery Science (DS) . Sendai, Japan . October 1, 2007 - October 4, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Philosophy of Science; Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Computer Appl. in Administrative Data Processing; Computer Appl. in Social and Behavioral Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75487-9

ISBN electrónico

978-3-540-75488-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Learning Locally Weighted C4.4 for Class Probability Estimation

Liangxiao Jiang; Harry Zhang; Dianhong Wang; Zhihua Cai

In many real-world data mining applications, accurate class probability estimations are often required to make optimal decisions. For example, in direct marketing, we often need to deploy different promotion strategies to customers with different likelihood (probability) of buying some products. When our learning task is to build a model with accurate class probability estimations, C4.4 is the most popular one for achieving this task because of its efficiency and effect. In this paper, we present a locally weighted version of C4.4 to scale up its class probability estimation performance by combining locally weighted learning with C4.4. We call our improved algorithm locally weighted C4.4, simply LWC4.4. We experimentally tested LWC4.4 using the whole 36 UCI data sets selected by Weka, and compared it to other related algorithms: C4.4, NB, KNN, NBTree, and LWNB. The experimental results show that LWC4.4 significantly outperforms all the other algorithms in term of conditional log likelihood, simply CLL. Thus, our work provides an effective algorithm to produce accurate class probability estimation.

- Long Papers | Pp. 104-115

User Preference Modeling from Positive Contents for Personalized Recommendation

Heung-Nam Kim; Inay Ha; Jin-Guk Jung; Geun-Sik Jo

With the spread of the Web, users can obtain a wide variety of information, and also can access novel content in real time. In this environment, finding useful information from a huge amount of available content becomes a time consuming process. In this paper, we focus on user modeling for personalization to recommend content relevant to user interests. Techniques used for association rules in deriving user profiles are exploited for discovering useful and meaningful patterns of users. Each user preference is presented the frequent term patterns, collectively called PTP (Personalized Term Pattern) and the preference terms, called PT (Personalized Term). In addition, a content-based filtering approach is employed to recommend content corresponding with user preferences. In order to evaluate the performance of the proposed method, we compare experimental results with those of a probabilistic learning model and vector space model. The experimental evaluation on datasets demonstrates that the proposed method brings significant advantages in terms of improving the recommendation quality in comparison with the other methods.

- Long Papers | Pp. 116-126

Reducing Trials by Thinning-Out in Skill Discovery

Hayato Kobayashi; Kohei Hatano; Akira Ishino; Ayumi Shinohara

In this paper, we propose a new concept, thinning-out, for reducing the number of trials in skill discovery. Thinning-out means to skip over such trials that are unlikely to improve discovering results, in the same way as “pruning” in a search tree. We show that our thinning-out technique significantly reduces the number of trials. In addition, we apply thinning-out to the discovery of good physical motions by legged robots in a simulation environment. By using thinning-out, our virtual robots can discover sophisticated motions that is much different from the initial motion in a reasonable amount of trials.

- Long Papers | Pp. 127-138

A Theoretical Study on Variable Ordering of Zero-Suppressed BDDs for Representing Frequent Itemsets

Shin-ichi Minato

Recently, an efficient method of database analysis using Zero-suppressed Binary Decision Diagrams (ZBDDs) has been proposed. BDDs are a graph-based representation of Boolean functions, now widely used in system design and verification. Here we focus on ZBDDs, a special type of BDDs, which are suitable for handling large-scale combinatorial itemsets in frequent itemset mining. In general, it is well-known that the size of ZBDDs greatly depends on variable ordering; however, in the specific cases of applying ZBDDs to data mining, the effect of variable ordering has not been studied well. In this paper, we present a theoretical study on ZBDD variable ordering for representing frequent itemsets. We show two instances of databases we composed, where the ZBDD sizes are exponentially sensitive to the variable ordering. We also show that there is a case where the ZBDD size must be exponential in any variable ordering. Our theoretical results are helpful for developing a good heuristic method of variable ordering.

- Long Papers | Pp. 139-150

Fast NML Computation for Naive Bayes Models

Tommi Mononen; Petri Myllymäki

The Minimum Description Length (MDL) is an informationtheoretic principle that can be used for model selection and other statistical inference tasks. One way to implement this principle in practice is to compute the Normalized Maximum Likelihood (NML) distribution for a given parametric model class. Unfortunately this is a computationally infeasible task for many model classes of practical importance. In this paper we present a fast algorithm for computing the NML for the Naive Bayes model class, which is frequently used in classification and clustering tasks. The algorithm is based on a relationship between powers of generating functions and discrete convolution. The resulting algorithm has the time complexity of , where n is the size of the data.

- Long Papers | Pp. 151-160

Unsupervised Spam Detection Based on String Alienness Measures

Kazuyuki Narisawa; Hideo Bannai; Kohei Hatano; Masayuki Takeda

We propose an unsupervised method for detecting spam documents from a given set of documents, based on s on strings. We give three measures for quantifying the (i.e. how different they are from others) of substrings within the documents. A document is then classified as spam if it contains a substring that is in an equivalence class with a high degree of alienness. The proposed method is unsupervised, language independent, and scalable. Computational experiments conducted on data collected from Japanese web forums show that the method successfully discovers spams.

- Long Papers | Pp. 161-172

A Consequence Finding Approach for Full Clausal Abduction

Oliver Ray; Katsumi Inoue

Abductive inference has long been associated with the logic of scientific discovery and automated abduction is now being used in real scientific tasks. But few methods can exploit the full potential of clausal logic and abduce non-ground explanations with indefinite answers. This paper shows how the consequence finding method of Skip Ordered Linear (SOL) resolution can overcome the limitations of existing systems by proposing a method that is sound and complete for finding minimal abductive solutions under a variety of pruning mechanisms. Its utility is shown with an example based on metabolic network modelling.

- Long Papers | Pp. 173-184

Literature-Based Discovery by an Enhanced Information Retrieval Model

Kazuhiro Seki; Javed Mostafa

The massive, ever-growing literature in life science makes it increasingly difficult for individuals to grasp all the information relevant to their interests. Since even experts’ knowledge is likely to be incomplete, important findings or associations among key concepts may remain unnoticed in the flood of information. This paper brings and extends a formal model from information retrieval in order to discover those implicit, hidden knowledge. Focusing on the biomedical domain, specifically, gene-disease associations, this paper demonstrates that our proposed model can identify not-yet-reported genetic associations and that the model can be enhanced by existing domain ontology.

- Long Papers | Pp. 185-196

Discovering Mentorship Information from Author Collaboration Networks

V. Suresh; Narayanan Raghupathy; B. Shekar; C. E. Veni Madhavan

Researchers are assessed from a perspective — by quantifying a researcher’s contribution to the field. Citation and publication counts are some typical examples. We propose a measure to assess researchers on their mentoring abilities. Our approach quantifies benefits bestowed by researchers upon their students by characterizing the publication dynamics of research advisor-student interactions in author collaboration networks. We show that our measures could help aspiring students identify research advisors with proven mentoring skills. Our measures also help in stratification of researchers with similar ranks based on typical indices like publication and citation counts while being independent of their direct influences.

- Long Papers | Pp. 197-208

Active Contours as Knowledge Discovery Methods

Arkadiusz Tomczyk; Piotr S. Szczepaniak; Michal Pryczek

In the paper we show that active contour methods can be interpreted as knowledge discovery methods. Application area is not restricted only to image segmentation, but it covers also classification of any other objects, even objects of higher granulation. Additional power of the presented method is that expert knowledge of almost any type can be used to classifier construction, which is not always possible in case of classic techniques. Moreover, the method introduced by the authors, earlier used only for supervised classification, is here applied in an unsupervised case (clustering) and examined on examples.

- Long Papers | Pp. 209-218