Catálogo de publicaciones - libros

Compartir en
redes sociales


Discovery Science: 10th International Conference, DS 2007 Sendai, Japan, October 1-4, 2007. Proceedings

Vincent Corruble ; Masayuki Takeda ; Einoshin Suzuki (eds.)

En conferencia: 10º International Conference on Discovery Science (DS) . Sendai, Japan . October 1, 2007 - October 4, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Philosophy of Science; Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Computer Appl. in Administrative Data Processing; Computer Appl. in Social and Behavioral Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75487-9

ISBN electrónico

978-3-540-75488-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

Takeaki Uno; Hiroki Arimura

Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrect, or missing some parts. It causes that some records which should include a pattern do not have it. To deal with real-world problems, one can use an ambiguous inclusion relation and find patterns which are mostly included in many records. However, computational difficulty have prevented such problems from being actively used in practice. In this paper, we use an alternative inclusion relation in which we consider an itemset to be included in an itemset if at most items of are not included in , i.e., | ∖ | ≤ . We address the problem of enumerating frequent itemsets under this inclusion relation and propose an efficient polynomial delay polynomial space algorithm. Moreover, To enable us to skip many small non-valuable frequent itemsets, we propose an algorithm for directly enumerating frequent itemsets of a certain size.

- Long Papers | Pp. 219-230

Discovering Implicit Feedbacks from Search Engine Log Files

Ashok Veilumuthu; Parthasarathy Ramachandran

A number of explicit and implicit feedback mechanisms have been proposed to improve the quality of the search engine results. The current approaches to information retrieval depends heavily on the web linkage structure which is a form of relevance judgment by the page authors. However, to overcome spamming attempts and the huge volumes of data, it is important to also incorporate the user feedback on the page relevance of a document. Since users hardly give explicit/direct feedback on search quality, it becomes necessary to consider implicit feedback that can be collected from search engine logs. In this article we evaluate two implicit feedback measures, namely click sequence and time spent in reading a document. We develop a mathematical programming model to collate the feedback collected from different sessions into a partial rank ordering of documents. The two implicit feedback measures, namely the click sequence and time spent in reading a document are compared for their feedback information content using Kendall’s measure. Experimental results based on actual log data from demonstrate that these two relevance judgment measures are not in perfect aggrement and hence incremental information can be derived from them.

- Long Papers | Pp. 231-242

Pharmacophore Knowledge Refinement Method in the Chemical Structure Space

Satoshi Fujishima; Yoshimasa Takahashi; Takashi Okada

Studies on the structure–activity relationship of drugs essentially require a relational learning scheme in order to extract meaningful chemical subgraphs; however, most relational learning systems suffer from a vast search space. On the other hand, some propositional logic mining methods use the presence or absence of chemical fragments as features, but rules so obtained give only crude knowledge about part of the pharmacophore structure. This paper proposes a knowledge refinement method in the chemical structure space for the latter approach. A simple hill-climbing approach was shown to be very useful if the seed fragment contains the essential characteristic of the pharmacophore. An application to the analysis of dopamine D1 agonists is discussed as an illustrative example.

- Regular Papers | Pp. 243-247

An Attempt to Rebuild C. Bernard’s Scientific Steps

Jean-Gabriel Ganascia; Bassel Habib

Our aim is to reconstruct Claude Bernard’s empirical investigations with a computational model. We suppose that he had in mind what we call “kernel models” that provide simplified views of physiology, which allowed him to make hypotheses and to draw out their logical consequences. We show how those “kernel models” can be specified using both description logics and multi-agent systems. Then, the paper will explain how it is possible to build a virtual experiment laboratory, which lets us construct and conduct virtual experiments.

- Regular Papers | Pp. 248-252

Semantic Annotation of Data Tables Using a Domain Ontology

Gaëlle Hignette; Patrice Buche; Juliette Dibie-Barthélemy; Ollivier Haemmerlé

In this paper, we show the different steps of an annotation process that allows one to annotate data tables with the relations of a domain ontology. The columns of a table are first segregated according to whether they represent numeric or symbolic data. Then, we annotate the numeric columns with their corresponding numeric type, and the symbolic columns with their corresponding symbolic type, combining different evidences from the ontology. The relations represented by a table are recognized using both the table title and the types of the columns. We give experimental results for our annotation method.

- Regular Papers | Pp. 253-258

Model Selection and Estimation Via Subjective User Preferences

Jaakko Hollmén

Subjective opinions of domain experts are often encountered in data analysis projects. Often, it is difficult to express the experts’ opinions in model form or integrate their professional knowledge in the analysis. In this paper, we approach the problem directly in the context of model selection and estimation: we ask the expert for subjective preferences between readily computed model solutions, and compute an optimal solution based on the recorded opinions. We consider the pre-computed models as graph nodes, and calculate the preferential relations between the nodes based on the recorded opinions as conditional probabilities. Using a random surfer model from the Web analysis community, we compute the stationary distribution of the preferences. The stationary distribution can be used in model selection by selecting the most probable model or in model estimation by averaging over the models according to their posterior probabilities. We present a real-life application in a regression problem of tree-ring width series data.

- Regular Papers | Pp. 259-263

Detecting Concept Drift Using Statistical Testing

Kyosuke Nishida; Koichiro Yamauchi

Detecting concept drift is important for dealing with real-world online learning problems. To detect concept drift in a small number of examples, methods that have an online classifier and monitor its prediction errors during the learning have been developed. We have developed such a detection method that uses a statistical test of equal proportions. Experimental results showed that our method performed well in detecting the concept drift in five synthetic datasets that contained various types of concept drift.

- Regular Papers | Pp. 264-269

Towards Future Technology Projection: A Method for Extracting Capability Phrases from Documents

Risa Nishiyama; Hironori Takeuchi; Hideo Watanabe

This paper deals with novel approaches for discovering phrases expressing technical capabilities in technical literature (such as patents), intended to support strategic consultants introducing new technologies and their capabilities to their clients. An extracted capability phrase is scored based on its expected business impact, which can also be considered as unexpectedness of the capability in a specified technology field. The proposed capability extraction method and unexpectedness estimation method are implemented in a “Future Technology Projection tool.” The tool will be utilized by the consultants to provide lists of capability phrases related to a technology field of interest to the consultants.

- Regular Papers | Pp. 270-274

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Andrea Pietracaprina; Fabio Vandin

In this work we study the mining of top- frequent closed itemsets, a recently proposed variant of the classical problem of mining frequent closed itemsets where the support threshold is chosen as the maximum value sufficient to guarantee that the itemsets returned in output be at least . We discuss the effectiveness of parameter in controlling the output size and develop an efficient algorithm for mining top- frequent closed itemsets in order of decreasing support, which exhibits consistently better performance than the best previously known one, attaining substantial improvements in some cases. A distinctive feature of our algorithm is that it allows the user to dynamically raise the value with no need to restart the computation from scratch.

- Regular Papers | Pp. 275-280

An Intentional Kernel Function for RNA Classification

Hiroshi Sankoh; Koichiro Doi; Akihiro Yamamoto

This paper presents a kernel function class which is based on the concept of the intentional kernel (Doi et al., 2006) as opposed to that of the convolution kernel (Haussler, 1999). A kernel function in computes the similarity between two RNA sequences from the viewpoint of secondary structures. As an instance of , we give the definition and the algorithm of which takes a pair of RNA sequences as its inputs, and facilitates Support Vector Machine (SVM) classifying RNA sequences in a higher dimension space. Our experimental results show a high performance of , compared with the string kernel which is a convolution kernel.

- Regular Papers | Pp. 281-285