Catálogo de publicaciones - libros

Compartir en
redes sociales


Discovery Science: 10th International Conference, DS 2007 Sendai, Japan, October 1-4, 2007. Proceedings

Vincent Corruble ; Masayuki Takeda ; Einoshin Suzuki (eds.)

En conferencia: 10º International Conference on Discovery Science (DS) . Sendai, Japan . October 1, 2007 - October 4, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Philosophy of Science; Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Computer Appl. in Administrative Data Processing; Computer Appl. in Social and Behavioral Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75487-9

ISBN electrónico

978-3-540-75488-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Challenge for Info-plosion

Masaru Kitsuregawa

Information created by people has increased rapidly since the year 2000, and now we are in a time which we could call the “information-explosion era.” The project “Cyber Infrastructure for the Information-explosion Era” is a six-year project from 2005 to 2010 supported by Grant-in-Aid for Scientific Research on Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan. The project aims to establish the following fundamental technologies in this information-explosion era: novel technologies for efficient and trustable information retrieval from explosively growing and heterogeneous information resources; stable, secure, and scalable information systems for managing rapid information growth; and information utilization by harmonized human-system interaction. It also aims to design a social system that cooperates with these technologies. Moreover, it maintains the synergy of cutting-edge technologies in informatics.

- Invited Papers | Pp. 1-8

Machine Learning in Ecosystem Informatics

Thomas G. Dietterich

The emerging field of Ecosystem Informatics applies methods from computer science and mathematics to address fundamental and applied problems in the ecosystem sciences. The ecosystem sciences are in the midst of a revolution driven by a combination of emerging technologies for improved sensing and the critical need for better science to help manage global climate change. This paper describes several initiatives at Oregon State University in ecosystem informatics. At the level of sensor technologies, this paper describes two projects: (a) wireless, battery-free sensor networks for forests and (b) rapid throughput automated arthropod population counting. At the level of data preparation and data cleaning, this paper describes the application of linear gaussian dynamic Bayesian networks to automated anomaly detection in temperature data streams. Finally, the paper describes two educational activities: (a) a summer institute in ecosystem informatics and (b) an interdisciplinary Ph.D. program in Ecosystem Informatics for mathematics, computer science, and the ecosystem sciences.

- Invited Papers | Pp. 9-25

Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity

Jürgen Schmidhuber

I postulate that human or other intelligent agents function or should function as follows. They store all sensory observations as they come—the data is ‘holy.’ At any time, given some agent’s current coding capabilities, part of the data is compressible by a short and hopefully fast program / description / explanation / world model. In the agent’s subjective eyes, such data is more regular and more than other data. It is well-known that knowledge of regularity and repeatability may improve the agent’s ability to plan actions leading to external rewards. In absence of such rewards, however, beauty is boring. Then becomes the of subjective beauty: as the learning agent improves its compression algorithm, formerly apparently random data parts become subjectively more regular and beautiful. Such progress in data compression is measured and maximized by the drive: create action sequences that extend the observation history and yield previously unknown / unpredictable but quickly learnable algorithmic regularity. I discuss how all of the above can be naturally implemented on computers, through an extension of passive unsupervised learning to the case of active data selection: we reward a general reinforcement learner (with access to the adaptive compressor) for actions that improve the subjective compressibility of the growing data. An unusually large compression breakthrough deserves the name . The of artists, dancers, musicians, pure mathematicians can be viewed as a by-product of this principle. Several qualitative examples support this hypothesis.

- Invited Papers | Pp. 26-38

A Theory of Similarity Functions for Learning and Clustering

Avrim Blum

Kernel methods have proven to be powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe work on developing a theory that applies to more general similarity functions (not just legal kernels) and furthermore describes the usefulness of a given similarity function in terms of more intuitive, direct properties, without need to refer to any implicit spaces.

An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to well: to learn well without any label information at all. We find that if we are willing to relax the objective a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the correct answer), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. This work can be viewed as an approach to defining a PAC model for clustering.

This talk is based on work joint with Maria-Florina Balcan and Santosh Vempala.

- Invited Papers | Pp. 39-39

A Hilbert Space Embedding for Distributions

Alex Smola; Arthur Gretton; Le Song; Bernhard Schölkopf

While kernel methods are the basis of many popular techniques in supervised learning, they are less commonly used in testing, estimation, and analysis of probability distributions, where information theoretic approaches rule the roost. However it becomes difficult to estimate mutual information or entropy if the data are high dimensional.

- Invited Papers | Pp. 40-41

Time and Space Efficient Discovery of Maximal Geometric Graphs

Hiroki Arimura; Takeaki Uno; Shinichi Shimozono

A is a labeled graph whose vertices are points in the 2D plane with an isomorphism invariant under geometric transformations such as translation, rotation, and scaling. While Kuramochi and Karypis (ICDM2002) extensively studied the frequent pattern mining problem for geometric subgraphs, the maximal graph mining has not been considered so far. In this paper, we study the maximal (or closed) graph mining problem for the general class of geometric graphs in the 2D plane by extending the framework of Kuramochi and Karypis. Combining techniques of canonical encoding and a depth-first search tree for the class of maximal patterns, we present , MaxGeo, in a given input geometric graph without duplicates. This is the first result establishing the output-sensitive complexity of closed graph mining for geometric graphs. We also show that the frequent graph mining problem is also solvable in polynomial delay and polynomial time.

- Long Papers | Pp. 42-55

Iterative Reordering of Rules for Building Ensembles Without Relearning

Paulo J. Azevedo; Alípio M. Jorge

We study a new method for improving the classification accuracy of a model composed of classification association rules (CAR). The method consists in reordering the original set of rules according to the error rates obtained on a set of training examples. This is done iteratively, starting from the original set of rules. After obtaining models these are used as an ensemble for classifying new cases. The net effect of this approach is that the original rule model is clearly improved. This improvement is due to the ensembling of the obtained models, which are, individually, slightly better than the original one. This ensembling approach has the advantage of running a single learning process, since the models in the ensemble are obtained by self replicating the original one.

- Long Papers | Pp. 56-67

On Approximating Minimum Infrequent and Maximum Frequent Sets

Mario Boley

The maximum cardinality of a frequent set as well as the minimum cardinality of an infrequent set are important characteristic numbers in frequent (item) set mining. Gunopulos et al. [10] have shown that finding a maximum frequent set is -hard. In this paper I show that the minimization problem is also -hard. As a next step I investigate whether these problems can be approximated. While a simple greedy algorithm turns out to approximate a minimum infrequent set within a logarithmic factor one can show that there is no such algorithm for the maximization problem.

- Long Papers | Pp. 68-77

A Partially Dynamic Clustering Algorithm for Data Insertion and Removal

Haytham Elghazel; Hamamache Kheddouci; Véronique Deslandres; Alain Dussauchoy

We consider the problem of clustering which has been addressed in many contexts and applications including dynamic information retrieval, Web documents classification, etc. The goal is to efficiently maintain homogenous and well-separated clusters as new data are inserted or existing data are removed. We propose a framework called based solely on pairwise dissimilarities among all pairs of data and on cluster dominance. In experiments on benchmark data sets, we show improvements in the performance of clustering solution in terms of quality and computational complexity.

- Long Papers | Pp. 78-90

Positivism Against Constructivism: A Network Game to Learn Epistemology

Hélène Hagège; Christopher Dartnell; Jean Sallantin

As mentioned in French secondary school official texts, teaching science implies teaching scientific process. This poses the problem of how to teach epistemology, as traditional science teaching is mostly dogmatic and based on contents. Previous studies show that pupils, science students and teachers mostly own positivist and realist spontaneous conceptions of science and scientific discovery. Here, we present the evaluation of the didactic impact of a network game, Eleusis+Nobel, on third year biology students who aim at becoming teachers. This cards game, based on a Popperian epistemology, has been designed to reproduce the scientific discovery process in a community. In the limits of our study, results obtained with classical social psychology tools indicate that students who played this game specifically assimilated the subjective dimension of knowledge and the role of the community in their conception of science, on the contrary to negative control students, who did not play.

- Long Papers | Pp. 91-103