Catálogo de publicaciones - libros

Compartir en
redes sociales


Discovery Science: 9th International Conference, DS 2006, Barcelona, Spain, October 7-10, 2006, Proceedings

Ljupčo Todorovski ; Nada Lavrač ; Klaus P. Jantke (eds.)

En conferencia: 9º International Conference on Discovery Science (DS) . Barcelona, Spain . October 7, 2006 - October 10, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Philosophy of Science; Artificial Intelligence (incl. Robotics); Database Management; Information Storage and Retrieval; Computer Appl. in Administrative Data Processing; Computer Appl. in Social and Behavioral Sciences

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-46491-4

ISBN electrónico

978-3-540-46493-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

e-Science and the Semantic Web: A Symbiotic Relationship

Carole Goble; Oscar Corcho; Pinar Alper; David De Roure

e-Science is scientific investigation performed through distributed global collaborations between scientists and their resources, and the computing infrastructure that enables this. Scientific progress increasingly depends on pooling know-how and results; making connections between ideas, people, and data; and finding and reusing knowledge and resources generated by others in perhaps unintended ways. It is about harvesting and harnessing the “collective intelligence” of the scientific community. The Semantic Web is an extension of the current Web in which information is given well-defined meaning to facilitate sharing and reuse, better enabling computers and people to work in cooperation. Applying the Semantic Web paradigm to e-Science has the potential to bring significant benefits to scientific discovery. We identify the benefits of lightweight and heavyweight approaches, based on our experiences in the Life Sciences.

I - Invited Papers | Pp. 1-12

Data-Driven Discovery Using Probabilistic Hidden Variable Models

Padhraic Smyth

Generative probabilistic models have proven to be a very useful framework for machine learning from scientific data. Key ideas that underlie the generative approach include (a) representing complex stochastic phenomena using the structured language of graphical models, (b) using latent (hidden) variables to make inferences about unobserved phenomena, and (c) leveraging Bayesian ideas for learning and prediction. This talk will begin with a brief review of learning from data with hidden variables and then discuss some exciting recent work in this area that has direct application to a broad range of scientific problems. A number of different scientific data sets will be used as examples to illustrate the application of these ideas in probabilistic learning, such as time-course microarray expression data, functional magnetic resonance imaging (fMRI) data of the human brain, text documents from the biomedical literature, and sets of cyclone trajectories.

I - Invited Papers | Pp. 13-13

Reinforcement Learning and Apprenticeship Learning for Robotic Control

Andrew Ng

Many control problems, such as autonomous helicopter flight, legged robot locomotion, and autonomous driving are difficult because (i) It is hard to write down, in closed form, a formal specification of the control task (for example, what is the cost function for ”driving well”?), (ii) It is difficult to learn good models of the robot’s dynamics, and (iii) It is expensive to find closed-loop controllers for high dimensional, highly stochastic domains. Using apprenticeship learning—in which we learn from a human demonstration of a task—as a unifying theme, I will present formal results showing how many control problems can be efficiently addressed given access to a demonstration. In presenting these ideas, I will also draw from a number of case studies, including applications in autonomous helicopter flight, quadruped obstacle negotiation, snake robot locomotion, and high-speed off-road navigation.

Finally, I will also describe the application of these ideas to the STAIR (STanford AI Robot) project, which has the long term goal of integrating methods from all major areas of AI—including spoken dialog/NLP, manipulation, vision, navigation, and planning—to build a general-purpose, “intelligent” home/office robotic assistant.

I - Invited Papers | Pp. 14-14

The Solution of Semi-Infinite Linear Programs Using Boosting-Like Methods

Gunnar Rätsch

We consider methods for the solution of large linear optimization problems, in particular so-called Semi-Infinite Linear Programs (SILPs) that have a finite number of variables but infinitely many linear constraints. We illustrate that such optimization problems frequently appear in machine learning and discuss several examples including maximum margin boosting, multiple kernel learning and structure learning. In the second part we review methods for solving SILPs. Here, we are particularly interested in methods related to boosting. We review recent theoretical results concerning the convergence of these algorithms and conclude this work with a discussion of empirical results comparing these algorithms.

I - Invited Papers | Pp. 15-15

Spectral Norm in Learning Theory: Some Selected Topics

Hans Ulrich Simon

In this paper, we review some known results that relate the statistical query complexity of a concept class to the spectral norm of its correlation matrix. Since spectral norms are widely used in various other areas, we are then able to put statistical query complexity in a broader context. We briefly describe some non-trivial connections to (seemingly) different topics in learning theory, complexity theory, and cryptography. A connection to the so-called Hidden Number Problem, which plays an important role for proving bit-security of cryptographic functions, will be discussed in somewhat more detail.

I - Invited Papers | Pp. 16-16

Classification of Changing Regions Based on Temporal Context in Local Spatial Association

Jae-Seong Ahn; Yang-Won Lee; Key-Ho Park

We propose a method of modeling regional changes in local spatial association and classifying the changing regions based on the similarity of time-series signature of local spatial association. For intuitive recognition of time-series local spatial association, we employ Moran scatterplot and extend it to QS-TiMoS (Quadrant Sequence on Time-series Moran Scatterplot) that allows for examining temporal context in local spatial association using a series of categorical variables. Based on the QS-TiMoS signature of nodes and edges, we develop the similarity measures for “state sequence” and “clustering transition” of time-series local spatial association. The similarity matrices generated from the similarity measures are then used for producing the classification maps of time-series local spatial association that present the history of changing regions in clusters. The feasibility of the proposed method is tested by a case study on the rate of land price fluctuation of 232 administrative units in Korea, 1995-2004.

II - Long Papers | Pp. 17-28

Kalman Filters and Adaptive Windows for Learning in Data Streams

Albert Bifet; Ricard Gavaldà

We study the combination of Kalman filter and a recently proposed algorithm for dynamically maintaining a sliding window, for learning from streams of examples. We integrate this idea into two well-known learning algorithms, the Naïve Bayes algorithm and the -means clusterer. We show on synthetic data that the new algorithms do never worse, and in some cases much better, than the algorithms using only memoryless Kalman filters or sliding windows with no filtering.

II - Long Papers | Pp. 29-40

Scientific Discovery: A View from the Trenches

Catherine Blake; Meredith Rendall

One of the primary goals in discovery science is to understand the human scientific reasoning processes. Despite sporadic success of automated discovery systems, few studies have systematically explored the socio-technical environments in which a discovery tool will ultimately be embedded. Modeling day-to-day activities of experienced scientists as they develop and verify hypotheses provides both a glimpse into the human cognitive processes surrounding discovery and a deeper understanding of the characteristics that are required for a discovery system to be successful. In this paper, we describe a study of experienced faculty in chemistry and chemical engineering as they engage in what Kuhn would call “normal” science, focusing in particular on how these scientists characterize discovery, how they arrive at their research question, and the processes they use to transform an initial idea into a subsequent publication. We discuss gaps between current definitions used in discovery science, and examples of system design improvements that would better support the information environment and activities in normal science.

II - Long Papers | Pp. 41-52

Optimal Bayesian 2D-Discretization for Variable Ranking in Regression

Marc Boullé; Carine Hue

In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.

II - Long Papers | Pp. 53-64

Text Data Clustering by Contextual Graphs

Krzysztof Ciesielski; Mieczysław A. Kłopotek

In this paper, we focus on the class of graph-based clustering models, such as growing neural gas or idiotypic nets for the purpose of high-dimensional text data clustering. We present a novel approach, which does not require operation on the complex overall graph of clusters, but rather allows to shift majority of effort to context-sensitive, local subgraph and local sub-space processing. Savings of orders of magnitude in processing time and memory can be achieved, while the quality of clusters is improved, as presented experiments demonstrate.

II - Long Papers | Pp. 65-76