Catálogo de publicaciones - libros


Grammatical Inference: Algorithms and Applications: 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings

Yasubumi Sakakibara ; Satoshi Kobayashi ; Kengo Sato ; Tetsuro Nishino ; Etsuji Tomita (eds.)

En conferencia: 8º International Colloquium on Grammatical Inference (ICGI) . Tokyo, Japan . September 20, 2006 - September 22, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-45264-5

ISBN electrónico

978-3-540-45265-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples

A. Lemay; J. Niehren; R. Gilleron

We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (n- NSTT s). These are tree automata that capture the class of monadic second-order definable n-ary queries. We show that n- NSTT s defined polynomially bounded n-ary queries can be learned from polynomial time and data. An application in Web information extraction yields encouraging results.

Palabras clave: Polynomial Time; Tree Automaton; Tree Language; Tree Transducer; Deterministic Automaton.

- Regular Papers | Pp. 253-267

Learning Multiplicity Tree Automata

Amaury Habrard; Jose Oncina

In this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define functions over any field K. A multiplicity automaton admits a support which is a non deterministic automaton. From a grammatical inference point of view, this paper presents a contribution which is original due to the combination of two important aspects. This is the first time, as far as we now, that a learning method focuses on non deterministic tree automata which computes functions over a field. The algorithm proposed in this paper stands in Angluin’s exact model where a learner is allowed to use membership and equivalence queries. We show that this algorithm is polynomial in time in function of the size of the representation.

Palabras clave: multiplicity tree automata; recognizable tree series; learning from equivalence and membership queries.

- Regular Papers | Pp. 268-280

Learning DFA from Correction and Equivalence Queries

Leonor Becerra-Bonache; Adrian Horia Dediu; Cristina Tîrnăucă

In active learning, membership queries and equivalence que- ries have established themselves as the standard combination to be used. However, they are quite “unnatural” for real learning environments (membership queries are oversimplified and equivalence queries do not have a correspondence in a real life setting). Based on several linguistic arguments that support the presence of corrections in children’s language acquisition, we propose another kind of query called correction query. We provide an algorithm that learns DFA using correction and equivalence queries in polynomial time. Despite the fact that the worst case complexity of our algorithm is not better than Angluin’s algorithm, we show through a large number of experiments that the average number of queries is considerably reduced by using correction queries.

Palabras clave: Active learning; learning DFA; membership query; equivalence query; correction query.

- Regular Papers | Pp. 281-292

Using MDL for Grammar Induction

Pieter Adriaans; Ceriel Jacobs

In this paper we study the application of the Minimum Description Length principle (or two-part-code optimization) to grammar induction in the light of recent developments in Kolmogorov complexity theory. We focus on issues that are important for construction of effective compression algorithms. We define an independent measure for the quality of a theory given a data set: the randomness deficiency . This is a measure of how typical the data set is for the theory. It can not be computed, but it can in many relevant cases be approximated. An optimal theory has minimal randomness deficiency. Using results from [4] and [2] we show that: – Shorter code not necessarily leads to better theories. We prove that, in DFA induction, already as a result of a single deterministic merge of two nodes, divergence of randomness deficiency and MDL code can occur. – Contrary to what is suggested by the results of [6] there is no fundamental difference between positive and negative data from an MDL perspective. – MDL is extremely sensitive to the correct calculation of code length: model code and data-to-model code. These results show why the applications of MDL to grammar induction so far have been disappointing. We show how the theoretical results can be deployed to create an effective algorithm for DFA induction. However, we believe that, since MDL is a global optimization criterion, MDL based solutions will in many cases be less effective in problem domains where local optimization criteria can be easily calculated. The algorithms were tested on the Abbadingo problems ([10]). The code was in Java, using the Satin ([17]) divide-and-conquer system that runs on top of the Ibis ([18]) Grid programming environment.

Palabras clave: Binary String; Regular Language; Kolmogorov Complexity; Short Code; Minimum Description Length Principle.

- Regular Papers | Pp. 293-306

Characteristic Sets for Inferring the Unions of the Tree Pattern Languages by the Most Fitting Hypotheses

Yen Kaow Ng; Takeshi Shinohara

A tree pattern p is a first-order term in formal logic, and the language of p is the set of all the tree patterns obtainable by replacing each variable in p with a tree pattern containing no variables. We consider the inductive inference of the unions of these languages from positive examples using strategies that guarantee some forms of minimality during the learning process. By a result in our earlier work, the existence of a characteristic set for each language in a class ${\mathcal L }$ (within ${\mathcal L }$ ) implies that ${\mathcal L }$ can be identified in the limit by a learner that simply conjectures a hypothesis containing the examples, that is minimal in the number of elements of up to an appropriate size. Furthermore, if there is a size ℓ such that each candidate hypothesis has a characteristic set (within the languages in ${\mathcal L }$ that intersects non-emptily with the examples) that consists only of elements of up to size ℓ, then the hypotheses containing the least number of elements of up to size ℓ are at the same time minimal with respect to inclusion. In this paper we show how to determine such a size ℓ for the unions of the tree pattern languages, and hence allowing us to learn the class using hypotheses that fulfill the two mentioned notions of minimality.

Palabras clave: Tree Pattern; Inductive Inference; Variable Node; Positive Data; Pattern Language.

- Regular Papers | Pp. 307-319

Learning Deterministic DEC Grammars Is Learning Rational Numbers

Pieter Adriaans

Below I show that the class of strings that can be learned by a deterministic DEC grammar is exactly the class of rational numbers between 0 and 1. I call this the class of semi-periodic or rational strings. Dynamically Expanding Context (Dec) grammars were introduced by Kohonen in order to model speech signals ([8]). They can be learned in quadratic time in the size of the grammar. They have been used successfully in the automatic generation and analysis of music ([7],[5],[6]).

- Regular Papers | Pp. 320-326

Iso-array Acceptors and Learning

T. Kalyani; V. R. Dare; D. G. Thomas; T. Robinson

The notions of iso-arrays, iso-pictures, local iso-picture languages and recognizable iso-picture languages have been introduced and studied in [6]. In [6] we have provided an algorithm to learn local iso-picture languages through identification in the limit using positive data. In this paper, we construct a two-dimensional on-line tessellation automaton to recognize iso-picture languages and present an algorithm to learn recognizable iso-picture languages from positive data and restricted subset queries.

Palabras clave: Learning; iso-picture languages; recognizable iso-picture languages; two-dimensional on-line tessellation automaton.

- Regular Papers | Pp. 327-339

A Merging States Algorithm for Inference of RFSAs

Gloria Alvarez; Pedro García; José Ruiz

The aim of this paper is to present MRIA , a new merging states algorithm for inference of Residual Finite State Automata.

Palabras clave: Positive Sample; Formal Language; Target Language; Input Sample; Regular Language.

- Poster Papers | Pp. 340-341

Query-Based Learning of XPath Expressions

Julien Carme; Michal Ceresna; Max Goebel

This work analyzes the application of active learning using example-based queries to the problem of constructing an XPath expression from visual interaction with an human user.

Palabras clave: XML; tree structured data; query based learning; XPath.

- Poster Papers | Pp. 342-343

Learning Finite-State Machines from Inexperienced Teachers

Olga Grinchtein; Martin Leucker

The general goal of query-based learning algorithms for finite-state machines is to identify a machine , usually of minimum size, that agrees with an a priori fixed (class of) machines. For this, queries on how the underlying system behaves may be issued.

- Poster Papers | Pp. 344-345