Catálogo de publicaciones - libros

Compartir en
redes sociales

Grammatical Inference: Algorithms and Applications: 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings

Yasubumi Sakakibara ; Satoshi Kobayashi ; Kengo Sato ; Tetsuro Nishino ; Etsuji Tomita (eds.)

En conferencia: 8º International Colloquium on Grammatical Inference (ICGI) . Tokyo, Japan . September 20, 2006 - September 22, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-45264-5

ISBN electrónico

978-3-540-45265-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11872436_11

Learning Analysis by Reduction from Positive Data

František Mráz; Friedrich Otto; Martin Plátek

Analysis by reduction is a linguistically motivated method for checking correctness of a sentence. It can be modelled by restarting automata. In this paper we propose a method for learning restarting automata which are strictly locally testable ( SLT - R -automata). The method is based on the concept of identification in the limit from positive examples only. Also we characterize the class of languages accepted by SLT - R -automata with respect to the Chomsky hierarchy.

- Regular Papers | Pp. 125-136

doi: 10.1007/11872436_12

Inferring Grammars for Mildly Context Sensitive Languages in Polynomial-Time

Tim Oates; Tom Armstrong; Leonor Becerra Bonache; Mike Atamas

Natural languages contain regular, context-free, and context-sensitive syntactic constructions, yet none of these classes of formal languages can be identified in the limit from positive examples. Mildly context-sensitive languages are able to represent some context-sensitive constructions, those most common in natural languages, such as multiple agreement, crossed agreement, and duplication. These languages are attractive for natural language applications due to their expressiveness, and the fact that they are not fully context-sensitive should lead to computational advantages as well. We realize one such computational advantage by presenting the first polynomial-time algorithm for inferring Simple External Context Grammars, a class of mildly context-sensitive grammars, from positive examples.

Palabras clave: Natural Language; Regular Language; Bijective Mapping; Computational Advantage; Lexical Semantic.

- Regular Papers | Pp. 137-147

doi: 10.1007/11872436_13

Planar Languages and Learnability

Alexander Clark; Christophe Costa Florêncio; Chris Watkins; Mariette Serayet

Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the pre-image of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernel-based learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting context-sensitive languages. In particular we show that the cross-serial dependencies in Swiss German, that established the non-context-freeness of natural language, are learnable using a standard kernel. We demonstrate the polynomial-time identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.

- Regular Papers | Pp. 148-160

doi: 10.1007/11872436_14

A Unified Algorithm for Extending Classes of Languages Identifiable in the Limit from Positive Data

Mitsuo Wakatsuki; Etsuji Tomita; Go Yamada

We are concerned with a unified algorithm for extending classes of languages identifiable in the limit from positive data. Let $ {\mathcal L} $ be a class of languages to be based on and let $ {\mathcal X} $ be a class of finite subsets of strings. The extended class of $ {\mathcal L} $ , denoted by $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ , is defined by these ${\mathcal L} $ and $ {\mathcal X} $ . Here we give a sufficient condition for $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ to be identifiable in the limit from positive data and we present a unified identification algorithm for it. Furthermore, we show that some proper subclasses of $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ are polynomial time identifiable in the limit from positive data in the sense of Yokomori.

Palabras clave: Polynomial Time; Target Language; Inductive Inference; Regular Language; Positive Data.

- Regular Papers | Pp. 161-174

doi: 10.1007/11872436_15

Protein Motif Prediction by Grammatical Inference

Piedachu Peris; Damián López; Marcelino Campos; José M. Sempere

The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.

Palabras clave: Grammatical inference; bioinformatics; protein motif location.

- Regular Papers | Pp. 175-187

doi: 10.1007/11872436_16

Grammatical Inference in Practice: A Case Study in the Biomedical Domain

Sophia Katrenko; Pieter Adriaans

In this paper we discuss an approach to named entity recognition (NER) based on grammatical inference (GI). Previous GI approaches have aimed at constructing a grammar underlying a given text source. It has been noted that the rules produced by GI can also be interpreted semantically [16] where a non-terminal describes interchangeable elements which are the instances of the same concepts. Such an observation leads to the hypothesis that GI might be useful for finding concept instances in a text. Furthermore, it should also be possible to discover relations between concepts, or more precisely, the way such relations are expressed linguistically. Throughout the paper, we propose a general framework for using GI for named entity recognition by discussing several possible approaches. In addition, we demonstrate that these methods successfully work on biomedical data using an existing GI tool.

Palabras clave: Equivalence Class; Jurkat Cell; Name Entity Recog; Entity Recognition; Semantic Class.

- Regular Papers | Pp. 188-200

doi: 10.1007/11872436_17

Inferring Grammar Rules of Programming Language Dialects

Alpana Dubey; Pankaj Jalote; Sanjeev Kumar Aggarwal

In this paper we address the problem of grammatical inference in the programming language domain. The grammar of a programming language is an important asset because it is used in developing many software engineering tools. Sometimes, grammars of languages are not available and have to be inferred from the source code; especially in the case of programming language dialects. We propose an approach for inferring the grammar of a programming language when an incomplete grammar along with a set of correct programs is given as input. The approach infers a set of grammar rules such that the addition of these rules makes the initial grammar complete. A grammar is complete if it parses all the input programs successfully. We also proposes a rule evaluation order, i.e. an order in which the rules are evaluated for correctness. A set of rules are correct if their addition makes the grammar complete. Experiments show that the proposed rule evaluation order improves the process of grammar inference.

Palabras clave: Programming language grammars; Dialects; Minimum Description Length Principle.

- Regular Papers | Pp. 201-213

doi: 10.1007/11872436_18

The Tenjinno Machine Translation Competition

Bradford Starkie; Menno van Zaanen; Dominique Estival

This paper describes the Tenjinno Machine Translation Competition held as part of the International Colloquium on Grammatical Inference 2006. The competition aimed to promote the development of new and better practical grammatical inference algorithms used in machine translation. Tenjinno focuses on formal models used in machine translation. We discuss design issues and decisions made when creating the competition. For the purpose of setting the competition tasks, a measure of the complexity of learning a transducer was developed. This measure has enabled us to compare the competition tasks to other published results, and it can be seen that the problems solved in the competition were of a greater complexity and were solved with lower word error rates than other published results. In addition the complexity measures and benchmark problems can be used to track the progress of the state-of-the-art into the future.

- Regular Papers | Pp. 214-226

doi: 10.1007/11872436_19

Large Scale Inference of Deterministic Transductions: Tenjinno Problem 1

Alexander Clark

We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.

Palabras clave: Machine Translation; Expectation Maximisation Algorithm; Input String; State Automaton; Input Language.

- Regular Papers | Pp. 227-239

doi: 10.1007/11872436_20

A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

Marc Bernard; Jean-Christophe Janodet; Marc Sebban

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.

Palabras clave: Edit Distance; Stochastic Transducers; Discriminative Models; Grammatical Inference.

- Regular Papers | Pp. 240-252