Catálogo de publicaciones - libros
Grammatical Inference: Algorithms and Applications: 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings
Yasubumi Sakakibara ; Satoshi Kobayashi ; Kengo Sato ; Tetsuro Nishino ; Etsuji Tomita (eds.)
En conferencia: 8º International Colloquium on Grammatical Inference (ICGI) . Tokyo, Japan . September 20, 2006 - September 22, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
| Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
|---|---|---|---|---|
| No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-45264-5
ISBN electrónico
978-3-540-45265-2
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11872436_11
Learning Analysis by Reduction from Positive Data
František Mráz; Friedrich Otto; Martin Plátek
Analysis by reduction is a linguistically motivated method for checking correctness of a sentence. It can be modelled by restarting automata. In this paper we propose a method for learning restarting automata which are strictly locally testable ( SLT - R -automata). The method is based on the concept of identification in the limit from positive examples only. Also we characterize the class of languages accepted by SLT - R -automata with respect to the Chomsky hierarchy.
- Regular Papers | Pp. 125-136
doi: 10.1007/11872436_12
Inferring Grammars for Mildly Context Sensitive Languages in Polynomial-Time
Tim Oates; Tom Armstrong; Leonor Becerra Bonache; Mike Atamas
Natural languages contain regular, context-free, and context-sensitive syntactic constructions, yet none of these classes of formal languages can be identified in the limit from positive examples. Mildly context-sensitive languages are able to represent some context-sensitive constructions, those most common in natural languages, such as multiple agreement, crossed agreement, and duplication. These languages are attractive for natural language applications due to their expressiveness, and the fact that they are not fully context-sensitive should lead to computational advantages as well. We realize one such computational advantage by presenting the first polynomial-time algorithm for inferring Simple External Context Grammars, a class of mildly context-sensitive grammars, from positive examples.
Palabras clave: Natural Language; Regular Language; Bijective Mapping; Computational Advantage; Lexical Semantic.
- Regular Papers | Pp. 137-147
doi: 10.1007/11872436_13
Planar Languages and Learnability
Alexander Clark; Christophe Costa Florêncio; Chris Watkins; Mariette Serayet
Strings can be mapped into Hilbert spaces using feature maps such as the Parikh map. Languages can then be defined as the pre-image of hyperplanes in the feature space, rather than using grammars or automata. These are the planar languages. In this paper we show that using techniques from kernel-based learning, we can represent and efficiently learn, from positive data alone, various linguistically interesting context-sensitive languages. In particular we show that the cross-serial dependencies in Swiss German, that established the non-context-freeness of natural language, are learnable using a standard kernel. We demonstrate the polynomial-time identifiability in the limit of these classes, and discuss some language theoretic properties of these classes, and their relationship to the choice of kernel/feature map.
- Regular Papers | Pp. 148-160
doi: 10.1007/11872436_14
A Unified Algorithm for Extending Classes of Languages Identifiable in the Limit from Positive Data
Mitsuo Wakatsuki; Etsuji Tomita; Go Yamada
We are concerned with a unified algorithm for extending classes of languages identifiable in the limit from positive data. Let $ {\mathcal L} $ be a class of languages to be based on and let $ {\mathcal X} $ be a class of finite subsets of strings. The extended class of $ {\mathcal L} $ , denoted by $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ , is defined by these ${\mathcal L} $ and $ {\mathcal X} $ . Here we give a sufficient condition for $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ to be identifiable in the limit from positive data and we present a unified identification algorithm for it. Furthermore, we show that some proper subclasses of $ {\mathcal C}({\mathcal L}, {\mathcal X}) $ are polynomial time identifiable in the limit from positive data in the sense of Yokomori.
Palabras clave: Polynomial Time; Target Language; Inductive Inference; Regular Language; Positive Data.
- Regular Papers | Pp. 161-174
doi: 10.1007/11872436_15
Protein Motif Prediction by Grammatical Inference
Piedachu Peris; Damián López; Marcelino Campos; José M. Sempere
The rapid growth of protein sequence databases is exceeding the capacity of biochemically and structurally characterizing new proteins. Therefore, it is very important the development of tools to locate, within protein sequences, those subsequences with an associated function or specific feature. In our work, we propose a method to predict one of those functional motifs (coiled coil), related with protein interaction. Our approach uses even linear languages inference to obtain a transductor which will be used to label unknown sequences. The experiments carried out show that our method outperforms the results of previous approaches.
Palabras clave: Grammatical inference; bioinformatics; protein motif location.
- Regular Papers | Pp. 175-187
doi: 10.1007/11872436_16
Grammatical Inference in Practice: A Case Study in the Biomedical Domain
Sophia Katrenko; Pieter Adriaans
In this paper we discuss an approach to named entity recognition (NER) based on grammatical inference (GI). Previous GI approaches have aimed at constructing a grammar underlying a given text source. It has been noted that the rules produced by GI can also be interpreted semantically [16] where a non-terminal describes interchangeable elements which are the instances of the same concepts. Such an observation leads to the hypothesis that GI might be useful for finding concept instances in a text. Furthermore, it should also be possible to discover relations between concepts, or more precisely, the way such relations are expressed linguistically. Throughout the paper, we propose a general framework for using GI for named entity recognition by discussing several possible approaches. In addition, we demonstrate that these methods successfully work on biomedical data using an existing GI tool.
Palabras clave: Equivalence Class; Jurkat Cell; Name Entity Recog; Entity Recognition; Semantic Class.
- Regular Papers | Pp. 188-200
doi: 10.1007/11872436_17
Inferring Grammar Rules of Programming Language Dialects
Alpana Dubey; Pankaj Jalote; Sanjeev Kumar Aggarwal
In this paper we address the problem of grammatical inference in the programming language domain. The grammar of a programming language is an important asset because it is used in developing many software engineering tools. Sometimes, grammars of languages are not available and have to be inferred from the source code; especially in the case of programming language dialects. We propose an approach for inferring the grammar of a programming language when an incomplete grammar along with a set of correct programs is given as input. The approach infers a set of grammar rules such that the addition of these rules makes the initial grammar complete. A grammar is complete if it parses all the input programs successfully. We also proposes a rule evaluation order, i.e. an order in which the rules are evaluated for correctness. A set of rules are correct if their addition makes the grammar complete. Experiments show that the proposed rule evaluation order improves the process of grammar inference.
Palabras clave: Programming language grammars; Dialects; Minimum Description Length Principle.
- Regular Papers | Pp. 201-213
doi: 10.1007/11872436_18
The Tenjinno Machine Translation Competition
Bradford Starkie; Menno van Zaanen; Dominique Estival
This paper describes the Tenjinno Machine Translation Competition held as part of the International Colloquium on Grammatical Inference 2006. The competition aimed to promote the development of new and better practical grammatical inference algorithms used in machine translation. Tenjinno focuses on formal models used in machine translation. We discuss design issues and decisions made when creating the competition. For the purpose of setting the competition tasks, a measure of the complexity of learning a transducer was developed. This measure has enabled us to compare the competition tasks to other published results, and it can be seen that the problems solved in the competition were of a greater complexity and were solved with lower word error rates than other published results. In addition the complexity measures and benchmark problems can be used to track the progress of the state-of-the-art into the future.
- Regular Papers | Pp. 214-226
doi: 10.1007/11872436_19
Large Scale Inference of Deterministic Transductions: Tenjinno Problem 1
Alexander Clark
We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.
Palabras clave: Machine Translation; Expectation Maximisation Algorithm; Input String; State Automaton; Input Language.
- Regular Papers | Pp. 227-239
doi: 10.1007/11872436_20
A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer
Marc Bernard; Jean-Christophe Janodet; Marc Sebban
Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.
Palabras clave: Edit Distance; Stochastic Transducers; Discriminative Models; Grammatical Inference.
- Regular Papers | Pp. 240-252