Catálogo de publicaciones - libros

Compartir en
redes sociales


Grammatical Inference: Algorithms and Applications: 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings

Yasubumi Sakakibara ; Satoshi Kobayashi ; Kengo Sato ; Tetsuro Nishino ; Etsuji Tomita (eds.)

En conferencia: 8º International Colloquium on Grammatical Inference (ICGI) . Tokyo, Japan . September 20, 2006 - September 22, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-45264-5

ISBN electrónico

978-3-540-45265-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Parsing Without Grammar Rules

Yuji Matsumoto

In this article, we present and contrast recent statistical approaches to word dependency parsing and lexicalized formalisms for grammar and semantics. We then consider the possibility of integrating those two extreme ideas, which leads to fully lexicalized parsing without any syntactic grammar rules.

Palabras clave: dependency parsing; lexicalized grammar; lexical semantics.

- Invited Papers | Pp. 1-6

Classification of Biological Sequences with Kernel Methods

Jean-Philippe Vert

We survey the foundations of kernel methods and the recent developments of kernels for variable-length strings, in the context of biological sequence analysis.

Palabras clave: Support Vector Machine; Kernel Method; Hide State; Reproduce Kernel Hilbert Space; Biological Sequence.

- Invited Papers | Pp. 7-18

Identification in the Limit of Systematic-Noisy Languages

Frédéric Tantini; Colin de la Higuera; Jean-Christophe Janodet

To study the problem of learning from noisy data, the common approach is to use a statistical model of noise. The influence of the noise is then considered according to pragmatic or statistical criteria, by using a paradigm taking into account a distribution of the data. In this article, we study the noise as a nonstatistical phenomenon, by defining the concept of systematic noise. We establish various ways of learning (in the limit) from noisy data. The first is based on a technique of reduction between problems and consists in learning from the data which one knows noisy, then in denoising the learned function. The second consists in denoising on the fly the training examples, thus to identify in the limit good examples, and then to learn from noncorrupted data. We give in both cases sufficient conditions so that learning is possible and we show through various examples (coming in particular from the field of the grammatical inference) that our techniques are complementary.

Palabras clave: Identification in the limit; languages; noise; pretopology.

- Regular Papers | Pp. 19-31

Ten Open Problems in Grammatical Inference

Colin de la Higuera

We propose 10 different open problems in the field of grammatical inference. In all cases, problems are theoretically oriented but correspond to practical questions. They cover the areas of polynomial learning models, learning from ordered alphabets, learning deterministic Pomdp s, learning negotiation processes, learning from context-free background knowledge.

- Regular Papers | Pp. 32-44

Polynomial-Time Identification of an Extension of Very Simple Grammars from Positive Data

Ryo Yoshinaka

The class of very simple grammars is known to be polynomial-time identifiable in the limit from positive data. This paper introduces an extension of very simple grammars called right-unique simple grammars , and presents an algorithm that identifies right-unique simple grammars in the limit from positive data. The learning algorithm possesses the following three properties. It computes a conjecture in polynomial time in the size of the input data if we regard the cardinality of the alphabet as a constant. It always outputs a grammar which is consistent with the input data. It never changes the conjecture unless the newly provided example contradicts the previous conjecture. The algorithm has a sub-algorithm that solves the inclusion problem for a superclass of right-unique simple grammars, which is also presented in this paper.

Palabras clave: Polynomial Time; Regular Language; Positive Data; Inclusion Problem; Simple Language.

- Regular Papers | Pp. 45-58

PAC-Learning Unambiguous NTS Languages

Alexander Clark

Non-terminally separated (NTS) languages are a subclass of deterministic context free languages where there is a stable relationship between the substrings of the language and the non-terminals of the grammar. We show that when the distribution of samples is generated by a PCFG, based on the same grammar as the target language, the class of unambiguous NTS languages is PAC-learnable from positive data alone, with polynomial bounds on data and computation.

Palabras clave: Target Language; Regular Language; Context Free Grammar; Context Free Language; Categorial Grammar.

- Regular Papers | Pp. 59-71

Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets

Katsuhiko Nakamura

This paper describes novel methods of learning general context free grammars from sample strings, which are implemented in Synapse system. Main features of the system are incremental learning, rule generation based on bottom-up parsing of positive samples, and search for rule sets. From the results of parsing, a rule generation process, called “bridging,” synthesizes the production rules that make up any lacking parts of an incomplete derivation tree for each positive string. To solve the fundamental problem of complexity for learning CFG, we employ methods of searching for non-minimum, semi-optimum sets of rules as well as incremental learning based on related grammars. One of the methods is search strategy called “serial search,” which finds additional rules for each positive sample and not to find the minimum rule set for all positive samples as in global search. The other methods are not to minimize nonterminal symbols in rule generation and to restrict the form of generated rules. The paper shows experimental results and compares various synthesis methods.

Palabras clave: grammatical inference; CFL; bottom-up parsing; iterative deepening; Synapse.

- Regular Papers | Pp. 72-83

Variational Bayesian Grammar Induction for Natural Language

Kenichi Kurihara; Taisuke Sato

This paper presents a new grammar induction algorithm for probabilistic context-free grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm.

Palabras clave: Noun Phrase; Wall Street Journal; Parse Tree; Training Corpus; Bayesian Learning.

- Regular Papers | Pp. 84-96

Stochastic Analysis of Lexical and Semantic Enhanced Structural Language Model

Shaojun Wang; Shaomin Wang; Li Cheng; Russell Greiner; Dale Schuurmans

In this paper, we present a directed Markov random field model that integrates trigram models, structural language models (SLM) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. The SLM is essentially a generalization of shift-reduce probabilistic push-down automata thus more complex and powerful than probabilistic context free grammars (PCFGs). The added context-sensitiveness due to trigrams and PLSAs and violation of tree structure in the topology of the underlying random field model make the inference and parameter estimation problems plausibly intractable, however the analysis of the behavior of the lexical and semantic enhanced structural language model leads to a generalized inside-outside algorithm and thus to rigorous exact EM type re-estimation of the composite language model parameters.

Palabras clave: Language modeling; structural language model; trigram; probabilistic latent semantic analysis.

- Regular Papers | Pp. 97-111

Using Pseudo-stochastic Rational Languages in Probabilistic Grammatical Inference

Amaury Habrard; François Denis; Yann Esposito

In probabilistic grammatical inference, a usual goal is to infer a good approximation of an unknown distribution P called a stochastic language . The estimate of P stands in some class of probabilistic models such as probabilistic automata (PA). In this paper, we focus on probabilistic models based on multiplicity automata (MA). The stochastic languages generated by MA are called rational stochastic languages ; they strictly include stochastic languages generated by PA and admit a very concise canonical representation. Despite the fact that this class is not recursively enumerable, it is efficiently identifiable in the limit by using the algorithm DEES, introduced by the authors in a previous paper. However, the identification is not proper and before the convergence of the algorithm, DEES can produce MA that do not define stochastic languages. Nevertheless, it is possible to use these MA to define stochastic languages. We show that they belong to a broader class of rational series, that we call pseudo-stochastic rational languages . The aim of this paper is twofold. First we provide a theoretical study of pseudo-stochastic rational languages, the languages output by DEES, showing for example that this class is decidable within polynomial time. Second, we have carried out experiments to compare DEES to classical inference algorithms (ALERGIA and MDI). They show that DEES outperforms them in most cases.

Palabras clave: pseudo-stochastic rational languages; multiplicity automata; probabilistic grammatical inference.

- Regular Papers | Pp. 112-124