Catálogo de publicaciones - libros

Compartir en
redes sociales


MICAI 2007: Advances in Artificial Intelligence: 6th Mexican International Conference on Artificial Intelligence, Aguascalientes, Mexico, November 4-10, 2007. Proceedings

Alexander Gelbukh ; Ángel Fernando Kuri Morales (eds.)

En conferencia: 6º Mexican International Conference on Artificial Intelligence (MICAI) . Aguascalientes, Mexico . November 4, 2007 - November 10, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Computation by Abstract Devices; Mathematical Logic and Formal Languages; Image Processing and Computer Vision

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-76630-8

ISBN electrónico

978-3-540-76631-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Variants of Tree Kernels for XML Documents

Peter Geibel; Helmar Gust; Kai-Uwe Kühnberger

In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe four new kernels suitable for this kind of trees: a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel, which is based on the set tree kernel and takes into a account a “fuzzy” comparison of child positions. We present first results on an artificial data set, a corpus of newspaper articles, for which we want to determine the type (genre) of an article based on its structure alone, and the well-known SUSANNE corpus.

- Natural Language Processing | Pp. 850-860

Textual Energy of Associative Memories: Performant Applications of Enertex Algorithm in Text Summarization and Topic Segmentation

Silvia Fernández; Eric SanJuan; Juan Manuel Torres-Moreno

In this paper we present a Neural Network approach, inspired by statistical physics of magnetic systems, to study fundamental problems of Natural Language Processing (NLP). The algorithm models documents as neural network whose Textual Energy is studied. We obtained good results on the application of this method to automatic summarization and Topic Segmentation.

- Natural Language Processing | Pp. 861-871

A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics

Iria da Cunha; Silvia Fernández; Patricia Velázquez Morales; Jorge Vivaldi; Eric SanJuan; Juan Manuel Torres-Moreno

In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.

- Natural Language Processing | Pp. 872-882

Graph Decomposition Approaches for Terminology Graphs

Mohamed Didi Biha; Bangaly Kaba; Marie-Jean Meurs; Eric SanJuan

We propose a graph-based decomposition methodology of a network of document features represented by a terminology graph. The graph is automatically extracted from raw data based on Natural Language Processing techniques implemented in the TermWatch system. These graphs are Small Worlds. Based on clique minimal separators and the associated graph of atoms: a subgraph without clique separator, we show that the terminology graph can be divided into a central kernel which is a single atom and a periphery made of small atoms. Moreover, the central kernel can be separated based on small optimal minimal separators.

- Natural Language Processing | Pp. 883-893

An Improved Fast Algorithm of Frequent String Extracting with no Thesaurus

Yumeng Zhang; Chuanhan Liu

Unlisted word identification is the hotspot in the research of Chinese information processing. String frequency statistics is a simple and effective method of extraction unlisted word. Existing algorithm cannot meet the requirement of high speed in vast text processing system. According to strategies of string length increasing and level-wise scanning, this paper presents a fast algorithm of extracting frequent strings and improves string frequency statistical method. The approach does not need thesaurus, and does not need to word segmentation, but according to the average mutual information to identify whether each frequent string is a word. Compared with previous approaches, experiments show that the algorithm gains advantages such as high speed, high accuracy of 91% and above.

- Natural Language Processing | Pp. 894-903

Using Lexical Patterns for Extracting Hyponyms from the Web

Rosa M. Ortega-Mendoza; Luis Villaseñor-Pineda; Manuel Montes-y-Gómez

This paper describes a method for extracting hyponyms from free text. In particular it explores two main matters. On the one hand, the possibility of reaching favorable results using only lexical extraction patterns. On the other hand, the usefulness of measuring the instance’s confidences based on the pattern’s confidences, and vice versa. Experimental results are encouraging because they show that the proposed method can be a practical high-precision approach for extracting hyponyms for a given set of concepts.

- Natural Language Processing | Pp. 904-911

On the Usage of Morphological Tags for Grammar Induction

Omar Juárez Gambino; Hiram Calvo

We present a study on the effect of adding morphological tags to the training corpus of a grammar inductor. For this purpose, we carried out several experiments using the grammar induction system called Alignment-Based Learning (ABL) and the CAST-3LB syntactically tagged Spanish corpus for training and testing. ABL produces a set of possible constituents with a word alignment process. We developed an algorithm which converts the hypotheses generated by ABL into ordered production rules. Then our algorithm groups them into possible phrase groups (constituents). These phrase groups correspond to the syntactic tagging of the unannotated text. We compared the phrase groups obtained by our algorithm with the manually tagged groups of CAST-3LB. The experiments in the grammar induction process consisted on trying three different variants for the training corpus: (1) using words; (2) using only the morphological tags; and (3) adding morphological tags to words. Our experiments show that the inclusion of morphological tags in the grammar induction process improves significantly the performance of ABL.

- Natural Language Processing | Pp. 912-921

Web-Based Model for Disambiguation of Prepositional Phrase Usage

Sofía N. Galicia-Haro; Alexander Gelbukh

We explore some Web-based methods to differentiate strings of words corresponding to Spanish prepositional phrases that can perform either as a regular prepositional phrase or as idiomatic prepositional phrase. The type of these Spanish prepositional phrases is preposition-nominal phrase-preposition (P-NP-P), for example: ‘by means of’, ‘in order to’, ‘with respect to’. We propose an unsupervised method that verifies linguistics properties of idiomatic prepositional phrases. Results are presented with the method applied to newspaper sentences.

- Natural Language Processing | Pp. 922-932

Identification of Chinese Verb Nominalization Using Support Vector Machine

Jinglei Zhao; Changxiong Chen; Hui Liu; Ruzhan Lu

Nominalization is a highly productive phenomenon across languages. The process of nominalization transforms a verb predicate to a referential expression. Identification of nominalizations presents a big challenge to Chinese language processing because there is no morphological difference between a verb nominalization and its corresponding verb predicate. In this paper, we apply a support vector machine to identify nominalizaitons from text. We explore extensively the various nominalization specific classification features for the identification task. Among which, many are first introduced in the literature. The experimental result shows that our method is very effective.

- Natural Language Processing | Pp. 933-943

Enrichment of Automatically Generated Texts Using Metaphor

Raquel Hervás; Rui P. Costa; Hugo Costa; Pablo Gervás; Francisco C. Pereira

Computer-generated texts are yet far from human-generated ones. Along with the limited use of vocabulary and syntactic structures they present, their lack of creativeness and abstraction is what points them as artificial. The use of metaphors and analogies is one of the creative tools used by humans that is difficult to reproduce in a computer. A human writer would not have difficulties to find conceptual relations between the domain he is writing about and his knowledge about other domains in the world, using this information in the text avoiding possible confusion. However, this task is not trivial for a computer. This paper presents an approach to the use of metaphors for referring to concepts in an automatically generated text. From a given mapping between the concepts of two domains we intend to generate metaphors for some concepts relating them with the target metaphoric domain and insert these metaphorical references in a text. We also study the ambiguity induced by metaphor and how to reduce it.

- Natural Language Processing | Pp. 944-954