Catálogo de publicaciones - libros
MICAI 2006: Advances in Artificial Intelligence: 5th Mexican International Conference on Artificial Intelligence, Apizaco, Mexico, November 13-17, 2006, Proceedings
Alexander Gelbukh ; Carlos Alberto Reyes-Garcia (eds.)
En conferencia: 5º Mexican International Conference on Artificial Intelligence (MICAI) . Apizaco, Mexico . November 13, 2006 - November 17, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Artificial Intelligence (incl. Robotics); Computation by Abstract Devices; Mathematical Logic and Formal Languages; Image Processing and Computer Vision
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-49026-5
ISBN electrónico
978-3-540-49058-6
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11925231_81
Speeding Up Target-Language Driven Part-of-Speech Tagger Training for Machine Translation
Felipe Sánchez-Martínez; Juan Antonio Pérez-Ortiz; Mikel L. Forcada
When training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems in an unsupervised manner the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of the translation of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with segment length, translation being the most time-consuming task. In this paper, we present a method that uses knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment, so that the number of translations to be performed during training is reduced. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved.
- Natural Language Processing | Pp. 844-854
doi: 10.1007/11925231_82
Defining Classifier Regions for WSD Ensembles Using Word Space Features
Harri M. T. Saarikoski; Steve Legrand; Alexander Gelbukh
Based on recent evaluation of word sense disambiguation (WSD) systems [10], disambiguation methods have reached a standstill. In [10] we showed that it is possible to predict the best system for target word using word features and that using this ’optimal ensembling method’ more accurate WSD ensembles can be built (3-5% over Senseval state of the art systems with the same amount of possible potential remaining). In the interest of developing if more accurate ensembles, w e here define the strong regions for three popular and effective classifiers used for WSD task (Naive Bayes – NB, Support Vector Machine – SVM, Decision Rules – D) using word features (word grain, amount of positive and negative training examples, dominant sense ratio). We also discuss the effect of remaining factors (feature-based).
- Natural Language Processing | Pp. 855-867
doi: 10.1007/11925231_83
Impact of Feature Selection for Corpus-Based WSD in Turkish
Zeynep Orhan; Zeynep Altan
Word sense disambiguation (WSD) is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that word. WSD is basically a mapping function from a context to a set of applicable senses depending on various parameters. Resource selection, determination of senses for ambiguous words, decision of effective features, algorithms, and evaluation criteria are the major issues in a WSD system. This paper deals with the feature selection strategies for word sense disambiguation task in Turkish language. There are many different features that can contribute to the meaning of a word. These features can vary according to the metaphorical usages, POS of the word, pragmatics, etc. The observations indicated that detecting the critical features can contribute much than the learning methodologies.
- Natural Language Processing | Pp. 868-878
doi: 10.1007/11925231_84
Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus
Rubén Izquierdo-Beviá; Lorenza Moreno-Monteagudo; Borja Navarro; Armando Suárez
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.
- Natural Language Processing | Pp. 879-888
doi: 10.1007/11925231_85
An Approach for Textual Entailment Recognition Based on Stacking and Voting
Zornitsa Kozareva; Andrés Montoyo
This paper presents a machine-learning approach for the recognition of textual entailment. For our approach we model lexical and semantic features. We study the effect of stacking and voting joint classifier combination techniques which boost the final performance of the system. In an exhaustive experimental evaluation, the performance of the developed approach is measured. The obtained results demonstrate that an ensemble of classifiers achieves higher accuracy than an individual classifier and comparable results to already existing textual entailment systems.
- Natural Language Processing | Pp. 889-899
doi: 10.1007/11925231_86
Textual Entailment Beyond Semantic Similarity Information
Sonia Vázquez; Zornitsa Kozareva; Andrés Montoyo
The variability of semantic expression is a special characteristic of natural language. This variability is challenging for many natural language processing applications that try to infer the same meaning from different text variants. In order to treat this problem a generic task has been proposed: Textual Entailment Recognition. In this paper, we present a new Textual Entailment approach based on Latent Semantic Indexing (LSI) and the cosine measure. This proposed approach extracts semantic knowledge from different corpora and resources. Our main purpose is to study how the acquired information can be combined with an already developed and tested Machine Learning Entailment system (MLEnt). The experiments show that the combination of MLEnt, LSI and cosine measure improves the results of the initial approach.
- Natural Language Processing | Pp. 900-910
doi: 10.1007/11925231_87
On the Identification of Temporal Clauses
Georgiana Puşcaşu; Patricio Martínez Barco; Estela Saquete Boró
This paper describes a machine learning approach to the identification of temporal clauses by disambiguating the subordinating conjunctions used to introduce them. Temporal clauses are regularly marked by subordinators, many of which are ambiguous, being able to introduce clauses of different semantic roles. The paper also describes our work on generating an annotated corpus of sentences embedding clauses introduced by ambiguous subordinators that might have temporal value. Each such clause is annotated as temporal or non-temporal by testing whether it answers the questions , or with respect to the action of its superordinate clause. Using this corpus, we then train and evaluate personalised classifiers for each ambiguous subordinator, in order to set apart temporal usages. Several classifiers are evaluated, and the best performing ones achieve an average accuracy of 89.23% across the set of ambiguous connectives.
- Natural Language Processing | Pp. 911-921
doi: 10.1007/11925231_88
Issues in Translating from Natural Language to SQL in a Domain-Independent Natural Language Interface to Databases
B. Juan J. González; Rodolfo A. Pazos Rangel; I. Cristina Cruz C.; H. Héctor J. Fraire; L. de Santos Aguilar; O. Joaquín Pérez
This paper deals with a domain-independent natural language interface to databases (NLIDB) for the Spanish language. This NLIDB had been previously tested for the Northwind and Pubs domains and had attained good performance (86% success rate). However, domain independence complicates the task of achieving high translation success, and to this end the ATIS (Air Travel Information System) database, which has been used by several natural language interfaces, was selected to conduct a new evaluation. The purpose of this evaluation was to asses the efficiency of the interface after the reconfiguration for another domain and to detect the problems that affect translation success. For the tests a corpus of queries was gathered and the results obtained showed that the interface can easily be reconfigured and that attained a 50% success rate. When the found problems concerning query translation were analyzed, wording deficiencies of some user queries and several errors in the synonym dictionary were discovered. After correcting these problems a second test was conducted, in which the interface attained a 61.4% success rate. These experiments showed that user training is necessary as well as a dialogue system that permits to clarify a query when it is deficiently formulated.
- Natural Language Processing | Pp. 922-931
doi: 10.1007/11925231_89
Interlinguas: A Classical Approach for the Semantic Web. A Practical Case
Jesús Cardeñosa; Carolina Gallardo; Luis Iraola
An efficient use of the web will imply the ability to find not only documents but also specific pieces of information according to user’s query. Right now, this last possibility is not tackled by current information extraction or question answering systems, since it requires both a deeper semantic understanding of queries and contents along with deductive capabilities. In this paper, the authors propose the use of Interlinguas as a plausible approach to search and extract specific pieces of information from a document, given the semantic nature of Interlinguas and their support for deduction. More concretely, the authors describe the UNL Interlinguas from the representational point of view and illustrate its deductive capabilities by means of an example.
- Information Retrieval and Text Classification | Pp. 932-942
doi: 10.1007/11925231_90
A Fuzzy Embedded GA for Information Retrieving from Related Data Set
Yang Yi; JinFeng Mei; ZhiJiao Xiao
The arm of this work is to provide a formal model and an effective way for information retrieving from a big related data set. Based upon fuzzy logic operation, a fuzzy mathematical model of 0-1 mixture programming is addressed. Meanwhile, a density function indicating the overall possessive status of the effective mined out data is introduced. Then, a soft computing (SC) approach which is a genetic algorithm (GA) embedded fuzzy deduction is presented. During the SC process, fuzzy logic decision is taken into the uses of determining the genes’ length, calculating fitness function and choosing feasible solution. Stimulated experiments and comparison tests show that the methods can match the user’s most desired information from magnanimity data exactly and efficiently. The approaches can be extended in practical application in solving general web mining problem.
- Information Retrieval and Text Classification | Pp. 943-951