Catálogo de publicaciones - libros
Progress in Artificial Intelligence: 12th Portuguese Conference on Artificial Intelligence, EPIA 2005, Covilha, Portugal, December 5-8, 2005, Proceedings
Carlos Bento ; Amílcar Cardoso ; Gaël Dias (eds.)
En conferencia: 12º Portuguese Conference on Artificial Intelligence (EPIA) . Covilha, Portugal . December 5, 2005 - December 8, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Artificial Intelligence (incl. Robotics); Computation by Abstract Devices; Database Management; Information Storage and Retrieval; Programming Techniques
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-30737-2
ISBN electrónico
978-3-540-31646-6
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11595014_61
Constrained Atomic Term: Widening the Reach of Rule Templates in Transformation Based Learning
Cícero Nogueira dos Santos; Claudia Oliveira
Within the framework of Transformation Based Learning (TBL), the rule template is one of the most important elements in the learning process. This paper presents a new model for TBL templates, in which the basic unit, denominated here as an atomic term (AT), encodes a variable sized window and a test that precedes the capture of a feature’s value. A case study of Portuguese NP identification is described and the experimental results obtained are presented.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 622-633
doi: 10.1007/11595014_62
Improving Passage Retrieval in Question Answering Using NLP
Jörg Tiedemann
This paper describes an approach for the integration of linguistic information in passage retrieval in an open-source question answering system for Dutch. Annotation produced by the wide-coverage dependency parser Alpino is stored in multiple index layers to be matched with natural language question that have been analyzed by the same parser. We present a genetic algorithm to select features to be included in retrieval queries and for optimizing keyword weights. The system is trained on questions annotated with their answers from the competition on Dutch question answering within the Cross-Language Evaluation Forum (CLEF). The optimization yielded a significant improvement of about 19% in mean reciprocal rank scores on unseen evaluation data compared to the base-line using traditional information retrieval with plain text keywords.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 634-646
doi: 10.1007/11595014_63
Mining the Semantics of Text Via Counter-Training
Roman Yangarber
We report on a set of experiments in text mining, specifically, finding semantic patterns given only a few keywords. The experiments employ the Counter-training framework for discovery of semantic knowledge from raw text in a weakly supervised fashion. The experiments indicate that the framework is suitable for efficient acquisition of semantic word classes and collocation patterns, which may be used for Information Extraction.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 647-657
doi: 10.1007/11595014_64
Minimum Redundancy Cut in Ontologies for Semantic Indexing
Florian Seydoux; Jean-Cédric Chappelier
This paper presents a new method that aims at improving semantic indexing while reducing the number of indexing terms. Indexing terms are determined using a minimum redundancy cut in a hierarchy of conceptual hypernyms provided by an ontology (e.g. , ). The results of some information retrieval experiments carried out on several standard document collections using the ontology are presented, illustrating the benefit of the method.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 658-668
doi: 10.1007/11595014_65
Unsupervised Learning of Multiword Units from Part-of-Speech Tagged Corpora: Does Quantity Mean Quality?
Gaël Dias; Špela Vintar
This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), we automatically identify relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. Finally, we propose an exhaustive evaluation of our architecture based on the multi-domain, bilingual Slovene-English IJS-ELAN corpus where surprising results are evidenced. To our knowledge, this challenge has never been attempted before.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 669-679
doi: 10.1007/11595014_66
Lappin and Leass’ Algorithm for Pronoun Resolution in Portuguese
Thiago Thomes Coelho; Ariadne Maria Brito Rizzoni Carvalho
This paper presents a variant of Lappin and Leass’ Algorithm for pronoun resolution in Portuguese texts; the algorithm resolves third person pronominal anaphora, as well as reflexive and reciprocal pronouns. It relies on salience measures, derived from the syntactic structure of the sentence, and on a simple discourse representation model. The algorithm, as well as its evaluation with legal and literary corpora, are presented.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 680-692
doi: 10.1007/11595014_67
STEMBR: A Stemming Algorithm for the Brazilian Portuguese Language
Reinaldo Viana Alvares; Ana Cristina Bicharra Garcia; Inhaúma Ferraz
Stemming algorithms have traditionally been utilized in information retrieval systems as they generate a more concise word representation. However, the efficiency of these algorithms varies according to the language they are used with. This paper presents STEMBR, a stemmer for Brazilian Portuguese whereby the suffix treatment is based on a statistical study of the frequency of the last letter for words found in Brazilian web pages. The proposed stemmer is compared with another algorithm specifically developed for Portuguese. The results show the efficiency of our stemmer.
- Chapter 9 – Text Mining and Applications (TEMA 2005) | Pp. 693-701