Catálogo de publicaciones - libros
Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, Proceedings
Andrés Montoyo ; Rafael Muńoz ; Elisabeth Métais (eds.)
En conferencia: 10º International Conference on Application of Natural Language to Information Systems (NLDB) . Alicante, Spain . June 15, 2005 - June 17, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Database Management; Computer Communication Networks; Logics and Meanings of Programs; Mathematical Logic and Formal Languages; Information Storage and Retrieval; Artificial Intelligence (incl. Robotics)
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26031-8
ISBN electrónico
978-3-540-32110-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Tabla de contenidos
doi: 10.1007/11428817_31
Information Extraction for User’s Utterance Processing on Ubiquitous Robot Companion
Hanmin Jung; Choong-Nyong Seon; Jae Hong Kim; Joo Chan Sohn; Won-Kyung Sung; Dong-In Park
Information Extraction originally tries to statically acquire information from various formatted documents in accordance with user-defined schema. Further, it can expand its application areas into the processing of purposeful user’s utterance in natural language including various linguistic phenomena such as syntactic transformation and colloquial expression with frequently omitted words/phrases. We basically adopt verified lexico-semantic framework to obtain meaningful information from user’s utterance and divide extraction phase into the two: the first is to extract and revise arguments and the other is to extract a predicate, which is an utterance meaning type of the input sentence.
- Regular Papers | Pp. 337-340
doi: 10.1007/11428817_32
Investigating the Best Configuration of HMM Spanish PoS Tagger when Minimum Amount of Training Data Is Available
Sergio Ferrández; Jesús Peral
One of the important processing steps for many natural language systems (information extraction, question answering, etc.) is Part-of-speech (PoS) tagging. This issue has been tackled with a number of different approaches in order to resolve this step. In this paper we study the functioning of a Hidden Markov Models (HMM) Spanish PoS tagger using a minimum amount of training corpora. Our PoS tagger is based on HMM where the states are tag pairs that emit words. It is based on transitional and lexical probabilities. This technique has been suggested by Rabiner [11] –and our implementation is influenced by Brants [2]–. We have investigated the best configuration of HMM using a small amount of training data which has about 50,000 words and the maximum precision obtained for an unknown Spanish text was 95.36%.
- Regular Papers | Pp. 341-344
doi: 10.1007/11428817_33
An Approach to Automatic Construction of Lexical Relations Between Chinese Nouns from Machine Readable Dictionary
Yi Hu; Ruzhan Lu; Xuening Li
In this paper, a machine readable dictionary is utilized to acquire Chinese noun pairs satisfying five lexical relations. For low accuracy of current Chinese parser, our method is different from the traditional ones that use parsing firstly. The new method is designed to be a three-step procedure. Firstly, it annotates the paraphrase of some nominal entries that are used as training data. Secondly, patterns that denote lexical relations between nouns are defined and the applicability of the patterns is learnt from training Maximum Entropy model. At last, these patterns are applied to the remaining portion of the dictionary. A relatively satisfying result is achieved.
- Regular Papers | Pp. 345-348
doi: 10.1007/11428817_34
Automatic Acquisition of Adjacent Information and Its Effectiveness in Extraction of Bilingual Word Pairs from Parallel Corpora
Hiroshi Echizen-ya; Kenji Araki; Yoshio Momouchi
We propose a learning method for solving the sparse data problem in automatic extraction of bilingual word pairs from parallel corpora. In general, methods based on similarity measures are insufficient because of the sparse data problem. The essence of our method is the use of this inference: in local parts of bilingual sentence pairs (, phrases, not sentences), the equivalents of words that adjoin the source language words of bilingual word pairs also adjoin the target language words of bilingual word pairs. Our learning method automatically acquires such adjacent information. The acquired adjacent information is used to extract bilingual word pairs. As a result, our system can limit the search scope for the decision of equivalents in bilingual sentence pairs by extracting only word pairs that adjoin the acquired adjacent information. We applied our method to two systems based on Yates’ and AIC. Results of evaluation experiments indicate that the extraction rates respectively improved 6.1 and 6.0 percentage points using our method.
- Regular Papers | Pp. 349-352
doi: 10.1007/11428817_35
Text Mining from Categorized Stem Cell Documents to Infer Developmental Stage-Specific Expression and Regulation Patterns of Stem Cells
Hyun Seok Park; Min Kyung Kim; Eun Jeong Choi; Young Joo Seol
Exponentially increasing stem cell data provide means to elucidate the system level understanding of differentiation. Given the existing information on biological networks combined with huge amount of literature data, inferring stem cell information through scientific reasoning of data from on-line documents would get great attention. In this paper, we describe the STEMWAY system for combining known interaction informatics with text mining techniques. Especially, recent advances in natural language processing technique raise new challenges and opportunities for extracting valuable information from literature classified by the developmental stages of stem cells.
- Regular Papers | Pp. 353-356
doi: 10.1007/11428817_36
Simple But Useful Algorithms for Identifying Noun Phrase Complements of Embedded Clauses in a Partial Parse
Sebastian van Delden
Two algorithms for identifying noun phrase complements of embedded clauses in a partial parse are presented. The candidate noun phrases play subject or object roles in (reduced) relative and infinitival clauses. The algorithms are tested on several sources and results are presented.
- Regular Papers | Pp. 357-360
doi: 10.1007/11428817_37
An Add-On to Rule-Based Sifters for Multi-recipient Spam Emails
Vipul Sharma; Puneet Sarda; Swasti Sharma
The Spam filtering technique described here targets multiple recipient Spam messages with similar email addresses. We exploit these similar patterns to create a rule-based classification system (accuracy 92%). Our technique uses the ‘TO’ and ‘CC’ fields to classify an email as Spam or Legitimate. We introduce certain new rules which should enhance the performance of the current filtering techniques [1][4][5]. We also introduce a novel metric to calculate the degree of similarity between a set of strings.
- Regular Papers | Pp. 361-364
doi: 10.1007/11428817_38
Semantic Annotation of a Natural Language Corpus for Knowledge Extraction
Borja Navarro; Patricio Martínez-Barco; Manuel Palomar
Knowledge management (ontologies development, disambiguation of words, semantic web, etc.) must extract knowledge from somewhere. The main source of knowledge are natural language texts, in which humans express how they view and conceptualize the world. However, the automatic extraction of knowledge from texts is not a trivial task. In this paper we present a semantic annotated corpus as a source for knowledge extraction. Semantic is the bridge between linguistic input and knowledge (concepts, real world). A corpus with semantic information annotated is a useful resource to extract knowledge from a real context: it is a semi-structured database that offers deep information about human knowledge, concepts and relations between them.
- Regular Papers | Pp. 365-368
doi: 10.1007/11428817_39
mySENSEVAL: Explaining WSD System Performance Using Target Word Features
Harri M. T. Saarikoski
Word sense disambiguation (WSD) is an unsolved problem in NLP. The field has produced a variety of methods but none of them potent enough to reach high, human-tagger accuracy in demanding NLP applications. Our contribution to WSD is mySENSEVAL, an error analyzer using SENSEVAL evaluation scores (in mySQL database) to find significant correlations between WSD system types and lexico-conceptual features (from WordNet and SUMO).
- Regular Papers | Pp. 369-371
doi: 10.1007/11428817_40
Information Extraction from Email Announcements
Viktor Pekar
Public email announcements present a number of unique challenges for an Information Extraction (IE) system, such as the presence of both free and semi-structured text, inconsistent document layout and widely varying formats of template fillers. In this paper we describe a study of parametrisation of an IE method to determine settings that best suit the specifics of the task at hand.
- Regular Papers | Pp. 372-375