Catálogo de publicaciones - libros

Compartir en
redes sociales

Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, Proceedings

Andrés Montoyo ; Rafael Muńoz ; Elisabeth Métais (eds.)

En conferencia: 10º International Conference on Application of Natural Language to Information Systems (NLDB) . Alicante, Spain . June 15, 2005 - June 17, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Computer Communication Networks; Logics and Meanings of Programs; Mathematical Logic and Formal Languages; Information Storage and Retrieval; Artificial Intelligence (incl. Robotics)

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26031-8

ISBN electrónico

978-3-540-32110-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11428817_31

Information Extraction for User’s Utterance Processing on Ubiquitous Robot Companion

Hanmin Jung; Choong-Nyong Seon; Jae Hong Kim; Joo Chan Sohn; Won-Kyung Sung; Dong-In Park

Information Extraction originally tries to statically acquire information from various formatted documents in accordance with user-defined schema. Further, it can expand its application areas into the processing of purposeful user’s utterance in natural language including various linguistic phenomena such as syntactic transformation and colloquial expression with frequently omitted words/phrases. We basically adopt verified lexico-semantic framework to obtain meaningful information from user’s utterance and divide extraction phase into the two: the first is to extract and revise arguments and the other is to extract a predicate, which is an utterance meaning type of the input sentence.

- Regular Papers | Pp. 337-340

doi: 10.1007/11428817_32

Investigating the Best Configuration of HMM Spanish PoS Tagger when Minimum Amount of Training Data Is Available

Sergio Ferrández; Jesús Peral

One of the important processing steps for many natural language systems (information extraction, question answering, etc.) is Part-of-speech (PoS) tagging. This issue has been tackled with a number of different approaches in order to resolve this step. In this paper we study the functioning of a Hidden Markov Models (HMM) Spanish PoS tagger using a minimum amount of training corpora. Our PoS tagger is based on HMM where the states are tag pairs that emit words. It is based on transitional and lexical probabilities. This technique has been suggested by Rabiner [11] –and our implementation is influenced by Brants [2]–. We have investigated the best configuration of HMM using a small amount of training data which has about 50,000 words and the maximum precision obtained for an unknown Spanish text was 95.36%.

- Regular Papers | Pp. 341-344

doi: 10.1007/11428817_33

An Approach to Automatic Construction of Lexical Relations Between Chinese Nouns from Machine Readable Dictionary

Yi Hu; Ruzhan Lu; Xuening Li

In this paper, a machine readable dictionary is utilized to acquire Chinese noun pairs satisfying five lexical relations. For low accuracy of current Chinese parser, our method is different from the traditional ones that use parsing firstly. The new method is designed to be a three-step procedure. Firstly, it annotates the paraphrase of some nominal entries that are used as training data. Secondly, patterns that denote lexical relations between nouns are defined and the applicability of the patterns is learnt from training Maximum Entropy model. At last, these patterns are applied to the remaining portion of the dictionary. A relatively satisfying result is achieved.

- Regular Papers | Pp. 345-348

doi: 10.1007/11428817_34

Automatic Acquisition of Adjacent Information and Its Effectiveness in Extraction of Bilingual Word Pairs from Parallel Corpora

Hiroshi Echizen-ya; Kenji Araki; Yoshio Momouchi

We propose a learning method for solving the sparse data problem in automatic extraction of bilingual word pairs from parallel corpora. In general, methods based on similarity measures are insufficient because of the sparse data problem. The essence of our method is the use of this inference: in local parts of bilingual sentence pairs (, phrases, not sentences), the equivalents of words that adjoin the source language words of bilingual word pairs also adjoin the target language words of bilingual word pairs. Our learning method automatically acquires such adjacent information. The acquired adjacent information is used to extract bilingual word pairs. As a result, our system can limit the search scope for the decision of equivalents in bilingual sentence pairs by extracting only word pairs that adjoin the acquired adjacent information. We applied our method to two systems based on Yates’ and AIC. Results of evaluation experiments indicate that the extraction rates respectively improved 6.1 and 6.0 percentage points using our method.

- Regular Papers | Pp. 349-352

doi: 10.1007/11428817_35

Text Mining from Categorized Stem Cell Documents to Infer Developmental Stage-Specific Expression and Regulation Patterns of Stem Cells

Hyun Seok Park; Min Kyung Kim; Eun Jeong Choi; Young Joo Seol

Exponentially increasing stem cell data provide means to elucidate the system level understanding of differentiation. Given the existing information on biological networks combined with huge amount of literature data, inferring stem cell information through scientific reasoning of data from on-line documents would get great attention. In this paper, we describe the STEMWAY system for combining known interaction informatics with text mining techniques. Especially, recent advances in natural language processing technique raise new challenges and opportunities for extracting valuable information from literature classified by the developmental stages of stem cells.

- Regular Papers | Pp. 353-356

doi: 10.1007/11428817_36

Simple But Useful Algorithms for Identifying Noun Phrase Complements of Embedded Clauses in a Partial Parse

Sebastian van Delden

Two algorithms for identifying noun phrase complements of embedded clauses in a partial parse are presented. The candidate noun phrases play subject or object roles in (reduced) relative and infinitival clauses. The algorithms are tested on several sources and results are presented.

- Regular Papers | Pp. 357-360

doi: 10.1007/11428817_37

An Add-On to Rule-Based Sifters for Multi-recipient Spam Emails

Vipul Sharma; Puneet Sarda; Swasti Sharma

The Spam filtering technique described here targets multiple recipient Spam messages with similar email addresses. We exploit these similar patterns to create a rule-based classification system (accuracy 92%). Our technique uses the ‘TO’ and ‘CC’ fields to classify an email as Spam or Legitimate. We introduce certain new rules which should enhance the performance of the current filtering techniques [1][4][5]. We also introduce a novel metric to calculate the degree of similarity between a set of strings.

- Regular Papers | Pp. 361-364

doi: 10.1007/11428817_38

Semantic Annotation of a Natural Language Corpus for Knowledge Extraction

Borja Navarro; Patricio Martínez-Barco; Manuel Palomar

Knowledge management (ontologies development, disambiguation of words, semantic web, etc.) must extract knowledge from somewhere. The main source of knowledge are natural language texts, in which humans express how they view and conceptualize the world. However, the automatic extraction of knowledge from texts is not a trivial task. In this paper we present a semantic annotated corpus as a source for knowledge extraction. Semantic is the bridge between linguistic input and knowledge (concepts, real world). A corpus with semantic information annotated is a useful resource to extract knowledge from a real context: it is a semi-structured database that offers deep information about human knowledge, concepts and relations between them.

- Regular Papers | Pp. 365-368

doi: 10.1007/11428817_39

mySENSEVAL: Explaining WSD System Performance Using Target Word Features

Harri M. T. Saarikoski

Word sense disambiguation (WSD) is an unsolved problem in NLP. The field has produced a variety of methods but none of them potent enough to reach high, human-tagger accuracy in demanding NLP applications. Our contribution to WSD is mySENSEVAL, an error analyzer using SENSEVAL evaluation scores (in mySQL database) to find significant correlations between WSD system types and lexico-conceptual features (from WordNet and SUMO).

- Regular Papers | Pp. 369-371

doi: 10.1007/11428817_40

Information Extraction from Email Announcements

Viktor Pekar

Public email announcements present a number of unique challenges for an Information Extraction (IE) system, such as the presence of both free and semi-structured text, inconsistent document layout and widely varying formats of template fillers. In this paper we describe a study of parametrisation of an IE method to determine settings that best suit the specifics of the task at hand.

- Regular Papers | Pp. 372-375