Catálogo de publicaciones - libros

Compartir en
redes sociales


Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, Proceedings

Andrés Montoyo ; Rafael Muńoz ; Elisabeth Métais (eds.)

En conferencia: 10º International Conference on Application of Natural Language to Information Systems (NLDB) . Alicante, Spain . June 15, 2005 - June 17, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Computer Communication Networks; Logics and Meanings of Programs; Mathematical Logic and Formal Languages; Information Storage and Retrieval; Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26031-8

ISBN electrónico

978-3-540-32110-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Extracting Semantic Taxonomies of Nouns from a Korean MRD Using a Small Bootstrapping Thesaurus and a Machine Learning Approach

SeonHwa Choi; HyukRo Park

Most approaches for extracting hypernyms of a noun from the definition in an MRD rely on the lexico-syntactic patterns compiled by human experts. Not only these methods require high cost for compiling lexico-syntatic patterns but also it is very difficult for human experts to compile a set of lexical-syntactic patterns with a broad-coverage, because in natural languages there are various different expressions which represent the same concept. To alleviate these problems, this paper proposes a new method for extracting hypernyms of a noun from an MRD. In proposed approach, we use only syntactic(part-of-speech) patterns instead of lexico-syntactic patterns in identifying hypernyms to reduce the number of patterns while keeping their coverage broad. Our experiment shows that the classification accuracy of the proposed method is 92.37% which is significantly much better than those of previous approaches.

- Regular Papers | Pp. 1-9

On the Transformation of Sentences with Genitive Relations to SQL Queries

Zsolt T. Kardkovács

In our ongoing project called “In the Web of Words” (WoW) we aimed to create a complex search interface that incorporates a deep web search engine module based on a Hungarian question processor. One of the most crucial part of the system was the transformation of genitive relations to adequate SQL queries, since e.g. questions begin with “Who” and “What” mostly contain such a relation. The genitive relation is one of the most complex semantic structures, since it could express wide range of different connection types between entities, even in a single language. Thus, transformation of its syntactic form to a formal computer language is far from clear. In the last decade, several natural language database interfaces (NLIDBs) have been proposed, however, a detailed or a general description of this problem is still missing in the literature. In this paper, we describe how to translate genitive phrases into SQL queries in general, i.e. we omit Hungarian-dependent optimizations.

- Regular Papers | Pp. 10-20

Binary Lexical Relations for Text Representation in Information Retrieval

Marco Gonzalez; Vera Lúcia Strube de Lima; José Valdeni de Lima

Text representation is crucial for many natural language processing applications. This paper presents an approach to extraction of binary lexical relations () from Portuguese texts for representing phrasal cohesion mechanisms. We demonstrate how this automatic strategy may be incorporated to information retrieval systems. Our approach is compared to those using bigrams and noun phrases for text retrieval. strategy is shown to improve on the best performance in an experimental information retrieval system.

- Regular Papers | Pp. 21-31

Application of Text Categorization to Astronomy Field

Huaizhong Kou; Amedeo Napoli; Yannick Toussaint

We introduce the application of text categorization techniques to the astronomy field to work out semantic ambiguities between table column’s names. In the astronomy field, astronomers often assign different names to table columns at their will even if they are about the same attributes of sky objects. As a result, it produces a big problem for data analysis over different tables. To solve this problem, the standard vocabulary called “unified concept descriptors (UCD)” has been defined. The reported data about sky objects can be easily analyzed through assigning columns to the predefined UCDs. In this paper, the widely used Rocchio categorization algorithm is implemented to assign UCD. An algorithm is realized to extract domain-specific semantics for text indexing while the traditional cosine-based category score model is extended by combining domain knowledge. The experiments show that Rocchio algorithm together with the proposed category score model performs well.

- Regular Papers | Pp. 32-43

Towards an XML Representation of Proper Names and Their Relationships

Béatrice Bouchou; Mickael Tran; Denis Maurel

The presented work is a part of the Prolex project, whose aim is the design and implementation of a multi-lingual dictionary of proper names and their relationships. It focuses on the design of a standard XML representation for this kind of information. We first present the main lines of the conceptual model for proper names (a classical Entities / Relationships model), then we report on our experiment in designing an XML schema from this conceptual model. We describe the current resulting schema and discuss its main features.

- Regular Papers | Pp. 44-55

Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

Tyne Liang; Ping-Ke Shih

Named Entity Recognition (NER) from biomedical literature is crucial in biomedical knowledge base automation. In this paper, both empirical rule and statistical approaches to protein entity recognition are presented and investigated on a general corpus GENIA 3.02p and a new domain-specific corpus SRC. Experimental results show the rules derived from SRC are useful though they are simpler and more general than the one used by other rule-based approaches. Meanwhile, a concise HMM-based model with rich set of features is presented and proved to be robust and competitive while comparing it to other successful hybrid models. Besides, the resolution of coordination variants common in entities recognition is addressed. By applying heuristic rules and clustering strategy, the presented resolver is proved to be feasible.

- Regular Papers | Pp. 56-66

Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia

Maria Ruiz-Casado; Enrique Alfonseca; Pablo Castells

This paper describes an automatic approach to identify lexical patterns which represent semantic relationships between concepts, from an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 1200 new relationships that did not appear in WordNet originally. The precision of these relationships ranges between 0.61 and 0.69, depending on the relation.

- Regular Papers | Pp. 67-79

Combining Data-Driven Systems for Improving Named Entity Recognition

Zornitsa Kozareva; Oscar Ferrández; Andres Montoyo; Rafael Muñoz; Armando Suárez

The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER which involves identification of proper names in texts, and classification into a set of predefined categories of interest as Person names, Organizations (companies, government organizations, committees, etc.) and Locations (cities, countries, rivers, etc). We examined the differences in language models learned by different data-driven systems performing the same NLP tasks and how they can be exploited to yield a higher accuracy than the best individual system. Three NE classifiers (Hidden Markov Models, Maximum Entropy and Memory-based learner) are trained on the same corpus data and after comparison their outputs are combined using voting strategy. Results are encouraging since 98.5% accuracy for recognition and 84.94% accuracy for classification of NE for Spanish language were achieved.

- Regular Papers | Pp. 80-90

Natural Language Processing: Mature Enough for Requirements Documents Analysis?

Leonid Kof

Requirements engineering is the Achilles’ heel of the whole software development process, because requirements documents are often inconsistent and incomplete. Misunderstandings and errors of the requirements engineering phase propagate to later development phases and can potentially lead to a project failure.

A promising way to overcome misunderstandings is to extract and validate terms used in requirements documents and relations between these terms. This position paper gives an overview of the existing terminology extraction methods and shows how they can be integrated to reach a comprehensive text analysis approach. It shows how the integrated method would both detect inconsistencies in the requirements document and extract an ontology after elimination of inconsistencies. This integrated method would be more reliable than every of its single constituents.

- Regular Papers | Pp. 91-102

Improving Text Categorization Using Domain Knowledge

Jingbo Zhu; Wenliang Chen

In this paper, we mainly study and propose an approach to improve document classification using domain knowledge. First we introduce a domain knowledge dictionary NEUKD, and propose two models which use domain knowledge as textual features for text categorization. The first one is BOTW model which uses domain associated terms and conventional words as textual features. The other one is BOF model which uses domain features as textual features. But due to limitation of size of domain knowledge dictionary, we study and use a machine learning technique to solve the problem, and propose a BOL model which could be considered as the extended version of BOF model. In the comparison experiments, we consider naïve Bayes system based on BOW model as baseline system. Comparison experimental results of naïve Bayes systems based on those four models (BOW, BOTW, BOF and BOL) show that domain knowledge is very useful for improving text categorization. BOTW model performs better than BOW model, and BOL and BOF models perform better than BOW model in small number of features cases. Through learning new features using machine learning technique, BOL model performs better than BOF model.

- Regular Papers | Pp. 103-113