Catálogo de publicaciones - libros

Compartir en
redes sociales


Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, Proceedings

Andrés Montoyo ; Rafael Muńoz ; Elisabeth Métais (eds.)

En conferencia: 10º International Conference on Application of Natural Language to Information Systems (NLDB) . Alicante, Spain . June 15, 2005 - June 17, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Computer Communication Networks; Logics and Meanings of Programs; Mathematical Logic and Formal Languages; Information Storage and Retrieval; Artificial Intelligence (incl. Robotics)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26031-8

ISBN electrónico

978-3-540-32110-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Text2Onto

Philipp Cimiano; Johanna Völker

In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-level in the form of instantiated modeling primitives within a so called Probabilistic Ontology Model (POM), we remain independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core aspect of Text2Onto and the fact that the system calculates a confidence for each learned object allows to design sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change discovery, we avoid processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Besides increasing efficiency in this way, it also allows a user to trace the evolution of the ontology with respect to the changes in the underlying corpus.

- Regular Papers | Pp. 227-238

Interaction Transformation Patterns Based on Semantic Roles

Isabel Díaz; Lidia Moreno; Oscar Pastor; Alfredo Matteo

This paper presents a strategy to deduce interactions from the text of use cases. This strategy is used by Metamorphosis: an automatic software production framework, conceived to facilitate the modelling of interactions of a system. Metamorphosis follows a linguistic engineering approach that is centred on the construction of models through the successive transformation of these models, in the definition of semantic roles and the application of design patterns. To obtain the Interaction Model of a system, three transformation levels are defined: the system, the use case, and the sentence. This paper focuses on how a transformation of a sentence is performed. Each transformation pattern specifies how to obtain information from the semantic context of a sentence, to deduce its corresponding interaction fragment. Some of the results obtained from the validation of these patterns are also presented.

- Regular Papers | Pp. 239-250

Query Refinement Through Lexical Clustering of Scientific Textual Databases

Eric SanJuan

TermWatch system automatically extracts multi word terms from scientific texts based on morphological analysis and relates them through linguistic variations. The resulting terminological network is clustered based on a 3-level hierarchical graph algorithm and mapped onto a 2D space. Clusters are automatically labeled based on variation activity. After a precise review of the methodology, this paper evaluates in the context of querying a scientific textual database, the overlap of terms and cluster labels with the keywords selected by human indexers as well as the set of possible queries based on the clustering output. The results show that linguistic variation paradigm is a robust way of automatically extracting and structuring a user comprehensive terminological resource for query refinement.

- Regular Papers | Pp. 251-262

Automatic Filtering of Bilingual Corpora for Statistical Machine Translation

Shahram Khadivi; Hermann Ney

For many applications such as machine translation and bilingual information retrieval, the bilingual corpora play an important role in training the system. Because they are obtained through automatic or semi automatic methods, they usually include noise, sentence pairs which are worthless or even harmful for training the system. We study the effect of different levels of corpus noise on an end-to-end statistical machine translation system. We also propose an efficient method for corpus filtering. This method filters out the noisy part of a corpus based on the state-of-the-art word alignment models. We show the efficiency of this method on the basis of the sentence misalignment rate of the filtered corpus and its positive effect on the translation quality.

- Regular Papers | Pp. 263-274

An Approach to Clustering Abstracts

Mikhail Alexandrov; Alexander Gelbukh; Paolo Rosso

Free access to full-text scientific papers in major digital libraries and other web repositories is limited to only their abstracts consisting of no more than several dozens of words. Current keyword-based techniques allow for clustering such type of short texts only when the data set is multi-category, e.g., some documents are devoted to sport, others to medicine, others to politics, etc. However, they fail on narrow domain-oriented libraries, e.g., those containing all documents only on physics, or all on geology, or all on computational linguistics, etc. Nevertheless, just such data sets are the most frequent and most interesting ones. We propose simple procedure to cluster abstracts, which consists in grouping keywords and using more adequate document similarity measure. We use Stein’s MajorClust method for clustering both keywords and documents. We illustrate our approach on the texts from the Proceedings of a narrow-topic conference. Limitations of our approach are also discussed. Our preliminary experiments show that abstracts cannot be clustered with the same quality as full texts, though the achieved quality is adequate for many applications; accordingly, we suggest Makagonov’s proposal that digital libraries should provide document images of full texts of the papers (and not only abstracts) for open access via Internet, in order to help in search, classification, clustering, selection, and proper referencing of the papers.

- Regular Papers | Pp. 275-285

Named Entity Recognition for Web Content Filtering

José María Gómez Hidalgo; Francisco Carrero García; Enrique Puertas Sanz

Effective Web content filtering is a necessity in educational and workplace environments, but current approaches are far from perfect. We discuss a model for text-based intelligent Web content filtering, in which shallow linguistic analysis plays a key role. In order to demonstrate how this model can be realized, we have developed a lexical Named Entity Recognition system, and used it to improve the effectiveness of statistical Automated Text Categorization methods. We have performed several experiments that confirm this fact, and encourage the integration of other shallow linguistic processing techniques in intelligent Web content filtering.

- Regular Papers | Pp. 286-297

The Role of Word Sense Disambiguation in Automated Text Categorization

José María Gómez Hidalgo; Manuel de Buenaga Rodríguez; José Carlos Cortizo Pérez

Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate automatic classifiers by using Information Retrieval and Machine Learning Techniques. However, performance of this approach is damaged by the problems derived from language variation (specially polysemy and synonymy). We investigate how Word Sense Disambiguation can be used to alleviate these problems, by using two traditional methods for thesaurus usage in Information Retrieval, namely Query Expansion and Concept Indexing. These methods are evaluated on the problem of using the Lexical Database WordNet for text categorization, focusing on the Word Sense Disambiguation step involved. Our experiments demonstrate that rather simple dictionary methods, and baseline statistical approaches, can be used to disambiguate words and improve text representation and learning in both Query Expansion and Concept Indexing approaches.

- Regular Papers | Pp. 298-309

Combining Biological Databases and Text Mining to Support New Bioinformatics Applications

René Witte; Christopher J. O. Baker

A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up with the rapidly expanding volume of scientific literature, natural language processing approaches are becoming increasingly important for bioinformatic projects.

In this paper, we go beyond simply extracting information from full-text articles by describing an architecture that supports targeted access to information from biological databases using the results derived from text mining of research papers, thereby integrating information from both sources within a biological application.

The described architecture is currently being used to extract information about protein mutations from full-text research papers. Text mining results drive the retrieval of sequence information from protein databases and the employment of algorithmic sequence analysis tools, which facilitate further data access from protein structure databases. Complex mapping of NLP derived text annotations to protein structures allows the rendering, with 3D structure visualization, of information not available in databases of mutation annotations.

- Regular Papers | Pp. 310-321

A Semi-automatic Approach to Extracting Common Sense Knowledge from Knowledge Sources

Veda C. Storey; Vijayan Sugumaran; Yi Ding

Common sense knowledge based systems are developed by researchers to enable machines to understand ordinary knowledge and reason intelligently as a human would. The knowledge repositories of such systems are usually developed manually by a knowledge engineer or by users. Building a knowledge base of common sense knowledge such as that possessed by an average human being would be a very time-consuming, if not impossible, task. Some aspects of real world knowledge have already been captured and organized into various repositories such as the World Wide Web, WordNet, and the DAML ontology library. However, the extraction and integration of common sense knowledge from those sources remains a challenge. To address this challenge, an architecture for a Common Sense Knowledge Extractor is proposed that serves as an intermediary tool to extract common sense knowledge from several knowledge sources in order to develop a common sense repository. The design of the system as an extension of prior research on intelligent query processing is presented.

- Regular Papers | Pp. 322-332

A Phrasal Approach to Natural Language Interfaces over Databases

Michael Minock

This short paper introduces the STEP system for natural language access to relational databases. In contrast to most work in the area, STEP adopts a phrasal approach; an administrator couples phrasal patterns to elementary expressions of tuple relational calculus. This ‘phrasal lexicon’ is used bi-directionally, enabling the generation of natural language from tuple relational calculus and the inverse parsing of natural language to tuple calculus. This ability to both understand and generate natural language enables STEP to engage the user in clarification dialogs when the parse of their query is of questionable quality or is open to multiple interpretations. An on-line demonstration of STEP is accessible at .

- Regular Papers | Pp. 333-336