Catálogo de publicaciones - libros

Compartir en
redes sociales


Text, Speech and Dialogue: 10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007. Proceedings

Václav Matoušek ; Pavel Mautner (eds.)

En conferencia: 10º International Conference on Text, Speech and Dialogue (TSD) . Pilsen, Czech Republic . September 3, 2007 - September 7, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Language Translation and Linguistics; Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Storage and Retrieval; Information Systems Applications (incl. Internet)

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74627-0

ISBN electrónico

978-3-540-74628-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Disambiguating Hypernym Relations for Thesaurus

Alistair Kennedy; Stanistaw Szpakowicz

Thesaurus is a lexical resource which groups terms by semantic relatedness. It is shortcoming that the relations are ambiguous, in that it does not them; it only shows that there a relation between terms. Our work focuses on disambiguating hypernym relations within Thesaurus. Several techniques of identifying hypernym relations are compared and contrasted in this paper, and a total of over 50,000 hypernym relations have been disambiguated within . Human judges have evaluated the quality of our disambiguation techniques, and we have demonstrated on several applications the usefulness of the disambiguated relations.

- Text | Pp. 66-75

Dependency and Phrasal Parsers of the Czech Language: A Comparison

Aleš Horák; Tomáš Holan; Vladimír Kadlec; Vojtěch Kovář

In the paper, we present the results of an experiment with comparing the effectiveness of real text parsers of Czech language based on completely different approaches – stochastic parsers that provide dependency trees as their outputs and a meta-grammar parser that generates a resulting chart structure representing a packed forest of phrasal derivation trees.

We describe and formulate main questions and problems accompanying such experiment, try to offer answers to these questions and finally display also factual results of the tests measured on 10 thousand Czech sentences.

- Text | Pp. 76-84

Automatic Word Clustering in Russian Texts

Olga Mitrofanova; Anton Mukhin; Polina Panicheva; Vyacheslav Savitsky

The paper deals with development and application of automatic word clustering (AWC) tool aimed at processing Russian texts of various types, which should satisfy the requirements of flexibility and compatibility with other linguistic resources. The construction of AWC tool requires computer implementation of latent semantic analysis (LSA) combined with clustering algorithms. To meet the need, Python-based software has been developed. Major procedures performed by AWC tool are segmentation of input texts and context analysis, co-occurrence matrix construction, agglomerative and -means clustering. Special attention is drawn to experimental results on clustering words in raw texts with changing parameters.

- Text | Pp. 85-91

Feature Engineering in Maximum Spanning Tree Dependency Parser

Václav Novák; Zdeněk Žabokrtský

In this paper we present the results of our experiments with modifications of the feature set used in the Czech mutation of the Maximum Spanning Tree parser. First we show how new feature templates improve the parsing accuracy and second we decrease the dimensionality of the feature space to make the parsing process more effective without sacrificing accuracy.

- Text | Pp. 92-98

Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns

Maciej Piasecki; Stanisław Szpakowicz; Bartosz Broda

We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.

- Text | Pp. 99-106

Bilingual News Clustering Using Named Entities and Fuzzy Similarity

Soto Montalvo; Raquel Martínez; Arantza Casillas; Víctor Fresno

This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge for multilingual news clustering. In the vectorial news representation we take into account the category of the named entities. In order to determine the similarity between two documents, we propose a new approach based on a fuzzy system, with a knowledge base that tries to incorporate the human knowledge about the importance of the named entities category in the news. We have compared our approach with a traditional one obtaining better results in a comparable corpus with news in Spanish and English.

- Text | Pp. 107-114

Extractive Summarization of Broadcast News: Comparing Strategies for European Portuguese

Ricardo Ribeiro; David Martins de Matos

This paper presents the comparison between three methods for extractive summarization of Portuguese broadcast news: feature-based, Maximal Marginal Relevance, and Latent Semantic Analysis. The main goal is to understand the level of agreement among the automatic summaries and how they compare to summaries produced by non-professional human summarizers. Results were evaluated using the ROUGE-L metric. Maximal Marginal Relevance performed close to human summarizers. Both feature-based and Latent Semantic Analysis automatic summarizers performed close to each other and worse than Maximal Marginal Relevance, when compared to the summaries done by the human summarizers.

- Text | Pp. 115-122

On the Evaluation of Korean WordNet

Altangerel Chagnaa; Ho-Seop Choe; Cheol-Young Ock; Hwa-Mook Yoon

WordNet has become an important and useful resource for the natural language processing field. Recently, many countries have been developing their own WordNet. In this paper we show an evaluation of the Korean WordNet (U-WIN). The purpose of the work is to study how well the manually created lexical taxonomy U-WIN is built. Evaluation is done level by level, and the reason for selecting words for each level is that we want to compare each level and to find relations between them. As a result the words at a certain level (level 6) give the best score, for which we can make a conclusion that the words at this level are better organized than those at other levels. The score decreases as the level goes up or down from this particular level.

- Text | Pp. 123-130

An Adaptive Keyboard with Personalized Language-Based Features

Siska Fitrianie; Leon J. M. Rothkrantz

Our research is about an adaptive keyboard, which autonomously adjusts its predictive features and key displays to current user input. We used personalized word prediction to improve the performance of such a system. Prediction using common English dictionary (represented by the British National Corpus) is compared with prediction using personal data, such as personal documents, chat logs, and personal emails. A user study was also conducted to gather requirements for a new keyboard design. Based on these studies, we developed a personalized and adaptive on-screen keyboard for both single-handed and zero-handed users. It combines tapping-based and motion-based text input with language-based acceleration techniques, including personalized and adaptive task-based dictionary, frequent character prompting, word completion, and grammar checker with suffix completion.

- Text | Pp. 131-138

An All-Path Parsing Algorithm for Constraint-Based Dependency Grammars of CF-Power

Tomasz Obrêbski

An all-path parsing algorithm for a constraint-based dependency grammar of context-free power is presented. The grammar specifies possible dependencies between words together with a number of constraints. The algorithm builds a packed representation of ambiguous syntactic structure in the form of a dependency graph. For certain types of ambiguities the graph grows slower than the chart or parse forest.

- Text | Pp. 139-146