Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

MARE-WINT: MARE-WINT

En conferencia: 27º International Conference of the German Society for Computational Linguistics and Language Technology (GSCL) . Berlin, Germany . September 13, 2017 - September 14, 2017

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

renewable; green; energy; environment; law; policy

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2018 Directory of Open access Books acceso abierto
No requiere 2018 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-319-73705-8

ISBN electrónico

978-3-319-73706-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Tabla de contenidos

Reconstruction of Separable Particle Verbs in a Corpus of Spoken German

Dolores Batinić; Thomas Schmidt

We present a method for detecting and reconstructing separated particle verbs in a corpus of spoken German by following an approach suggested for written language. Our study shows that the method can be applied successfully to spoken language, compares different ways of dealing with structures that are specific to spoken language corpora, analyses some remaining problems, and discusses ways of optimising precision or recall for the method. The outlook sketches some possibilities for further work in related areas.

- Processing German: Basic Technologies | Pp. 3-10

Detecting Vocal Irony

Felix Burkhardt; Benjamin Weiss; Florian Eyben; Jun Deng; Björn Schuller

We describe a data collection for vocal expression of ironic utterances and anger based on an Android app that was specifically developed for this study. The main aim of the investigation is to find evidence for a non-verbal expression of irony. A data set of 937 utterances was collected and labeled by six listeners for irony and anger. The automatically recognized textual content was labeled for sentiment. We report on experiments to classify ironic utterances based on sentiment and tone-of-voice. Baseline results show that an ironic voice can be detected automatically solely based on acoustic features in 69.3 UAR (unweighted average recall) and anger with 64.1 UAR. The performance drops by about 4% when it is calculated with a leave-one-speaker-out cross validation.

- Processing German: Basic Technologies | Pp. 11-22

The Devil is in the Details: Parsing Unknown German Words

Daniel Dakota

The statistical parsing of morphologically rich languages is hindered by the inability of parsers to collect solid statistics because of the large number of word types in such languages. There are however two separate but connected problems, reducing data sparsity of known words and handling rare and unknown words. Methods for tackling one problem may inadvertently negatively impact methods to handle the other. We perform a tightly controlled set of experiments to reduce data sparsity through class-based representations in combination with unknown word signatures with two PCFG-LA parsers that handle rare and unknown words differently on the German TiGer treebank. We demonstrate that methods that have improved results for other languages do not transfer directly to German, and that we can obtain better results using a simplistic model rather than a more generalized model for rare and unknown word handling.

- Processing German: Basic Technologies | Pp. 23-39

Exploring Ensemble Dependency Parsing to Reduce Manual Annotation Workload

Jessica Sohl; Heike Zinsmeister

In this paper we present an evaluation of combining automatic and manual dependency annotation to reduce manual workload. More precisely, an ensemble of three parsers is used to annotate sentences of German textbook texts automatically. By including a constrained-based system in the cluster in addition to machine learning approaches, this approach deviates from the original ensemble idea and results in a highly reliable ensemble majority vote. Additionally, our explorative use of dependency parsing identifies error-prone analyses of different systems and helps us to predict items that do not need to be manually checked. Our approach is not innovative as such but we explore in detail its benefits for the annotation task. The manual workload can be reduced by highlighting the reliability of items, for example, in terms of a ‘traffic-light system’ that signals the reliability of the automatic annotation.

- Processing German: Basic Technologies | Pp. 40-47

Different German and English Coreference Resolution Models for Multi-domain Content Curation Scenarios

Ankit Srivastava; Sabine Weber; Peter Bourgonje; Georg Rehm

Coreference Resolution is the process of identifying all words and phrases in a text that refer to the same entity. It has proven to be a useful intermediary step for a number of natural language processing applications. In this paper, we describe three implementations for performing coreference resolution: rule-based, statistical, and projection-based (from English to German). After a comparative evaluation on benchmark datasets, we conclude with an application of these systems on German and English texts from different scenarios in digital curation such as an archive of personal letters, excerpts from a museum exhibition, and regional news articles.

- Processing German: Basic Technologies | Pp. 48-61

Word and Sentence Segmentation in German: Overcoming Idiosyncrasies in the Use of Punctuation in Private Communication

Kyoko Sugisaki

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i. e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

- Processing German: Basic Technologies | Pp. 62-71

Fine-Grained POS Tagging of German Social Media and Web Texts

Stefan Thater

This paper presents work on part-of-speech tagging of German social media and web texts. We take a simple Hidden Markov Model based tagger as a starting point, and extend it with a distributional approach to estimating lexical (emission) probabilities of out-of-vocabulary words, which occur frequently in social media and web texts and are a major reason for the low performance of off-the-shelf taggers on these types of text. We evaluate our approach on the recent dataset and show that our approach improves accuracy on out-of-vocabulary tokens by up to 5.8%; overall, we improve state-of-the-art by 0.4% to 90.9% accuracy.

- Processing German: Basic Technologies | Pp. 72-80

Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers

Leonie Weissweiler; Alexander Fraser

Stemmers, which reduce words to their stems, are important components of many natural language processing systems. In this paper, we conduct a systematic evaluation of several stemmers for German using two gold standards we have created and will release to the community. We then present our own stemmer, which achieves state-of-the-art results, is easy to understand and extend, and will be made publicly available both for use by programmers and as a benchmark for further stemmer development.

- Processing German: Basic Technologies | Pp. 81-94

Negation Modeling for German Polarity Classification

Michael Wiegand; Maximilian Wolf; Josef Ruppenhofer

We present an approach for modeling German negation in open-domain fine-grained sentiment analysis. Unlike most previous work in sentiment analysis, we assume that negation can be conveyed by many lexical units (and not only common negation words) and that different negation words have different scopes. Our approach is examined on a new dataset comprising sentences with mentions of polar expressions and various negation words. We identify different types of negation words that have the same scopes. We show that already negation modeling based on these types largely outperforms traditional negation models which assume the same scope for all negation words and which employ a window-based scope detection rather than a scope detection based on syntactic information.

- Processing German: Basic Technologies | Pp. 95-111

NECKAr: A Named Entity Classifier for Wikidata

Johanna Geiß; Andreas Spitz; Michael Gertz

Many Information Extraction tasks such as Named Entity Recognition or Event Detection require background repositories that provide a classification of entities into the basic, predominantly used classes , , and . Several available knowledge bases offer a very detailed and specific ontology of entities that can be used as a repository. However, due to the mechanisms behind their construction, they are relatively static and of limited use to IE approaches that require up-to-date information. In contrast, Wikidata is a community-edited knowledge base that is kept current by its userbase, but has a constantly evolving and less rigid ontology structure that does not correspond to these basic classes. In this paper we present the tool NECKAr, which assigns Wikidata entities to the three main classes of named entities, as well as the resulting Wikidata NE dataset that consists of over 8 million classified entities. Both are available at .

- Processing German: Named Entities | Pp. 115-129