Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

MARE-WINT: MARE-WINT

En conferencia: 27º International Conference of the German Society for Computational Linguistics and Language Technology (GSCL) . Berlin, Germany . September 13, 2017 - September 14, 2017

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

renewable; green; energy; environment; law; policy

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2018 Directory of Open access Books acceso abierto
No requiere 2018 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-319-73705-8

ISBN electrónico

978-3-319-73706-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Tabla de contenidos

Investigating the Morphological Complexity of German Named Entities: The Case of the GermEval NER Challenge

Bettina Klimek; Markus Ackermann; Amit Kirschenbaum; Sebastian Hellmann

This paper presents a detailed analysis of Named Entity Recognition (NER) in German, based on the performance of systems that participated in the GermEval 2014 shared task. It focuses on the role of morphology in named entities, an issue too often neglected in the NER task. We introduce a measure to characterize the morphological complexity of German named entities and apply it to the subset of named entities identified by all systems, and to the subset of named entities none of the systems recognized. We discover that morphologically complex named entities are more prevalent in the latter set than in the former, a finding which should be taken into account in future development of methods of that sort. In addition, we provide an analysis of issues found in the GermEval gold standard annotation, which affected also the performance measurements of the different systems.

- Processing German: Named Entities | Pp. 130-145

Detecting Named Entities and Relations in German Clinical Reports

Roland Roller; Nils Rethmeier; Philippe Thomas; Marc Hübner; Hans Uszkoreit; Oliver Staeck; Klemens Budde; Fabian Halleck; Danilo Schmidt

Clinical notes and discharge summaries are commonly used in the clinical routine and contain patient related information such as well-being, findings and treatments. Information is often described in text form and presented in a semi-structured way. This makes it difficult to access the highly valuable information for patient support or clinical studies. Information extraction can help clinicians to access this information. However, most methods in the clinical domain focus on English data. This work aims at information extraction from German nephrology reports. We present on-going work in the context of detecting named entities and relations. Underlying to this work is a currently generated corpus annotation which includes a large set of different medical concepts, attributes and relations. At the current stage we apply a number of classification techniques to the existing dataset and achieve promising results for most of the frequent concepts and relations.

- Processing German: Named Entities | Pp. 146-154

In-Memory Distributed Training of Linear-Chain Conditional Random Fields with an Application to Fine-Grained Named Entity Recognition

Robert Schwarzenberg; Leonhard Hennig; Holmer Hemsen

Recognizing fine-grained named entities, i.e., and instead of just the coarse type , has been shown to increase task performance in several contexts. Fine-grained types, however, amplify the problem of data sparsity during training, which is why larger amounts of training data are needed. In this contribution we address scalability issues caused by the larger training sets. We distribute and parallelize feature extraction and parameter estimation in linear-chain conditional random fields, which are a popular choice for sequence labeling tasks such as named entity recognition (NER) and part of speech (POS) tagging. To this end, we employ the parallel stream processing framework Apache Flink which supports in-memory distributed iterations. Due to this feature, contrary to prior approaches, our system becomes iteration-aware during gradient descent. We experimentally demonstrate the scalability of our approach and also validate the parameters learned during distributed training in a fine-grained NER task.

- Processing German: Named Entities | Pp. 155-167

What Does This Imply? Examining the Impact of Implicitness on the Perception of Hate Speech

Darina Benikova; Michael Wojatzki; Torsten Zesch

We analyze whether implicitness affects human perception of hate speech. To do so, we use Tweets from an existing hate speech corpus and paraphrase them with rules to make the hate speech they contain more explicit. Comparing the judgment on the original and the paraphrased Tweets, our study indicates that implicitness is a factor in human and automatic hate speech detection. Hence, our study suggests that current automatic hate speech detection needs features that are more sensitive to implicitness.

- Online-Media and Online-Content | Pp. 171-179

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Peter Bourgonje; Julian Moreno-Schneider; Ankit Srivastava; Georg Rehm

The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.

- Online-Media and Online-Content | Pp. 180-191

Token Level Code-Switching Detection Using Wikipedia as a Lexical Resource

Daniel Claeser; Dennis Felske; Samantha Kent

We present a novel lexicon-based classification approach for code-switching detection on Twitter. The main aim is to develop a simple lexical look-up classifier based on frequency information retrieved from Wikipedia. We evaluate the classifier using three different language pairs: Spanish-English, Dutch-English, and German-Turkish. The results indicate that our figures for Spanish-English are competitive with current state of the art classifiers, even though the approach is simplistic and based solely on word frequency information.

- Online-Media and Online-Content | Pp. 192-198

How Social Media Text Analysis Can Inform Disaster Management

Sabine Gründer-Fahrer; Antje Schlaf; Sebastian Wustmann

Digitalization and the rise of social media have led disaster management to the insight that modern information technology will have to play a key role in dealing with a crisis. In this context, the paper introduces a NLP software for social media text analysis that has been developed in cooperation with disaster managers in the European project . The aim is to show how state-of-the-art techniques from text mining and information extraction can be applied to fulfil the requirements of the end-users. By way of example use cases the capacity of the approach will be demonstrated to make available social media as a valuable source of information for disaster management.

- Online-Media and Online-Content | Pp. 199-207

A Comparative Study of Uncertainty Based Active Learning Strategies for General Purpose Twitter Sentiment Analysis with Deep Neural Networks

Nils Haldenwang; Katrin Ihler; Julian Kniephoff; Oliver Vornberger

Active learning is a common approach when it comes to classification problems where a lot of unlabeled samples are available but the cost of manually annotating samples is high. This paper describes a study of the feasibility of uncertainty based active learning for general purpose Twitter sentiment analysis with deep neural networks. Results indicate that the approach based on active learning is able to achieve similar results to very large corpora of randomly selected samples. The method outperforms randomly selected training data when the amount of training data used for both approaches is of equal size.

- Online-Media and Online-Content | Pp. 208-215

An Infrastructure for Empowering Internet Users to Handle Fake News and Other Online Media Phenomena

Georg Rehm

Online media and digital communication technologies have an unprecedented, even increasing level of social, political and also economic relevance. This article proposes an infrastructure to address phenomena of modern online media production, circulation and manipulation by establishing a distributed architecture for automatic processing and human feedback.

- Online-Media and Online-Content | Pp. 216-231

Different Types of Automated and Semi-automated Semantic Storytelling: Curation Technologies for Different Sectors

Georg Rehm; Julián Moreno-Schneider; Peter Bourgonje; Ankit Srivastava; Rolf Fricke; Jan Thomsen; Jing He; Joachim Quantz; Armin Berger; Luca König; Sören Räuchle; Jens Gerth; David Wabnitz

Many industries face an increasing need for smart systems that support the processing and generation of digital content. This is both due to an ever increasing amount of incoming content that needs to be processed faster and more efficiently, but also due to an ever increasing pressure of publishing new content in cycles that are getting shorter and shorter. In a research and technology transfer project we develop a platform that provides content curation services that can be integrated into Content Management Systems, among others. In the project we develop curation services, which comprise semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The key objective is to support digital curators in their daily work, i.e., to (semi-)automate processes that the human experts are normally required to carry out intellectually and, typically, without tool support. The goal is to enable knowledge workers to become more efficient and more effective as well as to produce high-quality content. In this article we focus on the current state of development with regard to semantic storytelling in our four use cases.

- Online-Media and Online-Content | Pp. 232-247