Catálogo de publicaciones - libros
Título de Acceso Abierto
MARE-WINT: MARE-WINT
En conferencia: 27º International Conference of the German Society for Computational Linguistics and Language Technology (GSCL) . Berlin, Germany . September 13, 2017 - September 14, 2017
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
renewable; green; energy; environment; law; policy
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No requiere | 2018 | Directory of Open access Books | ||
No requiere | 2018 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-319-73705-8
ISBN electrónico
978-3-319-73706-5
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2018
Cobertura temática
Tabla de contenidos
Investigating the Morphological Complexity of German Named Entities: The Case of the GermEval NER Challenge
Bettina Klimek; Markus Ackermann; Amit Kirschenbaum; Sebastian Hellmann
This paper presents a detailed analysis of Named Entity Recognition (NER) in German, based on the performance of systems that participated in the GermEval 2014 shared task. It focuses on the role of morphology in named entities, an issue too often neglected in the NER task. We introduce a measure to characterize the morphological complexity of German named entities and apply it to the subset of named entities identified by all systems, and to the subset of named entities none of the systems recognized. We discover that morphologically complex named entities are more prevalent in the latter set than in the former, a finding which should be taken into account in future development of methods of that sort. In addition, we provide an analysis of issues found in the GermEval gold standard annotation, which affected also the performance measurements of the different systems.
- Processing German: Named Entities | Pp. 130-145
Detecting Named Entities and Relations in German Clinical Reports
Roland Roller; Nils Rethmeier; Philippe Thomas; Marc Hübner; Hans Uszkoreit; Oliver Staeck; Klemens Budde; Fabian Halleck; Danilo Schmidt
Clinical notes and discharge summaries are commonly used in the clinical routine and contain patient related information such as well-being, findings and treatments. Information is often described in text form and presented in a semi-structured way. This makes it difficult to access the highly valuable information for patient support or clinical studies. Information extraction can help clinicians to access this information. However, most methods in the clinical domain focus on English data. This work aims at information extraction from German nephrology reports. We present on-going work in the context of detecting named entities and relations. Underlying to this work is a currently generated corpus annotation which includes a large set of different medical concepts, attributes and relations. At the current stage we apply a number of classification techniques to the existing dataset and achieve promising results for most of the frequent concepts and relations.
- Processing German: Named Entities | Pp. 146-154
In-Memory Distributed Training of Linear-Chain Conditional Random Fields with an Application to Fine-Grained Named Entity Recognition
Robert Schwarzenberg; Leonhard Hennig; Holmer Hemsen
Recognizing fine-grained named entities, i.e., and instead of just the coarse type , has been shown to increase task performance in several contexts. Fine-grained types, however, amplify the problem of data sparsity during training, which is why larger amounts of training data are needed. In this contribution we address scalability issues caused by the larger training sets. We distribute and parallelize feature extraction and parameter estimation in linear-chain conditional random fields, which are a popular choice for sequence labeling tasks such as named entity recognition (NER) and part of speech (POS) tagging. To this end, we employ the parallel stream processing framework Apache Flink which supports in-memory distributed iterations. Due to this feature, contrary to prior approaches, our system becomes iteration-aware during gradient descent. We experimentally demonstrate the scalability of our approach and also validate the parameters learned during distributed training in a fine-grained NER task.
- Processing German: Named Entities | Pp. 155-167
What Does This Imply? Examining the Impact of Implicitness on the Perception of Hate Speech
Darina Benikova; Michael Wojatzki; Torsten Zesch
We analyze whether implicitness affects human perception of hate speech. To do so, we use Tweets from an existing hate speech corpus and paraphrase them with rules to make the hate speech they contain more explicit. Comparing the judgment on the original and the paraphrased Tweets, our study indicates that implicitness is a factor in human and automatic hate speech detection. Hence, our study suggests that current automatic hate speech detection needs features that are more sensitive to implicitness.
- Online-Media and Online-Content | Pp. 171-179
Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication
Peter Bourgonje; Julian Moreno-Schneider; Ankit Srivastava; Georg Rehm
The sheer ease with which abusive and hateful utterances can be made online – typically from the comfort of your home and the lack of any immediate negative repercussions – using today’s digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.
- Online-Media and Online-Content | Pp. 180-191
Token Level Code-Switching Detection Using Wikipedia as a Lexical Resource
Daniel Claeser; Dennis Felske; Samantha Kent
We present a novel lexicon-based classification approach for code-switching detection on Twitter. The main aim is to develop a simple lexical look-up classifier based on frequency information retrieved from Wikipedia. We evaluate the classifier using three different language pairs: Spanish-English, Dutch-English, and German-Turkish. The results indicate that our figures for Spanish-English are competitive with current state of the art classifiers, even though the approach is simplistic and based solely on word frequency information.
- Online-Media and Online-Content | Pp. 192-198
How Social Media Text Analysis Can Inform Disaster Management
Sabine Gründer-Fahrer; Antje Schlaf; Sebastian Wustmann
Digitalization and the rise of social media have led disaster management to the insight that modern information technology will have to play a key role in dealing with a crisis. In this context, the paper introduces a NLP software for social media text analysis that has been developed in cooperation with disaster managers in the European project . The aim is to show how state-of-the-art techniques from text mining and information extraction can be applied to fulfil the requirements of the end-users. By way of example use cases the capacity of the approach will be demonstrated to make available social media as a valuable source of information for disaster management.
- Online-Media and Online-Content | Pp. 199-207
A Comparative Study of Uncertainty Based Active Learning Strategies for General Purpose Twitter Sentiment Analysis with Deep Neural Networks
Nils Haldenwang; Katrin Ihler; Julian Kniephoff; Oliver Vornberger
Active learning is a common approach when it comes to classification problems where a lot of unlabeled samples are available but the cost of manually annotating samples is high. This paper describes a study of the feasibility of uncertainty based active learning for general purpose Twitter sentiment analysis with deep neural networks. Results indicate that the approach based on active learning is able to achieve similar results to very large corpora of randomly selected samples. The method outperforms randomly selected training data when the amount of training data used for both approaches is of equal size.
- Online-Media and Online-Content | Pp. 208-215
An Infrastructure for Empowering Internet Users to Handle Fake News and Other Online Media Phenomena
Georg Rehm
Online media and digital communication technologies have an unprecedented, even increasing level of social, political and also economic relevance. This article proposes an infrastructure to address phenomena of modern online media production, circulation and manipulation by establishing a distributed architecture for automatic processing and human feedback.
- Online-Media and Online-Content | Pp. 216-231
Different Types of Automated and Semi-automated Semantic Storytelling: Curation Technologies for Different Sectors
Georg Rehm; Julián Moreno-Schneider; Peter Bourgonje; Ankit Srivastava; Rolf Fricke; Jan Thomsen; Jing He; Joachim Quantz; Armin Berger; Luca König; Sören Räuchle; Jens Gerth; David Wabnitz
Many industries face an increasing need for smart systems that support the processing and generation of digital content. This is both due to an ever increasing amount of incoming content that needs to be processed faster and more efficiently, but also due to an ever increasing pressure of publishing new content in cycles that are getting shorter and shorter. In a research and technology transfer project we develop a platform that provides content curation services that can be integrated into Content Management Systems, among others. In the project we develop curation services, which comprise semantic text and document analytics processes as well as knowledge technologies that can be applied to document collections. The key objective is to support digital curators in their daily work, i.e., to (semi-)automate processes that the human experts are normally required to carry out intellectually and, typically, without tool support. The goal is to enable knowledge workers to become more efficient and more effective as well as to produce high-quality content. In this article we focus on the current state of development with regard to semantic storytelling in our four use cases.
- Online-Media and Online-Content | Pp. 232-247