Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications

Michael R. Berthold (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); User Interfaces and Human Computer Interaction; Pattern Recognition; Computer Communication Networks

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2012 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-642-31829-0

ISBN electrónico

978-3-642-31830-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© The Editor(s) (if applicable) and the Author(s) 2012. The book is published with open access at Springer-Link.com 2012

Tabla de contenidos

Towards Bisociative Knowledge Discovery

Michael R. Berthold

Knowledge discovery generally focuses on finding patterns within a reasonably well connected domain of interest. In this article we outline a framework for the discovery of new connections between domains (so called ), supporting the creative discovery process in a more powerful way. We motivate this approach, show the difference to classical data analysis and conclude by describing a number of different types of domain-crossing connections.

Part I - Bisociation | Pp. 1-10

Towards Creative Information Exploration Based on Koestler’s Concept of Bisociation

Werner Dubitzky; Tobias Kötter; Oliver Schmidt; Michael R. Berthold

refers to a novel framework for exploring large volumes of heterogeneous information. In particular, creative information exploration seeks to discover new, surprising and valuable relationships in data that would not be revealed by conventional information retrieval, data mining and data analysis technologies. While our approach is inspired by work in the field of computational creativity, we are particularly interested in a model of creativity proposed by Arthur Koestler in the 1960s. Koestler’s model of creativity rests on the concept of . Bisociative thinking occurs when a problem, idea, event or situation is perceived simultaneously in two or more “matrices of thought” or domains. When two matrices of thought interact with each other, the result is either their in a novel intellectual synthesis or their in a new aesthetic experience. This article discusses some of the foundational issues of computational creativity and bisociation in the context of creative information exploration.

Part I - Bisociation | Pp. 11-32

From Information Networks to Bisociative Information Networks

Tobias Kötter; Michael R. Berthold

The integration of heterogeneous data from various domains without the need for prefiltering prepares the ground for bisociative knowledge discoveries where attempts are made to find unexpected relations across seemingly unrelated domains. Information networks, due to their flexible data structure, lend themselves perfectly to the integration of these heterogeneous data sources. This chapter provides an overview of different types of information networks and categorizes them by identifying several key properties of information units and relations which reflect the expressiveness and thus ability of an information network to model heterogeneous data from diverse domains. The chapter progresses by describing a new type of information network known as bisociative information networks. This kind of network combines the key properties of existing networks in order to provide the foundation for bisociative knowledge discoveries. Finally based on this data structure three different patterns are described that fulfill the requirements of a bisociation by connecting concepts from seemingly unrelated domains.

Part I - Bisociation | Pp. 33-50

Network Creation: Overview

Christian Borgelt

Although networks are a very natural and straightforward way of organizing heterogeneous data, as argued in the introductory chapters, few data sources are in this form. We rather find the data we want to fuse, connect, analyze and thus exploit for creative discoveries, stored in flat files, (relational) databases, text document collections and the like. As a consequence, we need, as an initial step, methods that construct a network representation by analyzing tabular and textual data, in order to identify entities that can serve as nodes and to extract relevant relationships that should be represented by edges.

Part II - Representation and Network Creation | Pp. 51-53

Selecting the Links in BisoNets Generated from Document Collections

Marc Segond; Christian Borgelt

According to Koestler, the notion of a bisociation denotes a connection between pieces of information from habitually separated domains or categories. In this chapter, we consider a methodology to find such bisociations using a BisoNet as a representation of knowledge. In a first step, we consider how to create BisoNets from several tex- tual databases taken from different domains using simple text-mining techniques. To achieve this, we introduce a procedure to link nodes of a BisoNet and to endow such links with weights, which is based on a new measure for comparing text frequency vectors. In a second step, we try to rediscover known bisociations, which were originally found by a human domain expert, namely indirect relations between migraine and magnesium as they are hidden in medical research articles published before 1987. We observe that these bisociations are easily rediscovered by simply following the strongest links.

Part II - Representation and Network Creation | Pp. 54-65

Bridging Concept Identification for Constructing Information Networks from Text Documents

Matjaž Juršič; Borut Sluban; Bojan Cestnik; Miha Grčar; Nada Lavrač

A major challenge for next generation data mining systems is creative knowledge discovery from diverse and distributed data sources. In this task an important challenge is information fusion of diverse mainly unstructured representations into a unique knowledge format. This chapter focuses on merging information available in text documents into an information network – a graph representation of knowledge. The problem addressed is how to efficiently and effectively produce an information network from large text corpora from at least two diverse, seemingly unrelated, domains. The goal is to produce a network that has the highest potential for providing yet unexplored cross domain links which could lead to new scientific discoveries. The focus of this work is better identification of important domain bridging concepts that are promoted as core nodes around which the rest of the network is formed. The evaluation is performed by repeating a discovery made on medical articles in the migraine magnesium domain.

Part II - Representation and Network Creation | Pp. 66-90

Discovery of Novel Term Associations in a Document Collection

Teemu Hynönen; Sébastien Mahler; Hannu Toivonen

We propose a method to mine novel, document-specific associations between terms in a collection of unstructured documents. We believe that documents are often best described by the relationships they establish. This is also evidenced by the popularity of conceptual maps, mind maps, and other similar methodologies to organize and summarize information. Our goal is to discover term relationships that can be used to construct conceptual maps or so called BisoNets.

The model we propose, tpf–idf–tpu, looks for pairs of terms that are associated in an individual document. It considers three aspects, two of which have been generalized from tf–idf to term pairs: term pair frequency (tpf; importance for the document), inverse document frequency (idf; uniqueness in the collection), and term pair uncorrelation (tpu; independence of the terms). The last component is needed to filter out statistically dependent pairs that are not likely to be considered novel or interesting by the user.

We present experimental results on two collections of documents: one extracted from Wikipedia, and one containing text mining articles with manually assigned term associations. The results indicate that the tpf–idf–tpu method can discover novel associations, that they are different from just taking pairs of tf–idf keywords, and that they match better the subjective associations of a reader.

Part II - Representation and Network Creation | Pp. 91-103

Cover Similarity Based Item Set Mining

Marc Segond; Christian Borgelt

In standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions. We, instead, strive to find item sets for which the similarity of the covers of the items (that is, the sets of transactions containing the items) exceeds a user-defined threshold. This approach yields a much better assessment of the association strength of the items, because it takes additional information about their occurrences into account. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. In addition, standard frequent item set mining turns out to be a special case of this flexible framework. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.

Part II - Representation and Network Creation | Pp. 104-121

Patterns and Logic for Reasoning with Networks

Angelika Kimmig; Esther Galbrun; Hannu Toivonen; Luc De Raedt

and ProbLog are two frameworks to implement bisociative information networks (BisoNets). They combine structured data representations with probabilities expressing uncertainty. While is based on graphs, ProbLog’s core language is that of the logic programming language Prolog. This chapter provides an overview of important concepts, terminology, and reasoning tasks addressed in the two systems. It does so in an informal way, focusing on intuition rather than on mathematical definitions. It aims at bridging the gap between network representations and logical ones.

Part II - Representation and Network Creation | Pp. 122-143

Network Analysis: Overview

Hannu Toivonen

Heterogeneous information networks or BisoNets, as they are called in the context of bisociative knowledge discovery, are a flexible and popular form of representing data in numerous fields. Additionally, such networks can be created or derived from other types of information using, e.g., the methods given in Part II of this volume.

This part of the book describes various network algorithms for the exploration and analysis of BisoNets. Their general goal is to support and partially even automate the process of bisociation. More specific goals are to allow navigation of BisoNets by indirect and predicted relationships and by analogy, to produce explanations for discovered relationships, and to help abstract and summarise BisoNets for more effective visualisation.

Part III - Network Analysis | Pp. 144-146