Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications

Michael R. Berthold (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); User Interfaces and Human Computer Interaction; Pattern Recognition; Computer Communication Networks

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2012 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-642-31829-0

ISBN electrónico

978-3-642-31830-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© The Editor(s) (if applicable) and the Author(s) 2012. The book is published with open access at Springer-Link.com 2012

Tabla de contenidos

BiQL: A Query Language for Analyzing Information Networks

Anton Dries; Siegfried Nijssen; Luc De Raedt

One of the key steps in data analysis is the exploration of data. For traditional relational data, this process is facilitated by relational database management systems and the aggregates and rankings they can compute. However, for the exploration of graph data, relational databases may not be most practical and scalable. Many tasks related to exploration of information networks involve computation and analysis of connections (e.g. paths) between concepts. Traditional relational databases offer no specific support for performing such tasks. For instance, a statistic such as the shortest path between two given nodes cannot be computed by a relational database. Surprisingly, tools for querying graph and network databases are much less well developed than for relational data, and only recently an increasing number of studies are devoted to graph or network databases. Our position is that the development of such graph databases is important both to make basic graph mining easier and to prepare data for more complex types of analysis.

In this chapter we present the BiQL data model for representing and manipulating information networks. The BiQL data model consists of two parts: a data model describing objects, link, domains and networks, and a query language describing basic network manipulations. The main focus here lies on data preparation and data analysis, and less on data mining or knowledge discovery tasks directly.

Part III - Network Analysis | Pp. 147-165

Review of BisoNet Abstraction Techniques

Fang Zhou; Sébastien Mahler; Hannu Toivonen

BisoNets represent relations of information items as networks. The goal of BisoNet abstraction is to transform a large BisoNet into a smaller one which is simpler and easier to use, although some information may be lost in the abstraction process. An abstracted BisoNet can help users to see the structure of a large BisoNet, or understand connections between distant nodes, or discover hidden knowledge. In this paper we review different approaches and techniques to abstract a large BisoNet. We classify the approaches into two groups: preference-free methods and preference-dependent methods.

Part III - Network Analysis | Pp. 166-178

Simplification of Networks by Edge Pruning

Fang Zhou; Sébastien Mahler; Hannu Toivonen

We propose a novel problem to simplify weighted graphs by pruning least important edges from them. Simplified graphs can be used to improve visualization of a network, to extract its main structure, or as a pre-processing step for other data mining algorithms.

We define a graph connectivity function based on the best paths between all pairs of nodes. Given the number of edges to be pruned, the problem is then to select a subset of edges that best maintains the overall graph connectivity. Our model is applicable to a wide range of settings, including probabilistic graphs, flow graphs and distance graphs, since the path quality function that is used to find best paths can be defined by the user. We analyze the problem, and give lower bounds for the effect of individual edge removal in the case where the path quality function has a natural recursive property. We then propose a range of algorithms and report on experimental results on real networks derived from public biological databases.

The results show that a large fraction of edges can be removed quite fast and with minimal effect on the overall graph connectivity. A rough semantic analysis of the removed edges indicates that few important edges were removed, and that the proposed approach could be a valuable tool in aiding users to view or explore weighted graphs.

Part III - Network Analysis | Pp. 179-198

Network Compression by Node and Edge Mergers

Hannu Toivonen; Fang Zhou; Aleksi Hartikainen; Atte Hinkka

We give methods to compress weighted graphs (i.e., networks or BisoNets) into smaller ones. The motivation is that large networks of social, biological, or other relations can be complex to handle and visualize. Using the given methods, nodes and edges of a give graph are grouped to supernodes and superedges, respectively. The interpretation (i.e. decompression) of a compressed graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge equals the weight of the superedge. The compression problem then consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized.

In this chapter, we describe this task as the ’simple weighted graph compression problem’. We also discuss a much wider class of tasks under the name of ’generalized weighted graph compression problem’. The generalized task extends the optimization to preserve longer-range connectivities between nodes, not just individual edge weights. We study the properties of these problems and outline a range of algorithms to solve them, with different trade-offs between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.

Part III - Network Analysis | Pp. 199-217

Finding Representative Nodes in Probabilistic Graphs

Laura Langohr; Hannu Toivonen

We introduce the problem of identifying representative nodes in probabilistic graphs, motivated by the need to produce different simple views to large BisoNets. We define a probabilistic similarity measure for nodes, and then apply clustering methods to find groups of nodes. Finally, a representative is output from each cluster. We report on experiments with real biomedical data, using both the -medoids and hierarchical clustering methods in the clustering step. The results suggest that the clustering based approaches are capable of finding a representative set of nodes.

Part III - Network Analysis | Pp. 218-229

(Missing) Concept Discovery in Heterogeneous Information Networks

Tobias Kötter; Michael R. Berthold

This article proposes a new approach to extract existing (or detect missing) concepts from a loosely integrated collection of information units by means of concept graph detection. Thereby a concept graph defines a concept by a quasi bipartite sub-graph of a bigger network with the members of the concept as the first vertex partition and their shared aspects as the second vertex partition. Once the concepts have been extracted they can be used to create higher level representations of the data. Concept graphs further allow the discovery of missing concepts, which could lead to new insights by connecting seemingly unrelated information units.

Part III - Network Analysis | Pp. 230-245

Node Similarities from Spreading Activation

Kilian Thiel; Michael R. Berthold

In this paper we propose two methods to derive different kinds of node neighborhood based similarities in a network. The first similarity measure focuses on the overlap of direct and indirect neighbors. The second similarity compares nodes based on the structure of their possibly also very distant neighborhoods. Both similarities are derived from spreading activation patterns over time. Whereas in the first method the activation patterns are directly compared, in the second method the relative change of activation over time is compared. We applied both methods to a real world graph dataset and discuss some of the results in more detail.

Part III - Network Analysis | Pp. 246-262

Towards Discovery of Subgraph Bisociations

Uwe Nagel; Kilian Thiel; Tobias Kötter; Dawid Piątek; Michael R. Berthold

The discovery of surprising relations in large, heterogeneous information repositories is gaining increasing importance in real world data analysis. If these repositories come from diverse origins, forming different domains, domain bridging associations between otherwise weakly connected domains can provide insights into the data that are not accomplished by aggregative approaches. In this paper, we propose a first formalization for the detection of such potentially interesting, domain-crossing relations based purely on structural properties of a relational knowledge description.

Part III - Network Analysis | Pp. 263-284

Exploration: Overview

Andreas Nürnberger

In the previous chapters of this book quite different approaches to create networks based on existing data collections (Part II) have been discussed and diverse methods for network analysis have been proposed (Part III). All these methods provide powerful means in order to obtain different insights into the properties of huge information networks or graphs. However, one disadavantage of these individual approaches is that each approach provides only specific facets of information to the end user. Integrated exploration tools that enable the interactive exploration of huge graphs and data collections using data viusalization, aggregation and mining methods could be much more beneficial for the (interactive) task of finding bisociations. Therefore, we present and discuss in this part of the book methods for interactive exploration that bring these fields together.

Part IV - Exploration | Pp. 285-286

Data Exploration for Bisociative Knowledge Discovery: A Brief Overview of Tools and Evaluation Methods

Tatiana Gossen; Marcus Nitsche; Stefan Haun; Andreas Nürnberger

In this chapter we explain the definition of the term . We refine this definition in the context of browsing, navigating and searching. We provide a definition of and derive requirements on user interfaces, which are designed to support bisociative knowledge discovery. We discuss how to support subtasks of bisociative data exploration with appropriate user interface elements. We also present a set of exploratory tools, which are currently available or in development. Finally, we discuss the problem of usability evaluation in the context of exploratory search. Two main issues - complexity and comparability - are explained and possible solutions proposed.

Part IV - Exploration | Pp. 287-300