Catálogo de publicaciones - libros

Compartir en
redes sociales


Título de Acceso Abierto

Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications

Michael R. Berthold (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Information Systems Applications (incl. Internet); User Interfaces and Human Computer Interaction; Pattern Recognition; Computer Communication Networks

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No requiere 2012 SpringerLink acceso abierto

Información

Tipo de recurso:

libros

ISBN impreso

978-3-642-31829-0

ISBN electrónico

978-3-642-31830-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© The Editor(s) (if applicable) and the Author(s) 2012. The book is published with open access at Springer-Link.com 2012

Tabla de contenidos

On the Integration of Graph Exploration and Data Analysis: The Creative Exploration Toolkit

Stefan Haun; Tatiana Gossen; Andreas Nürnberger; Tobias Kötter; Kilian Thiel; Michael R. Berthold

To enable discovery in large, heterogenious information networks a tool is needed that allows exploration in changing graph structures and integrates advanced graph mining methods in an interactive visualization framework. We present the Creative Exploration Toolkit (CET), which consists of a state-of-the-art user interface for graph visualization designed towards explorative tasks and support tools for integration and communication with external data sources and mining tools, especially the data-mining platform KNIME. All parts of the interface can be customized to fit the requirements of special tasks, including the use of node type dependent icons, highlighting of nodes and clusters. Through an evaluation we have shown the applicability of CET for structure-based analysis tasks.

Part IV - Exploration | Pp. 301-312

Bisociative Knowledge Discovery by Literature Outlier Detection

Ingrid Petrič; Bojan Cestnik; Nada Lavrač; Tanja Urbančič

The aim of this chapter is to present the role of outliers in literature-based knowledge discovery that can be used to explore potential bisociative links between different domains of expertise. The proposed approach upgrades the RaJoLink method which provides a novel framework for effectively guiding the knowledge discovery from literature, based on the principle of rare terms from scientific articles. This chapter shows that outlier documents can be successfully used as means of detecting bridging terms that connect documents of two different literature sources. This linking process, known also as closed discovery, is incorporated as one of the steps of the RaJoLink methodology, and is performed by using publicly available topic ontology construction tool OntoGen. We chose scientific articles about autism as the application example with which we demonstrated the proposed approach.

Part IV - Exploration | Pp. 313-324

Exploring the Power of Outliers for Cross-Domain Literature Mining

Borut Sluban; Matjaž Juršič; Bojan Cestnik; Nada Lavrač

In bisociative cross-domain literature mining the goal is to identify interesting terms or concepts which relate different domains. This chapter reveals that a majority of these domain bridging concepts can be found in outlier documents which are not in the mainstream domain literature. We have detected outlier documents by combining three classification-based outlier detection methods and explored the power of these outlier documents in terms of their potential for supporting the bridging concept discovery process. The experimental evaluation was performed on the classical migraine-magnesium and the recently explored autism-calcineurin domain pairs.

Part IV - Exploration | Pp. 325-337

Bisociative Literature Mining by Ensemble Heuristics

Matjaž Juršič; Bojan Cestnik; Tanja Urbančič; Nada Lavrač

In literature mining, the identification of bridging concepts that link two diverse domains has been shown to be a promising approach for finding bisociations as distinct, yet unexplored cross-domain connections which could lead to new scientific discoveries. This chapter introduces the system CrossBee (on line Cross-Context Bisociation Explorer) which implements a methodology that supports the search for hidden links connecting two different domains. The methodology is based on an ensemble of specially tailored text mining heuristics which assign the candidate bridging concepts a bisociation score. Using this score, the user of the system can primarily explore only the most promising concepts with high bisociation scores. Besides improved bridging concept identification and ranking, CrossBee also provides various content presentations which further speed up the process of bisociation hypotheses examination. These presentations include side-by-side document inspection, emphasizing of interesting text fragments, and uncovering similar documents. The methodology is evaluated on two problems: the standard migraine-magnesium problem well-known in literature mining, and a more recent autism-calcineurin literature mining problem.

Part IV - Exploration | Pp. 338-358

Applications and Evaluation: Overview

Igor Mozetič; Nada Lavrač

This part of the book presents several applications which were motivated by the concept of bisociation, and to some extent exploited the notions of heterogeneous information networks, explicit contextualization and/or context crossing.

The main goals of these applications are:

Most of the applications are in the area of biology, but in addition there are interesting digressions to finance, improvements of business processes, and music recommendations.

Part V - Applications and Evaluation | Pp. 359-363

Biomine: A Network-Structured Resource of Biological Entities for Link Prediction

Lauri Eronen; Petteri Hintsanen; Hannu Toivonen

Biomine is a biological graph database constructed from public databases. Its entities (vertices) include biological concepts (such as genes, proteins, tissues, processes and phenotypes, as well as scientific articles) and relations (edges) between these entities correspond to real-world phenomena such as “a gene codes for a protein” or “an article refers to a phenotype”. Biomine also provides tools for querying the graph for connections and visualizing them interactively.

We describe the Biomine graph database. We also discuss link discovery in such biological graphs and review possible link prediction measures. Biomine currently contains over 1 million entities and over 8 million relations between them, with focus on human genetics. It is available on-line and can be queried for connecting subgraphs between biological entities.

Part V - Applications and Evaluation | Pp. 364-378

Semantic Subgroup Discovery and Cross-Context Linking for Microarray Data Analysis

Igor Mozetič; Nada Lavrač; Vid Podpečan; Petra Kralj Novak; Helena Motaln; Marko Petek; Kristina Gruden; Hannu Toivonen; Kimmo Kulovesi

The article presents an approach to computational knowledge discovery through the mechanism of . Bisociative reasoning is at the heart of creative, accidental discovery (e.g., serendipity), and is focused on finding unexpected links by crossing contexts. Contextualization and linking between highly diverse and distributed data and knowledge sources is therefore crucial for the implementation of bisociative reasoning. In the article we explore these ideas on the problem of analysis of microarray data. We show how enriched gene sets are found by using ontology information as background knowledge in semantic subgroup discovery. These genes are then contextualized by the computation of probabilistic links to diverse bioinformatics resources. Preliminary experiments with microarray data illustrate the approach.

Part V - Applications and Evaluation | Pp. 379-389

Contrast Mining from Interesting Subgroups

Laura Langohr; Vid Podpečan; Marko Petek; Igor Mozetič; Kristina Gruden

Subgroup discovery methods find interesting subsets of objects of a given class. We propose to extend subgroup discovery by a second subgroup discovery step to find interesting subgroups of objects specific for a class in one or more contrast classes. First, a subgroup discovery method is applied. Then, contrast classes of objects are defined by using set theoretic functions on the discovered subgroups of objects. Finally, subgroup discovery is performed to find interesting subgroups within the two contrast classes, pointing out differences between the characteristics of the two. This has various application areas, one being biology, where finding interesting subgroups has been addressed widely for gene-expression data. There, our method finds enriched gene sets which are common to samples in a class (e.g., differential expression in virus infected versus non-infected) and at the same time specific for one or more class attributes (e.g., time points or genotypes). We report on experimental results on a time-series data set for virus infected potato plants. The results present a comprehensive overview of potato’s response to virus infection and reveal new research hypotheses for plant biologists.

Part V - Applications and Evaluation | Pp. 390-406

Link and Node Prediction in Metabolic Networks with Probabilistic Logic

Angelika Kimmig; Fabrizio Costa

Information on metabolic processes for hundreds of organisms is available in public databases. However, this information is often incomplete or affected by uncertainty. Systems capable to perform automatic curation of these databases and capable to suggest pathway-holes fillings are therefore needed. To this end such systems should exploit data available from related organisms and cope with heterogeneous sources of information (e.g. phylogenetic relations). Here we start to investigate two fundamental problems concerning automatic metabolic networks curation, namely and using ProbLog, a simple yet powerful extension of the logic programming language Prolog with independent random variables.

Part V - Applications and Evaluation | Pp. 407-426

Modelling a Biological System: Network Creation by Triplet Extraction from Biological Literature

Dragana Miljkovic; Vid Podpečan; Miha Grčar; Kristina Gruden; Tjaša Stare; Marko Petek; Igor Mozetič; Nada Lavrač

The chapter proposes an approach to support modelling of plant defence response to pathogen attacks. Such models are currently built manually from expert knowledge, experimental results, and literature search, which is a very time consuming process. Manual model construction can be effectively complemented by automated model extraction from biological literature. This work focuses on the construction of triplets in the form of subject-predicate-object extracted from scientific papers, which are used by the Biomine automated graph construction and visualisation engine to create the biological model. The approach was evaluated by comparing the automatically generated graph with a manually developed Petri net model of plant defence. This approach to automated model creation was explored also in a bisociative setting. The emphasis is not on creative knowledge discovery, but rather on specifying and crossing the boundaries of knowledge of individual scientists. This could be used to model the expertise of virtual scientific consortia.

Part V - Applications and Evaluation | Pp. 427-437