Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Databases and Information Systems: 11th East European Conference, ADBIS 2007, Varna, Bulgaria, September 29-October 3, 2007. Proceedings

Yannis Ioannidis ; Boris Novikov ; Boris Rachev (eds.)

En conferencia: 11º East European Conference on Advances in Databases and Information Systems (ADBIS) . Varna, Bulgaria . September 29, 2007 - October 3, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75184-7

ISBN electrónico

978-3-540-75185-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

ETL Workflows: From Formal Specification to Optimization

Timos K. Sellis; Alkis Simitsis

In this paper, we present our work on a framework towards the modeling and optimization of Extraction-Transformation-Loading (ETL) workflows. The goal of this research was to facilitate, manage, and optimize the design and implementation of the ETL workflows both during the initial design and deployment stage, as well as, during the continuous evolution of a data warehouse. In particular, we present our results which include: (a) the provision of a novel conceptual model for the tracing of inter-attribute relationships and the respective ETL transformations in the early stages of a data warehouse project, along with an attempt to use ontology-based mechanisms to semi-automatically capture the semantics and the relationships among the various sources; (b) the provision of a novel logical model for the representation of ETL workflows with two main characteristics: genericity and customization; (c) the semi-automatic transition from the conceptual to the logical model for ETL workflows; and (d) the tuning of an ETL workflow for the optimization of the execution order of its operations. Finally, we discuss some issues on future work in the area that we consider important and a step towards the incorporation of the above research results to other areas as well.

- Invited Lectures | Pp. 1-11

Harvesting and Organizing Knowledge from the Web

Gerhard Weikum

Information organization and search on the Web is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and Deep-Web search. I envision another big leap forward by automatically harvesting and organizing knowledge from the Web, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three strong trends: 1) rich Semantic-Web-style knowledge repositories like ontologies and taxonomies, 2) large-scale information extraction from high-quality text sources such as Wikipedia, and 3) social tagging in the spirit of Web 2.0. I refer to the three directions as Semantic Web, Statistical Web, and Social Web (at the risk of some oversimplification), and I briefly characterize each of them.

- Invited Lectures | Pp. 12-13

Schema and Data Translation: A Personal Perspective

Paolo Atzeni

The problem of translating schemas and data form a model to another has been under the attention of database researchers for decades, but definitive solutions have not been reached.

Motivation for the problem comes from the variety of sources available in modern systems, which often use different approaches (and data models) for the organization of information.

The topic is discussed here by first setting the context with reference to the recent proposal for model management system, which considers an even wider set of requirements. Then, definitions are given for the problem of schema and data translation and for the related one concerning data exchange. Some side technical issues are then discussed: how schemas, models and mappings are described, and what is the relationship between source and target schemas in terms of information capacity. Finally, a specific proposal for data translation is discussed in some detail.

- Invited Lectures | Pp. 14-27

A Protocol Ontology for Inter-Organizational Workflow Coordination

Eric Andonoff; Wassim Bouaziz; Chihab Hanachi

As coordination is a central issue in Inter-Organizational Workflow (IOW), it is quite natural to model it as a specific entity. Moreover, the structure of the different IOW coordination problems is amenable to protocols. Hence, this paper show how these protocols could be modelled and made accessible to partners involved in an IOW. More precisely, the paper proposes a coordination protocol ontology for IOW and explains how workflow partners can dynamically select them. This solution eases the design and development of IOW systems by providing autonomous, reusable and extendable coordination components. This solution also supports semantic coordination through the use of the protocol ontology, and by making protocols shared resources exploitable in both design and execution steps.

- Activity Modeling | Pp. 28-40

Preventing Orphan Requests by Integrating Replication and Transactions

Heine Kolltveit; Svein-Olaf Hvasshovd

Replication is crucial to achieve high availability distributed systems. However, non-determinism introduces consistency problems between replicas. Transactions are very well suited to maintain consistency, and by integrating them with replication, support for non-deterministic execution in replicated environments can be achieved. This paper presents an approach where a passively replicated transaction manager is allowed to break replication transparency to abort orphan requests, thus handling non-determinism. A prototype implemented using existing open-source software, Jgroup/ARM and Jini, has been developed, and performance and failover tests have been executed. The results show that while this approach is possible, components specifically tuned for performance must be used to meet real-time requirements.

- Activity Modeling | Pp. 41-54

Discretization Numbers for Multiple-Instances Problem in Relational Database

Rayner Alfred; Dimitar Kazakov

Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multi-relational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.

- Activity Modeling | Pp. 55-65

Adaptive -Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors

Stefanos Ougiaroglou; Alexandros Nanopoulos; Apostolos N. Papadopoulos; Yannis Manolopoulos; Tatjana Welzer-Druzovec

Classification based on -nearest neighbors (NN classification) is one of the most widely used classification methods. The number of nearest neighbors used for achieving a high accuracy in classification is given in advance and is highly dependent on the data set used. If the size of data set is large, the sequential or binary search of NNs is inapplicable due to the increased computational costs. Therefore, indexing schemes are frequently used to speed-up the classification process. If the required number of nearest neighbors is high, the use of an index may not be adequate to achieve high performance. In this paper, we demonstrate that the execution of the nearest neighbor search algorithm can be interrupted if some criteria are satisfied. This way, a decision can be made without the computation of all nearest neighbors of a new object. Three different heuristics are studied towards enhancing the nearest neighbor algorithm with an early-break capability. These heuristics aim at: (i) reducing computation and I/O costs as much as possible, and (ii) maintaining classification accuracy at a high level. Experimental results based on real-life data sets illustrate the applicability of the proposed method in achieving better performance than existing methods.

- Classification | Pp. 66-82

Database Implementation of a Model-Free Classifier

Konstantinos Morfonios

Most methods proposed so far for classification of high-dimensional data are memory-based and obtain a model of the data classes through training before actually performing any classification. As a result, these methods are ineffective on (a) very large datasets stored in databases or data warehouses, (b) data whose partitioning into classes cannot be captured by global models and is sensitive to local characteristics, and (c) data that arrives continuously to the system with pre-classified and unclassified instances mutually interleaved and whose successful classification is sensitive to using the most complete and/or most up-to-date information. In this paper, we propose LOCUS, a scalable model-free classifier that overcomes these problems. LOCUS is based on ideas from pattern recognition and is shown to converge to the optimal Bayes classifier as the size of the datasets involved increases. Moreover, LOCUS is data-scalable and can be implemented using standard SQL over arbitrary database tables. To the best of our knowledge, LOCUS is the first classifier that combines all the characteristics above. We demonstrate the effectiveness of LOCUS through experiments over both real-world and synthetic datasets, comparing it against memory-based decision trees. The results indicate an overall superiority of LOCUS over decision trees on both classification accuracy and data sizes that it can handle.

- Classification | Pp. 83-97

Update Support for Database Views Via Cooperation

Stephen J. Hegner; Peggy Schmidt

Support for updates to views of database schemata is typically very limited; only those changes which can be represented entirely within the view, or changes which involve only generic changes outside of the view, are permitted. In this work, a different point of view towards the view-update problem is taken. If a proposed update cannot be performed within the view, then rather than rejecting it outright, the cooperation of other views is sought, so that in their combined environments the desired changes can be realized. This approach has not only the advantage that a wider range of updates are supported than is possible with more traditional approaches, but also that updates which require the combined access privileges of several users are supported.

- Design | Pp. 98-113

An Agile Process for the Creation of Conceptual Models from Content Descriptions

Sebastian Bossung; Hans-Werner Sehring; Henner Carl; Joachim W. Schmidt

It is widely accepted practice to build domain models as a conceptual basis for software systems. Normally, the conceptual schema cannot be supplied by domain experts but is constructed by modelling experts. However, this is infeasible in many cases, e.g., if the system is to be generated ad hoc from a conceptual schema.

This paper presents an iterative process that helps domain experts to create a conceptual schema without the need for a modelling expert. The process starts from a set of sample instances provided by the domain expert in a very simple form. The domain expert is assisted in consolidating the samples such that a coherent schema can be inferred from them. Feedback is given by generating a prototype system which is based on the schema and populated with the provided samples.

The process combines the following three aspects in a novel way: (1) it is based on a large amount of samples supplied by the domain expert, (2) it gives feedback by agile generation of a prototype system, and (3) it does not require a modelling expert nor does it assume modelling knowledge with the domain expert.

- Design | Pp. 114-129