Catálogo de publicaciones - libros

Compartir en
redes sociales

Data Management in Grids: First VLDB Workshop, DMG 2005, Trondheim, Norway, September 2-3, 2005, Revised Selected Papers

Jean-Marc Pierson (eds.)

En conferencia: 1º Workshop on Data Management in Grids (DMG) . Trondheim, Norway . September 2, 2005 - September 3, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Database Management; Computer Communication Networks; Information Storage and Retrieval; Information Systems Applications (incl. Internet); Multimedia Information Systems; User Interfaces and Human Computer Interaction

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2006	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-31212-3

ISBN electrónico

978-3-540-32452-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2006

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11611950_1

Globally Distributed Data

Reagan W. Moore

The management of globally distributed data is simplified through the use of data grids which enable data sharing environments. Data grids provide both the interoperability mechanisms needed to interact with legacy storage systems and legacy applications, as well as the logical name spaces needed to identify files, resources, and users. Data grids also provide support for consistent management of state information about each file within the distributed environment. The state information includes access controls, descriptive metadata, and administration metadata. These capabilities enable data virtualization, the ability to manage data independently of the chosen storage repositories. Applications that manage globally distributed data include data grid federations, distributed digital libraries, and distributed persistent archives.

Pp. 1-3

doi: 10.1007/11611950_2

XML Data Integration in OGSA Grids

Carmela Comito; Domenico Talia

Data integration is the flexible and managed federation, analysis, and processing of data from different distributed sources. Data integration is becoming as important as data mining for exploiting the value of large and distributed data sets that are available today. Distributed processing infrastructures such as Grids can be used for data integration on geographically distributed sites. This paper presents a framework for integrating heterogeneous XML data sources distributed among the nodes of a Grid. We propose a query reformulation algorithm to combine and query XML documents through a decentralized point-to-point mediation process among the different data sources based on schema mappings. The above cited XML integration formalism is exposed as a Grid Service within the GDIS architecture. GDIS is a service-based architecture for providing data integration in Grids using a decentralized approach. The underlying model of such architecture is discussed and we show how it fits the XMAP formalism/algorithm.

Pp. 4-15

doi: 10.1007/11611950_3

Towards Dynamic Information Integration

Jürgen Göres

To utilize the full potential of structured or semi-structured data stored across different information systems, users and applications must not be confronted directly with the individual, heterogeneous data sources, but instead be supplied with a customized integrated view on the data. Traditional information integration is relying on a human-driven process to accomplish this task. While feasible in static, closed-world scenarios, this approach fails in settings like the nascent data grids, which are characterized by a large, permanently changing set of autonomous data sources. We describe the end-to-end integration approach underlying our PALADIN project which aims to reduce and ultimately eliminate the dependency on human experts in the integration process in order to provide fast and cost-effective integration services for these dynamic environments.

Pp. 16-29

doi: 10.1007/11611950_4

Adapting to Changing Resource Performance in Grid Query Processing

Anastasios Gounaris; Jim Smith; Norman W. Paton; Rizos Sakellariou; Alvaro A. A. Fernandes; Paul Watson

The Grid provides facilities that support the coordinated use of diverse resources, and consequently, provides new opportunities for wide-area query processing. However, Grid resources, as well as being heterogeneous, may also exhibit unpredictable, volatile behaviour. Thus, query processing on the Grid needs to be adaptive, in order to cope with evolving resource characteristics, such as machine load. To address this challenge, an architecture is proposed that has been empirically evaluated over a prototype Grid-enabled adaptive query processor instantiating it.

Pp. 30-44

doi: 10.1007/11611950_5

An Adaptive Distributed Query Processing Grid Service

Fabio Porto; Vinícius F. V. da Silva; Márcio L. Dutra; Bruno Schulze

Grid services provide an important abstract layer on top of heterogeneous components (hardware and software) that take part into a grid environment. We are developing a data grid service prototype that aims at providing transparent use of grid resources to data intensive scientific applications. Our prototype was designed having as target three main issues: (1) dynamic scheduling and allocation of query execution engine modules into grid nodes; (2)adaptability of query execution to variations on environment conditions and (3) support to special scientific operations. We propose a new node scheduling algorithm and show how it can be integrated into a simple distributed and parallel query optimization strategy. Our implementation demonstrates a speedup of 16.6 with 18 scheduled nodes and a steady throughput rate, obtained applying a dynamic adaptive strategy.

Pp. 45-57

doi: 10.1007/11611950_6

Framework for Querying Distributed Objects Managed by a Grid Infrastructure

Ruslan Fomkin; Tore Risch

Queries over scientific data often imply expensive analyses of data requiring a lot of computational resources available in Grids. We are developing a customizable query processor built on top of an established Grid infrastructure, the NorduGrid middleware, and have implemented a framework for managing long running queries in Grid environment. With the framework the user does not specify the detailed job and parallelization descriptions required by NorduGrid. Instead s/he specifies queries in terms of an application-oriented schema describing contents of files managed by the Grid and accessed through wrappers. When a query is received by the system it generates NorduGrid job descriptions submitted to NorduGrid for execution. The framework considers limitations of NorduGrid. It includes a submission mechanism, a job babysitter, and a generic data exchange mechanism. The submission mechanism generates a number of jobs for parallel execution of a user query over wrapped data files. The task of the babysitter is to submit generated jobs to NorduGrid for the execution, to monitor their execution status, and to download results from the execution. The generic exchange mechanism provides a way to exchange objects through files between Grid execution nodes and user applications.

Pp. 58-70

doi: 10.1007/11611950_7

An Outline of the Global Grid Forum Data Access and Integration Service Specifications

Mario Antonioletti; Amy Krause; Norman W. Paton

Grid computing concerns itself with building the infrastructure to facilitate the sharing of computational and data resources to enable collaboration within virtual organisations. The Global Grid Forum (GGF) provides a framework for users, developers and vendors to come together to develop standards to ensure interoperability between middleware from different service providers. Central to this effort is the Open Grid Services Architecture (OGSA), and its associated specifications. These define consistent interfaces, generally couched as web services, and the components required to construct grid infrastructures. Both the web service and grid communities stand to benefit from the provision of consistent and agreed web service interfaces for data resources and the systems that manage them. This paper describes, motivates and presents the context for the work that has been undertaken by the GGF Data Access and Integration Services Working Group (DAIS-WG). The group has defined a set of data access and integration interfaces that are consistent with the OGSA vision. A brief overview of the current family of DAIS specifications is given: WS-DAI specifies a collection of generic data resource properties and messages that are specialised by WS-DAIR and WS-DAIX for use with relational and XML data resources, respectively. The WS-DAI specifications can be applied in regular web services environments or as part of a grid fabric.

Pp. 71-84

doi: 10.1007/11611950_8

File Caching in Data Intensive Scientific Applications on Data-Grids

Ekow Otoo; Doron Rotem; Alexandru Romosan; Sridhar Seshadri

We present some theoretical and experimental results of an important caching problem which arises frequently in data intensive scientific applications that are run in data-grids. Such applications often need to process several files simultaneously, i.e., the application runs only if all its needed files are present in some disk cache accessible to the compute resource of the application. The set of files requested by an application, all of which must be in cache for the application to run, is called a This requirement introduces the need for cache replacement algorithms that are based on file-bundles rather then individual files. We show that traditional caching algorithms such as and are not optimal in this case since they are not sensitive to file-bundles and may hold in the cache non-relevant combinations of files. We propose and analyze a new cache replacement algorithm specifically adapted to deal with file-bundles. Results of experimental studies of the new algorithm, using a disk cache simulation model under a wide range of conditions such as file request distributions, relative cache size, file size distribution, and incoming job queue size, show significant improvement over traditional caching algorithms such as GDS.

Pp. 85-99

doi: 10.1007/11611950_9

RRS: Replica Registration Service for Data Grids

Arie Shoshani; Alex Sim; Kurt Stockinger

Over the last few years various scientific experiments and Grid projects have developed different catalogs for keeping track of their data files. Some projects use specialized file catalogs, others use distributed replica catalogs to reference files at different locations. Due to this diversity of catalogs, it is very hard to manage files across Grid projects, or to replace one catalog with another.

In this paper we introduce a new Grid service called the Replica Registration Service (RRS). It can be thought of as an abstraction of the concepts for registering files and their replicas. In addition to traditional single file registration operations, the RRS supports collective file registration requests and keeps persistent registration queues. This approach is of particular importance for large-scale usage where thousands of files are copied and registered. Moreover, the RRS supports a set of error directives that are triggered in case of registration failures. Our goal is to provide a single uniform interface for various file catalogs to support the registration of files across multiple Grid projects, and to make Grid clients oblivious to the specific catalog used.

Pp. 100-112

doi: 10.1007/11611950_10

Datagridflows: Managing Long-Run Processes on Datagrids

Arun Jagatheesan; Jonathan Weinberg; Reena Mathew; Allen Ding; Erik Vandekieft; Daniel Moore; Reagan Moore; Lucas Gilbert; Mark Tran; Jeffrey Kuramoto

This paper is an introduction to . Until recently, datagrids were generally considered over-hyped and the associated technologies not widely embraced in the academic community. Today, datagrids have become a reality and an important technology for managing large, unstructured data and storage resources distributed over autonomous administrative domains. The datagrids that are operating in production provide us an idea of new requirements and challenges that will be faced in future datagrid environments. One such requirement is the coordinated execution of long-run data management processes in datagrids. We term these processes as “datagridflows”. This new area provides exciting opportunities and challenges to researchers in distributed computing and distributed databases. This paper is intended to introduce these challenges to other researchers, including those new to grid computing. We provide motivation through discussion of datagridflow requirements and real production scenarios. We introduce current work on datagridflow technologies including the for describing datagridflows in datagrids.

Pp. 113-128