Catálogo de publicaciones - libros

Compartir en
redes sociales


Service Availability: Third International Service Availability Symposium, ISAS 2006, Helsinki, Finland, May 15-16, 2006, Revised Selected Papers

Dave Penkler ; Manfred Reitenspiess ; Francis Tam (eds.)

En conferencia: 3º International Service Availability Symposium (ISAS) . Helsinki, Finland . May 15, 2006 - May 16, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Computer Communication Networks; Information Systems Applications (incl. Internet); Information Storage and Retrieval; Software Engineering; Management of Computing and Information Systems

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-68724-5

ISBN electrónico

978-3-540-68725-2

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Model Based Approach for Autonomic Availability Management

Kesari Mishra; Kishor S. Trivedi

As increasingly complex computer systems have started playing a controlling role in all aspects of modern life, system availability and associated downtime of technical systems have acquired critical importance. Losses due to system downtime have risen manifold and become wide-ranging. Even though the component level availability of hardware and software has increased considerably, system wide availability still needs improvement as the heterogeneity of components and the complexity of interconnections has gone up considerably too. As systems become more interconnected and diverse, architects are less able to anticipate and design for every interaction among components, leaving such issues to be dealt with at runtime. Therefore, in this paper, we propose an approach for autonomic management of system availability, which provides real-time evaluation, monitoring and management of the availability of systems in critical applications. A hybrid approach is used where analytic models provide the behavioral abstraction of components/subsystems, their interconnections and dependencies and statistical inference is applied on the data from real time monitoring of those components and subsystems, to parameterize the system availability model. The model is solved online (that is, in real time) so that at any instant of time, both the point as well as the interval estimates of the overall system availability are obtained by propagating the point and the interval estimates of each of the input parameters, through the system model. The online monitoring and estimation of system availability can then lead to adaptive online control of system availability.

- Availability Modeling, Estimation and Analysis | Pp. 1-16

Analysis of a Service Degradation Model with Preventive Rejuvenation

Hiroyuki Eto; Tadashi Dohi

The preventive maintenance is very useful to improve effectively the service availability for software systems with service degradation. In this paper, we present a stochastic model to describe an operational software, which consists of one operating system and multiple applications and provides a service in continuous time. Two kinds of maintenance strategies are taken: reconfiguration of applications as a corrective maintenance and preventive rejuvenation of an operating system. We derive the optimal preventive rejuvenation schedule maximizing the steady-state service availability in the framework of semi-Markov decision process and study analytically the optimality structure on it. We give a simple numerical example to determine the condition-based optimal rejuvenation schedule via the decision table.

- Availability Modeling, Estimation and Analysis | Pp. 17-29

Estimating SLAs Availability/Reliability in Multi-services IP Networks

Saida Benlarbi

Multi-Services IP Networks are being required to deliver unprecedented high volumes of diverse traffic that span two ends of the reliability requirements spectrum. On one end of the spectrum real-time services such as voice and real time TV require high sensitivity to delays and jitter. On the other end of the spectrum best effort services such as data content delivery services requires zero traffic losses and traffic integrity. This paper focuses on investigating the challenges of measuring the end to end service availability given the different layers of resilience in the hierarchical network architecture. It proposes a layered approach to modeling and measurement of multi-services IP networks from which composite availability/reliability estimation models combining convoluted levels of fault resilience can be derived. The resulting models can be readily used to show a Service Level Agreement reliability measure is met from both the service provider and the end user standpoints.

- Availability Modeling, Estimation and Analysis | Pp. 30-42

Making Services Fault Tolerant

Pat Pik Wah Chan; Michael R. Lyu; Miroslaw Malek

With ever growing use of Internet, Web services become increasingly popular and their growth rate surpasses even the most optimistic predictions. Services are self-descriptive, self-contained, platform-independent and openly-available components that interact over the network. They are written strictly according to open specifications and/or standards and provide important and often critical functions for many business-to-business systems. Failures causing either service downtime or producing invalid results in such systems may range from a mere inconvenience to significant monetary penalties or even loss of human lives. In applications where sensing and control of machines and other devices take place via services, making the services highly dependable is one of main critical goals. Currently, there is no experimental investigation to evaluate the reliability and availability of Web services systems. In this paper, we identify parameters impacting the Web services dependability, describe the methods of dependability enhancement by redundancy in space and redundancy in time and perform a series of experiments to evaluate the availability of Web services. To increase the availability of the Web service, we use several replication schemes and compare them with a single service. The Web services are coordinated by a replication manager. The replication algorithm and the detailed system configuration are described in this paper.

- Availability Modeling, Estimation and Analysis | Pp. 43-61

Performability Analysis of Storage Systems in Practice: Methodology and Tools

Hairong Sun; Tina Tyan; Steven Johnson; Richard Elling; Nisha Talagala; Robert B. Wood

This paper presents a methodology and tools used for performability analysis of storage systems in Sun Microsystems. A Markov modeling tool is used to evaluate the probabilities of normal and fault states in the storage system, based on field reliability data collected from customer sites. Fault injection tests are conducted to measure the performance of the storage system in various degraded states with a performance benchmark developed within Sun Microsystems. A graphic metric is introduced for performability assessment and comparison. An example is used throughout the paper to illustrate the methodology and process.

- Availability Modeling, Estimation and Analysis | Pp. 62-75

Using Web Service Transformations to Implement Cooperative Fault Tolerance

Toshiyuki Moritsu; Matti A. Hiltunen; Richard D. Schlichting; Junichi Toyouchi; Yasuharu Namba

Developing techniques to increase the availability of web services in the event of failure has become increasingly important given their key role in providing access to online information, financial, and retail resources. This paper describes an approach to improving availability by using failover between similar but not identical services, and the use of cooperative fault tolerance between the providers of these services. With this approach, a similar service can be used as a backup, with the protocol and service differences between the two services masked by the use of transformation web services that are generated semi-automatically. The basic idea of cooperative fault-tolerance using similar services is presented based on an example involving two stock broker services. The software architecture and the process for generating the transformation web services using a code generation tool are also described, along with experimental results from the stock broker example. These results suggest that the transformation overhead is modest compared with the typical cost of communication.

- Dependability Techniques and Their Applications | Pp. 76-91

Reducing the Recovery Time of IP-Phones in an H.323 Based VoIP System

Sachin Garg; Chandra Kintala; David Stott

In large deployments of H.323 based Voice-over-IP (VoIP) systems, achieving the desired availability is a major challenge. A major factor in determining the availability is the resilience of the recovery mechanisms in the H.323 protocol suite against server and network failures. In this paper, we focus on the “registration” aspect of the H.225 protocol in H.323 suite. Specifically, we tackle the registration-flood problem which occurs after a server or network failure, when more IP Phones attempt to register with the VoIP server than the server can handle. The most significant ramification of overload is longer registration times resulting in lower overall availability of the VoIP system. Existing solutions to mitigate registration-floods are either server centric or network centric. In this paper, we propose a complementary end-point based technique using random back-off. Discrete event simulation based evaluation shows that the proposed technique can yield significant reduction in the recovery time thereby increasing service availability. We also compare the performance of existing solutions with the proposed technique, particularly the relative effect of network delay and loss on the performance of the techniques.

- Dependability Techniques and Their Applications | Pp. 92-105

Hardware Instruction Counting for Log-Based Rollback Recovery on x86-Family Processors

Daniel Stodden; Hubert Eichner; Max Walter; Carsten Trinitis

Log-based recovery protocols enable process replicas in distributed systems to replay a computation up to the point where a previous computation failed. One fundamental assumption underlying these protocols is the piecewise deterministic (PWD) execution model, stating that recovery must not execute, but simulate the execution of nondeterministic events in order to maintain consistency.

One such source of nondeterminism are asynchronous events triggering software signal handlers, an issue known to be solved by instruction counters. Efficient implementations in software have been shown to be practical, but require significant changes to applications and system software. Hardware counters, in contrast, allow running software unmodified. A number of processors implementing the Intel x86 instruction set architecture provide monitoring registers with properties similar to a true instruction counter.

Designed for application profiling, these facilities reveal a number issues to be resolved when utilized for applications like the PWD model, which demands for a maximum in precision during replay. We discuss some of the most prominent problems faced when using performance counters for protocols satisfying the PWD model. We present additional hardware mechanisms, eliminating inconsistencies in counter interrupt delivery, based on standard processor debugging facilities, and at the expense of a small number of additionally generated exceptions.

- Dependability Techniques and Their Applications | Pp. 106-119

Improving Robustness Testing of COTS OS Extensions

Constantin Sârbu; Andréas Johansson; Falk Fraikin; Neeraj Suri

Operating systems (OS) are increasingly geared towards support of diverse peripheral components, both hardware (HW) and software (SW), rather than explicitly focused on increased reliability of delivered OS services. The interface between the OS and the HW devices is provided by device drivers. Furthermore, drivers have become add-on COTS components to support the OS’s capabilities of widespread device support. Unfortunately, drivers constitute a major cause of system outages, impacting overall service reliability. Consequently, the testing of drivers becomes important. However, despite the efforts to develop appropriate testing methods, the multitude of possible system configurations and lack of detailed OS specifications makes the task difficult. Not requiring access to OS source code, this paper develops novel, non-intrusive support for test methods, based on ascertaining test progress from a driver’s operational state model. This approach complements existing schemes, enhancing the level of accuracy of the test process by providing test location guidance.

- Dependability Techniques and Their Applications | Pp. 120-139

Transparent Checkpointing for Applications with Graphical User Interfaces

Jan-Thomas Czornack; Carsten Trinitis; Max Walter

Transparent checkpointing is a well known method to increase the dependability of long running applications. However, most known implementations concentrate on applications that do not use graphical user interfaces.

In this paper we describe common problems arising with transparent checkpointing of applications including their graphical user interfaces. We present a proxy that is able to store the window session of an application and compare our approach with an existing X-Server extension that serves the same purpose.

We also discuss the performance impact of both solutions and present performance and latency measurements that demonstrate the usability of the proxy.

Service infrastructures.

- Dependability Techniques and Their Applications | Pp. 140-148