Catálogo de publicaciones - libros

Compartir en
redes sociales


Dependable Computing: Second Latin-American Symposium, LADC 2005, Salvador, Brazil, October 25-28, 2005, Proceedings

Carlos Alberto Maziero ; João Gabriel Silva ; Aline Maria Santos Andrade ; Flávio Morais de Assis Silva (eds.)

En conferencia: 2º Latin-American Symposium on Dependable Computing (LADC) . Salvador de Bahia, Brazil . October 25, 2005 - October 28, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Theory of Computation; Special Purpose and Application-Based Systems; System Performance and Evaluation; Software Engineering; Logic Design; Coding and Information Theory

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29572-3

ISBN electrónico

978-3-540-32092-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Soft Error Mitigation in Cache Memories of Embedded Systems by Means of a Protected Scheme

Hamid R. Zarandi; Seyed Ghassem Miremadi

The size and speed of SRAM caches of embedded systems are increasing in response to demands for higher performance. However, the SRAM caches are vulnerable to soft errors originated from energetic nuclear particles or electrical sources. This paper proposes a new protected cache scheme, which provides high performance as well as high fault detection coverage. In this scheme, the cache space is divided into sets of different sizes. Here, the length of tag fields associated to each set is unique and is different from the other sets. The other remained bits of tags are used for protecting the tag using a fault detection scheme e.g., generalized parity. This leads to protect the cache without compromising performance and area with respect to the similar one, fully associative cache. The results obtained from simulating some standard trace files reveal that the proposed scheme exhibits a performance near to fully associative cache but achieves a considerable fault detection coverage which is suitable to be used in the dependable computing.

- Embedded Systems | Pp. 121-130

On the Effects of Errors During Boot

Mário Zenha-Rela; João Carlos Cunha; Carlos Bruno Silva; Luís Ferreira da Silva

We present the results of injecting errors during the boot phase of an embedded real-time system based on the ERC32 space processor. In this phase the hardware is initialized, and the processor executes the boot loader followed by kernel initialization. For this reason most system support is not yet available and traditional fault-injection techniques such as cannot be used. Thus our study was based in the processor’s IEEE 1149.1 (boundary-scan) infrastructure through which we injected about 5000 double bit-flip errors. The observations show that such system will either crash(25%) or execute correctly(75%), since only 2 errors eventually lead to the output of wrong results. However about 10% of faults originated latent errors dormant in memory. We also provide some suggestions on what can be done to increase robustness during this system state, in which most fault-tolerance techniques are not yet setup.

- Embedded Systems | Pp. 131-142

A Fault Tolerant Approach to Object Oriented Design and Synthesis of Embedded Systems

M. Fazeli; R. Farivar; S. Hessabi; S. G. Miremadi

The ODYSSEY design methodology has been recently introduced as a viable solution to the increasing design complexity problem in the ASICs. It is an object-oriented design methodology which models a system in terms of its constituting objects and their corresponding method calls. Some of these methods are implemented in hardware; others are simply executed by a general purpose processor. One fundamental element of this methodology is a network on chip that implements method invocation for hardware-based method calls. However this network is prone to faults, thus errors on it may result into system failure.

In this paper an architectural fault-tolerance enhancement to the ODYSSEY design methodology is proposed which covers this problem. It detects and corrects all single event upset errors on the network, and detects all permanent ones. The proposed enhancement is modeled analytically and then simulated. The simulation results, while validating the analytical model, show very low network performance overhead.

- Embedded Systems | Pp. 143-153

Scheduling Fixed-Priority Hard Real-Time Tasks in the Presence of Faults

George Lima; Alan Burns

We describe an approach to scheduling hard real-time tasks taking into account fault scenarios. All tasks are scheduled at run-time according to their fixed priorities, which are determined off-line. Upon error-detection, special tasks are released to perform error-recovery actions. We allow error-recovery actions to be executed at higher priority levels so that the fault resilience of the task set can be increased. To do so, we extend the well known response time analysis technique and describe a non-standard priority assignment policy. Results from simulation indicate that the fault resilience of the task sets can be significantly increased by using the proposed approach.

- Time | Pp. 154-173

On the Monitoring Period for Fault-Tolerant Sensor Networks

Filipe Araújo; Luís Rodrigues

Connectivity of a sensor network depends critically on tolerance to node failures. Nodes may fail due to several reasons, including energy exhaustion, material fatigue, environmental hazards or deliberate attacks. Although most routing algorithms for sensor networks have the ability to circumvent zones where nodes have crashed, if too many nodes fail the network may become disconnected.

A sensible strategy for increasing the dependability of a sensor network consists in deploying more nodes than strictly necessary, to replace crashed nodes. Spare nodes that are not fundamental for routing or sensing may go to sleep. To ensure proper operation of the sensor network, sleeping nodes should monitor active nodes frequently. If crashed nodes are not replaced, messages follow sub-optimal routes (which are energy inefficient) and, furthermore, the network may eventually become partitioned due to the effect of accumulated crashes. On the other hand, to save the energy, nodes should remain sleeping as much as possible. In fact, if the energy consumed with the monitoring process is too high, spare nodes may exhaust their batteries (and the batteries of active nodes) before they are needed.

This paper studies the optimal monitoring period in fault-tolerant sensor networks to ensure that: the network remains connected (i.e., crashed nodes are detected and substituted fast enough to avoid the network partition) and, the lifetime of the network is maximized (i.e., inactive nodes save as much battery as possible).

- Time | Pp. 174-190

Adapting Failure Detectors to Communication Network Load Fluctuations Using SNMP and Artificial Neural Nets

Fábio Lima; Raimundo Macêdo

A failure detector is an important building block for fault-tolerant distributed computing: mechanisms such as distributed consensus and group communication rely on the information provided by failure detectors in order to make progress and terminate. As such, erroneous information provided by the failure detector (or the absence of it) may delay decision-making or lead the upper-layer fault-tolerant mechanism to take incorrect decisions (e.g., the exclution of a correct process from a group membership). On the other hand, the implementation of failure detectors that can precisely identify failures is restricted by the actual behaviour of a system, especially in settings where message transmission delays and system loads can vary over time. In this paper we explore the use of artificial neural networks in order to implement failure detectors that are dynamically adapted to the current communication load conditions. The training patterns used to feed the neural network were obtained by using Simple Network Management Protocol (SNMP) agents over MIB – Management Information Base variables. The output of such neural network is an estimation for the arrival time for the failure detector to receive the next heartbeat message from a remote process. The suggested approach was fully implemented and tested over a set of GNU/Linux networked workstations. In order to analyze the efficiency of our approach, we have run a series of experiments where network loads were varied randomly, and we measured several QoS parameters, comparing our detector against known implementations. The performance data collected indicate that neural networks and MIB variables can indeed be combined to improve the QoS of failure detectors.

- Time | Pp. 191-205

Parsimony-Based Approach for Obtaining Resource-Efficient and Trustworthy Execution

HariGovind V. Ramasamy; Adnan Agbaria; William H. Sanders

We propose a resource-efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well-suited for services in which request processing is resource-intensive. Previous efforts took a failure-masking approach of using all 2 + 1 execution replicas to execute all requests, where is the maximum number of failures tolerated. We describe an asynchronous execution protocol that combines failure masking with imperfect failure detection and checkpointing. Our protocol is parsimony-based since it uses only + 1 execution replicas, called the primary committee or , to execute the requests normally. Under normal conditions, characterized by a stable network and no misbehavior by replicas, our approach enables a trustworthy reply to be obtained with the same latency as in the all-active approach, but with only about half of the overall resource use of the all-active approach. However, a request that exposes faults among the replicas will incur a higher latency than the all-active approach mainly due to fault detection latency. Under such conditions, the protocol switches to a recovery mode, in which all 2 + 1 replicas execute the request and send their replies. Then, after selecting a new , the request latency returns to the same level as that of all-active execution. Practical observations point to the fact that failures and instability are the exception rather than the norm. That motivated our decision to optimize resource efficiency for the common case, even if it means paying a slightly higher performance cost during periods of instability.

- Distributed Systems Algorithms | Pp. 206-225

Generating Fast Atomic Commit from Hyperfast Consensus

Fabíola Gonçalves Pereira Greve; Jean-Pierre Le Narzul

This work introduces a highly modular derivation of fast non-blocking atomic commit protocols. Modularity is achieved by the use of consensus protocols as completely independent services. Fast decision is obtained by the use of consensus protocols that decide in one communication step in good scenarios. Two original non-blocking atomic commit protocols are presented. One of the presented protocols outperforms existing equivalent solutions that are based on the use of failure detectors. In the presence of a low resiliency rate, ≤ 1, it behaves as the classical and , exhibiting the same message complexities. In the general case, when one considers the number of tolerated crashes as < /2, it exhibits a complexity of 2 + 3 point to point messages. The best known algorithm exhibits a complexity of 4 + 3 point to point messages.

- Distributed Systems Algorithms | Pp. 226-244

Group-Based Replication of On-Line Transaction Processing Servers

A. Correia; A. Sousa; L. Soares; J. Pereira; F. Moura; R. Oliveira

Several techniques for database replication using group communication have recently been proposed, namely, the Database State Machine, Postgres-R, and the NODO protocol. Although all rely on a totally ordered multicast for consistency, they differ substantially on how multicast is used. This results in different performance trade-offs which are hard to compare as each protocol is presented using a different load scenario and evaluation method.

In this paper we evaluate the suitability of such protocols for replication of On-Line Transaction Processing (OLTP) applications in clusters of servers and over wide area networks. This is achieved by implementing them using a common infra-structure and by using a standard workload. The results allows us to select the best protocol regarding performance and scalability in a demanding but realistic usage scenario.

- Distributed Systems Algorithms | Pp. 245-260

Third Workshop on Theses and Dissertations on Dependable Computing

Avelino Zorzo; Ingrid Jansch-Pôrto; Fabíola Gonçalves Pereira Greve

The Workshop on Theses and Dissertations on Dependable Computing is a student forum for bringing together graduate students that research on topics related to dependable computing. The aim of this meeting is to present and discuss the proposed contribution, preliminary results and possible directions for their research. The previous editions of this Workshop were held in Florianópolis in conjunction with the Brazilian Symposium on Fault Tolerance (SCTF 2001), and in 2003 in São Paulo with the Latin-American Symposium on Dependable Computing (LADC 2003).

- Workshops | Pp. 261-261