Catálogo de publicaciones - libros

Compartir en
redes sociales


Computational Science-ICCS 2005: 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part II

Vaidy S. Sunderam ; Geert Dick van Albada ; Peter M. A. Sloot ; Jack J. Dongarra (eds.)

En conferencia: 5º International Conference on Computational Science (ICCS) . Atlanta, GA, USA . May 22, 2005 - May 25, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26043-1

ISBN electrónico

978-3-540-32114-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Profiling and Searching for RNA Pseudoknot Structures in Genomes

Chunmei Liu; Yinglei Song; Russell L. Malmberg; Liming Cai

A new method is developed that can profile and efficiently search for pseudoknot structures in noncoding RNA genes. It profiles interleaving stems in pseudoknot structures with independent Covariance Model (CM) components. The statistical alignment score for searching is obtained by combining the alignment scores from all CM components. Our experiments show that the model can achieve excellent accuracy on both random and biological data. The efficiency achieved by the method makes it possible to search for the pseudoknot structures in genomes of a variety of organisms.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 968-975

Integrating Text Chunking with Mixture Hidden Markov Models for Effective Biomedical Information Extraction

Min Song; Il-Yeol Song; Xiaohua Hu; Robert B. Allen

This paper presents a new information extraction (IE) technique, KXtractor, which integrates a text chunking technique with Mixture Hidden Markov Models (MiHMM). KXtractor is differentiated from other approaches in that (a) it overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. By incorporating sentence structures into the learned models, KXtractor provides better extraction accuracy than the single POS HMMs do. (b) It resolves the issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, F-measure for KXtractor was higher than ones for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 976-984

-Recombination Haplotype Inference in Pedigrees

Francis Y. L. Chin; Qiangfeng Zhang; Hong Shen

Haplotyping under the Mendelian law of inheritance on pedigree genotype data is studied. Because genetic recombinations are rare, research has focused on Minimum Recombination Haplotype Inference (MRHI), i.e. finding the haplotype configuration consistent with the genotype data having the minimum number of recombinations. We focus here on the more realistic k-MRHI, which has the additional constraint that the number of recombinations on each parent-offspring pair is at most k.

Although k-MRHI is NP-hard even for k = 1, we give an algorithm to solve k-MRHI efficiently by dynamic programming in O(nm03k+12m0) time on pedigrees with n nodes and at most m0 heterozygous loci in each node. Experiments on real and simulated data show that, in most cases, our algorithm gives the same haplotyping results but runs much faster than other popular algorithms.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 985-993

Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays

Ion I. Măndoiu; Claudia Prăjescu; Dragoş Trincă

In this paper we address two optimization problems arising in the design of genomic assays based on universal tag arrays. First, we address the universal array tag set design problem. For this problem, we extend previous formulations to incorporate antitag-to-antitag hybridization constraints in addition to constraints on antitag-to-tag hybridization specificity, establish a constructive upper bound on the maximum number of tags satisfying the extended constraints, and propose a simple greedy tag selection algorithm. Second, we give methods for improving the multiplexing rate in large-scale genomic assays by combining primer selection with tag assignment. Experimental results on simulated data show that this integrated optimization leads to reductions of up to 50% in the number of required arrays.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 994-1002

A Parallel Implementation for Determining Genomic Distances Under Deletion and Insertion

Vijaya Smitha Kolli; Hui Liu; Michelle Hong Pan; Yi Pan

As the need for comparing genomes of different species has grown dramatically with the fast progress of the Human Genome Project, the evolution at the level of whole genomes has attracted more and more attention from both biologists and computer scientists. They are especially interested in the scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. Marron et al proposed a polynomial-time approximation algorithm to compute (near) minimum edit distances under inversions, deletions, and unrestricted insertions. Our work is based on their algorithm, which carries out lots of comparisons and sorting to calculate the edit distance. These comparisons and sorting are extremely time-consuming, and they result in decrease of computational efficiency. We believe the time of the algorithm can be improved through parallelization. We parallelize their algorithm via OpenMP using Intel C++ compiler for Linux 7.1, and compare three levels of parallelism: coarse grain, fine grain and combination of both. The experiments are conducted for a varying number of threads and different lengths of the gene sequences. The experimental results show that either coarse grain parallelism or fine grain parallelism alone does not improve the performance of the algorithm very much. However, the use of combination of both fine grain and coarse grain parallelism improves the performance of the algorithm drastically.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 1003-1010

Phasing and Missing Data Recovery in Family Trios

Dumitru Brinza; Jingwu He; Weidong Mao; Alexander Zelikovsky

Although there exist many phasing methods for unrelated adults or pedigrees, phasing and missing data recovery for data representing family trios is lagging behind. This paper is an attempt to fill this gap by considering the following problem. Given a set of genotypes partitioned into family trios, find for each trio a quartet of parent haplotypes which agree with all three genotypes and recover the SNP values missed in given genotype data. Our contributions include (i) formulating the pure-parsimony trio phasing and the trio missing data recovery problems, (ii) proposing two new greedy and integer linear programming based solution methods, and (iii)extensive experimental validation of proposed methods showing advantage over the previously known methods.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 1011-1019

Highly Scalable Algorithms for Robust String Barcoding

B. DasGupta; K. M. Konwar; I. I. Măndoiu; A. A. Shvartsman

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 1020-1028

Virtual Gene: A Gene Selection Algorithm for Sample Classification on Microarray Datasets

Xian Xu; Aidong Zhang

Gene Selection is one class of most used data analysis algorithms on microarray dataset. The goal of gene selection algorithms is to filter out a small set of informative genes that best explains experimental variations. Traditional gene selection algorithms are mostly single-gene based. Some discriminative scores are calculated and sorted for each gene. Top ranked genes are then selected as informative genes for further study. Such algorithms ignore completely correlations between genes, although such correlations is widely known. Genes interact with each other through various pathways and regulative networks. In this paper, we propose to use, instead of ignoring, such correlations for gene selection. Experiments performed on three public available datasets show promising results.

2005 - International Workshop on Bioinformatics Research and Applications | Pp. 1038-1045

Fast Expression Templates

Jochen Härdtlein; Alexander Linke; Christoph Pflaum

Expression templates (ET) can significantly reduce the implementation effort of mathematical software. For some compilers, especially for those of supercomputers, however, it can be observed that classical ET implementations do not deliver the expected performance. This is because aliasing of pointers in combination with the complicated ET constructs becomes much more difficult. Therefore, we introduced the concept of enumerated variables, which are provided with an additional integer template parameter. Based on this new implementation of ET we obtain a C++ code whose performance is very close to the handcrafted C code. The performance results of these so-called are presented for the Hitachi SR8000 supercomputer and the NEC SX6, both with automatic vectorization and parallelization. Additionally we studied the combination of and OpenMP on a high performance Opteron cluster.

- Workshop on “Programming Grids and Metacomputing Systems – PGaMS2005” | Pp. 1055-1063

H2O Metacomputing – Jini Lookup and Discovery

Dirk Gorissen; Gunther Stuer; Kurt Vanmechelen; Jan Broeckhove

Because of its inter-organisational, collaborative use of computational resources, grid computing presents a severe interoperability challenge to grid application developers. Different middleware technologies need to be bridged in order to fully utilise the power the grid provides. This paper describes a bridge between two such middlewares: The H2O Metacomputing Framework and Jini technology. The paper details how H2O resources may be registered, discovered and used as Jini services. Both technologies are introduced, design decisions discussed and a fully functional implementation presented.

- Workshop on “Programming Grids and Metacomputing Systems – PGaMS2005” | Pp. 1072-1079