Catálogo de publicaciones - libros
Combinatorial Pattern Matching: 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005, Proceedings
Alberto Apostolico ; Maxime Crochemore ; Kunsoo Park (eds.)
En conferencia: 16º Annual Symposium on Combinatorial Pattern Matching (CPM) . Jeju Island, South Korea . June 19, 2005 - June 22, 2005
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Algorithm Analysis and Problem Complexity; Data Structures; Information Storage and Retrieval; Document Preparation and Text Processing; Pattern Recognition; Bioinformatics
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2005 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-26201-5
ISBN electrónico
978-3-540-31562-9
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2005
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2005
Cobertura temática
Tabla de contenidos
doi: 10.1007/11496656_31
Assessing the Significance of Sets of Words
Valentina Boeva; Julien Clément; Mireille Régnier; Mathias Vandenbogaert
Various criteria have been defined to evaluate the significance of sets of words, the computation of them often being difficult. We provide explicit expressions for the waiting time in such a context. In order to assess the significance of a cluster of potential binding sites, we extend them to the co-occurrence problem. We point out that these criteria values depend on a few fundamental parameters. We provide efficient algorithms to compute them, that rely on a combinatorial interpretation of the formulae. We show that our results are very tight in the so-called twilight zone and improve on previous rough approximations. One assumes that the text is generated according to a Markov stationary process. These results are developed for an extended model of consensus.
Pp. 358-370
doi: 10.1007/11496656_32
Inferring a Graph from Path Frequency
Tatsuya Akutsu; Daiji Fukagawa
We consider the problem of inferring a graph (and a sequence) from the numbers of occurrences of vertex-labeled paths, which is closely related to the pre-image problem for graphs in machine learning: to reconstruct a graph from its feature space representation. We show that this problem can be solved in polynomial time in the size of an output graph if graphs are trees of bounded degree and the lengths of given paths are bounded by a constant. On the other hand, we show that this problem is strongly NP-hard even for planar graphs of bounded degree.
Pp. 371-382
doi: 10.1007/11496656_33
Exact and Approximation Algorithms for DNA Tag Set Design
Ion I. Măndoiu; Dragoş Trincă
In this paper we propose new solution methods for designing tag sets for use in universal DNA arrays. First, we give integer linear programming formulations for two previous formalizations of the tag set design problem, and show that these formulations can be solved to optimality for instance sizes of practical interest by using general purpose optimization packages. Second, we note the benefits of periodic tags, and establish an interesting connection between the tag design problem and the problem of packing the maximum number of vertex-disjoint directed cycles in a given graph. We show that combining a simple greedy cycle packing algorithm with a previously proposed alphabetic tree search strategy yields an increase of over 40% in the number of tags compared to previous methods.
Pp. 383-393
doi: 10.1007/11496656_34
Parametric Analysis for Ungapped Markov Models of Evolution
David Fernández-Baca; Balaji Venkatachalam
We present efficient sensitivity-analysis algorithms for two problems involving Markov models of sequence evolution: ancestral reconstruction in evolutionary trees and local ungapped alignment under log-odds scoring. Our algorithms generate complete descriptions of the optimum solutions for all possible values of the evolutionary distance. The running time for the parametric ancestral reconstruction problem under the Kimura 2-parameter model is ( + log ), where is the number of sequences and is their length, assuming all edges have the same length. For the parametric gapless alignment problem under the Jukes-Cantor model, the running time is ( + log ), where and are the sequence lengths and ≤ .
Pp. 394-405
doi: 10.1007/11496656_35
Linear Programming for Phylogenetic Reconstruction Based on Gene Rearrangements
Jijun Tang; Bernard M. E. Moret
Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite , is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality (in the spirit of GESTALT). Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5% up to very large evolutionary distances, consistently beating the baseline Neighbor-Joining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events.
Pp. 406-416
doi: 10.1007/11496656_36
Identifying Similar Surface Patches on Proteins Using a Spin-Image Surface Representation
Mary Ellen Bock; Guido M. Cortelazzo; Carlo Ferrari; Concettina Guerra
We apply a spin image representation for 3D objects used in computer vision to the problem of comparing protein surfaces. Due to the irregularities of the protein surfaces, this is a much more complex problem than comparing regular and smooth surfaces. The spin images capture local features in a way that is useful for finding related active sites on the surface of two proteins. They reduce the three-dimensional local information to two dimensions which is a significant computational advantage.
We try to find a collection of pairs of points on the two proteins such that the corresponding members of the pairs for one of the proteins form a surface patch for which the corresponding spin images are a “match”. Preliminary results are presented which demonstrate the feasibility of the method.
Pp. 417-428
doi: 10.1007/11496656_37
Mass Spectra Alignments and Their Significance
Sebastian Böcker; Hans-Michael Kaltenbach
Mass Spectrometry has become one of the most popular analysis techniques in Genomics and Systems Biology. We investigate a general framework that allows the alignment (or matching) of any two mass spectra. In particular, we examine the alignment of a reference mass spectrum generated from a database, with a measured sample mass spectrum. In this context, we assess the significance of alignment scores for character-specific cleavage experiments, such as tryptic digestion of amino acids. We present an efficient approach to estimate this significance, with runtime linear in the number of detected peaks. In this context, we investigate the probability that a random string over a weighted alphabet contains a substring of some given weight.
Pp. 429-441