Catálogo de publicaciones - libros

Compartir en
redes sociales


Combinatorial Pattern Matching: 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings

Moshe Lewenstein ; Gabriel Valiente (eds.)

En conferencia: 17º Annual Symposium on Combinatorial Pattern Matching (CPM) . Barcelona, Spain . July 5, 2006 - July 7, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Pattern Recognition; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing; Information Storage and Retrieval; Bioinformatics; Data Structures

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-35455-0

ISBN electrónico

978-3-540-35461-1

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

New Algorithms for Text Fingerprinting

Roman Kolpakov; Mathieu Raffinot

Let s = s _1 .. s _ n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set ${\mathcal{F}}$ of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint $f \in {\mathcal{F}}$ is represented by a binary array, F , of size |Σ| named a fingerprint table. A fingerprint, $f \in {\mathcal{F}}$ , admits a number of maximal locations ( i , j ) in S , that is the alphabet of s _ i .. s _ j is f and s _ i  − − 1, s _ j  + 1, if defined, are not in f . The total number of maximal locations is ${\mathcal{L}} \leq n |\Sigma|+1.$ We present new algorithms and a new data structure for the three problems: (1) compute ${\mathcal{F}}$ ; (2) given F , answer if F represents a fingerprint in ${\mathcal{F}}$ ; (3) given F , find all maximal locations of F in s . These problems are respectively solved in $O(({\mathcal{L}}+ n) \log |\Sigma|)$ , Θ(|Σ|), and Θ(|Σ| + K ) time – where K is the number of maximal locations of F .

Palabras clave: Maximal Location; Hash Table; Distinct Character; Naming Algorithm; Edge Label.

- Session 9. String Matching II | Pp. 342-353

Sublinear Algorithms for Parameterized Matching

Leena Salmela; Jorma Tarhio

Two strings parameterize match if there is a bijection that transforms the first string character by character into the second string. This problem has been studied in both one and two dimensions but the research has been centered on developing algorithms with good worst-case performance. We present algorithms that solve this problem in sublinear time on average for moderately repetitive patterns.

Palabras clave: String Match; Repetitive Pattern; Random Text; Read Character; Parameterized Match.

- Session 9. String Matching II | Pp. 354-364

Approximate Matching in Weighted Sequences

Amihood Amir; Costas Iliopoulos; Oren Kapah; Ely Porat

Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image” of this set, where the probability of every symbol’s occurrence at every text location is given. We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time $O(n \sqrt{m\log m})$ , where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O ( nm ) and O ( nm ^2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version. We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O ( s log^2 s ), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time $\min(O(kn\sqrt{s}+s^{3/2}\log^2s), O(s^{4/3}m^{1/3}\log s))$ .

Palabras clave: Edit Distance; Text Element; Zero Probability; Text Location; Weighted Match.

- Session 9. String Matching II | Pp. 365-376

Algorithms for Finding a Most Similar Subforest

Jesper Jansson; Zeshan Peng

Given an ordered labeled forest  F (“the target forest”) and an ordered labeled forest  G (“the pattern forest”), the most similar subforest problem is to find a subforest  F ′ of  F such that the distance between  F ′ and  G is minimum over all possible  F ′. This problem generalizes several well-studied problems which have important applications in locating patterns in hierarchical structures such as RNA molecules’ secondary structures and XML documents. In this paper, we present efficient algorithms for the most similar subforest problem with forest edit distance for three types of subforests: simple substructures, sibling substructures, and closed subforests.

Palabras clave: Label Tree; Edit Mapping; Pattern Forest; Combinatorial Pattern Match; Alignment Distance.

- Session 10. Dynamic Programming | Pp. 377-388

Efficient Algorithms for Regular Expression Constrained Sequence Alignment

Yun-Sheng Chung; Chin Lung Lu; Chuan Yi Tang

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O (|Σ|^2 | V |^4 n ^2) and O (|Σ|^2 | V |^4 n ), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O (| V |^3 n ^2) time and O (| V |^2 n ) space in the worst case. If | V |= O (log n ) we propose another algorithm with time complexity O (| V |^2log| V | n ^2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|^2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.

- Session 10. Dynamic Programming | Pp. 389-400

Large Scale Matching for Position Weight Matrices

Aude Liefooghe; Hélène Touzet; Jean-Stéphane Varré

This paper addresses the problem of multiple pattern matching for motifs encoded by Position Weight Matrices. We first present an algorithm that uses a multi-index table to preprocess the set of motifs, allowing a dramatically decrease of computation time. We then show how to take benefit from simlar motifs to prevent useless computations.

Palabras clave: Transcription Factor Binding Site; Similar Matrice; Score Threshold; Position Weight Matrix; Weighted Pattern.

- Session 10. Dynamic Programming | Pp. 401-412