Catálogo de publicaciones - libros
Combinatorial Pattern Matching: 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings
Moshe Lewenstein ; Gabriel Valiente (eds.)
En conferencia: 17º Annual Symposium on Combinatorial Pattern Matching (CPM) . Barcelona, Spain . July 5, 2006 - July 7, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
Pattern Recognition; Algorithm Analysis and Problem Complexity; Document Preparation and Text Processing; Information Storage and Retrieval; Bioinformatics; Data Structures
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-35455-0
ISBN electrónico
978-3-540-35461-1
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Cobertura temática
Tabla de contenidos
doi: 10.1007/11780441_31
New Algorithms for Text Fingerprinting
Roman Kolpakov; Mathieu Raffinot
Let s = s _1 .. s _ n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set ${\mathcal{F}}$ of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint $f \in {\mathcal{F}}$ is represented by a binary array, F , of size |Σ| named a fingerprint table. A fingerprint, $f \in {\mathcal{F}}$ , admits a number of maximal locations ( i , j ) in S , that is the alphabet of s _ i .. s _ j is f and s _ i − − 1, s _ j + 1, if defined, are not in f . The total number of maximal locations is ${\mathcal{L}} \leq n |\Sigma|+1.$ We present new algorithms and a new data structure for the three problems: (1) compute ${\mathcal{F}}$ ; (2) given F , answer if F represents a fingerprint in ${\mathcal{F}}$ ; (3) given F , find all maximal locations of F in s . These problems are respectively solved in $O(({\mathcal{L}}+ n) \log |\Sigma|)$ , Θ(|Σ|), and Θ(|Σ| + K ) time – where K is the number of maximal locations of F .
Palabras clave: Maximal Location; Hash Table; Distinct Character; Naming Algorithm; Edge Label.
- Session 9. String Matching II | Pp. 342-353
doi: 10.1007/11780441_32
Sublinear Algorithms for Parameterized Matching
Leena Salmela; Jorma Tarhio
Two strings parameterize match if there is a bijection that transforms the first string character by character into the second string. This problem has been studied in both one and two dimensions but the research has been centered on developing algorithms with good worst-case performance. We present algorithms that solve this problem in sublinear time on average for moderately repetitive patterns.
Palabras clave: String Match; Repetitive Pattern; Random Text; Read Character; Parameterized Match.
- Session 9. String Matching II | Pp. 354-364
doi: 10.1007/11780441_33
Approximate Matching in Weighted Sequences
Amihood Amir; Costas Iliopoulos; Oren Kapah; Ely Porat
Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image” of this set, where the probability of every symbol’s occurrence at every text location is given. We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time $O(n \sqrt{m\log m})$ , where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O ( nm ) and O ( nm ^2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version. We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O ( s log^2 s ), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time $\min(O(kn\sqrt{s}+s^{3/2}\log^2s), O(s^{4/3}m^{1/3}\log s))$ .
Palabras clave: Edit Distance; Text Element; Zero Probability; Text Location; Weighted Match.
- Session 9. String Matching II | Pp. 365-376
doi: 10.1007/11780441_34
Algorithms for Finding a Most Similar Subforest
Jesper Jansson; Zeshan Peng
Given an ordered labeled forest F (“the target forest”) and an ordered labeled forest G (“the pattern forest”), the most similar subforest problem is to find a subforest F ′ of F such that the distance between F ′ and G is minimum over all possible F ′. This problem generalizes several well-studied problems which have important applications in locating patterns in hierarchical structures such as RNA molecules’ secondary structures and XML documents. In this paper, we present efficient algorithms for the most similar subforest problem with forest edit distance for three types of subforests: simple substructures, sibling substructures, and closed subforests.
Palabras clave: Label Tree; Edit Mapping; Pattern Forest; Combinatorial Pattern Match; Alignment Distance.
- Session 10. Dynamic Programming | Pp. 377-388
doi: 10.1007/11780441_35
Efficient Algorithms for Regular Expression Constrained Sequence Alignment
Yun-Sheng Chung; Chin Lung Lu; Chuan Yi Tang
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O (|Σ|^2 | V |^4 n ^2) and O (|Σ|^2 | V |^4 n ), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O (| V |^3 n ^2) time and O (| V |^2 n ) space in the worst case. If | V |= O (log n ) we propose another algorithm with time complexity O (| V |^2log| V | n ^2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|^2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.
- Session 10. Dynamic Programming | Pp. 389-400
doi: 10.1007/11780441_36
Large Scale Matching for Position Weight Matrices
Aude Liefooghe; Hélène Touzet; Jean-Stéphane Varré
This paper addresses the problem of multiple pattern matching for motifs encoded by Position Weight Matrices. We first present an algorithm that uses a multi-index table to preprocess the set of motifs, allowing a dramatically decrease of computation time. We then show how to take benefit from simlar motifs to prevent useless computations.
Palabras clave: Transcription Factor Binding Site; Similar Matrice; Score Threshold; Position Weight Matrix; Weighted Pattern.
- Session 10. Dynamic Programming | Pp. 401-412