Catálogo de publicaciones - libros

Compartir en
redes sociales


Combinatorial Pattern Matching: 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005, Proceedings

Alberto Apostolico ; Maxime Crochemore ; Kunsoo Park (eds.)

En conferencia: 16º Annual Symposium on Combinatorial Pattern Matching (CPM) . Jeju Island, South Korea . June 19, 2005 - June 22, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Algorithm Analysis and Problem Complexity; Data Structures; Information Storage and Retrieval; Document Preparation and Text Processing; Pattern Recognition; Bioinformatics

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-26201-5

ISBN electrónico

978-3-540-31562-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Speeding up Parsing of Biological Context-Free Grammars

Daniel Fredouille; Christopher H. Bryant

Grammars have been shown to be a very useful way to model biological sequences families. As both the quantity of biological sequences and the complexity of the biological grammars increase, generic and efficient methods for parsing are needed. We consider two parsers for context-free grammars: depth-first top-down parser and chart parser; we analyse and compare them, both theoretically and empirically, with respect to biological data. The theoretical comparison is based on a common feature of biological grammars: the gap – a gap is an element of the grammars designed to match any subsequence of the parsed string. The empirical comparison is based on grammars and sequences used by the bioinformatics community. Our conclusions are that: (1) the chart parsing algorithm is significantly faster than the depth-first top-down algorithm, (2) designing special treatments in the algorithms for managing gaps is useful, and (3) the way the grammar encodes gaps has to be carefully chosen, when using parsers not optimised for managing gaps, to prevent important increases in running times.

Pp. 241-256

A New Periodicity Lemma

Kangmin Fan; William F. Smyth; R. J. Simpson

Given a string  = [1..], a of period in is a substring  = [.. +  − 1], = ||, ≥ 2, where neither  = [.. +  − 1] nor [.. + ( + 1) − 1] is a repetition. The maximum number of repetitions in any string is well known to be Θ(log ). A or of period in is a substring  = [.. +  + || − 1] of x, where is a repetition, a proper prefix of , and no repetition of period begins at position – 1 of or ends at position + + ||. In 2000 Kolpakov and Kucherov showed that the maximum number () of runs in any string is (), but their proof was nonconstructive and provided no specific constant of proportionality. At the same time, they presented experimental data strongly suggesting that () < . In this paper, as a first step toward proving this conjecture, we present a periodicity lemma that establishes limitations on the number of squares, and their periods, that can occur over a specified range of positions in . We then apply this result to specify corresponding limitations on the occurrence of runs.

Pp. 257-265

Two Dimensional Parameterized Matching

Carmit Hazay; Moshe Lewenstein; Dekel Tsur

Two equal length strings, or two equal sized two dimensional texts, if there is a one-one mapping (relative to the alphabet) of their characters. is the task of finding all × substrings of an × text that p-match to an × pattern. This models, for example, searching for color images with changing of color maps. We present an algorithm that solves the two dimensional parameterized matching problem in (+·polylog()) time.

Pp. 266-279

An Optimal Algorithm for Online Square Detection

Gen-Huey Chen; Jin-Ju Hong; Hsueh-I Lu

A is the concatenation of two identical non-empty strings. Let be the input string which is given character by character. Let be the (unknown) smallest integer such that the -th prefix of contains a square. The problem is to determine as soon as the -th character of is read. The best previously known algorithm of the online square detection problem, due to Leung, Peng, and Ting, runs in (log) time. We improve the time complexity to (log ), where is the number of distinct characters in the -th prefix of the input string. It is not difficult to implement our algorithm to run in expected () time.

Pp. 280-287

A Simple Fast Hybrid Pattern-Matching Algorithm

Frantisek Franek; Christopher G. Jennings; William F. Smyth

The Knuth-Morris-Pratt (KMP) pattern-matching algorithm guarantees both independence from alphabet size and worst-case execution time linear in the pattern length; on the other hand, the Boyer-Moore (BM) algorithm provides near-optimal average-case and best-case behaviour, as well as executing very fast in practice. We describe a simple algorithm that employs the main ideas of KMP and BM (with a little help from Sunday) in an effort to combine these desirable features. Experiments indicate that in practice the new algorithm is among the fastest exact pattern-matching algorithms discovered to date, perhaps dominant for alphabet size 8 or more.

Pp. 288-297

Prefix-Free Regular-Expression Matching

Yo-Sub Han; Yajun Wang; Derick Wood

We explore the regular-expression matching problem with respect to prefix-freeness of the pattern. We show that the prefix-free regular expression gives only linear number of matching substrings in the size of a given text. Based on this observation, we propose an efficient algorithm for the prefix-free regular-expression matching problem. Furthermore, we suggest an algorithm to determine whether or not a given regular language is prefix-free.

Pp. 298-309

Reducing the Size of NFAs by Using Equivalences and Preorders

Lucian Ilie; Roberto Solis-Oba; Sheng Yu

The efficiency of regular expression matching algorithms depends very much on the size of the nondeterministic finite automata (NFA) obtained from regular expressions. Reducing the size of these automata by using equivalences has been shown to reduce significantly the search time. We consider the problem of reducing the size of arbitrary NFAs using equivalences and preorders. For equivalences, we give an algorithm to optimally combine equivalent states for reducing the size of the automata. We also show that the problem of optimally using preorders to reduce the size of an automaton is NP-hard.

Pp. 310-321

Regular Expression Constrained Sequence Alignment

Abdullah N. Arslan

Given strings , , and a regular expression , we introduce as the problem of finding the maximum alignment score between and over all alignments such that in these alignments there exists a segment where some substring of is aligned with some substring of , and both and match , i.e. , ∈ () where () is the regular language described by . A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an () time algorithm for the regular expression constrained sequence alignment problem where , and are the lengths of , and , respectively, and is in the order of the size of the transition function of a finite automaton that we create from a nondeterministic finite automaton accepting (). contains () states if has states.

Pp. 322-333

A Linear Tree Edit Distance Algorithm for Similar Ordered Trees

Hélène Touzet

We describe a linear algorithm for comparing two similar ordered rooted trees with node labels. The method for comparing trees is the usual tree edit distance. We show that an optimal mapping which uses at most insertions or deletions can then be constructed in () where is the size of the trees. The approach is inspired by the Zhang-Shasha algorithm for tree edit distance in combination with an adequate pruning of the search space.

Pp. 334-345

A Polynomial Time Matching Algorithm of Ordered Tree Patterns Having Height-Constrained Variables

Kazuhide Aikou; Yusuke Suzuki; Takayoshi Shoudai; Tomoyuki Uchida; Tetsuhiro Miyahara

Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. Knowledge representations for tree structured data are quite important to discover interesting features which such tree structured data have. In order to represent tree structured patterns with rich structural features, we introduce a new type of structured variables, called height-constrained variables. An (,)-height-constrained variable can be replaced with any tree such that the trunk length of the tree is at least and the height of the tree is at most . Then, we define a term tree as a rooted tree structured pattern with ordered children and height-constrained variables. In this paper, given a term tree and an ordered tree , we present an time algorithm of deciding whether or not matches , where is the maximum number of the children of an internal vertex in , is the sum of all trunk length constraints of all (,)-height-constrained variables in , and and are the numbers of vertices of and , respectively.

Pp. 346-357