Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Management. Data, Data Everywhere: 24th British National Conference on Databases, BNCOD 24, Glasgow, UK, July 3-5, 2007. Proceedings

Richard Cooper ; Jessie Kennedy (eds.)

En conferencia: 24º British National Conference on Databases (BNCOD) . Glasgow, UK . July 3, 2007 - July 5, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-73389-8

ISBN electrónico

978-3-540-73390-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

XFLab: A Technique of Query Processing over XML Fragment Stream

Sangwook Lee; Jin Kim; Hyunchul Kang

We investigate XML query processing in a portable/handheld client device with limited memory in ubiquitous computing environment. Because of memory limitation in the client, the source XML data possibly of large volume is fragmented in the server and streamed in fragments over which query processing is done in the client. The state-of-the-art techniques employ the in fragmenting XML data and processing queries over XML fragment stream. In this paper, we propose a new technique where an is employed instead of the hole-filler model. Through preliminary experiments, we show that our technique outperforms the state-of-the-art techniques both in memory usage and in query processing time.

- Poster Papers | Pp. 185-189

Knowledge Discovery from Semantically Heterogeneous Aggregate Databases Using Model-Based Clustering

Shuai Zhang; Sally McClean; Bryan Scotney

When distributed databases are developed independently, they may be semantically heterogeneous with respect to data granularity, scheme information and the embedded semantics. However, most traditional distributed knowledge discovery (DKD) methods assume that the distributed databases derive from a single virtual global table, where they share the same semantics and data structures. This data heterogeneity and the underlying semantics bring a considerable challenge for DKD. In this paper, we propose a model-based clustering method for aggregate databases, where the heterogeneous schema structure is due to the heterogeneous classification schema. The underlying semantics can be captured by different clusters. The clustering is carried out via a mixture model, where each component of the mixture corresponds to a different virtual global table. An advantage of our approach is that the algorithm resolves the heterogeneity as part of the clustering process without previously having to homogenise the heterogeneous local schema to a shared schema. Evaluation of the algorithm is carried out using both real and synthetic data. Scalability of the algorithm is tested against the number of databases to be clustered; the number of clusters; and the size of the databases. The relationship between performance and complexity is also evaluated. Our experiments show that this approach has good potential for scalable integration of semantically heterogeneous databases.

- Clustering and Security | Pp. 190-202

Speeding Up Clustering-Based -Anonymisation Algorithms with Pre-partitioning

Grigorios Loukides; Jianhua Shao

K-anonymisation is a technique for protecting privacy contained within a dataset. Many k-anonymisation algorithms have been proposed, and one class of such algorithms are clustering-based. These algorithms can offer high quality solutions, but are rather inefficient to execute. In this paper, we propose a method that partitions a dataset into groups first and then clusters the data within each group for k-anonymisation. Our experiments show that combining partitioning with clustering can improve the performance of clustering-based k-anonymisation algorithms significantly while maintaining the quality of anonymisations they produce.

- Clustering and Security | Pp. 203-214

Fine-Grained Access Control for Database Management Systems

Hong Zhu; Kevin Lü

A practical approach for developing fine-grained access control (FGAC) for database management systems is reported in this paper. We extend SQL language to support security policies. The concept of the for databases is proposed. We implement the policy reuse through the use of policy types and policy instances to alleviate the administration workload of maintaining security policies. The policies for rows and columns can be expressed with policy types. Moreover, complicated database integrity constraints can also be expressed by policy types, and no further purpose-built programs are needed to create specific security control policies. We implement the fine-grained access control in a relational database management system DM5 [4]. The performance test results based on TPC-W are also presented.

- Clustering and Security | Pp. 215-223

Extracting Temporal Information from Short Messages

Richard Cooper; Sinclair Manson

Information Extraction, the process of eliciting data from natural language documents, usually relies on the ability to parse the document and then to detect the meaning of the sentences by exploiting the syntactic structures encountered. In previous papers, we have discussed an application to extract information from short (e-mail and text) messages which takes an alternative approach. The application is lightweight and uses pattern matching rather than parsing, since parsing is not feasible for messages in which both the syntax and the spelling are unreliable. The application works in the context of a high level database schema and identifies sentences which make statements about data describable by this schema. The application matches sentences with templates to identify metadata terms and the data values associated with them. However, the initial prototype could only manage simple, time independent assertions about the data, such as "Jane Austen is the author." This paper describes an extension to the application which can extract temporal data, both time instants and time periods. It also manages time stamps - temporal information which partitions the values of time varying attributes, such as the monarch of a country. In order to achieve this, the original data model has had to be extended with a temporal component and a set of sentence templates has been constructed to recognise statements in this model. The paper describes the temporal model and the extensions to the application, concluding with a worked example.

- Data Mining and Extraction | Pp. 224-234

Max-FTP: Mining Maximal Fault-Tolerant Frequent Patterns from Databases

Shariq Bashir; Abdul Rauf Baig

Mining Fault-Tolerant (FT) Frequent Patterns in real world (dirty) databases is considered to be a fruitful direction for future data mining research. In last couple of years a number of different algorithms have been proposed on the basis of Apriori-FT frequent pattern mining concept. The main limitation of these existing FT frequent pattern mining algorithms is that, they try to find all FT frequent patterns without considering only useful long (maximal) patterns. This not only increases the processing time of mining process but also generates too many redundant short FT frequent patterns that are un-useful. In this paper we present a novel concept of mining only maximal (long) useful FT frequent patterns. For mining such patterns algorithm we introduce a novel depth first search algorithm (imal ault-olerant Frequent attern Mining), with its various search space pruning and fast frequency counting techniques. Our different extensive experimental result on benchmark datasets show that Max-FTP is very efficient in filtering un-interesting FT patterns and execution as compared to Apriori-FT.

- Data Mining and Extraction | Pp. 235-246

A New Approach for Distributed Density Based Clustering on Grid Platform

Nhien-An Le-Khac; Lamine M. Aouad; M-Tahar Kechadi

Many distributed data mining tasks such as distributed association rules and distributed classification have been proposed and developed in the last few years. However, only a few research concerns distributed clustering for analysing large, heterogeneous and distributed datasets. This is especially true with distributed density-based clustering although the centralised versions of the technique have been widely used fin different real-world applications. In this paper, we present a new approach for distributed density-based clustering. Our approach is based on two main concepts: the extension of local models created by at each node of the system and the aggregation of these local models by using tree based topologies to construct global models. The preliminary evaluation shows that our approach is efficient and flexible and it is appropriate with high density datasets and a moderate difference in dataset distributions among the sites.

- Data Mining and Extraction | Pp. 247-258