Catálogo de publicaciones - libros

Compartir en
redes sociales


Knowledge Discovery from XML Documents: First International Workshop, KDXD 2006, Singapore, April 9, 2006, Proceedings

Richi Nayak ; Mohammed J. Zaki (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33180-3

ISBN electrónico

978-3-540-33181-0

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

Opportunities for XML Data Mining in Modern Applications, or XML Data Mining: Where Is the Ore?

Stephane Bressan; Anthony Tung; Yang Rui

We attempt to identify the opportunities for XML data mining in modern applications. We will try and match requirements of modern application managing XML data with the capabilities of the existing XML mining tools and techniques.

- Keynote Papers | Pp. 1-1

Capturing Semantics in XML Documents

Tok Wang Ling

Traditional semantic data models, such as the Entity Relationship (ER) data model, are used to represent real world semantics that are crucial for the effective management of structured data. The semantics that can be expressed in the ER data model include the representation of entity types together with their identifiers and attributes, n-ary relationship types together with their participating entity types and attributes, and functional dependencies among the participating entity types of relationship types and their attributes, etc.

Today, semistructured data has become more prevalent on the Web, and XML has become the de facto standard for semi-structured data. A DTD and an XML Schema of an XML document only reflect the hierarchical structure of the semistructured data stored in the XML document. The hierarchical structures of XML documents are captured by the relationships between an element and its attributes, and between an element and its subelements. Elementattribute relationships do not have clear semantics, and the relationships between elements and their subelements are binary. The semantics of n-ary relationships with n > 2 cannot be represented or captured correctly and precisely in DTD and XML Schema. Many of the crucial semantics captured by the ER model for structured data are not captured by either DTD or XML Schema. We present the problems encountered in order to correctly and efficiently store, query, and transform (view) XML documents without knowing these important semantics. We solve these problems by using a semantic-rich data model called the bject, elationship, ttribute data model for emitructured Data (ORA-SS). We briefly describe how to mine such important semantics from given XML documents.

- Keynote Papers | Pp. 2-2

Mining Changes from Versions of Dynamic XML Documents

Laura Irina Rusu; Wenny Rahayu; David Taniar

The ability to store information contained in XML documents for future reference becomes a very important issue these days, as the number of applications which use and exchange data in XML format is growing continuously. Moreover, the contents of XML documents are dynamic and they change across time, so researchers are looking to efficient solutions to store the documents’ versions and eventually extract interesting information out of them. This paper proposes a novel approach for mining association rules from changes between versions of dynamic XML documents, in a simple manner, by using the information contained in the consolidated delta. We argue that by applying our proposed algorithm, important information about the behaviour of the changed XML document in time could be extracted and then used to make predictions about its future performance.

- XML Data Mining Methods | Pp. 3-12

XML Document Clustering by Independent Component Analysis

Tong Wang; Da-Xin Liu; Xuan-Zuo Lin

When XML documents are clustered, the high dimensionality problem will occur. Independent Component Analysis (ICA) can reduce dimensionality and in the meanwhile find the underlying latent variables of XML structures to improve the quality of the clustering. This paper proposes a novel strategy to cluster XML documents based on ICA. According to extracted from XML trees, the document was at first represented as Vector Space Model (VSM).Then ICA is applied to reduce the dimensionality of document vectors. Furthermore, document vectors are clustered on this reduced Euclidean Space spanned by the independent components. The experiments show that ICA can enhance the accuracy of the clustering with stable performance.

- XML Data Mining Methods | Pp. 13-21

Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

Marko Brunzel; Myra Spiliopoulou

The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents.

The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.

- XML Data Mining Methods | Pp. 22-32

Classification of XSLT-Generated Web Documents with Support Vector Machines

Atakan Kurt; Engin Tozal

XSLT is a transformation language mainly used for converting XML documents to HTML or other formats. Due to its simplicity and flexibility XML has replaced traditional EDI file formats. Most e-business applications store data in XML, convert XML into HTML using XSTL, and publish the HTML documents to the web. In this paper we argue that the use of XSLT presents an opportunity rather than a challenge to web document classification. We show that it is possible to combine the advantages of both HTML and XML into classification of documents at the XSLT transformation stage, named , to attain higher classification rates using Support Vector Machines (SVM). The results are both expected and promising. We believe that XSLT classification can become a favorable classification method over HTML or XML classification where XSLT stylesheets are available.

- XML Data Mining Methods | Pp. 33-42

Machine Learning Models: Combining Evidence of Similarity for XML Schema Matching

Tran Hong-Minh; Dan Smith

Matching schemas at an element level or structural level is generally categorized as either hybrid, which uses one algorithm, or composite, which combines evidence from several different matching algorithms for the final similarity measure. We present an approach for combining element-level evidence of similarity for matching XML schemas with a composite approach. By combining high recall algorithms in a composite system we reduce the number of real matches missed. By performing experiments on a number of machine learning models for combination of evidence in a composite approach and choosing the SMO for the high precision and recall, we increase the reliability of the final matching results. The precision is therefore enhanced (e.g., with data sets used by Cupid and suggested by the author of LSD, our precision is respectively 13.05% and 31.55% higher than COMA and Cupid on average).

- XML Data Mining Methods | Pp. 43-53

Information Retrieval from Distributed Semistructured Documents Using Metadata Interface

Guija Choe; Young-Kwang Nam; Joseph Goguen; Guilian Wang

We describe a method for retrieving information from distributed heterogeneous semistructured documents, and its implementation in the metadata interface DDXMI (Distributed Document XML Metadata Interface). The system generates local queries appropriate for local schemas from a user query over the global schema and shows the result of the generated queries. The three components are designed to generate the local queries: mappings between global schema and local schemas (extracted from local documents if not given), path substitution, and node identification for resolving the heterogeneity among nodes with the same label that often exist in semistructured data. The system uses Quilt as its XML query language. An experiment is reported over three local semistructured documents: ‘thesis’, ‘reports’, and ‘journal’ documents with ‘article’ global schema. The prototype was developed under Windows system with Java and JavaCC.

- XML Data Reasoning and Querying Methods | Pp. 54-63

Using Ontologies for Semantic Query Optimization of XML Database

Wei Sun; Da-Xin Liu

As XML has gained prevalence in recent years, the management of XML compliant structured-document database has become a very interesting and compelling research area. Effective query optimization is crucial to obtaining good performance from an XML database given a declarative query specification because of the much enlarged optimization space. Query rewriting techniques based on semantic knowledge have been used in database management systems, namely for query optimization. The main goal of query optimization is to rewrite a user query into another one that uses less time and/or less resources during the execution. When using those query optimization strategies the transformed queries are equivalent to the submitted ones. This paper presents a new approach of query optimization using ontology semantics for query processing within XML database. In fact, our approach shows how ontologies can effectively be exploited to rewrite a user query into another one such that the new query provides equally meaningful results that satisfy the intention of the user. Based on practical examples and their usefulness we develop a set of rewriting rules. In addition, we prove that the results of the query rewriting are semantically correct by using a logical model.

- XML Data Reasoning and Querying Methods | Pp. 64-73

The Expressive Language ALCNHR+K(D) for Knowledge Reasoning

Nizamuddin Channa; Shanping Li

The Expressive Language ALCNHR+(D) provides conjunction, full negation, quantifiers, number restrictions, role hierarchies, transitively closed roles and concrete domains. In addition to the operators known from ALCNHR+, a restricted existential predicate restriction operator for concrete domains is supported. In order to capture the semantic of complicated knowledge reasoning model, the expressive language ALCNHR+K(D) is introduced. It cannot only be able to represent knowledge about concrete domain and constraints, but also rules in some sense of closed world semantic model hypothesis. The paper investigates an extension to description logic based knowledge reasoning by means o f decomposing and rewriting complicated hybrid concepts into partitions. We present an approach that automatically decomposes the whole knowledge base into description logic compatible and constraints solver. Our arguments are two-fold. First, complex description logics with powerful representation ability lack effectively reasoning ability and second, how to reason with the combination of inferences from distributed heterogeneous reasoner.

- XML Data Reasoning and Querying Methods | Pp. 74-84