Catálogo de publicaciones - libros

Compartir en
redes sociales

Web Information Systems Engineering: WISE 2005: 6th International Conference on Web Information Systems Engineering, New York, NY, USA, November 20-22, 2005, Proceedings


Este recurso está disponible en las siguientes plataformas

Tabla de contenidos

Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

Yasuhito Asano; Takao Nishizeki; Masashi Toyoda

There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al . The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al . built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is significantly effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.

Palabras clave: Base Method; Mining Community; Virtual Source; Dense Subgraph; Virtual Edge.

- Web Mining | Pp. 1-14

A Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis

Guandong Xu; Yanchun Zhang; Xiaofang Zhou

Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of clickstream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.

Palabras clave: Collaborative Filter; User Session; User Access Pattern; Probabilistic Latent Semantic Analysis Model; Latent Semantic Model.

- Web Mining | Pp. 15-28

Constructing Interface Schemas for Search Interfaces of Web Databases

Hai He; Weiyi Meng; Clement Yu; Zonghuan Wu

Many databases have become Web-accessible through form-based search interfaces (i.e., search forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta information ; however, the schema is not formally defined on the search interface. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we introduce a schema model for complex search interfaces, and present a tool (WISE- i Extractor) for automatically extracting and deriving all the needed information to construct the schemas. Our experimental results on real search interfaces indicate that this tool is highly effective.

Palabras clave: Logic Relationship; Domain Type; Domain Element; Search Interface; Exclusive Attribute.

- Web Mining | Pp. 29-42

Temporal Ranking of Search Engine Results

Adam Jatowt; Yukiko Kawai; Katsumi Tanaka

Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevant but also fresh information. We have developed a method for adjusting the ranking of search engine results from the point of view of page freshness and relevance. It uses an algorithm that post-processes search engine results based on the changed contents of the pages. By analyzing archived versions of web pages we estimate temporal qualities of pages, that is, general freshness and relevance of the page to the query topic over certain time frames. For the top quality web pages, their content differences between past snapshots of the pages indexed by a search engine and their present versions are analyzed. Basing on these differences the algorithm assigns new ranks to the web pages without the need to maintain a constantly updated index of web documents.

Palabras clave: Search Engine; Cosine Similarity; Temporal Quality; Query Word; Query Vector.

- Web Information Retrieval | Pp. 43-52

Evaluation of Result Merging Strategies for Metasearch Engines

Yiyao Lu; Weiyi Meng; Liangcai Shu; Clement Yu; King-Lup Liu

Result merging is a key component in a metasearch engine. Once the results from various search engines are collected, the metasearch system merges them into a single ranked list. The effectiveness of a metasearch engine is closely related to the result merging algorithm it employs. In this paper, we investigate a variety of resulting merging algorithms based on a wide range of available information about the retrieved results, from their local ranks, their titles and snippets, to the full documents of these results. The effectiveness of these algorithms is then compared experimentally based on 50 queries from the TREC Web track and 10 most popular general-purpose search engines. Our experiments yield two important results. First, simple result merging strategies can outperform Google. Second, merging based on the titles and snippets of retrieved results can outperform that based on the full documents.

- Web Information Retrieval | Pp. 53-66

Decomposition-Based Optimization of Reload Strategies in the World Wide Web

Dirk Kukulenz

Web sites, Web pages and the data on pages are available only for specific periods of time and are deleted afterwards from a client’s point of view. An important task in order to retrieve information from the Web is to consider Web information in the course of time. Different strategies like push and pull strategies may be applied for this task. Since push services are usually not available, pull strategies have to be conceived in order to optimize the retrieved information with respect to the age of retrieved data and its completeness. In this article we present a new procedure to optimize retrieved data from Web pages by page decomposition. By deploying an automatic Wrapper induction technique a page is decomposed into functional segments. Each segment is considered as an independent component for the analysis of the time behavior of the page. Based on this decomposition we present a new component-based download strategy. By applying this method to Web pages it is shown that for a fraction of Web data the freshness of retrieved data may be improved significantly compared to traditional methods.

Palabras clave: Continuous Query; Remote Source; Regular Grammar; Pull Strategy; Page Change.

- Web Information Retrieval | Pp. 67-80

An Ontological Approach for Defining Agents for Collaborative Applications

I. T. Hawryszkiewycz

Collaboration has become important especially in knowledge intensive applications. Computer support systems for collaborative work on The Web, however, usually grow in an ad-hoc manner. The paper suggests two reasons for such an ad-hoc approach. One is a lack of methods to map collaborative requirements into collaborative workspaces. The other is that collaborative processes themselves change over time. The paper proposes a metamodel that provides an ontology to support collaborative process modelling and use these to define generic agents, which can assist users to set up and change collaborative workspaces. The metamodel itself integrates social, organizational and workflow semantics providing the ability to describe complex collaborative processes. The metamodel concepts and the corresponding agents are generic in nature and the paper will describe ways to map such generic concepts to specific domain applications.

Palabras clave: Multiagent System; Generic Agent; Collaborative Process; Agent Structure; Real World Phenomenon.

- Metadata Management | Pp. 81-94

Improving Web Data Annotations with Spreading Activation

Fatih Gelgi; Srinivas Vadrevu; Hasan Davulcu

The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by the human eye, “meaningful information” is still largely inaccessible for the computer applications. In this paper, we present automated algorithms to gather meta-data and instance information by utilizing global regularities on the Web and incorporating the contextual information. Our system is distinguished since it does not require domain specific engineering. Experimental evaluations were successfully performed on the TAP knowledge base and the faculty-course home pages of computer science departments containing 16,861 Web pages.

Palabras clave: Semi-structured data; spreading activation; semantic partitioning.

- Metadata Management | Pp. 95-106

Semantic Partitioning of Web Pages

Srinivas Vadrevu; Fatih Gelgi; Hasan Davulcu

In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures. These content structures enable us to automatically annotate labels in the Web pages with their semantic roles, thus yielding meta-data and instance information for the Web pages. Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16,861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of Web pages. The algorithm is able to achieve this performance without any domain specific engineering requirement.

Palabras clave: Regular Expression; Semantic Role; Attribute Label; Kleene Star; Grammar Induction.

- Metadata Management | Pp. 107-118

A Formal Ontology Reasoning with Individual Optimization: A Realization of the Semantic Web

Pakornpong Pothipruk; Guido Governatori

Answering a query over a group of RDF data pages is a trivial process. However, in the Semantic Web, there is a need for ontology technology. Consequently, OWL, a family of web ontology languages based on description logic, has been proposed for the Semantic Web. Answering a query over the SemanticWeb is thus not trivial, but a deductive process. However, the reasoning on OWL with data has an efficiency problem. Thus, we introduce optimization techniques for the inference algorithm. This work demonstrates the techniques for instance checking and instance retrieval problems with respect to $\mathcal{ALC}$ description logic which covers certain parts of OWL.

Palabras clave: Description Logic; Individual Optimization; Query Answering; Reasoning Service; Boolean Query.

- Ontology and Semantic Web | Pp. 119-132

oMAP: Combining Classifiers for Aligning Automatically OWL Ontologies

Umberto Straccia; Raphaël Troncy

This paper introduces a method and a tool for automatically aligning OWL ontologies, a crucial step for achieving the interoperability of heterogeneous systems in the Semantic Web. Different components are combined for finding suitable mapping candidates (together with their weights), and the set of rules with maximum matching probability is selected. Machine learning-based classifiers and a new classifier using the structure and the semantics of the OWL ontologies are proposed. Our method has been implemented and evaluated on an independent test set provided by an international ontology alignment contest. We provide the results of this evaluation with respect to the other competitors.

Palabras clave: Resource Description Framework; Mapping Rule; Target Entity; Ontology Alignment; Source Ontology.

- Ontology and Semantic Web | Pp. 133-147

Semantic Web Technologies for Interpreting DNA Microarray Analyses: The MEAT System

Khaled Khelif; Rose Dieng-Kuntz; Pascal Barbry

This paper describes MEAT (Memory of Experiments for the Analysis of Transcriptomes), a project aiming at supporting biologists working on DNA microarrays. We provide methodological and software support to build an experiment memory for this domain. Our approach, based on Semantic Web Technologies, is relying on formalized ontologies and semantic annotations of scientific articles and other knowledge sources. It can probably be extended to other massive analyses of biological events (as provided by proteomics, metabolomics...).

Palabras clave: Resource Description Framework; Annotation Base; Unify Medical Language System; Semantic Annotation; Resource Description Framework Triple.

- Ontology and Semantic Web | Pp. 148-160

Extracting Global Policies for Efficient Access Control of XML Documents

Mizuho Iwaihara; Bo Wang; Somchai Chatvichienchai

As documents containing sensitive information are exchanged over the Internet, access control of XML documents is becoming important. Access control policies can specify fine-grained rules to documents, but policies sometimes become redundant, as documents are restructured or combined during exchange. In this paper, we consider a new approach of optimizing access control policies, by extracting distribution information of given authorization values within XML data. The extracted information is called a global policy tree, and it can be utilized for minimizing the total size of policies as well as efficient query processing. We present a linear-time algorithm for minimizing policies utilizing global policy trees, and our evaluation results show significant improvement over existing work.

Palabras clave: Access Control; Policy Tree; Simple Path; Access Control Policy; Global Policy.

- XML | Pp. 161-174

Querying and Repairing Inconsistent XML Data

S. Flesca; F. Furfaro; S. Greco; E. Zumpano

The problem of repairing XML data which are inconsistent and incomplete with respect to a set of integrity constraints and a DTD is addressed. The existence of repairs (i.e. minimal sets of update operations making data consistent) is investigated and shown to be undecidable in the general case. This pro-blem is shown to be still undecidable when data are interpreted as “incomplete” (so that they could be repaired by performing insert operations only). However, it becomes decidable when particular classes of constraints are considered. The existence of repairs is proved to be decidable and, in particular, $\mathcal{NP}$ -complete, if inconsistent data are interpreted as “dirty” data (so that repairs are data-cleaning operations consisting in only deletions). The existence of general repairs (containing both insert and delete operations) for special classes of integrity constraints (functional dependencies) is also investigated. Finally, for all the cases where the existence of a repair is decidable, the complexity of providing consistent answers to a query (issued on inconsistent data) is characterized.

Palabras clave: Functional Dependency; Integrity Constraint; Path Expression; Tree Query; Consistent Answer.

- XML | Pp. 175-188

Towards Automatic Generation of Rules for Incremental Maintenance of XML Views of Relational Data

Vânia Vidal; Valdiana Araujo; Marco Casanova

This paper first proposes a two-step approach to define rules for maintaining materialized XML views specified over relational databases. The first step concentrates on identifying all paths of the base schema that are relevant to a path of the view, with respect to an update. The second step creates rules that maintain all view paths that can be affected by the update. The paper then discusses how to automatically identify all paths in the base schema that are relevant to a view path with respect to a given update operation and how to create the appropriate maintenance rules.

Palabras clave: Relation Scheme; Multiple Occurrence; Relevant Path; Rule Template; Incremental Maintenance.

- XML | Pp. 189-202

A Methodological Approach for Incorporating Adaptive Navigation Techniques into Web Applications

Gonzalo Rojas; Vicente Pelechano

The incorporation of Adaptive Navigation techniques into Web applications is a complex task. The conceptual specification of the navigation must consider the preferences and needs of users, as well as different implementation alternatives from the same navigational structure. However, there is a lack of methods that rule this integration. This work proposes a methodological approach that allow describing adaptive navigation characteristics of a Web application at a high abstraction level. We introduce a User Modelling process and a set of concepts that permit to incorporate two types of adaptive navigation techniques into the navigational description of a Web application. These techniques select and sort the links of a page, according to their relevance for a given user. The steps of the methodology are illustrated through a case study.

Palabras clave: Relevance Concept; Conceptual Description; High Abstraction Level; Special Offer; Online Bookstore.

- Web Service Method | Pp. 203-216

A Web Service Support to Collaborative Process with Semantic Information

Woongsup Kim; Moon Jung Chung

Web services are introduced to deliver methods and technologies to help organizations link their software. However, existing web service standards based on WSDL limits web services’ usefulness in collaborative process management. In this paper, we present a framework, WSCPC, that enables web service based collaborative process management in heterogeneous software environment. To facilitate web service based collaboration, we propose semantic service models and a web service extension that overcome limitations in current frameworks, and, hence, support complex level of communications needed for intensive collaboration in heterogeneous Virtual Enterprises environment. Through our semantic service models, organizational functionalities and capabilities are encapsulated and published as services. Collaborating partners can schedule, control, and monitor the relevant functionalities through WSCPC web services interactions model and web service extensions.

Palabras clave: Service Registry; Process Engine; Process Execution; Service Consumer; Virtual Enterprise.

- Web Service Method | Pp. 217-230

Server-Side Caching Strategies for Online Auction Sites

Daniel A. Menascé; Vasudeva Akula

Online auction sites have very specific workloads and user behavior characteristics. Previous studies on workload characterization conducted by the authors showed that i) bidding activity on auctions increases considerably after 90% of an auction’s life time has elapsed, ii) a very large percentage of auctions have a relatively low number of bids and bidders and a very small percentage of auctions have a high number of bids and bidders, iii) prices rise very fast after an auction has lasted more than 90% of its life time. Thus, if bidders are not able to successfully bid at the very last moments of an auction because of site overload, the final price may not be as high as it could be and sellers, and consequently the auction site, may lose revenue. In this paper, we propose server-side caching strategies in which cache placement and replacement policies are based on auction-related parameters such as number of bids placed or percent remaining time till closing time. A main-memory auction cache at the application server can be used to reduce accesses to the back-end database server. Trace-based simulations were used to evaluate these caching strategies in terms of cache hit ratio and cache efficiency.

Palabras clave: Cache Size; Replacement Policy; Online Auction; Placement Policy; Auction Site.

- Web Service Method | Pp. 231-244

Maintaining Consistency Under Isolation Relaxation of Web Services Transactions

Seunglak Choi; Hyukjae Jang; Hangkyu Kim; Jungsook Kim; Su Myeon Kim; Junehwa Song; Yoon-Joon Lee

For efficiently managing Web Services (WS) transactions which are executed across multiple loosely-coupled autonomous organizations, isolation is commonly relaxed. A Web services operation of a transaction releases locks on its resources once its jobs are completed without waiting for the completions of other operations. However, those early unlocked resources can be seen by other transactions, which can spoil data integrity and causes incorrect outcomes. Existing WS transaction standards do not consider this problem. In this paper, we propose a mechanism to ensure the consistent executions of isolation-relaxing WS transactions. The mechanism effectively detects inconsistent states of transactions with a notion of a completion dependency and recovers them to consistent states. We also propose a new Web services Transaction Dependency management Protocol (WTDP). WTDP helps organizations manage the WS transactions easily without data inconsistency. WTDP is designed to be compliant with a representative WS transaction standard, the Web Services Transactions specifications, for easy integration into existing WS transaction systems. We prototyped a WTDP-based WS transaction management system to validate our protocol.

Palabras clave: Transaction Model; Dependent Coordinator; Vendor Manage Inventory; Circular Dependency; Furniture Maker.

- Web Service Structure | Pp. 245-257

Binding and Execution of Web Service Compositions

K. Vidyasankar; V. S. Ananthanarayana

Web services enable the design, integration, composition, and deployment of distributed and heterogeneous software. While most syntactic issues in composition have been taken care of somewhat satisfactorily, several semantic issues remain unresolved. In this paper, we consider issues relating to binding and execution of composite services. A Web service composition or composite activity consists of a set of (basic or composite) activities with some ordering constraints. In general, an arbitrary collection of execution instances of the individual activities may not constitute an execution of the composite activity; the individual execution instances must be “compatible”. In this paper, we propose (a) a simple formalism to express the compatibility requirements in a composition, and (b) a methodology for (i) the selection of a composite service provider for a composite activity and (ii) the selection of (other) service providers for the constituent activities of the composite activity, to ensure an execution of the composition satisfying the compatibility requirements.

Palabras clave: Constituent Activity; Composite Service; Travel Agent; Composite Activity; Input Constraint.

- Web Service Structure | Pp. 258-272

Handling Transactional Properties in Web Service Composition

Marie-Christine Fauvet; Helga Duarte; Marlon Dumas; Boualem Benatallah

The development of new services by composition of existing ones has gained considerable momentum as a means of integrating heterogeneous applications and realising business collaborations. Services that enter into compositions with other services may have transactional properties, especially those in the broad area of resource management (e.g. booking services). These transactional properties may be exploited in order to derive composite services which themselves exhibit certain transactional properties. This paper presents a model for composing services that expose transactional properties and more specifically, services that support tentative holds and/or atomic execution. The proposed model is based on a high-level service composition operator that produces composite services that satisfy specified atomicity constraints. The model supports the possibility of selecting the services that enter into a composition at runtime, depending on their ability to provide resource reservations at a given point in time and taking into account user preferences.

Palabras clave: Composition Operator; Service Composition; Component Service; Service Type; Composite Service.

- Web Service Structure | Pp. 273-289

XFlow: An XML-Based Document-Centric Workflow

Andrea Marchetti; Maurizio Tesconi; Salvatore Minutoli

This paper aims at investigating on an appropriate framework that allows the definition of workflows for collaborative document procedures. In this framework, called XFlow and largely based on XSLT Processing Model, the workflows are described by means of a new XML application called XFlowML (XFlow Markup Language). XFlowML describes the document workflow using an agent-based approach. Each agent can participate to the workflow with one or more roles, defined as XPath expressions, based on a hierarchical role chart. An XFlowML document contains as many templates as agent roles participating to the workflow. The document workflow engine constitutes the run-time execution support for the document processing by implementing the XFlowML constructs. A prototype of XFlow has been implemented with an extensive use of XML technologies (XSLT, XPath, XForms, SVG) and open-source tools (Cocoon, Tomcat, mySQL).

Palabras clave: External Agent; Agent Role; XPath Expression; Document Instance; Document Flow.

- Collaborative Methodology | Pp. 290-303

Optimization of XSLT by Compact Specialization and Combination

Ce Dong; James Bailey

In recent times, there has been an increased utilization of server-side XSLT systems as part of e-commerce and e-publishing applications. For the high volumes of data in these applications, effective optimization techniques for XSLT are particularly important. In this paper, we propose two new optimization approaches, Specialization Combination and Specialization Set Compaction, to help improve performance. We describe rules for combining specialized XSLT stylesheets and provide methods for generating a more compact specialization set. An experimental evaluation of our methods is undertaken, where we show our methods to be particularly effective for cases with very large XML input and different varieties of user queries.

Palabras clave: Specialization Combination; Query Term; User Query; Compact Specialization; Distinct Query.

- Collaborative Methodology | Pp. 304-317

Extracting Web Data Using Instance-Based Learning

Yanhong Zhai; Bing Liu

This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic methods. In this paper, we propose an instance-based learning method, which performs extraction by comparing each new instance (or page) to be extracted with labeled instances (or pages). The key advantage of our method is that it does not need an initial set of labeled pages to learn extraction rules as in wrapper induction. Instead, the algorithm is able to start extraction from a single labeled instance (or page). Only when a new page cannot be extracted does the page need labeling. This avoids unnecessary page labeling, which solves a major problem with inductive learning (or wrapper induction), i.e., the set of labeled pages may not be representative of all other pages. The instance-based approach is very natural because structured data on the Web usually follow some fixed templates and pages of the same template usually can be extracted using a single page instance of the template. The key issue is the similarity or distance measure. Traditional measures based on the Euclidean distance or text similarity are not easily applicable in this context because items to be extracted from different pages can be entirely different. This paper proposes a novel similarity measure for the purpose, which is suitable for templated Web pages. Experimental results with product data extraction from 1200 pages in 24 diverse Web sites show that the approach is surprisingly effective. It outperforms the state-of-the-art existing systems significantly.

Palabras clave: Target Item; Inductive Learning; Semistructured Data; Comparative Shopping; Item Price.

- Collaborative Methodology | Pp. 318-331

PRoBe: Multi-dimensional Range Queries in P2P Networks

O. D. Sahin; S. Antony; D. Agrawal; A. El Abbadi

Structured P2P systems are effective for exact key searches in a distributed environment as they offer scalability, self-organization, and dynamicity. These valuable properties also make them a candidate for more complex queries, such as range queries. In this paper, we describe PRoBe, a system that supports range queries over multiple attributes in P2P networks. PRoBe uses a multi-dimensional logical space for this purpose and maps data items onto this space based on their attribute values. The logical space is divided into hyper-rectangles, each maintained by a peer in the system. The range queries correspond to hyper-rectangles which are answered by forwarding the query to the peers responsible for overlapping regions of the logical space. We also propose load balancing techniques and show how cached query answers can be utilized for the efficient evaluation of similar range queries. The performance of PRoBe and the effects of various parameters are analyzed through a simulation study.

Palabras clave: Load Balance; Data Item; Range Query; Logical Space; Hilbert Curve.

- P2P, Ubiquitous and Mobile | Pp. 332-346

An Infrastructure for Reactive Information Environments

Rudi Belotti; Corsin Decurtins; Michael Grossniklaus; Moira C. Norrie

We introduce the concept of reactive information environments and a general infrastructure for experimentation with such systems. Its asynchronous state-based processing model is described along with the architectural requirements and main components of our infrastructure. These include a general context engine coupled together with a web publishing platform. An application for a public news service is used to motivate the requirements, explain the processing model and show how an application is implemented using the platform.

Palabras clave: Resource Description Framework; Server Side; Client Side; Context Element; Application Database.

- P2P, Ubiquitous and Mobile | Pp. 347-360

LoT-RBAC: A Location and Time-Based RBAC Model

Suroop Mohan Chandran; J. B. D. Joshi

Recent growth in location-based mobile services has introduced a significant need for location and time-based access control to resources. High mobility of the users and services in the emerging mobile applications in particular make the issue of controlling who can access what information and resources from which locations a daunting challenge. Several RBAC based models have been proposed that attempt to capture the location based and/or time-based access control requirements in various applications. However, they have limited flexibility and granularity. In this paper, we propose a Location and Time-based RBAC (LoT-RBAC) model to address the access control requirements of highly mobile, dynamic environments to provide both location and time based control.

Palabras clave: location based access; role based access; temporal constraint.

- P2P, Ubiquitous and Mobile | Pp. 361-375

Document Re-ranking by Generality in Bio-medical Information Retrieval

Xin Yan; Xue Li; Dawei Song

Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.

Palabras clave: Generality; Relevance; Document Ranking.

- Document Retrieval Applications | Pp. 376-389

Representing and Reasoning About Privacy Abstractions

Yin Hua Li; Salima Benbernou

The emerging next generation Web technologies offer tremendous opportunities for automating information management in a variety of application domains including office tasks, travel, and digital government. One of the main challenges facing effective automation is privacy. Verifying the correct usage of collected personal data is a major concern for both individuals and organizations. In this paper, we present a framework for reasoning about privacy models including provider’s privacy policies and user’s privacy preferences. More specifically, we use a Description Logic (DL) based notation to specify privacy abstractions. We provide a formalization of matching user’s privacy preferences against provider’s privacy policies using DLs’ reasoning mechanisms. We have implemented a Privacy Match Engine(PME) which is based on RACER.

Palabras clave: Privacy Policy; Description Logic; Reasoning Mechanism; Concept Description; Privacy Preference.

- Document Retrieval Applications | Pp. 390-403

Conceptual Query Refinement: The Basic Model

Nenad Stojanovic

In this paper we present a novel approach for the refinement of Boolean queries by using ontologies. We introduce a conceptual model for defining user’s queries, whih enables that the disambiguation (and consequently the refinement) of a query can be performed on the level of the meaning of a query. In that way the refinement process results in a set of meaningful, conceptual extensions of the initial query. Moreover, since a query is represented as a set of logic formulas, the query refinement process can be modeled as an inference process. It opens a palette of additional services that can enrich the query refinement process, like cooperative answering.

- Document Retrieval Applications | Pp. 404-417

Peer-to-Peer Technology Usage in Web Service Discovery and Matchmaking

Brahmananda Sapkota; Laurentiu Vasiliu; Ioan Toma; Dumitru Roman; Chris Bussler

This paper presents a dynamic and scalable mechanism for discovery of semantically enriched descriptions of Web services. By employing Web Service Modeling Ontology (WSMO) as the underlying framework for describing both user requests and Web services, and combining it with the usage of Peer-to-Peer technology in this context, a scalable, distributed, dynamic and flexible discovery mechanism is obtained. A use case scenario is presented for supporting the viability of such a mechanism.

Palabras clave: Service Discovery; Cluster Manager; Business Process Integration; Goal Decomposition; Digital Enterprise Research Institute.

- Short Paper Session 1: Web Services and E-Commerce | Pp. 418-425

A Contract-Based Approach for Monitoring Collaborative Web Services Using Commitments in the Event Calculus

Mohsen Rouached; Olivier Perrin; Claude Godart

Web services (WS) are gaining popularity for supporting business interactions in cross-organisational distributed business processes. However, current WS specifications mostly concentrate on syntactic aspects. Because multiparty collaborations in business involve complex and long-lived interactions between autonomous partners, their behaviour must be specified to ensure the reliability of the collaboration. This paper presents an event-based framework associated with a semantic definition of the commitments expressed in the event calculus, to model and monitor multi-party contracts. This framework permits to coordinate and regulate Web services in business collaborations, by allowing detection of actual and imminent violations.

Palabras clave: Service Monitoring; Collaboration and Coordination; Event Calculus; Commitments.

- Short Paper Session 1: Web Services and E-Commerce | Pp. 426-434

Asynchronous Web Services Communication Patterns in Business Protocols

Marco Brambilla; Giuseppe Guglielmetti; Christina Tziviskou

Asynchronous interactions are becoming more and more important in the realization of complex B2B Web applications, and Web services are at the moment the most innovative and well-established implementation platform for communication between applications. This paper studies the existing business protocols for Web services interactions, compares their expressive power, extracts a set of patterns for implementing asynchrony, studies the trade-offs and the typical usage scenarios of the various patterns, and finally proposes a sample application that has been implemented based on these patterns. The application has been designed using a high-level modeling language for Web applications, thus showing that the studied patterns can be applied at a conceptual level as well as directly at implementation level.

- Short Paper Session 1: Web Services and E-Commerce | Pp. 435-442

Towards the Automation of E-Negotiation Processes Based on Web Services – A Modeling Approach

Stefanie Rinderle; Morad Benyoucef

E-Negotiation is the process of conducting negotiations between business partners using electronic means. The interest in e-negotiation is motivated by its potential to provide business partners with more efficient processes, enabling them to draft better contracts in less time. Most of today’s e-marketplaces support some form of e-negotiation. Numerous attempts are being made to design e-marketplaces that support more than one negotiation protocol. The main problem in designing these e-marketplaces is the lack of a systematic approach. In our view, the e-marketplace enforces negotiation protocols and therefore should make them available for consultation by humans and for automation by software agents. Separating the protocols from the e-negotiation media is a step towards a configurable e-marketplace. In this paper we address the requirements for modeling e-negotiation protocols. Then we adopt the Statechart formalism as a modeling language and provide descriptions of five commonly used e-negotiation protocols. Finally, we discuss how we move from these Statechart descriptions of the protocols to modeling the interactions between the e-marketplace participants using a web service orchestration language.

Palabras clave: Service Oriented Architecture; Software Agent; Negotiation Strategy; Negotiation Protocol; Service Broker.

- Short Paper Session 1: Web Services and E-Commerce | Pp. 443-453

Modeling of User Acceptance of Consumer E-Commerce Website

Rui Chen

As the consumer e-commerce market grows intensively competitive, the capability of a website to capture consumers and to be accepted has been recognized as a critical issue. The user acceptance of a website not only brings immediate business opportunities, it also casts great impact on future return and loyalty buildup of the consumer. This paper is intended to explore the measurement of consumer acceptance of e-commerce website. By synthesizing previous research into a coherent body of knowledge and by recognizing the roles of contingency factors, we develop a new e-commerce website acceptance model that examines the website success. The model is extended from Garrity & Sanders Model and is expected to shed light on website design practice.

Palabras clave: User Satisfaction; Technology Acceptance Model; Online Shopping; User Acceptance; Consumer Acceptance.

- Short Paper Session 1: Web Services and E-Commerce | Pp. 454-462

A Collaborative Recommender System Based on User Association Clusters

Chein-Shung Hwang; Pei-Jung Tsai

The ever-increasing popularity of the Internet has led to an explosive growth of the sheer volume of information. Recommender system is one of the possible solutions to the information overload problem. Traditional item-based collaborative filtering algorithms can provide quick and accurate recommendations by building a model offline. However, they may not be able to provide truly personalized information. For providing efficient and effective recommendations while maintaining a certain degree of personalization, in this paper, we propose a hybrid model-based recommender system which first partitions the user set based on user ratings and then performs item-based collaborative algorithms on the partitions to compute a list of recommendations. We have applied our system to the well known movielens dataset. Three measures (precision, recall and F1-measure) are used to evaluate the performance of the system. The experimental results show that our system is better than traditional collaborative recommender systems.

Palabras clave: Association Rule; Recommender System; Collaborative Filter; Collaborative Filter Algorithm; Movielens Dataset.

- Short Paper Session 2: Recommendation and Web Information Extraction | Pp. 463-469

Space-Limited Ranked Query Evaluation Using Adaptive Pruning

Nicholas Lester; Alistair Moffat; William Webber; Justin Zobel

Evaluation of ranked queries on large text collections can be costly in terms of processing time and memory space. Dynamic pruning techniques allow both costs to be reduced, at the potential risk of decreased retrieval effectiveness. In this paper we describe an improved query pruning mechanism that offers a more resilient tradeoff between query evaluation costs and retrieval effectiveness than do previous pruning approaches.

Palabras clave: Mean Average Precision; Query Evaluation; Retrieval Effectiveness; Posting List; Query Stream.

- Short Paper Session 2: Recommendation and Web Information Extraction | Pp. 470-477

Automated Retraining Methods for Document Classification and Their Parameter Tuning

Stefan Siersdorfer; Gerhard Weikum

This paper addresses the problem of semi-supervised classification on document collections using retraining (also called self-training). A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. Such an approach is by itself not robust and faces tuning problems regarding parameters like the number of selected documents, the number of retraining iterations, and the ratio of positive and negative classified samples used for retraining. The paper develops methods for automatically tuning these parameters, based on predicting the leave-one-out error for a re-trained classifier and avoiding that the classifier is diluted by selecting too many or weak documents for retraining. Our experiments with three different datasets confirm the practical viability of the approach.

Palabras clave: Support Vector Machine; Test Document; Unlabeled Data; Training Document; Text Summarization.

- Short Paper Session 2: Recommendation and Web Information Extraction | Pp. 478-486

NET – A System for Extracting Web Data from Flat and Nested Data Records

Bing Liu; Yanhong Zhai

This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately.

- Short Paper Session 2: Recommendation and Web Information Extraction | Pp. 487-495

Blog Map of Experiences: Extracting and Geographically Mapping Visitor Experiences from Urban Blogs

Takeshi Kurashima; Taro Tezuka; Katsumi Tanaka

The prevalence of weblogs (blogs) has enabled people to share the personal experiences of tourists at specific locations and times. Such information was traditionally unavailable, except indirectly through local newspapers and periodicals. This paper describes a method of spatially and temporally obtaining specific experiences by extracting association rules from the content of blog articles. For example, we can read about visitors’ activities and evaluations of sightseeing spots. By geographically mapping their experiences, the proposed system enables observation of tourist activities and impressions of specific locations, which can often be more diverse than local guidebooks and more trustworthy than advertisements.

Palabras clave: Association Rule; Association Rule Mining; APRIORI Algorithm; Transaction Database; Kyoto City.

- Short Paper Session 2: Recommendation and Web Information Extraction | Pp. 496-503

Reliable Multicast and Its Probabilistic Model for Job Submission in Peer-to-Peer Grids

Peter Merz; Katja Gorunova

We present an efficient algorithm for job submissions in Peer-to-Peer (desktop) grids based on limited multicasts. Our approach combines the advantages of two overlay architectures: Chord-like structured networks and unstructured networks with epidemic communication. To predict the multicast properties and to optimize its distribution schedule, we present a probabilistic model of the process of information propagation within the overlay. We show the efficiency and the fault-tolerance of our proposed method and demonstrate the high accuracy of the predictive model.

- Short Paper Session 3: P2P, Grid and Distributed Management | Pp. 504-511

Peer-Sensitive ObjectRank – Valuing Contextual Information in Social Networks

Andrei Damian; Wolfgang Nejdl; Raluca Paiu

Building on previous work on how to model contextual information for desktop search and how to implement semantically rich information exchange in social networks, we define a new algorithm, Peer-Sensitive ObjectRank for ranking resources on the desktop. The new algorithm takes into account different trust values for each peer, generalizing previous biasing PageRank algorithms. We investigate in detail, how different assumptions about trust distributions influence the ranking of information received from different peers, and which consequences they have with respect to integration of new resources into one peer’s initial network of resources. We also investigate how assumptions concerning size and quality of a peer’s resource network influence ranking after information exchange, and conclude with directions for further research.

Palabras clave: Social Network; Contextual Information; Ranking Algorithm; Malicious Peer; Initial Rank.

- Short Paper Session 3: P2P, Grid and Distributed Management | Pp. 512-519

Automatic Performance Tuning for J2EE Application Server Systems

Yan Zhang; Wei Qu; Anna Liu

Performance tuning for J2EE application server systems is a complex manual task. This is unfortunately a necessary task in order to achieve optimal performance under dynamic workload environment. In this paper, we present our architecture and approach for implementing autonomic behavior in J2EE application server systems. Our experimental results demonstrate the feasibility and practicality of our architecture and approach in automatic performance tuning of J2EE application server systems.

Palabras clave: Application Server; Service Level Agreement; Autonomic Computing; Automatic Performance; Generic Control System.

- Short Paper Session 3: P2P, Grid and Distributed Management | Pp. 520-527

Xebu: A Binary Format with Schema-Based Optimizations for XML Data

Jaakko Kangasharju; Sasu Tarkoma; Tancred Lindholm

XML is currently being used as the message syntax for Web services. To enable small mobile devices to use Web services, this XML use must not be too resource-consuming. Due to several measurements indicating otherwise, alternate serialization formats for XML data have been proposed. We present here a format for XML data designed from the ground up for the mobile environment. The format is simple, yet gives acceptable document sizes and is efficiently processable. An automaton-based approach gives further improvements when full or partial schema information is available. We provide performance measurements verifying these claims and also consider some issues arising from the use of an alternate XML serialization format.

Palabras clave: XML and Semi-structured Data; Web Services; Mobile Environment; XML Serialization Format; Binary XML.

- Short Paper Session 3: P2P, Grid and Distributed Management | Pp. 528-535

Maintaining Versions of Dynamic XML Documents

Laura Irina Rusu; Wenny Rahayu; David Taniar

The ability to store information contained in XML documents for future references becomes a very important issue as the number of applications which use and exchange data in XML format is growing continuously. Moreover, the contents of XML documents are dynamic and they change across time. However, storing all document versions in an XML data warehouse would introduce a high level of redundancy. Nevertheless the ability to store XML documents together with their different versions across time is often required. Our paper proposes a novel approach for storing changes of dynamic XML documents in time with less overhead so earlier versions can be easily queried. We show how our proposed consolidated delta is built, with steps and rules of the algorithm involved and we demonstrate the efficiency of the versioning approach in terms of storage and retrieval using some test data.

- Short Paper Session 3: P2P, Grid and Distributed Management | Pp. 536-543

Identifying Value Mappings for Data Integration: An Unsupervised Approach

Jaewoo Kang; Dongwon Lee; Prasenjit Mitra

The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. “Two-door front wheel drive” can be represented as “2DR-FWD” or “R2FD”, or even as “CAR TYPE 3” in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.

Palabras clave: Singular Value Decomposition; Latent Semantic Indexing; Semantic Heterogeneity; Human Computer Interface; Identity Uncertainty.

- Short Paper Session 4: Advanced Issues | Pp. 544-551

Relaxing Result Accuracy for Performance in Publish/Subscribe Systems

Engie Bashir; Jihad Boulos

Since the evaluation of XPath expressions is highly dependent upon their size and navigational structures that include ancestor-descendant relationships (“//”) and wildcard steps (“/∗”), we introduce a novel and complementary approach to optimizing XPath queries by rewriting and minimizing such structural occurrences. This rewriting approach depends upon the existence of a statistical schema, which we derive from a set of pre-processed XML documents. However, an imprecision in the schema extraction may lead to a loss of accuracy in the results. Through experimentation and analysis, we validate the scalability and efficiency of our approach.

Palabras clave: Schema Extraction; Deterministic Finite Automaton; XPath Query; XPath Expression; Query Matcher.

- Short Paper Session 4: Advanced Issues | Pp. 552-559

Using Non-random Associations for Predicting Latency in WANs

Vladimir Zadorozhny; Louiqa Raschid; Avigdor Gal; Qiang Ye; Hyma Murthy

In this paper, we propose a scalable performance management tool for Wide Area Applications. Our objective is to scalably identify non-random associations between pairs of individual Latency Profiles (iLPs) (i.e., latency distributions experienced by clients when connecting to a server) and exploit them in latency prediction. Our approach utilizes Relevance Networks (RNs) to manage tens of thousands of iLPs . Non-random associations between iLPs can be identified by topology-independent measures such as correlation and mutual information. We demonstrate that these non-random associations do indeed have a significant impact in improving the error of latency prediction.

Palabras clave: Mutual Information; Border Gateway Protocol; Correlation Threshold; Relevance Network; Latency Prediction.

- Short Paper Session 4: Advanced Issues | Pp. 560-568

An Online Face Recognition System Using Multiple Compressed Images over the Internet

Hwangjun Song; Sun Jae Chung; Young-Ho Park

In this work, we propose an effective online face recognition system over the error-prone Internet. The proposed system uses multiple JPEG-compressed images to improve the recognition rate, and image compression and resizing to reduce the transmission delay and the processing delay. Furthermore, the robustness is improved by effective packetization of compressed image data. First of all, we examine the face recognition rate in terms of quantization parameter and image size, and then implement the effective multiple-image-based face recognition system. Finally, experimental results are provided to show the performance comparison with existing algorithms over the error-prone Internet environment.

- Short Paper Session 4: Advanced Issues | Pp. 569-576

The Information Market: Its Basic Concepts and Its Challenges

P. van Bommel; B. van Gils; H. A. Proper; M. van Vliet; Th. P. van der Weide

This paper discusses the concept of information market. The authors of this paper have been involved in several aspects of information retrieval research. In continuing this research tradition we now take a wider perspective on this field and re-position it as a market where demand for information meets supply for information. The paper starts by exploring the notion of a market in general and is followed by a specialization of these considerations in the information market, where we will also position some of the existing work.

- Short Paper Session 4: Advanced Issues | Pp. 577-583

List Data Extraction in Semi-structured Document

Hui Xu; Juan-Zi Li; Peng Xu

The amount of semi-structured documents is tremendous online, such as business annual reports, online airport listings, catalogs, hotel directories, etc. List, which has structured characteristics, is used to store highly structured and database-like information in many semi-structured documents. This paper is about list data extraction from semi-structured documents. By list data extraction, we mean extracting data from lists and grouping it by rows and columns. List data extraction is of benefit to text mining applications on semi-structured documents. Recently, several methods are proposed to extract list data by utilizing the word layout and arrangement information [1, 2]. However, in the research community, few previous studies has so far sufficiently investigated the problem of making use of not only layout and arrangement information, but also the semantic information of words, to the best of our knowledge. In this paper, we propose a clustering based method making use of both the layout information and the semantic information of words for this extraction task. We show experimental results on plain-text annual reports from Shanghai Stock Exchange, in which 73.49% of the lists were extracted correctly.

- Poster Flash Session 1 | Pp. 584-585

Optimization Issues for Keyword Search over Tree-Structured Documents

Sujeet Pradhan; Katsumi Tanaka

In this paper, we discuss one of several optimization issues regarding our algebraic query model for keyword search over tree-structured documents. In particular, we focus on the properties of a class of filters. The filters in this class not only restrict the size of query results, but also are capable of reducing the cost of query processing.

- Poster Flash Session 1 | Pp. 586-587

Semantic Integration of Schema Conforming XML Data Sources

Dimitri Theodoratos; Theodore Dalamagas; I-Ting Liu

A challenging problem in Web engineering is the integration of XML data sources. Even if these data sources conform to schemas, they may have their schemas and the correspongind XML documents structured differently. In this paper, we address the problem of integrating XML data sources (a) by adding semantic information to document schemas, and (b) by using a query language that allows a partial specification of tree patterns. The semantic information allows the grouping of elements into the so called schema dimensions . Our approach allows querying data sources with different schemas in an integrated way. Users posing queries have the flexibility to specify structural constraints fully, partially or not at all. Our approach was initially developed for arbitrarily structured data sources [1]. Here, we show how this approach can be applied to tree-structured data sources that comply to schemas.

- Poster Flash Session 1 | Pp. 588-589

Extract Salient Words with WordRank for Effective Similarity Search in Text Data

Xiaojun Wan; Jianwu Yang

We propose a method named WordRank to extract a few salient words from the target document and then use these words to retrieve similar documents based on popular retrieval functions. The set of extracted words is a concise and topic-oriented representation of the target document and reduces the ambiguous and noisy information in the document, so as to improve the retrieval performance. Experiments and results demonstrate the high effectiveness of the proposed approach.

- Poster Flash Session 1 | Pp. 590-591

Intensional P2P Mappings Between RDF Ontologies

Zoran Majkić

We consider the Peer-To-Peer (P2P) database system with RDF ontologies and with the semantic characterization of P2P mappings based on logical views over local peer’s ontology. Such kind of virtual-predicate based mappings needs an embedding of RDF ontologies into a predicate first-order logic, or at some of its sublanguages as, for example, logic programs for deductive databases. We consider a peer as a local epistemic logic system with its own belief based on RDF tuples, independent from other peers and their own beliefs. This motivates the need of a semantic characterization of P2P mappings based not on the extension but on the meaning of concepts used in the mappings, that is, based on intensional logic. We show that it adequately models robust weakly-coupled framework of RDF ontologies and supports decidable query answering.The approach to use conventional first order logic (FOL) as the semantic underpinning for RDF has many advantages: FOL is well established and well understood. We will consider an RDF-ontology as finite set of triples < r , p , v >, where r is a resource name (for class, an instance or a value), p is a property (InstanceOf or Property in RDF, or Subclass or Property in RDFS), and v is a value (which could also be a resource name). We denote by $\mathcal{T}$ the set of all triples which satisfy such requirements.

- Poster Flash Session 1 | Pp. 592-594

Meta-modeling of Educational Practices for Adaptive Web Based Education Systems

Manuel Caeiro-Rodríguez; Martín Llamas-Nistal; Luis Anido-Rifón

This paper proposes a component-based architecture for adaptive Web-based education systems that support a particular kind of EML models. Our work is concerned with the development of an EML meta-model to provide an enhanced support for the modeling of collaborative practices. The proposal is based on the identification of perspectives.

- Poster Flash Session 1 | Pp. 595-596

An On-line Intelligent Recommendation System for Digital Products Using Fuzzy Logic

Yukun Cao; Yunfeng Li; Chengliang Wang

Developing an intelligent recommendation system is a good way to overcome the problem of products information overload. We believe that the personalized recommendation system should be build according the special features of a certain kind of products, thereby forming professional recommendation systems for different products. In the paper, we propose a system for digital products, such as laptop, digital camera, PDA, etc. The approach utilizes fuzzy logic to retrieve optimal products based on the consumer’s current preferences from the system-user interactions. Experimental results show the promise of our systems.

- Poster Flash Session 1 | Pp. 597-598

Consensus Making on the Semantic Web: Personalization and Community Support

Anna V. Zhdanova; Francisco Martín-Recuerda

We propose a framework for ontology-based consensus making, which is grounded on personalization and community support. Corresponding software is designed to be naturally deployed in community Web environments.

Palabras clave: Community Support; Ontology Alignment; Ontology View; Ontology Management; Consensus Framework.

- Poster Flash Session 1 | Pp. 599-600

Dictionary-Based Voting Text Categorization in a Chemistry-Focused Search Engine

Chunyan Liang; Li Guo; Zhaojie Xia; Xiaoxia Li; Zhangyuan Yang

A chemistry-focused search engine, named ChemEngine, is developed to help chemists to get chemical information more conveniently and precisely on Internet. Text Categorization is used in ChemEngine to facilitate users’ search. The semantic similarity and noisy data in chemical web pages make traditional classifier perform poorly on them. To classify chemical web pages more accurately, a new text categorization approach based on dictionary and voting is proposed and integrated into the ChemEngine.

Palabras clave: Semantic Similarity; Traditional Classifier; Result List; Train Data; Vote Method.

- Poster Flash Session 1 | Pp. 601-602

An Approach to Securely Interconnect Geo Web Services

Adrian Spalka; Stefan Schulz

Web Services play a growing role in the geographic community. Efforts to establish a Spatial Data Infrastructure (SDI) are coordinated by the Open Geospatial Consortium (OGC). However, as the infrastructure gets established, and more content providers wish to offer their products, questions of security arise. Geographical services often stand to gain tremendously from composition and delegation of requests, where one piece of data is constructed from several sets of data, using different sources. However, current standards by the OGC regulate only message transfers, without taking considerations of access control, security and privacy into account.

- Poster Flash Session 2 | Pp. 603-604

Automatic Keyword Extraction by Server Log Analysis

Chen Ding; Jin Zhou; Chi-Hung Chi

Traditionally, keywords are extracted from full texts of a document. While in the web environment, there are more sources we can use to provide a more complete view of a web page’s contents. In this paper, we propose to analyze web server logs to extract keywords of entry pages from anchor texts and query terms, and propagate these terms along user access paths to other linked pages. The major benefit of this method is that temporal changes could be reflected in extracted terms, and it is more about a user’s viewpoint on page’s contents instead of author’s.

- Poster Flash Session 2 | Pp. 605-606

Approximate Intensional Representation of Web Search Results

Yasunori Matsuike; Satoshi Oyama; Katsumi Tanaka

In this paper, we propose the notion of the “Approximate Intensional Representation (abbrieviated by AIR)” for Web search result. Intuitively, an AIR for a user query q is another query q’ such that the search result (Web pages) is approximately represented by the query expression q’. The purpose of the AIR is to support users to understand the outline of the searched Web pages in a form of query.

- Poster Flash Session 2 | Pp. 607-608

A Unique Design for High-Performance Decentralized Resources Locating: A Topological Perspective

Xinli Huang; Fanyuan Ma; Wenju Zhang

In this paper, we propose a unique protocol for high-performance decentralized resources locating, focusing on building overlays with good topological properties. Our protocol operates with only local knowledge, yet results in enlarged search scope and reduced network traffic, by better matching the heterogeneity and the physical topology, which is also justified by simulations.

Palabras clave: Topological Property; Node Degree; Unique Design; Search Radius; Underlying Network.

- Poster Flash Session 2 | Pp. 609-610

Searching the Web Through User Information Spaces

Athanasios Papagelis; Christos Zaroliagis

During the last years web search engines have moved from the simple but inefficient syntactical analysis (first generation) to the more robust and usable web graph analysis (second generation). Much of the current research is focussed on the so-called third generation search engines that, in principle, inject “human characteristics” on how results are obtained and presented to the end user. Approaches exploited towards this direction include (among others): an alteration of PageRank [1] that takes into account user specific characteristics and bias the page ordering using the user preferences (an approach, though, that does not scale well with the number of users). The approach is further exploited in [3], where several PageRanks are computed for a given number of distinct search topics. A similar idea is used in [6], where the PageRank computation takes into account the content of the pages and the query terms the surfer is looking for. In [4], a decomposition of PageRank to basic components is suggested that may be able to scale the different PageRank computations to a bigger number of topics or even distinct users. Another approach to web search is presented in [2], where a rich extension of the web, called semantic web, and the application of searching over this new setting is described.

- Poster Flash Session 2 | Pp. 611-612

REBIEX: Record Boundary Identification and Extraction Through Pattern Mining

Parashuram Kulkarni

Information on the web is often placed in a structure having a particular alignment and order. For example, Web pages produced by Web search engines, CGI scripts, etc generally have multiple records of information, with each record representing one unit of information and share a distinct visual pattern. The pattern formed by these records may be in the structure of documents or in the repetitive nature of their content. For effective information extraction it becomes essential to identify record boundaries for these units of information and apply extraction rules on individual record elements. In this paper I present REBIEX, a system to automatically identify and extract repeated patterns formed by the data records in a fuzzy way, allowing for slight inconsistencies using the structural elements of web documents as well as the content and categories of text elements in the documents without the need of any training data or human intervention. This technique, unlike the current ones makes use of the fact that it is not only HTML structure which repeats, but also the content matter of the document which repeats consistently. The system also employs a novel algorithm to mine repeating patterns in a fuzzy way with high accuracy.

Palabras clave: Pattern Mining; Text Element; Repetitive Nature; Multiple Record; Domain Specific Information.

- Poster Flash Session 2 | Pp. 613-615

Discovering the Biomedical Deep Web

Rajesh Ramanand; King-Ip Lin

The rapid growth of biomedical information in the Deep Web has produced unprecedented challenges for traditional search engines. This paper describes a new Deep web resource discovery system for biomedical information. We designed two hypertext mining applications: a Focused Crawler that selectively seeks out relevant pages using a classifier that evaluates the relevance of the document with respect to biomedical information, and a Query Interface Extractor that extracts information from the page to detect the presence of a Deep Web database. Our anecdotes suggest that combining focused crawling with query interface extraction is very effective for building high-quality collections of Deep Web resources on biomedical topics.

Palabras clave: Biomedical Data; Query Interface; Decision Tree Classifier; Biomedical Informa; Trained Classifier.

- Poster Flash Session 2 | Pp. 616-617

A Potential IRI Based Phishing Strategy

Anthony Y. Fu; Xiaotie Deng; Wenyin Liu

We anticipate a potential phishing strategy by obfuscation of Web links using Internationalized Resource Identifier (IRI). In the IRI scheme, the glyphs of many characters look very similar while their Unicodes are different. Hence, certain different IRIs may show high similarity. The potential phishing attacks based on this strategy are very likely to happen in the near future with the boosting utilization of IRI. We report this potential phishing strategy to provoke much further dissections of related counter measures.

Palabras clave: Internet security; Anti-phishing; Internationalized Resource Identifier (IRI).

- Poster Flash Session 2 | Pp. 618-619

Multiway Iceberg Cubing on Trees

Pauline LienHua Chou; Xiuzhen Zhang

The Star-cubing algorithm performs multiway aggregation on trees but incurs huge memory consumption. We propose a new algorithm MG-cubing that achieves maximal multiway aggregation. Our experiments show that MG-cubing achieves similar and very often better time and memory efficiency than Star-cubing.

- Poster Flash Session 2 | Pp. 620-622

Building a Semantic-Rich Service-Oriented Manufacturing Environment

Zhonghua Yang; Jing-Bing Zhang; Robert Gay; Liqun Zhuang; Hui Mien Lee

Service-orientation has emerged as a new promising paradigm for enterprise integration in the manufacturing sector. In this paper, we focus on the approach and technologies for constructing a service-oriented manufacturing environment. The service orientation is achieved via virtualization in which every thing, including machines, equipments, devices, various data sources, applications, and processes, are virtualized as standard-based Web services. The virtualization approach is based on the emerging Web Services Resource Framework (WS-RF). A case study of virtualizing an AGV system using WS-RF is described. The use of Semantic Web Services technologies to enhance manufacturing Web services for a semantic-rich environment is discussed, focusing on OWL-S for semantic markup of manufacturing Web services and OWL for the development of ontologies in the manufacturing domain. An enterprise integration architecture enabled by Semantic Web service composition is also discussed.

Palabras clave: Business Process; Service Composition; Domain Ontology; Automate Guide Vehicle; Enterprise Integration.

- Industry-1: Semantic Web | Pp. 623-632

Building a Semantic Web System for Scientific Applications: An Engineering Approach

Renato Fileto; Claudia Bauzer Medeiros; Calton Pu; Ling Liu; Eduardo Delgado Assad

This paper presents an engineering experience for building a Semantic Web compliant system for a scientific application – agricultural zoning. First, we define the concept of ontological cover and a set of relationships between such covers. These definitions, based on domain ontologies, can be used, for example, to support the discovery of services on the Web. Second, we propose a semantic acyclic restriction on ontologies which enables the efficient comparison of ontological covers. Third, we present different engineering solutions to build ontology views satisfying the acyclic restriction in a prototype. Our experimental results unveil some limitations of the current Semantic Web technology to handle large data volumes, and show that the combination of such technology with traditional data management techniques is an effective way to achieve highly functional and scalable solutions.

- Industry-1: Semantic Web | Pp. 633-642

A SOAP Container Model for e-Business Messaging Requirements

Hamid Ben Malek; Jacques Durand

e-Business software vendors need to accommodate several standards involved in the various functions of a messaging endpoint. Vendors also need to quickly rollout the next version of a messaging protocol by reusing as much as possible of the common software. Increasingly in an e-Business context, several versions of a messaging standard will have to be concurrently operated by business partners. The current platforms for Web service or SOAP offer little support to the above. We have designed SPEF (SOAP Profile Enabling Framework) to address these engineering and business challenges. SPEF allows for coordinating the processing of SOAP modules that implement different standards (security, reliability, etc.). It has been designed as a lightweight messaging framework that behaves as a container for functional plug-ins. Message processing (either for sending or receiving) amounts to a workflow among such plug-ins. The framework relies heavily on open-source software for the basic functions common to various messaging profiles. The paper reports on the resulting integration and on experimenting with SPEF on existing SOAP standards.

Palabras clave: Business Process Management; Messaging Protocol; Open Source Implementation; Soap Message; Messaging Standard.

- Industry-1: Semantic Web | Pp. 643-652

An Empirical Study of Security Threats and Countermeasures in Web Services-Based Services Oriented Architectures

Mamoon Yunus; Rizwan Mallal

As enterprises deploy Services Oriented Architecture (SOA), Web Services Security and Management has become the cornerstone of successful architectures. The greatest potential of Web Services is through re-usability and flexibility. This required flexibility in turn leads to significant security and management challenges. Enterprises migrating to SOA face security challenges such as malicious and malformed SOAP messages parser vulnerabilities and Denial of Service attacks over Web Services. Discovering Web Service Vulnerabilities and Compliance Violations and establishing countermeasure policies for Web Services security threats across large enterprises need to be addressed through standards-based products. This paper explores typical Web Services implementations, threat identification methods, and countermeasures against Web Services vulnerabilities.

- Industry-2: SOA | Pp. 653-659

Collaborative End-Point Service Modulation System (COSMOS)

Naga Ayachitula; Shu-Ping Chang; Larisa Shwartz; Surendra Maheswaran

Many diverse end point devices require high levels of interoperability to effectively manage services and applications. This paper attempts to provide a comprehensive framework for classifying services and offers a building-block approach that uses service as a basic unit for end-point interactions and collaboration. This paper presents a layered architecture of service classification that can be leveraged for facile and an effective adoption of new services and the orchestration of existing services. Today, a vast variety of services and agents exist in the market place and new ones are constantly created at a faster pace than ever. Initiated earlier, the move to a common open service platform for service collaboration conforming to standards like Open Service Gateway initiative (OSGi), Open Mobile Alliance Device Management (OMA DM) etc., expands the capabilities and service delivery for service providers and device manufacturers. Common end point device platform management functions include, but are not limited to, service collaboration, configuration, and inventory and software management services. A common service platform will make the services on the device interoperable with a broader range of applications, services, and transport and network technologies. Solutions available today use a single service for data connectivity, transport service mechanism, etc and therefore, by means of this tight coupling, risk limiting the service provider capabilities. Choosing to support multiple technologies enables service providers to support more types of services on device. However, the complexities arising from adoption of interoperability require taxonomy of services for effective service collaboration with existing services.

- Industry-2: SOA | Pp. 660-668

A Process-Driven e-Business Service Integration System and Its Application to e-Logistics Services

Kwanghoon Kim; Ilkyeun Ra

In this paper, we introduce a process-driven e-Business service integration (BSI) system, which is named ’e-Lollapalooza’, and has been successfully developed through a functional extension of the ebXML technology. It consists of three major components – Choreography Modeler coping with the process-driven collaboration issue, Runtime & Monitoring Client coping with the business intelligence issue and EJB-based BSI Engine coping with the scalability issue. This paper particularly focuses on the e-Lollapalooza’s implementation details supporting the ebXML-based choreography and orchestration among the engaged organizations in a process-driven multiparty collaboration for e-Logistics and e-Commerce services. Now, it is fully deployed on an EJB-based middleware computing environment, and operable based upon the ebXML standard as an e-Business process management framework for e-Logistics process automation and B2B choreography. Finally, we describe an application of the e-Lollapalooza system to the purchase order and delivery processes in a cyber-shopping mall run by a postal service company.

Palabras clave: e-Business Service Integration System; B2B Choreography and Orchestration; e-Business Process Management; ebXML Standard; CPP/CPA; BPSS; e-Logistics; e-Commerce.

- Industry-2: SOA | Pp. 669-678

BPM and SOA: Synergies and Challenges

Thomas Woodley; Stephane Gagnon

While BPM and SOA have evolved independently, there is an inevitable symbiotic relationship between them. As well, a SOA can be developed using various service formats, whether unique Web Services, orchestrated services using the Business Process Execution Language (BPEL), or other service providers. A SOA promotes the creation of highly accessible, loosely coupled, discrete business services. For greatest reach, BPM consumes and leverages such services, tying them together to solve and streamline broad business challenges. Not surprisingly however, there are certain considerations while designing a SOA to support BPM. Certain service designs align well within a BPM solution or strategy, while others can cause significant headaches for an overall BPM solution. Conversely, SOA with BPM layered on top can become an entirely different value proposition as compared to SOA alone. As a backbone for SOA components, BPM can integrate important functionalities to extend the value of the SOA investment. Similarly, BPM can provide a platform for SOA service management. We will explore the interdependencies between BPM and SOA, and will provide practical guidance on how to make each implementation mutually supportive, extending the reach and value of each. We will also discuess whether SOA alone can provide the business service functionality required for BPM solutions of the future, or if other complementary architectures may also have a role to play.

Palabras clave: Business Process; Business Service; Service Design; Business Process Execution Language; Soap Message.

- Industry-3: BPM | Pp. 679-688

Web Performance Indicator by Implicit User Feedback – Application and Formal Approach

Michael Barth; Michal Skubacz; Carsten Stolz

With growing importance of the internet, web sites have to be continously improved. Web metrics help to identify improvement potentials. Particularly success metrics for e-commerce sites based on transaction analysis are commonly available and well understood. In contrast to transaction based sites, the success of web sites geared toward information delivery is harder to quantify since there is no direct feedback of the user. We propose a generic success measure for information driven web sites. The idea of the measure is based on the observation of user behaviour in context of the web site semantics. In particular we observe users on their way through the web site and assign positive and negative scores to their actions. The value of the score depends on the transitions between page types and their contribution to the web site’s objectives. To derive a generic view on the metric construction, we introduce a formal meta environment deriving success measures upon the relations and dependencies of usage, content and structure of a web site.

Palabras clave: User Session; Content Page; Topic Shift; Target Page; Page Category.

- Industry-3: BPM | Pp. 689-700

Discovering the Most Frequent Patterns of Executions in Business Processes Described in BPEL

Benoit Dubouloz; Candemir Toklu

Emerging Business Process Management Systems (BPMS) are revolutionizing the way enterprises address inter-/intra- company process integration and business IT alignment problems. BPMS is becoming the tool of choice for process lifecycle management. Continuous process improvement is the key focus of process lifecycle management. To carry out this task effectively process designers need a deep understanding of the process behavior. They will need efficient mining algorithms that deliver pertinent and valuable information on all executed instances of a complex process. We propose an algorithm that mines the frequent paths of execution for processes described in BPEL by extending the formalism that has been proposed for mining frequent patterns in a workflow.

Palabras clave: Frequent Pattern; Mining Algorithm; Process Instance; Business Process Execution Language; Loop Body.

- Industry-3: BPM | Pp. 701-710

CONFIOUS: Managing the Electronic Submission and Reviewing Process of Scientific Conferences

Manos Papagelis; Dimitris Plexousakis; Panagiotis N. Nikolaou

Most scientific communities have recently established policies and mechanisms to put into practice electronic conference management, mainly by exploiting the Internet as the communication and cooperation infrastructure. Their foremost objective is to reduce the operational and communication costs but to maintain high quality reviewing and the fairness of the evaluation process. Interestingly, we report on experience gained by an implemented system named Confious. Confious [8] is a state-of-the-art management system that combines modern design, sophisticated algorithms and a powerful engine to help the program committee (PC) Chair to effortlessly accomplish a number of complicated tasks and carry out the necessary activities to produce the proceedings of a scientific conference. We are principally interested in (a) describing the workflow dynamics of a real-world scientific process, (b) identifying the main concerns of the person in charge of the conference organization, (c) providing mechanisms that enable the efficient management and monitoring of the overall coordination process.

Palabras clave: Potential Conflict; Program Committee; Scientific Conference; Review Form; User Role.

- Industry-3: BPM | Pp. 711-720

Tool Support for Model-Driven Development of Web Applications

Jaime Gómez; Alejandro Bia; Antonio Parraga

This paper describes the engineering foundations of VisualWADE, a CASE tool to automate the production of Web applications. VisualWADE follows a model-driven approach focusing on requirements analysis, high level design, and rapid prototyping. In this way, an application evolves smoothly from the first prototype to the final product, and its maintenance is a natural consequence of development. The paper also discusses the lessons learned in the development of the tool and its application to several case studies in the industrial context.

Palabras clave: Class Diagram; Object Constraint Language; Case Tool; Navigation Model; Object Oriented Analysis.

- Industry-4: Web Infrastructure | Pp. 721-730

Web Personalization: My Own Web Based on Open Content Platform

Seiyoung Lee; Hwan-Seung Yong

The key word in the 2nd round of portal competition will be Personalization. This study reviewed recent core research related to Web Personalization and thus showed the ideal next generation model based on recent personalization strategies of major portals. The model is mainly composed of the following: Open Content strategy based on RSS; Personalized Search based on a user’s preferences, Desktop Search, My Web storage, etc; Social Network, the concept that a user can share information with others depending on his interests; and Ubiquitous Computing that can merge people, computers and materials with the help of various multimedia technologies.

- Industry-4: Web Infrastructure | Pp. 731-739

An Effective Approach for Content Delivery in an Evolving Intranet Environment – A Case Study of the Largest Telecom Company in Taiwan

Chih-Chin Liang; Ping-Yu Hsu; Jun-Der Leu; Hsing Luh

Being the dominant telecommunication company in Taiwan, ChungHwa Telecom Co., Ltd., CHT is her symbol listed on the New York Stock Exchange, provides major communication services to more than 23 million people living in Taiwan. CHT has vast number of software developed on client-server or web-based architectures with client software installed in more than ten thousand client computers spreading over the entire nation. Since telecommunication industry evolved in fast pace, the software functions are constantly changing. The changes have to be reflected in all client software before new services can be launched. Thus, the cost and time in distributing contents to client computers has become a major concern in CHT. To improve the efficiency of contents distribution, this research helps CHT develop new software to automatically distribute contents to client computers. To minimize the chance of system locks and balance contents distribution loading, in the new system, each dispatching server sends update contents to no more than three other servers. The contents are delivered with hybrid routing strategy that combines both fixed and adaptive routing strategies. With its low error rate and speedy distribution, the new system reduces the man-minutes per year required to manage the contents distribution of a client server system from 14,227.2 minutes to 1,144 minutes, namely reduces 92% of the time. The user satisfaction of the system was also found to be above 80% among six factors of the measurement designed by Bailey, et al. [1].

Palabras clave: intranet systems; system integration; routing algorithms.

- Industry-4: Web Infrastructure | Pp. 740-749

Achieving Decision Consistency Across the SOA-Based Enterprise Using Business Rules Management Systems

James Taylor

The adoption of a service-oriented architecture (SOA) provides businesses with the ability to rapidly deploy new applications and easily integrate with other component applications both inside and outside the organization. This decentralized application environment provides a great deal of flexibility for business units and IT departments, but it also creates difficulty in managing the consistency of business decisions delivered through various applications. Business rules management systems (BRMS) provide a mechanism for managing decision logic and act as a conductor in order to align application decision behavior. The key to BRMS is the use of a centralized rules repository, within which resides the decision logic applications use to interact with their customers. Applications communicate with a rules engine in order to process those business rules specific to the decision required for the particular application and situational context. This paper will show how business rules management systems fit within a service-oriented architecture, how BRMS can act as intermediary between service-based applications and legacy applications, and how companies are using BRMS to manage decision processes across the enterprise.

Palabras clave: Decision Logic; Business Rule; Work Order; Hewlett Packard; Rule Engine.

- Industry-4: Web Infrastructure | Pp. 750-761

Service Design, Implementation and Description (Tutorial)

Marlon Dumas; Andreas Wombacher

There is an increasingly widespread acceptance of service-oriented architectures as a paradigm for integrating heterogeneous software applications. In this paradigm, independently developed and operated applications are exposed as (Web) services that are then interconnected using a set of standard protocols and languages. While the technology for developing basic services and interconnecting them on a point-to-point basis has attained a certain degree of maturity, there remain open challenges when it comes to building and managing services that participate in interactions that do not follow simple request-response patterns.

- Tutorials and Panels | Pp. 762-762

WISE-2005 Tutorial: Web Content Mining

Bing Liu

Web mining aims to develop a new generation of techniques to effectively mine useful information or knowledge from the Web. It consists of Web usage mining, Web structure mining, and Web content mining. Web usage mining refers to the discovery of user access patterns from Web usage logs. Web structure mining tries to discover useful knowledge from the structure of Web hyperlinks. Web content mining aims to extract and mine useful information or knowledge from Web page contents. This tutorial focuses on Web Content Mining. In the past few years, there was a rapid expansion of activities in this area. In this tutorial, I will introduce the main web content mining tasks and problems and state-of-the-art techniques for dealing with them. All parts of the tutorial have a mix of research and industry flavor, addressing seminal research concepts and looking at the technology from an industry angle.

Palabras clave: Concept Hierarchy; Customer Review; Comparative Shopping; Consumer Opinion; Industry Flavor.

- Tutorials and Panels | Pp. 763-763

An Introduction to Data Grid Management Systems

Arun Jagatheesan

We describe a “grid” as a coordinated infrastructure, formed by combining resources that might be owned by distributed and autonomous administrative domains. A data grid infrastructure facilitates a logical view of distributed resources that are shared between autonomous administrative domains. An emerging data storage problem is the management of unstructured data storage resources for inter/intra/multi-enterprise collaborative efforts. A new paradigm in data management systems, apart from traditional file systems and database systems is required. Data grids are being built around the world for coordinated sharing and management of unstructured data storage resources that are distributed at collaborating teams from the same or different enterprises. Data Grid Management System (DGMS) middleware will soon become part of the software infrastructure in many enterprises.

- Tutorials and Panels | Pp. 764-764

Are We Ready for the Service Oriented Architecture?

Stephane Gagnon

This Industry Track Panel poses a strategic question, “Are We Ready for the Service Oriented Architecture (SOA)?” We discuss this issue from both vendor and adopter perspectives in the company of 5 IT Executives. In particular, we go beyond the discussion of SOA standards as such, and try to assess the importance of this approach from the point of view of related technologies, such as Business Process Management, Enterprise Architecture, Configuration Management, Business Rules Systems, and Open Source Solutions.

- Tutorials and Panels | Pp. 765-765

Data Engineering Approach to Design of Web Services

George Feuerlicht

With the wide acceptance of Web Services as the preferred implementation platform for service-oriented applications there is increased interest in how such applications should be designed. While there are similarities between software components and services there is now a general agreement that mapping existing components directly to Web Services leads to suboptimal design and results in poor performance and scalability.Most practitioners recommend the use of coarse-grained, message-orientedWeb Service that minimize the number of messages and avoid the need to maintain state information between invocations. We argue that the design of message structures used as Web Services payloads directly impacts on application interoperability, and that excessive use of coarse-grained, document-centric message structures results in poor reuse and undesirable interdependencies between services. Our approach provides a framework for the design message structures using data engineering principles. We consider the impact of increasing message granularity on cohesion and coupling of service-oriented applications and analyze associated design trade-offs.

- Poster | Pp. 766-767


Tipo: libros

ISBN impreso


ISBN electrónico


Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación