Catálogo de publicaciones - libros

Compartir en
redes sociales


Web Information Systems Engineering: WISE 2005: 6th International Conference on Web Information Systems Engineering, New York, NY, USA, November 20-22, 2005, Proceedings

Anne H. H. Ngu ; Masaru Kitsuregawa ; Erich J. Neuhold ; Jen-Yao Chung ; Quan Z. Sheng (eds.)

En conferencia: 6º International Conference on Web Information Systems Engineering (WISE) . New York, NY, USA . November 20, 2005 - November 22, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Popular Computer Science; Information Systems Applications (incl. Internet); Information Storage and Retrieval; Database Management; Artificial Intelligence (incl. Robotics); Computers and Society

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-30017-5

ISBN electrónico

978-3-540-32286-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

Yasuhito Asano; Takao Nishizeki; Masashi Toyoda

There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al . The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al . built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is significantly effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.

Palabras clave: Base Method; Mining Community; Virtual Source; Dense Subgraph; Virtual Edge.

- Web Mining | Pp. 1-14

A Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis

Guandong Xu; Yanchun Zhang; Xiaofang Zhou

Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of clickstream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.

Palabras clave: Collaborative Filter; User Session; User Access Pattern; Probabilistic Latent Semantic Analysis Model; Latent Semantic Model.

- Web Mining | Pp. 15-28

Constructing Interface Schemas for Search Interfaces of Web Databases

Hai He; Weiyi Meng; Clement Yu; Zonghuan Wu

Many databases have become Web-accessible through form-based search interfaces (i.e., search forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta information ; however, the schema is not formally defined on the search interface. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we introduce a schema model for complex search interfaces, and present a tool (WISE- i Extractor) for automatically extracting and deriving all the needed information to construct the schemas. Our experimental results on real search interfaces indicate that this tool is highly effective.

Palabras clave: Logic Relationship; Domain Type; Domain Element; Search Interface; Exclusive Attribute.

- Web Mining | Pp. 29-42

Temporal Ranking of Search Engine Results

Adam Jatowt; Yukiko Kawai; Katsumi Tanaka

Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevant but also fresh information. We have developed a method for adjusting the ranking of search engine results from the point of view of page freshness and relevance. It uses an algorithm that post-processes search engine results based on the changed contents of the pages. By analyzing archived versions of web pages we estimate temporal qualities of pages, that is, general freshness and relevance of the page to the query topic over certain time frames. For the top quality web pages, their content differences between past snapshots of the pages indexed by a search engine and their present versions are analyzed. Basing on these differences the algorithm assigns new ranks to the web pages without the need to maintain a constantly updated index of web documents.

Palabras clave: Search Engine; Cosine Similarity; Temporal Quality; Query Word; Query Vector.

- Web Information Retrieval | Pp. 43-52

Evaluation of Result Merging Strategies for Metasearch Engines

Yiyao Lu; Weiyi Meng; Liangcai Shu; Clement Yu; King-Lup Liu

Result merging is a key component in a metasearch engine. Once the results from various search engines are collected, the metasearch system merges them into a single ranked list. The effectiveness of a metasearch engine is closely related to the result merging algorithm it employs. In this paper, we investigate a variety of resulting merging algorithms based on a wide range of available information about the retrieved results, from their local ranks, their titles and snippets, to the full documents of these results. The effectiveness of these algorithms is then compared experimentally based on 50 queries from the TREC Web track and 10 most popular general-purpose search engines. Our experiments yield two important results. First, simple result merging strategies can outperform Google. Second, merging based on the titles and snippets of retrieved results can outperform that based on the full documents.

- Web Information Retrieval | Pp. 53-66

Decomposition-Based Optimization of Reload Strategies in the World Wide Web

Dirk Kukulenz

Web sites, Web pages and the data on pages are available only for specific periods of time and are deleted afterwards from a client’s point of view. An important task in order to retrieve information from the Web is to consider Web information in the course of time. Different strategies like push and pull strategies may be applied for this task. Since push services are usually not available, pull strategies have to be conceived in order to optimize the retrieved information with respect to the age of retrieved data and its completeness. In this article we present a new procedure to optimize retrieved data from Web pages by page decomposition. By deploying an automatic Wrapper induction technique a page is decomposed into functional segments. Each segment is considered as an independent component for the analysis of the time behavior of the page. Based on this decomposition we present a new component-based download strategy. By applying this method to Web pages it is shown that for a fraction of Web data the freshness of retrieved data may be improved significantly compared to traditional methods.

Palabras clave: Continuous Query; Remote Source; Regular Grammar; Pull Strategy; Page Change.

- Web Information Retrieval | Pp. 67-80

An Ontological Approach for Defining Agents for Collaborative Applications

I. T. Hawryszkiewycz

Collaboration has become important especially in knowledge intensive applications. Computer support systems for collaborative work on The Web, however, usually grow in an ad-hoc manner. The paper suggests two reasons for such an ad-hoc approach. One is a lack of methods to map collaborative requirements into collaborative workspaces. The other is that collaborative processes themselves change over time. The paper proposes a metamodel that provides an ontology to support collaborative process modelling and use these to define generic agents, which can assist users to set up and change collaborative workspaces. The metamodel itself integrates social, organizational and workflow semantics providing the ability to describe complex collaborative processes. The metamodel concepts and the corresponding agents are generic in nature and the paper will describe ways to map such generic concepts to specific domain applications.

Palabras clave: Multiagent System; Generic Agent; Collaborative Process; Agent Structure; Real World Phenomenon.

- Metadata Management | Pp. 81-94

Improving Web Data Annotations with Spreading Activation

Fatih Gelgi; Srinivas Vadrevu; Hasan Davulcu

The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by the human eye, “meaningful information” is still largely inaccessible for the computer applications. In this paper, we present automated algorithms to gather meta-data and instance information by utilizing global regularities on the Web and incorporating the contextual information. Our system is distinguished since it does not require domain specific engineering. Experimental evaluations were successfully performed on the TAP knowledge base and the faculty-course home pages of computer science departments containing 16,861 Web pages.

Palabras clave: Semi-structured data; spreading activation; semantic partitioning.

- Metadata Management | Pp. 95-106

Semantic Partitioning of Web Pages

Srinivas Vadrevu; Fatih Gelgi; Hasan Davulcu

In this paper we describe the semantic partitioner algorithm, that uses the structural and presentation regularities of the Web pages to automatically transform them into hierarchical content structures. These content structures enable us to automatically annotate labels in the Web pages with their semantic roles, thus yielding meta-data and instance information for the Web pages. Experimental results with the TAP knowledge base and computer science department Web sites, comprising 16,861 Web pages indicate that our algorithm is able gather meta-data accurately from various types of Web pages. The algorithm is able to achieve this performance without any domain specific engineering requirement.

Palabras clave: Regular Expression; Semantic Role; Attribute Label; Kleene Star; Grammar Induction.

- Metadata Management | Pp. 107-118

A Formal Ontology Reasoning with Individual Optimization: A Realization of the Semantic Web

Pakornpong Pothipruk; Guido Governatori

Answering a query over a group of RDF data pages is a trivial process. However, in the Semantic Web, there is a need for ontology technology. Consequently, OWL, a family of web ontology languages based on description logic, has been proposed for the Semantic Web. Answering a query over the SemanticWeb is thus not trivial, but a deductive process. However, the reasoning on OWL with data has an efficiency problem. Thus, we introduce optimization techniques for the inference algorithm. This work demonstrates the techniques for instance checking and instance retrieval problems with respect to $\mathcal{ALC}$ description logic which covers certain parts of OWL.

Palabras clave: Description Logic; Individual Optimization; Query Answering; Reasoning Service; Boolean Query.

- Ontology and Semantic Web | Pp. 119-132