Catálogo de publicaciones - libros

Compartir en
redes sociales

From Web to Social Web: Discovering and Deploying User and Content Profiles: Workshop on Web Mining, WebMine 2006, Berlin, Germany, September 18, 2006. Revised Selected and Invited Papers

Bettina Berendt ; Andreas Hotho ; Dunja Mladenic ; Giovanni Semeraro (eds.)

En conferencia: Workshop on Web Mining (WebMine) . Berlin, Germany . September 18, 2006 - September 18, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Data Mining and Knowledge Discovery; Computer Communication Networks; Database Management; Information Systems Applications (incl. Internet); Computers and Society

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74950-9

ISBN electrónico

978-3-540-74951-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-74951-6_1

An Analysis of Bloggers, Topics and Tags for a Blog Recommender System

Conor Hayes; Paolo Avesani; Uldis Bojars

Over the past few years the web has experienced an exponential growth in the use of weblogs or , web sites containing journal-style entries presented in reverse chronological order. In this paper we provide an analysis of the type of recommendation strategy suitable for this domain. We introduce measures to characterise the blogosphere in terms of blogger and topic drift and we demonstrate how these measures can be used to construct a plausible explanation for blogger behaviour. We show that the blog domain is characterised by bloggers moving frequently from topic to topic and that blogger activity closely tracks events in the real world. We then demonstrate how tag cloud information each cluster allows us to identify the most topic-relevant and consistent blogs in each cluster. We briefly describe how we plan to integrate this work within the SIOC framework.

Pp. 1-20

doi: 10.1007/978-3-540-74951-6_2

Combining Web Usage Mining and XML Mining in a Real Case Study

Federico Michele Facca

In this paper we report our first extended experiments on Conceptual Web log generation and XML Mining over generated Conceptual logs. Conceptual logs are XML Web server log containing rich information about the structure of a Web site and its content. Furthermore they can be automatically generated starting from a proper logging facility and a conceptual application model. This allows an easier analysis of the results of the mining process, thanks to the rich information provided and allows to perform the data mining process at different levels of abstraction. In this work we use WebML as conceptual model, and as mining tool; nevertheless the underlying idea is of general validity and can be applied to any other conceptual modeling framework and mining technique.

Pp. 21-40

doi: 10.1007/978-3-540-74951-6_3

Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web

Katharina Probst; Rayid Ghani; Marko Krema; Andy Fano; Yan Liu

We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute-value pairs. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. We formulate the extraction task as a classification problem and use Naïve Bayes combined with a multi-view semi-supervised algorithm (co-EM). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the semi-supervised classification algorithm. The extracted attributes and values are then linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods products. The extracted attribute-value pairs can be useful in a variety of applications, including product recommendations, product comparisons, and demand forecasting. In this paper, we describe one practical application of the extracted attribute-value pairs: a prototype of an Assortment Comparison Tool that allows retailers to compare their product assortments to those of their competitors. As the comparison is based on attributes and values, we can draw meaningful conclusions at a very fine-grained level. We present the details and research issues of such a tool, as well as the current state of our prototype.

Pp. 41-60

doi: 10.1007/978-3-540-74951-6_4

Discovering User Profiles from Semantically Indexed Scientific Papers

Giovanni Semeraro; Pierpaolo Basile; Marco de Gemmis; Pasquale Lops

Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings () of a polysemous word, the correct one. Semantically indexed documents are used to train a naïve Bayes learner that infers “semantic”, user profiles as binary text classifiers (user-likes and user-dislikes).

Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.

Pp. 61-81

doi: 10.1007/978-3-540-74951-6_5

Web Usage Mining in Noisy and Ambiguous Environments: Exploring the Role of Concept Hierarchies, Compression, and Robust User Profiles

Olfa Nasraoui; Esin Saka

Recent efforts in Web usage mining have started incorporating more semantics into the data in order to obtain a representation deeper than shallow clicks. In this paper, we review these approaches, and examine the incorporation of simple cues from a website hierarchy in order to relate clickstream events that would otherwise seem unrelated, and thus perform URL compression. We study their effect on data reduction and on the quality of the resulting knowledge discovery. Web usage data is also notorious for containing moderate to high amounts of noise, thus motivating the use of robust knowledge discovery algorithms that can resist noise and outliers with various degrees of resistance or robustness. Therefore, we also examine the effect of robustness on the final quality of the knowledge discovery. Our experimental results conclude that post-processed and robust user profiles have better quality than raw profiles that are estimated through optimization alone. However URL compression, as expected, tends to reduce the quality, but also can drastically reduce the size of the data set, resulting in faster mining.

Pp. 82-101

doi: 10.1007/978-3-540-74951-6_6

From World-Wide-Web Mining to Worldwide Webmining: Understanding People’s Diversity for Effective Knowledge Discovery

Bettina Berendt; Anett Kralisch

Users are well-established objects of analysis in Web mining: Web usage mining investigates users’ behaviour, Web content and structure mining analyze the content and link structures they generate, Web community mining transfers these questions from analyses of individuals to analyses of groups, etc. However, too often users are reduced to the digital data they have created and/or accessed, and it is (generally implicitly) assumed that “all users are alike” in the ways in which they create and access those data. We argue that to make these analyses and findings more meaningful, a shift is needed from technology to human aspects. This shift calls for a multidisciplinary approach that integrates insights from behavioural, psychological, and linguistic sciences into the field of knowledge discovery. In this paper, we introduce the concept to emphasize that data and knowledge are created and accessed globally, from users who differ in language, culture, and other factors. The Web is the major medium for these activities. The paper investigates how knowledge discovery, including but not limited to Web mining, may benefit from an integration of the concept of ubiquity of people. We provide an overview of the impact of language and culture on how data and knowledge are accessed, shared, and evaluated. We describe a series of studies as an example of integrating these questions into Web (usage) mining. We conclude with a discussion of research questions that are raised by the integration of the ubiquity of people into knowledge discovery, in particular with regard to data collection, data processing, and data presentation.

Pp. 102-121

doi: 10.1007/978-3-540-74951-6_7

Aspect-Based Tagging for Collaborative Media Organization

Oliver Flasch; Andreas Kaspari; Katharina Morik; Michael Wurst

Organizing multimedia data is very challenging. One of the most important approaches to support users in searching and navigating media collections is collaborative filtering. Recently, systems as flickr or last.fm have become popular. They allow users to not only rate but also tag items with arbitrary labels. Such systems replace the concept of a global common ontology, as envisioned by the Semantic Web, with a paradigm of heterogeneous, local “folksonomies”. The problem of such tagging systems is, however, that resulting taggings carry only little semantics. In this paper, we present an extension to the tagging approach. We allow tags to be grouped into aspects. We show that introducing aspects does not only help the user to manage large numbers of tags, but also facilitates data mining in various ways. We exemplify our approach on Nemoz, a distributed media organizer based on tagging and distributed data mining.

Pp. 122-141

doi: 10.1007/978-3-540-74951-6_8

Contextual Recommendation

Sarabjot Singh Anand; Bamshad Mobasher

The role of context in our daily interaction with our environment has been studied in psychology, linguistics, artificial intelligence, information retrieval, and more recently, in pervasive/ubiquitous computing. However, context has been largely ignored in research into recommender systems specifically and personalization in general. In this paper we describe how context can be brought to bear on recommender systems. As a means for achieving this, we propose a fundamental shift in terms of how we model a user within a recommendation system: inspired by models of human memory developed in psychology, we distinguish between a user’s short term and long term memories, define a recommendation process that uses these two memories, using context-based retrieval cues to retrieve relevant preference information from long term memory and use it in conjunction with the information stored in short term memory for generating recommendations. We also describe implementations of recommender systems and personalization solutions based on this framework and show how this results in an increase in recommendation quality.

Pp. 142-160