Catálogo de publicaciones - libros

Compartir en
redes sociales


Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings

Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)

En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

No disponibles.

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2006 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-33206-0

ISBN electrónico

978-3-540-33207-7

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2006

Tabla de contenidos

: An Integrated Multigraph Cut-Based Approach for Detecting Events from a Website

Qiankun Zhao; Sourav S Bhowmick; Aixin Sun

The web is a sensor of the real world. Often, content of web pages correspond to real world objects or events whereas the web usage data reflect users’ opinions and actions to the corresponding events. Moreover, the of the web usage data may reflect the evolution of the corresponding events over time. In this paper, we present two variants of (ntegrated eb vent etector) algorithm to extract events from website data by integrating and . We model the website related data as a , where each vertex represents a web page and each edge represents the between the connected web pages in terms of , , and/or . Then, the problem of event detection is to extract from the multigraph to represent real world events. We solve this problem by adopting the normalized graph cut algorithm. Experiments show that the s play an important role in algorithms and can produce high quality results.

- Web Mining | Pp. 351-360

Enhancing Duplicate Collection Detection Through Replica Boundary Discovery

Zhigang Zhang; Weijia Jia; Xiaoming Li

Web documents are widely replicated on the Internet. These replicated documents bring potential problems to Web based information systems. So replica detection on the Web is an indispensable task. The challenge is to find these duplicated collections from a very large data set with limited hardware resources in acceptable time. In this paper, we first introduce the notion of to roughly reflect the situation of the replicas; then we propose an effective and efficient approach to discover the boundary of the replicas. The advantages of the proposed approach include: first, it dramatically reduces pair-wise document similarity computation, making it much faster than traditional replicated document detection approaches; second, it can identify the boundary of the replicated collections accurately, demonstrating to what extent two collections are replicated. On two web page sets containing 24 million and 30 million Web pages respectively, we evaluated the accuracy of the approach.

- Web Mining | Pp. 361-370

Summarization and Visualization of Communication Patterns in a Large-Scale Social Network

Preetha Appan; Hari Sundaram; Belle Tseng

This paper deals with the problem of summarization and visualization of communication patterns in a large scale corporate social network. The solution to the problem can have significant impact in understanding large scale social network dynamics. There are three key aspects to our approach. First we propose a ring based network representation scheme – the insight is that visual displays of of large scale social networks can be accomplished Second, we detect three specific network activity patterns – and patterns at multiple time scales. For each pattern we develop specific visualizations within the overall ring based framework. Finally we develop an activity pattern ranking scheme and a visualization that enables us to summarize key social network activities in a single snapshot. We have validated our approach by using the large Enron corpus – we have excellent activity detection results, and very good preliminary user study results for the visualization.

- Graph and Network Mining | Pp. 371-379

Patterns of Influence in a Recommendation Network

Jure Leskovec; Ajit Singh; Jon Kleinberg

Information cascades are phenomena in which individuals adopt a new action or idea due to influence by others. As such a process spreads through an underlying social network, it can result in widespread adoption overall. We consider information cascades in the context of recommendations, and in particular study the patterns of cascading recommendations that arise in large social networks. We investigate a large person-to-person recommendation network, consisting of four million people who made sixteen million recommendations on half a million products. Such a dataset allows us to pose a number of fundamental questions: What kinds of cascades arise frequently in real life? What features distinguish them? We enumerate and count cascade subgraphs on large directed graphs; as one component of this, we develop a novel efficient heuristic based on graph isomorphism testing that scales to large datasets. We discover novel patterns: the distribution of cascade sizes is approximately heavy-tailed; cascades tend to be shallow, but occasional large bursts of propagation can occur. The relative abundance of different cascade subgraphs suggests subtle properties of the underlying social network and recommendation process.

- Graph and Network Mining | Pp. 380-389

Constructing Decision Trees for Graph-Structured Data by Chunkingless Graph-Based Induction

Phu Chien Nguyen; Kouzou Ohara; Akira Mogi; Hiroshi Motoda; Takashi Washio

Chunkingless Graph-Based Induction (Cl-GBI) is a machine learning technique proposed for the purpose of extracting typical patterns from graph-structured data. This method is regarded as an improved version of Graph-Based Induction (GBI) which employs stepwise pair expansion (pairwise chunking) to extract typical patterns from graph-structured data, and can find overlapping patterns that cannot not be found by GBI. In this paper, we propose an algorithm for constructing decision trees for graph-structured data using Cl-GBI. This decision tree construction algorithm, called Decision Tree Chunkingless Graph-Based Induction (DT-ClGBI), can construct decision trees from graph-structured datasets while simultaneously constructing attributes useful for classification using Cl-GBI internally. Since patterns extracted by Cl-GBI are considered as attributes of a graph, and their existence/non-existence are used as attribute values, DT-ClGBI can be conceived as a tree generator equipped with feature construction capability. Experiments were conducted on synthetic and real-world graph-structured datasets showing the effectiveness of the algorithm.

- Graph and Network Mining | Pp. 390-399

Combining Smooth Graphs with Semi-supervised Classification

Xueyuan Zhou; Chunping Li

In semi-supervised classification, many methods use the graph representation of data. Based on the graph, different methods, e.g. random walk model, spectral cluster, Markov chain, and regularization theory etc., are employed to design classification algorithms. However, all these methods use the form of graphs constructed directly from data, e.g. NN graph. In reality, data is only the observation with noise of hidden variables. Classification results using data directly from the observation may be biased by noise. Therefore, filtering the noise before using any classification methods can give a better classification. We propose a novel method to filter the noise in high dimension data by smoothing the graph. The analysis is given from the aspects of spectral theory, Markov chain, and regularization. We show that our method can reduce the high frequency components of the graph, and also has an explanation from regularization view. A graph volume based parameter learning method can be efficiently applied to classification. Experiments on artificial and real world data set indicate that our method has a superior classification accuracy.

- Graph and Network Mining | Pp. 400-409

Network Data Mining: Discovering Patterns of Interaction Between Attributes

John Galloway; Simeon J. Simoff

Network Data Mining identifies emergent networks between myriads of individual data items and utilises special statistical algorithms that aid visualisation of ‘emergent’ patterns and trends in the linkage. It complements predictive data mining methods and methods for outlier detection, which assume the independence between the attributes and the independence between the values of these attributes. Many problems, however, especially phenomena of a more complex nature, are not well suited for these methods. For example, in the analysis of transaction data there are no known suspicious transactions. This paper presents a human-centred methodology and supporting techniques that address the issues of depicting implicit relationships between data attributes and/or specific values of these attributes. The methodology and corresponding techniques are illustrated on a case study from the area of security.

- Graph and Network Mining | Pp. 410-414

SGPM: Static Group Pattern Mining Using Apriori-Like Sliding Window

John Goh; David Taniar; Ee-Peng Lim

Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using for . This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and instead to find group patterns thus reducing the complexity of the mining problem.

- Association Rule Mining | Pp. 415-424

Mining Temporal Indirect Associations

Ling Chen; Sourav S. Bhowmick; Jinyan Li

This paper presents a novel pattern called . An indirect association pattern refers to a pair of items that rarely occur together but highly depend on the presence of a mediator itemset. The existing model of indirect association does not consider the of items. Consequently, some discovered patterns may be invalid while some useful patterns may not be covered. To overcome this drawback, in this paper, we take into account the lifespan of items to extend the current model to be temporal. An algorithm, , that finds the set of mediators in manner is developed. Then, we extend the framework of the algorithm to discover temporal indirect associations. Our experimental results demonstrated the efficiency and effectiveness of the proposed algorithms.

- Association Rule Mining | Pp. 425-434

Mining Top-K Frequent Closed Itemsets Is Not in APX

Chienwen Wu

Mining top-k frequent closed itemsets was initially proposed and exactly solved by Wang et al. [IEEE Transactions on Knowledge and Data Engineering 17 (2005) 652-664]. However, in the literature, no research has ever considered the complexity of this problem. In this paper, we present a set of proofs showing that, in the general case, the problem of mining top-k frequent closed itemsets is not in . This indicates that heuristic algorithms rather than exact algorithms are preferred to solve the problem.

- Association Rule Mining | Pp. 435-439