Catálogo de publicaciones - libros
Advances in Knowledge Discovery and Data Mining: 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings
Wee-Keong Ng ; Masaru Kitsuregawa ; Jianzhong Li ; Kuiyu Chang (eds.)
En conferencia: 10º Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) . Singapore, Singapore . April 9, 2006 - April 12, 2006
Resumen/Descripción – provisto por la editorial
No disponible.
Palabras clave – provistas por la editorial
No disponibles.
Disponibilidad
Institución detectada | Año de publicación | Navegá | Descargá | Solicitá |
---|---|---|---|---|
No detectada | 2006 | SpringerLink |
Información
Tipo de recurso:
libros
ISBN impreso
978-3-540-33206-0
ISBN electrónico
978-3-540-33207-7
Editor responsable
Springer Nature
País de edición
Reino Unido
Fecha de publicación
2006
Información sobre derechos de publicación
© Springer-Verlag Berlin Heidelberg 2006
Tabla de contenidos
doi: 10.1007/11731139_41
: An Integrated Multigraph Cut-Based Approach for Detecting Events from a Website
Qiankun Zhao; Sourav S Bhowmick; Aixin Sun
The web is a sensor of the real world. Often, content of web pages correspond to real world objects or events whereas the web usage data reflect users’ opinions and actions to the corresponding events. Moreover, the of the web usage data may reflect the evolution of the corresponding events over time. In this paper, we present two variants of (ntegrated eb vent etector) algorithm to extract events from website data by integrating and . We model the website related data as a , where each vertex represents a web page and each edge represents the between the connected web pages in terms of , , and/or . Then, the problem of event detection is to extract from the multigraph to represent real world events. We solve this problem by adopting the normalized graph cut algorithm. Experiments show that the s play an important role in algorithms and can produce high quality results.
- Web Mining | Pp. 351-360
doi: 10.1007/11731139_42
Enhancing Duplicate Collection Detection Through Replica Boundary Discovery
Zhigang Zhang; Weijia Jia; Xiaoming Li
Web documents are widely replicated on the Internet. These replicated documents bring potential problems to Web based information systems. So replica detection on the Web is an indispensable task. The challenge is to find these duplicated collections from a very large data set with limited hardware resources in acceptable time. In this paper, we first introduce the notion of to roughly reflect the situation of the replicas; then we propose an effective and efficient approach to discover the boundary of the replicas. The advantages of the proposed approach include: first, it dramatically reduces pair-wise document similarity computation, making it much faster than traditional replicated document detection approaches; second, it can identify the boundary of the replicated collections accurately, demonstrating to what extent two collections are replicated. On two web page sets containing 24 million and 30 million Web pages respectively, we evaluated the accuracy of the approach.
- Web Mining | Pp. 361-370
doi: 10.1007/11731139_43
Summarization and Visualization of Communication Patterns in a Large-Scale Social Network
Preetha Appan; Hari Sundaram; Belle Tseng
This paper deals with the problem of summarization and visualization of communication patterns in a large scale corporate social network. The solution to the problem can have significant impact in understanding large scale social network dynamics. There are three key aspects to our approach. First we propose a ring based network representation scheme – the insight is that visual displays of of large scale social networks can be accomplished Second, we detect three specific network activity patterns – and patterns at multiple time scales. For each pattern we develop specific visualizations within the overall ring based framework. Finally we develop an activity pattern ranking scheme and a visualization that enables us to summarize key social network activities in a single snapshot. We have validated our approach by using the large Enron corpus – we have excellent activity detection results, and very good preliminary user study results for the visualization.
- Graph and Network Mining | Pp. 371-379
doi: 10.1007/11731139_44
Patterns of Influence in a Recommendation Network
Jure Leskovec; Ajit Singh; Jon Kleinberg
Information cascades are phenomena in which individuals adopt a new action or idea due to influence by others. As such a process spreads through an underlying social network, it can result in widespread adoption overall. We consider information cascades in the context of recommendations, and in particular study the patterns of cascading recommendations that arise in large social networks. We investigate a large person-to-person recommendation network, consisting of four million people who made sixteen million recommendations on half a million products. Such a dataset allows us to pose a number of fundamental questions: What kinds of cascades arise frequently in real life? What features distinguish them? We enumerate and count cascade subgraphs on large directed graphs; as one component of this, we develop a novel efficient heuristic based on graph isomorphism testing that scales to large datasets. We discover novel patterns: the distribution of cascade sizes is approximately heavy-tailed; cascades tend to be shallow, but occasional large bursts of propagation can occur. The relative abundance of different cascade subgraphs suggests subtle properties of the underlying social network and recommendation process.
- Graph and Network Mining | Pp. 380-389
doi: 10.1007/11731139_45
Constructing Decision Trees for Graph-Structured Data by Chunkingless Graph-Based Induction
Phu Chien Nguyen; Kouzou Ohara; Akira Mogi; Hiroshi Motoda; Takashi Washio
Chunkingless Graph-Based Induction (Cl-GBI) is a machine learning technique proposed for the purpose of extracting typical patterns from graph-structured data. This method is regarded as an improved version of Graph-Based Induction (GBI) which employs stepwise pair expansion (pairwise chunking) to extract typical patterns from graph-structured data, and can find overlapping patterns that cannot not be found by GBI. In this paper, we propose an algorithm for constructing decision trees for graph-structured data using Cl-GBI. This decision tree construction algorithm, called Decision Tree Chunkingless Graph-Based Induction (DT-ClGBI), can construct decision trees from graph-structured datasets while simultaneously constructing attributes useful for classification using Cl-GBI internally. Since patterns extracted by Cl-GBI are considered as attributes of a graph, and their existence/non-existence are used as attribute values, DT-ClGBI can be conceived as a tree generator equipped with feature construction capability. Experiments were conducted on synthetic and real-world graph-structured datasets showing the effectiveness of the algorithm.
- Graph and Network Mining | Pp. 390-399
doi: 10.1007/11731139_46
Combining Smooth Graphs with Semi-supervised Classification
Xueyuan Zhou; Chunping Li
In semi-supervised classification, many methods use the graph representation of data. Based on the graph, different methods, e.g. random walk model, spectral cluster, Markov chain, and regularization theory etc., are employed to design classification algorithms. However, all these methods use the form of graphs constructed directly from data, e.g. NN graph. In reality, data is only the observation with noise of hidden variables. Classification results using data directly from the observation may be biased by noise. Therefore, filtering the noise before using any classification methods can give a better classification. We propose a novel method to filter the noise in high dimension data by smoothing the graph. The analysis is given from the aspects of spectral theory, Markov chain, and regularization. We show that our method can reduce the high frequency components of the graph, and also has an explanation from regularization view. A graph volume based parameter learning method can be efficiently applied to classification. Experiments on artificial and real world data set indicate that our method has a superior classification accuracy.
- Graph and Network Mining | Pp. 400-409
doi: 10.1007/11731139_47
Network Data Mining: Discovering Patterns of Interaction Between Attributes
John Galloway; Simeon J. Simoff
Network Data Mining identifies emergent networks between myriads of individual data items and utilises special statistical algorithms that aid visualisation of ‘emergent’ patterns and trends in the linkage. It complements predictive data mining methods and methods for outlier detection, which assume the independence between the attributes and the independence between the values of these attributes. Many problems, however, especially phenomena of a more complex nature, are not well suited for these methods. For example, in the analysis of transaction data there are no known suspicious transactions. This paper presents a human-centred methodology and supporting techniques that address the issues of depicting implicit relationships between data attributes and/or specific values of these attributes. The methodology and corresponding techniques are illustrated on a case study from the area of security.
- Graph and Network Mining | Pp. 410-414
doi: 10.1007/11731139_48
SGPM: Static Group Pattern Mining Using Apriori-Like Sliding Window
John Goh; David Taniar; Ee-Peng Lim
Mobile user data mining is a field that focuses on extracting interesting pattern and knowledge out from data generated by mobile users. Group pattern is a type of mobile user data mining method. In group pattern mining, group patterns from a given user movement database is found based on spatio-temporal distances. In this paper, we propose an improvement of efficiency using area method for locating mobile users and using for . This reduces the complexity of valid group pattern mining problem. We support the use of static method, which uses areas and instead to find group patterns thus reducing the complexity of the mining problem.
- Association Rule Mining | Pp. 415-424
doi: 10.1007/11731139_49
Mining Temporal Indirect Associations
Ling Chen; Sourav S. Bhowmick; Jinyan Li
This paper presents a novel pattern called . An indirect association pattern refers to a pair of items that rarely occur together but highly depend on the presence of a mediator itemset. The existing model of indirect association does not consider the of items. Consequently, some discovered patterns may be invalid while some useful patterns may not be covered. To overcome this drawback, in this paper, we take into account the lifespan of items to extend the current model to be temporal. An algorithm, , that finds the set of mediators in manner is developed. Then, we extend the framework of the algorithm to discover temporal indirect associations. Our experimental results demonstrated the efficiency and effectiveness of the proposed algorithms.
- Association Rule Mining | Pp. 425-434
doi: 10.1007/11731139_50
Mining Top-K Frequent Closed Itemsets Is Not in APX
Chienwen Wu
Mining top-k frequent closed itemsets was initially proposed and exactly solved by Wang et al. [IEEE Transactions on Knowledge and Data Engineering 17 (2005) 652-664]. However, in the literature, no research has ever considered the complexity of this problem. In this paper, we present a set of proofs showing that, in the general case, the problem of mining top-k frequent closed itemsets is not in . This indicates that heuristic algorithms rather than exact algorithms are preferred to solve the problem.
- Association Rule Mining | Pp. 435-439