Catálogo de publicaciones - libros

Compartir en
redes sociales


Data Mining and Knowledge Management: Chinese Academy of Sciences Symposium CASDMKD 2004, Beijing, China, July 12-14, 2004, Revised Paper

Yong Shi ; Weixuan Xu ; Zhengxin Chen (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Database Management; Information Systems Applications (incl.Internet); Computer Appl. in Administrative Data Processing; Business Information Systems

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-23987-1

ISBN electrónico

978-3-540-30537-8

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin/Heidelberg 2005

Tabla de contenidos

Knowledge-Information Circulation Through the Enterprise: Forward to the Roots of Knowledge Management

Milan Zeleny

The field of Knowledge Management (KM) has already completed its initiatory phase, characterized by operational confusion between knowledge and information, stemming from the tenuous notion of “explicit knowledge”. Consequently, the progress of KM has been much slower than would the significance of knowledge management in a modern enterprise indicate. Here we propose and discuss four cornerstones for returning to the roots of knowledge management and so moving forward towards a new phase of KM. We discuss the roots of reliable knowledge thinking and theory in economics, management and philosophy. Then we formulate clear, unambiguous and pragmatic definitions and distinctions of knowledge and information, establish simple and natural measures of the value of knowledge and propose the Knowledge-Information (KnowIn) continuum and its circulatory nature in managing knowledge of the enterprise. Autopoietic cycle A-C-I-S is elaborated to that purpose. We conclude the paper by discussing some implications of the new KM for strategy and strategic management.

Palabras clave: Knowledge Management; Tacit Knowledge; Strategic Management; Explicit Knowledge; Natural Measure.

- Keynote Lectures | Pp. 22-33

A Hybrid Nonlinear Classifier Based on Generalized Choquet Integrals

Zhenyuan Wang; Hai-Feng Guo; Yong Shi; Kwong-Sak Leung

In this new hybrid model ofnonlinear classifier, unlike the classical linear classifier where the feature attributes influence the classifying attribute independently, the interaction among the influences from the feature attributes toward the classifying attribute is described by a signed fuzzy measure. An optimized Choquet integral with respect to an optimized signed fuzzy measure is adopted as a nonlinear projector to map each observation from the sample space onto a one-dimensional space. Thus, combining a criterion concerning the weighted Euclidean distance, the new linear classifier also takes account of the elliptic-clustering character of the classes and, therefore, is much more powerful than some existing classifiers. Such a classifier can be applied to deal with data even having classes with some complex geometrical shapes such as crescent (cashew-shaped) classes.

- Data Mining Methodology | Pp. 34-40

Fuzzy Classification Using Self-Organizing Map and Learning Vector Quantization

Ning Chen

Fuzzy classification proposes an approach to solve uncertainty problem in classification tasks. It assigns an instance to more than one class with different degrees instead of a definite class by crisp classification. This paper studies the usage of fuzzy strategy in classification. Two fuzzy algorithms for sequential self-organizing map and learning vector quantization are proposed based on fuzzy projection and learning rules. The derived classifiers are able to provide fuzzy classes when classifying new data. Experiments show the effectiveness of proposed algorithms in terms of classification accuracy.

Palabras clave: fuzzy classification; self-organizing map (SOM); learning vector quantization (LVQ).

- Data Mining Methodology | Pp. 41-50

Solving Discriminant Models Using Interior Point Algorithm

Siming Huang; Guoliang Yang; Chao Su

In this paper we first survey the linear programming based discriminant models in the literature. We then propose an interior point algorithm to solve the linear programming. The algorithm is polynomial with simple starting point.

Palabras clave: Linear Discriminant Analysis; Discriminant Model; Interior Point Algorithm; Linear Program Approach; Data Mining Problem.

- Data Mining Methodology | Pp. 51-60

A Method for Solving Optimization Problem in Continuous Space Using Improved Ant Colony Algorithm

Ling Chen; Jie Shen; Ling Qin; Jin Fan

A method for solving optimization problem with continuous parameters using improved ant colony algorithm is presented. In the method, groups of candidate values of the components are constructed, and each value in the group has its trail information. In each iteration of the ant colony algorithm, the method first chooses initial values of the components using the trail information. Then, crossover and mutation can determine the values of the components in the solution. Our experimental results of the problem of nonlinear programming show that our method has much higher convergence speed and stability than that of GA, and the drawback of ant colony algorithm of not being suitable for solving continuous optimization problems is overcome.

Palabras clave: Quadratic Assignment Problem; Solve Optimization Problem; Continuous Optimization Problem; High Convergence Speed; Trail Information.

- Data Mining Methodology | Pp. 61-70

Data Set Balancing

David L. Olson

This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The basic data mining algorithms of decision tree, regression-based, and neural network models are considered, using both categorical and continuous data. Two of the data sets have binary outcomes, while the third has a set of four possible outcomes. Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common outcome. When data is balanced, accuracy rates tend to decline. If data is balanced, that reduces the training set size, and can lead to the degeneracy of model failure through omission of cases encountered in the test set. Decision tree algorithms were found to be the most robust with respect to the degree of balancing applied.

Palabras clave: Neural Network Model; Decision Tree Model; Correct Classification Rate; Decision Tree Algorithm; Loan Application.

- Practical Issues of Data Mining | Pp. 71-80

Computation of Least Square Estimates Without Matrix Manipulation

Yachen Lin; Chung Chen

The least square approach is undoubtedly one of the well known methods in the fields of statistics and related disciplines such as optimization, artificial intelligence, and data mining. The core of the traditional least square approach is to find the inverse of the product of the design matrix and its transpose. Therefore, it requires storing at least two matrixes – the design matrix and the inverse matrix of the product. In some applications, for example, high frequency financial data in the capital market and transactional data in the credit card market, the design matrix is huge and on line update is desirable. Such cases present a difficulty to the traditional matrix version of the least square approach. The reasons are from the following two aspects: (1) it is still a cumbersome task to manipulate the huge matrix; (2) it is difficult to utilize the latest information and update the estimates on the fly. Therefore, a new method is demanded. In this paper, authors applied the idea of CIO-component-wise iterative optimization and propose an algorithm to solve a least square estimate without manipulating matrix, i.e. it requires no storage for the design matrix and the inverse of the product, and furthermore it can update the estimates on the fly. Also, it is rigorously shown that the solution obtained by the algorithm is truly a least square estimate.

Palabras clave: Design Matrix; Gibbs Sampling; Private Label; Transactional Data; Matrix Manipulation.

- Practical Issues of Data Mining | Pp. 81-89

Ensuring Serializability for Mobile Data Mining on Multimedia Objects

Shin Parker; Zhengxin Chen; Eugene Sheng

Data mining usually is considered as application tasks conducted on the top of database management systems. However, this may not always be true. To illustrate this, in this article we examine the issue of conduct data mining in mobile computing environments, where multiple physical copies of the same data object in client caches may exist at the same time with the server as the primary owner of all data objects. By demonstrating what can be mined in such an environment, we point out the important connection of data mining with database implementation. This leads us to take a look at the issue of extending traditional invalid-access prevention policy protocols, which are needed to ensure serializability involving data updates in mobile environments. Furthermore, we provide examples to illustrate how such kind of research can shed light on mobile data mining.

Palabras clave: Mobile User; Mobile Client; Multimedia Object; Page Table; Data Conflict.

- Practical Issues of Data Mining | Pp. 90-98

“Copasetic Clustering”: Making Sense of Large-Scale Images

Karl Fraser; Paul O’Neill; Zidong Wang; Xiaohui Liu

In an information rich world, the task of data analysis is becoming ever more complex. Even with the processing capability of modern technology, more often than not, important details become saturated and thus, lost amongst the volume of data. With analysis problems ranging from discovering credit card fraud to tracking terrorist activities the phrase “a needle in a haystack” has never been more apt. In order to deal with large data sets current approaches require that the data be sampled or summarised before true analysis can take place. In this paper we propose a novel pyramidic method, namely, copasetic clustering, which focuses on the problem of applying traditional clustering techniques to large-scale data sets while using limited resources. A further benefit of the technique is the transparency into intermediate clustering steps; when applied to spatial data sets this allows the capture of contextual information. The abilities of this technique are demonstrated using both synthetic and biological data.

Palabras clave: Traditional Technique; Internal Decision; Traditional Cluster; Noise Element; Gene Spot.

- Data Mining for Bioinformatics | Pp. 99-108

Ranking Gene Regulatory Network Models with Microarray Data and Bayesian Network

Hongqiang Li; Mi Zhou; Yan Cui

Researchers often have several different hypothesises on the possible structures of the gene regulatory network (GRN) underlying the biological model they study. It would be very helpful to be able to rank the hypothesises using existing data. Microarray technologies enable us to monitor the expression levels of tens of thousands of genes simultaneously. Given the expression levels of almost all of the well-substantiated genes in an organism under many experimental conditions, it is possible to evaluate the hypothetical gene regulatory networks with statistical methods. We present RankGRN, a web-based tool for ranking hypothetical gene regulatory networks. RankGRN scores the gene regulatory network models against microarray data using Bayesian Network methods. The score reflects how well a gene network model explains the microarray data. A posterior probability is calculated for each network based on the scores. The networks are then ranked by their posterior probabilities. RankGRN is available online at [http://GeneNet.org/bn]. RankGRN is a useful tool for evaluating the hypothetical gene network models’ capability of explaining the observational gene expression data (i.e. the microarray data). Users can select the gene network model that best explains the microarray data.

Palabras clave: Posterior Probability; Microarray Data; Bayesian Network; Gene Regulatory Network; Regulatory Relation.

- Data Mining for Bioinformatics | Pp. 109-118