Catálogo de publicaciones - revistas

Compartir en
redes sociales


Título de Acceso Abierto

Data Science and Engineering

Resumen/Descripción – provisto por la editorial en inglés
Data Science and Engineering (DSE) is an international, peer-reviewed, and open access journal published under the brand SpringerOpen. DSE is published in cooperation with the China Computer Federation (CCF). Focusing on the theoretical background and advanced engineering approaches, DSE aims to offer a prime forum for researchers, professionals, and industrial practitioners to share their knowledge in this rapidly growing area. It provides in-depth coverage of the latest advances in the closely related fields of data science and data engineering.
Palabras clave – provistas por la editorial

data collection; data management; big data; knowledge extraction

Disponibilidad
Institución detectada Período Navegá Descargá Solicitá
No requiere desde sep. 2024 / hasta sep. 2024 Directory of Open Access Journals acceso abierto
No requiere desde ene. 2016 / hasta sep. 2024 SpringerLink acceso abierto
open-access-logo  Esta publicación es de Acceso Abierto y no aplica cargos a los/as autores/as.

Información

Tipo de recurso:

revistas

ISSN impreso

2364-1185

ISSN electrónico

2364-1541

Editor responsable

Springer Nature

Idiomas de la publicación

  • inglés

País de edición

Alemania

Fecha de publicación

Información sobre licencias CC

https://creativecommons.org/licenses/by/4.0/

Tabla de contenidos

UMP-MG: A Uni-directed Message-Passing Multi-label Generation Model for Hierarchical Text Classification

Bo NingORCID; Deji Zhao; Xinjian Zhang; Chao Wang; Shuangyong Song

<jats:title>Abstract</jats:title><jats:p>Hierarchical Text Classification (HTC) is a formidable task which involves classifying textual descriptions into a taxonomic hierarchy. Existing methods, however, have difficulty in adequately modeling the hierarchical label structures, because they tend to focus on employing graph embedding methods to encode the hierarchical structure while disregarding the fact that the HTC labels are rooted in a tree structure. This is significant because, unlike a graph, the tree structure inherently has a directive that ordains information flow from one node to another—a critical factor when applying graph embedding to the HTC task. But in the graph structure, message-passing is undirected, which will lead to the imbalance of message transmission between nodes when applied to HTC. To this end, we propose a unidirectional message-passing multi-label generation model for HTC, referred to as UMP-MG. Instead of viewing HTC as a classification problem as previous methods have done, this novel approach conceptualizes it as a sequence generation task, introducing prior hierarchical information during the decoding process. This further enables the blocking of information flow in one direction to ensure that the graph embedding method is better suited for the HTC task and thus resulted in the enhanced tree structure representation. Results obtained through experimentation on both the public WOS dataset and an E-commerce user intent classification dataset demonstrate that our proposed model can achieve superlative results.</jats:p>

Palabras clave: Computer Science Applications; Computational Mechanics.

Pp. No disponible

A Framework to Maximize Group Fairness for Workers on Online Labor Platforms

Anis El Rabaa; Shady ElbassuoniORCID; Jihad Hanna; Amer E. Mouawad; Ayham Olleik; Sihem Amer-Yahia

<jats:title>Abstract</jats:title><jats:p>As the number of online labor platforms and the diversity of jobs on these platforms increase, ensuring group fairness for workers needs to be the focus of job-matching services. Risk of discrimination against workers occurs in two different job-matching services: when someone is looking for a job (i.e., a job seeker) and when someone wants to deploy jobs (i.e., a job provider). To maximize their chances of getting hired, job seekers submit their profiles on different platforms. Similarly, job providers publish their job offers on multiple platforms with the goal of reaching a wide and diverse workforce. In this paper, we propose a theoretical framework to maximize group fairness for workers 1) when job seekers are looking for jobs on multiple platforms, and 2) when jobs are being deployed by job providers on multiple platforms. We formulate each goal as different optimization problems with different constraints, prove most of them are computationally hard to solve and propose various efficient algorithms to solve all of them in reasonable time. We then design a series of experiments that rely on synthetic and semi-synthetic data generated from a real-world online labor platform to evaluate our framework.</jats:p>

Palabras clave: Computer Science Applications; Computational Mechanics.

Pp. No disponible

A Survey of Advanced Information Fusion System: from Model-Driven to Knowledge-Enabled

Di Zhu; Hailian Yin; Yidan Xu; Jiaqi Wu; Bowen Zhang; Yaqi Cheng; Zhanzuo Yin; Ziqiang Yu; Hao Wen; Bohan LiORCID

<jats:title>Abstract</jats:title><jats:p>Advanced knowledge engineering (KE), represented by knowledge graph (KG), drives the development of various fields and engineering technologies and provides various knowledge fusion and knowledge empowerment interfaces. At the same time, advanced system engineering (SE) takes model-based system engineering (MBSE) as the core to realize formal modeling and process analysis of the whole system. The two complement each other and are the key technologies for the transition from 2.0 to 3.0 in the era of artificial intelligence and the transition from perceptual intelligence to cognitive intelligence. This survey summarizes an advanced information fusion system, from model-driven to knowledge-enabled. Firstly, the concept, representative methods, key technologies and application fields of model-driven system engineering are introduced. Then, it introduces the concept of knowledge-driven knowledge engineering, summarizes the architecture and construction methods of advanced knowledge engineering and summarizes the application fields. Finally, the combination of advanced information fusion systems, development opportunities and challenges are discussed.</jats:p>

Palabras clave: Computer Science Applications; Computational Mechanics.

Pp. No disponible

Signal Contrastive Enhanced Graph Collaborative Filtering for Recommendation

Zhi-Yuan Li; Man-Sheng Chen; Yuefang Gao; Chang-Dong WangORCID

<jats:title>Abstract</jats:title><jats:p>Graph collaborative filtering methods have shown great performance improvements compared with deep neural network-based models. However, these methods suffer from data sparsity and data noise problems. To address these issues, we propose a new contrastive learning-based graph collaborative filtering method to learn more robust representations. The proposed method is called signal contrastive enhanced graph collaborative filtering (SC-GCF), which conducts contrastive learning on graph signals. It has been proved that graph neural networks correspond to low-pass filters on the graph signals from the graph convolution perspective. Different from the previous contrastive learning-based methods, we first pay attention to the diversity of graph signals to directly optimize the informativeness of the graph signals. We introduce a hypergraph module to strengthen the representation learning ability of graph neural networks. The hypergraph learning module utilizes a learnable hypergraph structure to model the latent global dependency relations that graph neural networks cannot depict. Experiments are conducted on four public datasets, and the results show significant improvements compared with the state-of-the-art methods, which confirms the importance of considering signal-level contrastive learning and hypergraph learning.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

A Reinduction-Based Approach for Efficient High Utility Itemset Mining from Incremental Datasets

Pushp SraORCID; Satish Chand

<jats:title>Abstract</jats:title><jats:p>High utility itemset mining is a crucial research area that focuses on identifying combinations of itemsets from databases that possess a utility value higher than a user-specified threshold. However, most existing algorithms assume that the databases are static, which is not realistic for real-life datasets that are continuously growing with new data. Furthermore, existing algorithms only rely on the utility value to identify relevant itemsets, leading to even the earliest occurring combinations being produced as output. Although some mining algorithms adopt a support-based approach to account for itemset frequency, they do not consider the temporal nature of itemsets. To address these challenges, this paper proposes the Scented Utility Miner (SUM) algorithm that uses a reinduction strategy to track the recency of itemset occurrence and mine itemsets from incremental databases. The paper provides a novel approach for mining high utility itemsets from dynamic databases and presents several experiments that demonstrate the effectiveness of the proposed approach.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data

Sharon Torao PingiORCID; Duoyi ZhangORCID; Md Abul BasharORCID; Richi NayakORCID

<jats:title>Abstract</jats:title><jats:p>Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

Graph Neural Network-Based Short‑Term Load Forecasting with Temporal Convolution

Chenchen Sun; Yan Ning; Derong Shen; Tiezheng Nie

<jats:title>Abstract</jats:title><jats:p>An accurate short-term load forecasting plays an important role in modern power system’s operation and economic development. However, short-term load forecasting is affected by multiple factors, and due to the complexity of the relationships between factors, the graph structure in this task is unknown. On the other hand, existing methods do not fully aggregating data information through the inherent relationships between various factors. In this paper, we propose a short-term load forecasting framework based on graph neural networks and dilated 1D-CNN, called GLFN-TC. GLFN-TC uses the graph learning module to automatically learn the relationships between variables to solve problem with unknown graph structure. GLFN-TC effectively handles temporal and spatial dependencies through two modules. In temporal convolution module, GLFN-TC uses dilated 1D-CNN to extract temporal dependencies from historical data of each node. In densely connected residual convolution module, in order to ensure that data information is not lost, GLFN-TC uses the graph convolution of densely connected residual to make full use of the data information of each graph convolution layer. Finally, the predicted values are obtained through the load forecasting module. We conducted five studies to verify the outperformance of GLFN-TC. In short-term load forecasting, using MSE as an example, the experimental results of GLFN-TC decreased by 0.0396, 0.0137, 0.0358, 0.0213 and 0.0337 compared to the optimal baseline method on ISO-NE, AT, AP, SH and NCENT datasets, respectively. Results show that GLFN-TC can achieve higher prediction accuracy than the existing common methods.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

Anomaly Detection with Sub-Extreme Values: Health Provider Billing

Rob MusprattORCID; Musa MammadovORCID

<jats:title>Abstract</jats:title><jats:p>Anomaly detection within the context of healthcare billing requires a method or algorithm which is flexible to the practicalities and requirements of manual case review, the volumes and associated effort of which can determine whether anomalous output is ultimately actioned or not. In this paper, we apply a modified version of a previously introduced anomaly detection algorithm to address this very issue by enacting refined targeting capability based on the identification of sub-extreme anomalies. By balancing the anomaly identification process with appropriate threshold setting tailored to each sample health provider discipline, it is shown that final candidate volumes are controlled with greater accuracy and sensitivity. A comparison with standard local outlier factor (LOF) scores is included for benchmark purposes.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

AIoT-CitySense: AI and IoT-Driven City-Scale Sensing for Roadside Infrastructure Maintenance

Abdur Rahim Mohammad ForkanORCID; Yong-Bin Kang; Felip Marti; Abhik Banerjee; Chris McCarthy; Hadi Ghaderi; Breno Costa; Anas Dawod; Dimitrios Georgakopolous; Prem Prakash Jayaraman

<jats:title>Abstract</jats:title><jats:p>The transformation of cities into smarter and more efficient environments relies on proactive and timely detection and maintenance of city-wide infrastructure, including roadside infrastructure such as road signs and the cleaning of illegally dumped rubbish. Currently, these maintenance tasks rely predominantly on citizen reports or on-site checks by council staff. However, this approach has been shown to be time-consuming and highly costly, resulting in significant delays that negatively impact communities. This paper presents AIoT-CitySense, an AI and IoT-driven city-scale sensing framework, developed and piloted in collaboration with a local government in Australia. AIoT-CitySense has been designed to address the unique requirements of roadside infrastructure maintenance within the local government municipality. A tailored solution of AIoT-CitySense has been deployed on existing waste service trucks that cover a road network of approximately 100 kms in the municipality. Our analysis shows that proactive detection for roadside infrastructure maintenance using our solution reached an impressive 85%, surpassing the timeframes associated with manual reporting processes. AIoT-CitySense can potentially transform various domains, such as efficient detection of potholes and precise line marking for pedestrians. This paper exemplifies the power of leveraging city-wide data using AI and IoT technologies to drive tangible changes and improve the quality of city life.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible

DB-GPT: Large Language Model Meets Database

Xuanhe Zhou; Zhaoyan Sun; Guoliang LiORCID

<jats:title>Abstract</jats:title><jats:p>Large language models (LLMs) have shown superior performance in various areas. And LLMs have the potential to revolutionize data management by serving as the "brain" of next-generation database systems. However, there are several challenges that utilize LLMs to optimize databases. First, it is challenging to provide appropriate prompts (e.g., instructions and demonstration examples) to enable LLMs to understand the database optimization problems. Second, LLMs only capture the logical database characters (e.g., SQL semantics) but are not aware of physical characters (e.g., data distributions), and it requires to fine-tune LLMs to capture both physical and logical information. Third, LLMs are not well trained for databases with strict constraints (e.g., query plan equivalence) and privacy-preserving requirements, and it is challenging to train database-specific LLMs while ensuring database privacy. To overcome these challenges, this vision paper proposes a LLM-based database framework (), including automatic prompt generation, DB-specific model fine-tuning, and DB-specific model design and pre-training. Preliminary experiments show that achieves relatively good performance in database tasks like query rewrite and index tuning. The source code and datasets are available at github.com/TsinghuaDatabaseGroup/DB-GPT.</jats:p>

Palabras clave: Computer Science Applications; Artificial Intelligence; Information Systems; Software.

Pp. No disponible