Publications

Publications by HumanISE

2025

TGNN-Bet: Approximation of Temporal Betweenness Centrality using Temporal Graph Neural Network

Authors
Sadhu, S; Kumari, K; Namtirtha, A; Malta, MC; Dutta, A;

Publication
International Conference on Communication Systems and Networks, COMSNETS

Abstract
Networks appear across various domains, and identifying central nodes in temporal networks is more challenging than in static networks. Temporal betweenness centrality is the widely used method to assess the importance of the nodes. This method is based on shortest temporal path calculations. However, computing this centrality metrics value is computationally intensive, especially for large-scale networks. Various approximation algorithms exist, but they often lack efficiency or accuracy. We introduce TGNN-Bet, a temporal graph neural network model, to approximate temporal betweenness centrality. In TGNN-Bet, each node gathers features from multi-hop neighbors, enabling the model to simulate paths and capture the reachability of nodes. The model's effectiveness is validated using the Spearman correlation (?) performance metric and comparing system runtimes with the existing temporal betweenness centrality method. Experimental results on six real-world temporal networks demonstrate that TGNN-Bet strongly correlates with existing temporal betweenness centrality methods. The proposed TGNN-Bet model achieves an average computation time reduction of 94.216% compared to conventional temporal betweenness centrality methods. © 2025 IEEE.

CloseRead Abstract

2025

SUSTAINABILITY AND DIGITALISATION IN SOCIAL SOLIDARITY COOPERATIVES: A STUDY IN A CONTEXT OF CHANGE

Authors
Castro, C; Bernardino, SJQ; Meira, DA; Bandeira, AM; Pinto, C; Azevedo, AIRL; Pinto, AS; Rodrigues, AC; Martinho, ALMS; Rocha, AP; Vasconcelos, P; Fernandes, TP; Tomé, B; Coutinho, BC; Silva, M; Gomes, M; Antunes, SS; Curado Malta, M;

Publication
Cooperativismo e Economia Social

Abstract
The COVID-19 pandemic has brought new challenges to Social Solidarity Cooperatives (SSCs), affecting how they conduct their activities. The aim of this article is to analyse the extent to which SSCs’ behaviours have changed in terms of environmental practices and digital empowerment following the pandemic. Behaviour changes were assessed using a quantitative, exploratory methodology based on a questionnaire survey of 80 SSCs in Portugal. The results were analysed using a range of techniques, including descriptive analysis, exploratory factor analysis and cluster analysis. The data analysis process made it possible to group the SSCs into three distinct groups, characterised by different changes in behaviour: (i) a group of organisations with some changes in the organisation’s practices, which are more environmentally sustainable; (ii) a group of organisations that show some changes in terms of the digital transition; and (iii) a third group where there are simultaneously, and more significantly, changes in practices in terms of environmental sustainability and the digital transition. This last group is the one with the largest number of organisations in the sample. The formation of clusters is influenced by the age of the organisation and its location. © 2025, Faculty of Legal Sciences and Labor, University of Vigo. All rights reserved.

CloseRead Abstract

2025

Imbalanced learning in corruption detection: results explanations with SHAP

Authors
Vasconcelos, MO; Cavique, L;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
The growing use of machine learning for integrity assessments in public administration has intensified interest in understanding how algorithms can detect corruption risk-a topic of increasing relevance in the context of rising demands for transparency. Previous research on fraud detection often overlooks the dual challenge of extreme class imbalance and the need for model explainability. This study addresses both issues by combining data-level and algorithm-level techniques in a real-world dataset from Brazil's Federal District, where there is one corruption case for every 707 non-corruption cases (a ratio of 1:707). Data engineering was essential, encompassing gathering, cleaning, transformation, and dimensionality reduction to enhance model performance and interpretability. Among the tested models, weighted logistic regression stood out, achieving the best AUC (0.692). To increase transparency, we employed SHapley Additive exPlanations, enabling both global and local interpretability of predictions. The analysis identified strong predictors of corruption risk, such as business ownership, political candidacy, and frequent job function changes. This work provides a replicable pipeline that integrates imbalanced learning and explainable AI, offering valuable contributions to risk management and decision-making in the public sector.

CloseRead Abstract

2025

A machine learning framework for uplift modeling through customer segmentation

Authors
Pinheiro, P; Cavique, L;

Publication
Decision Analytics Journal

Abstract
In uplift modeling, the goal is to identify high-value customers based on persuadable customers, those who make a purchase only if contacted. To achieve this, uplift modeling combines machine learning techniques with causal inference, allowing businesses to refine their customer targeting strategies and focus efforts where they are most profitable. This study proposes a practical and reproducible two-phase procedure for identifying high-value customers. In the first phase, customers are segmented using decision trees, which offer a transparent and data-driven approach to grouping individuals with similar characteristics. This segmentation lays the groundwork for a meaningful interpretation of customer behavior. In the second phase, uplift is calculated for each customer segment by comparing the outcomes of the treatment and control groups. This enables the identification of customer groups with the highest uplift. A real-world use case further illustrates the value and applicability of the proposed method. To validate model performance, the procedure employs established metrics such as the Qini index and Cohen's kappa, which provide insights into both the effectiveness and reliability of the uplift estimates. This work presents a decoupled procedure for uplift modeling that leverages well-established libraries, fostering transparency and a clear understanding of the analytical process. A key contribution to uplift modeling and causal inference is the use of decision trees for stratification, which enables the creation of meaningful segments and their evaluation through the average treatment effect. By integrating theory with practical implementation, this work offers a comprehensive framework for uplift modeling that bridges academic rigor and business usability. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Mitigating false negatives in imbalanced datasets: An ensemble approach

Authors
Vasconcelos, M; Cavique, L;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.

CloseRead Abstract

2025

Large Language Model for Querying Databases in Portuguese

Authors
Figueiredo, L; Pinheiro, P; Cavique, L; Marques, N;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscrip- tion-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

CloseRead Abstract