Luís Cavique

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering

Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Luís Cavique

Interest
Topics

Details

Name
Luís Cavique
Role
External Research Collaborator
Since
12th March 2025

Nationality
Portugal
Centre
Human-Centered Computing and Information Science
Contacts
+351222094000
luis.cavique@inesctec.pt

Publications

View all Publications

2026

Managing Missing Data and Predictions in Short Time Series

Authors
António, F; Cavique, L;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2025, PT I

Abstract
Sales forecasting in the presence of Missing Data poses significant challenges, particularly for short time series where limited observations amplify the impact of incomplete records. This study analyzes a real-world transactional dataset (2021-2024) to predict quantities and prices for 2025. We classify missingness patterns and mechanisms (MCAR, MAR, MNAR) to inform the selection of imputation strategies. We evaluate techniques including MICE, Mean, KNN, and Linear Regression under simulated missingness rates, with KNN emerging as the most robust for the MAR mechanism. Regarding very short-term series predictions, the naive forecast Max2 (maximum of the last two observed values) outperformed moving averages. The results highlight the importance of mechanismaware imputation and domain-tailored forecasting in sparse datasets. This work presents a practical framework for businesses to effectively utilize incomplete sales data.

CloseRead Abstract

2026

Municipal food waste collection strategies in Portugal: A dataset

Authors
Alcalde, DD; Bugarim, D; Coelho, T; Almeida, E; Silva, C; Cavique, L; Dias Ferreira, C;

Publication
DATA IN BRIEF

Abstract
The dataset reports an up-to-date overview of the selective biowaste collection with a focus on food waste and organic kitchen waste across 308 municipalities in Portugal, to assess the compliance with the EU Waste Framework Directive that made biowaste collection mandatory from 1st January 2024. Data were collected through a structured survey sent to the totality of the municipalities, complemented by systematic research in secondary official sources such as municipal web-sites, reports and statistical data. The questionnaire covered aspects such as coverage, collection models (nearby bring points, door-to-door, co-collection), sector-specific deployment (household collection, non-domestic collection), operational characteristics, and performance indicators (capture rates, cost per tonne). The dataset was structured and validated through cross-checking the multiple sources assessed, prioritising direct municipal questionnaire responses. It includes disaggregated data at a municipality level, including detailed information on the characteristics and efficiency of the initiatives, when available. The database allows the cross-comparison across Portuguese regions and potentially with other international systems, in terms of biowaste collection strategies with focus on food waste and organic kitchen waste. Municipalities in Portugal have been carrying out pilot experiences within their territories, but there is no systematic assessment of what has been carried out nor the results obtained. Given the limited available data, this dataset provides a valuable resource for policy design and further research on biowaste management initiatives to further assess their efficiency and adaptability to different municipal realities at a national and even European level. (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)

CloseRead Abstract

2025

Imbalanced learning in corruption detection: results explanations with SHAP

Authors
Vasconcelos, MO; Cavique, L;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
The growing use of machine learning for integrity assessments in public administration has intensified interest in understanding how algorithms can detect corruption risk-a topic of increasing relevance in the context of rising demands for transparency. Previous research on fraud detection often overlooks the dual challenge of extreme class imbalance and the need for model explainability. This study addresses both issues by combining data-level and algorithm-level techniques in a real-world dataset from Brazil's Federal District, where there is one corruption case for every 707 non-corruption cases (a ratio of 1:707). Data engineering was essential, encompassing gathering, cleaning, transformation, and dimensionality reduction to enhance model performance and interpretability. Among the tested models, weighted logistic regression stood out, achieving the best AUC (0.692). To increase transparency, we employed SHapley Additive exPlanations, enabling both global and local interpretability of predictions. The analysis identified strong predictors of corruption risk, such as business ownership, political candidacy, and frequent job function changes. This work provides a replicable pipeline that integrates imbalanced learning and explainable AI, offering valuable contributions to risk management and decision-making in the public sector.

CloseRead Abstract

2025

A machine learning framework for uplift modeling through customer segmentation

Authors
Pinheiro, P; Cavique, L;

Publication
Decision Analytics Journal

Abstract
In uplift modeling, the goal is to identify high-value customers based on persuadable customers, those who make a purchase only if contacted. To achieve this, uplift modeling combines machine learning techniques with causal inference, allowing businesses to refine their customer targeting strategies and focus efforts where they are most profitable. This study proposes a practical and reproducible two-phase procedure for identifying high-value customers. In the first phase, customers are segmented using decision trees, which offer a transparent and data-driven approach to grouping individuals with similar characteristics. This segmentation lays the groundwork for a meaningful interpretation of customer behavior. In the second phase, uplift is calculated for each customer segment by comparing the outcomes of the treatment and control groups. This enables the identification of customer groups with the highest uplift. A real-world use case further illustrates the value and applicability of the proposed method. To validate model performance, the procedure employs established metrics such as the Qini index and Cohen's kappa, which provide insights into both the effectiveness and reliability of the uplift estimates. This work presents a decoupled procedure for uplift modeling that leverages well-established libraries, fostering transparency and a clear understanding of the analytical process. A key contribution to uplift modeling and causal inference is the use of decision trees for stratification, which enables the creation of meaningful segments and their evaluation through the average treatment effect. By integrating theory with practical implementation, this work offers a comprehensive framework for uplift modeling that bridges academic rigor and business usability. © 2025 Elsevier B.V., All rights reserved.

CloseRead Abstract

2025

Mitigating false negatives in imbalanced datasets: An ensemble approach

Authors
Vasconcelos, M; Cavique, L;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.

CloseRead Abstract

Details

Name

Role

Since

Nationality

Centre

Contacts

Managing Missing Data and Predictions in Short Time Series

Municipal food waste collection strategies in Portugal: A dataset

Imbalanced learning in corruption detection: results explanations with SHAP

A machine learning framework for uplift modeling through customer segmentation

Mitigating false negatives in imbalanced datasets: An ensemble approach