Publications

Publications by João Gama

2025

Data Science for Fighting Environmental Crime

Authors
Barbosa, M; Ribeiro, C; Gomes, F; Ribeiro, RP; Gama, J;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
The rise of environmental crimes has become a major concern globally as they cause significant damage to ecosystems, public health and result in economic losses. The availability of vast sensor data provides an opportunity to analyze environmental data proactively. This helps to detect irregularities and uncover potential criminal activities. This paper highlights the critical role played by machine learning (ML) and remote sensing technologies in the continuously evolving scenarios of environmental crime. By examining some case studies on detecting illegal fishing, illegal oil spills, illegal landfills, and illegal logging, we delve into the practical implementation of data-driven approaches for environmental crime detection. Our goal with this study is to provide an overview of the existing research in this area and foster the use of ML and data science techniques to enhance environmental crime detection.

CloseRead Abstract

2025

Fairness Analysis in Causal Models: An Application to Public Procurement

Authors
Teixeira, S; Nogueira, AR; Gama, J;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT II

Abstract
Data-driven decision models based on Artificial Intelligence (AI) have been widely used in the public and private sectors. These models present challenges and are intended to be fair, effective and transparent in public interest areas. Bias, fairness and government transparency are aspects that significantly impact the functioning of a democratic society. They shape the government's and its citizens' relationship, influencing trust, accountability, and the equitable treatment of individuals and groups. Data-driven decision models can be biased at several process stages, contributing to injustices. Our research purpose is to understand fairness in the use of causal discovery for public procurement. By analysing Portuguese public contracts data, we aim i) to predict the place of execution of public contracts using the PC algorithm with sp-mi, smc-chi(2) and mc-chi(2) conditional independence tests; ii) to analyse and compare the fairness in those scenarios using Predictive Parity Rate, Proportional Parity, Demographic Parity and Accuracy Parity metrics. By addressing fairness concerns, we pursue to enhance responsible data-driven decision models. We conclude that, in our case, fairness metrics make an assessment more local than global due to causality pathways. We also observe that the Proportional Parity metric is the one with the lowest variance among all metrics and one with the highest precision, and this reinforces the observation that the Agency category is the one that is furthest apart in terms of the proportion of the groups.

CloseRead Abstract

2024

Recent Advances in Learning from Data Streams

Authors
Gama, J;

Publication
IC3K

Abstract

2024

Next Location Prediction with Time-Evolving Markov Models over Data Streams

Authors
Andrade, T; Gama, J;

Publication
EPIA (3)

Abstract
Various relevant aspects of our lives relate to the places we visit and our daily activities. The movement of individuals between regular places, such as work, school, or other important personal locations is getting increasing attention due to the pervasiveness of geolocation devices and the amount of data they generate. This paper presents an approach for personal location prediction using a probabilistic model and data mining techniques over mobility data streams. We extract the individuals’ locations from relevant events in a data stream to build and maintain a Markov Chain over the important places. We evaluate the method over 3 real-world datasets. The results show the usefulness of the proposal in comparison with other well-known approaches.

CloseRead Abstract

2025

Decision-making systems improvement based on explainable artificial intelligence approaches for predictive maintenance

Authors
Rajaoarisoa, L; Randrianandraina, R; Nalepa, GJ; Gama, J;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
To maintain the performance of the latest generation of onshore and offshore wind turbine systems, a new methodology must be proposed to enhance the maintenance policy. In this context, this paper introduces an approach to designing a decision support tool that combines predictive capabilities with anomaly explanations for effective IoT predictive maintenance tasks. Essentially, the paper proposes an approach that integrates a predictive maintenance model with an explicative decision-making system. The key challenge is to detect anomalies and provide plausible explanations, enabling human operators to determine the necessary actions swiftly. To achieve this, the proposed approach identifies a minimal set of relevant features required to generate rules that explain the root causes of issues in the physical system. It estimates that certain features, such as the active power generator, blade pitch angle, and the average water temperature of the voltage circuit protection in the generator's sub-components, are particularly critical to monitor. Additionally, the approach simplifies the computation of an efficient predictive maintenance model. Compared to other deep learning models, the identified model provides up to 80% accuracy in anomaly detection and up to 96% for predicting the remaining useful life of the system under study. These performance metrics and indicators values are essential for enhancing the decision-making process. Moreover, the proposed decision support tool elucidates the onset of degradation and its dynamic evolution based on expert knowledge and data gathered through Internet of Things (IoT) technology and inspection reports. Thus, the developed approach should aid maintenance managers in making accurate decisions regarding inspection, replacement, and repair tasks. The methodology is demonstrated using a wind farm dataset provided by Energias De Portugal.

CloseRead Abstract

2025

Modelling Concept Drift in Dynamic Data Streams for Recommender Systems

Authors
Caroprese, L; Pisani, FS; Veloso, BM; König, M; Manco, G; Hoos, HH; Gama, J;

Publication
Trans. Recomm. Syst.

Abstract
Recommendation systems play a crucial role in modern e-commerce and streaming services. However, the limited availability of public datasets hampers the rapid development of more efficient and accurate recommendation algorithms within the research community. This work introduces a stream-based data generator designed to generate user preferences for a set of items while accommodating progressive changes in user preferences. The underlying principle involves using user/item embeddings to derive preferences by exploring the proximity of these embeddings. Whether randomly generated or learned from a real finite data stream, these embeddings serve as the basis for generating new preferences. We investigate how this fundamental model can adapt to shifts in user behavior over time; in our framework, changes correspond to alterations in the structure of the tripartite graph, reflecting modifications in the underlying embeddings. Through an analysis of real-life data streams, we demonstrate that the proposed model is effective in capturing actual preferences and the changes that they can exhibit over time. Thus, we characterize these changes and develop a generalized method capable of simulating realistic data, thereby generating streams with similar yet controllable drift dynamics.

CloseRead Abstract