Publications

Publications by João Gama

2023

A DTW Approach for Complex Data A Case Study with Network Data Streams

Authors
Silva, PR; Vinagre, J; Gama, J;

Publication
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.

CloseRead Abstract

2023

Identification of morphologically cryptic species with computer vision models: wall lizards (Squamata: Lacertidae: Podarcis) as a case study

Authors
Pinho, C; Kaliontzopoulou, A; Ferreira, CA; Gama, J;

Publication
ZOOLOGICAL JOURNAL OF THE LINNEAN SOCIETY

Abstract
Automated image classification is a thriving field of machine learning, and various successful applications dealing with biological images have recently emerged. In this work, we address the ability of these methods to identify species that are difficult to tell apart by humans due to their morphological similarity. We focus on distinguishing species of wall lizards, namely those belonging to the Podarcis hispanicus species complex, which constitutes a well-known example of cryptic morphological variation. We address two classification experiments: (1) assignment of images of the morphologically relatively distinct P. bocagei and P. lusitanicus; and (2) distinction between the overall more cryptic nine taxa that compose this complex. We used four datasets (two image perspectives and individuals of the two sexes) and three deep-learning models to address each problem. Our results suggest a high ability of the models to identify the correct species, especially when combining predictions from different perspectives and models (accuracy of 95.9% and 97.1% for females and males, respectively, in the two-class case; and of 91.2% to 93.5% for females and males, respectively, in the nine-class case). Overall, these results establish deep-learning models as an important tool for field identification and monitoring of cryptic species complexes, alleviating the burden of expert or genetic identification.

CloseRead Abstract

2024

Classification of Pulmonary Nodules in 2-[<SUP>18</SUP>F]FDG PET/CT Images with a 3D Convolutional Neural Network

Authors
Alves, VM; Cardoso, JD; Gama, J;

Publication
NUCLEAR MEDICINE AND MOLECULAR IMAGING

Abstract
Purpose 2-[F-18]FDG PET/CT plays an important role in the management of pulmonary nodules. Convolutional neural networks (CNNs) automatically learn features from images and have the potential to improve the discrimination between malignant and benign pulmonary nodules. The purpose of this study was to develop and validate a CNN model for classification of pulmonary nodules from 2-[F-18]FDG PET images.Methods One hundred thirteen participants were retrospectively selected. One nodule per participant. The 2-[F-18]FDG PET images were preprocessed and annotated with the reference standard. The deep learning experiment entailed random data splitting in five sets. A test set was held out for evaluation of the final model. Four-fold cross-validation was performed from the remaining sets for training and evaluating a set of candidate models and for selecting the final model. Models of three types of 3D CNNs architectures were trained from random weight initialization (Stacked 3D CNN, VGG-like and Inception-v2-like models) both in original and augmented datasets. Transfer learning, from ImageNet with ResNet-50, was also used.Results The final model (Stacked 3D CNN model) obtained an area under the ROC curve of 0.8385 (95% CI: 0.6455-1.0000) in the test set. The model had a sensibility of 80.00%, a specificity of 69.23% and an accuracy of 73.91%, in the test set, for an optimised decision threshold that assigns a higher cost to false negatives.Conclusion A 3D CNN model was effective at distinguishing benign from malignant pulmonary nodules in 2-[F-18]FDG PET images.

CloseRead Abstract

2023

Data-driven predictive maintenance framework for railway systems

Authors
Meira, J; Veloso, B; Bolon Canedo, V; Marreiros, G; Alonso Betanzos, A; Gama, J;

Publication
INTELLIGENT DATA ANALYSIS

Abstract
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions based on the analyzed data. This study suggests a data-driven predictive maintenance framework for the air production unit (APU) system of a train of Metro do Porto. The proposed method assists in detecting failures and errors in machinery before they reach critical stages. We present an anomaly detection model following an unsupervised approach, combining the Half-Space-trees method with One Class K Nearest Neighbor, adapted to deal with data streams. We evaluate and compare our approach with the Half-Space-Trees method applied without the One Class K Nearest Neighbor combination. Our model produced few type-I errors, significantly increasing the value of precision when compared to the Half-Space-Trees model. Our proposal achieved high anomaly detection performance, predicting most of the catastrophic failures of the APU train system.

CloseRead Abstract

2023

XAI for Predictive Maintenance

Authors
Gama, J; Nowaczyk, S; Pashami, S; Ribeiro, RP; Nalepa, GJ; Veloso, B;

Publication
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023

Abstract
The field of Explainable Predictive Maintenance (PM) is concerned with developing methods that can clarify how AI systems operate in the PM domain. One of the challenges of creating maintenance plans is integrating AI output with human decision-making processes and expertise. For AI to be helpful and trustworthy, fault predictions must be contextualized and easily comprehensible to humans. This involves providing tailored explanations to different actors depending on their roles and needs. For example, engineers can be connected to technical installation blueprints, while managers can evaluate system downtime costs, and lawyers can assess safety-threatening failures' potential liability. In many industries, black-box AI systems analyze sensor data to predict failures by detecting anomalies and deviations from typical behavior with impressive accuracy. However, PM is just one part of a broader context that aims to identify the most probable causes, develop a recovery plan, and estimate remaining useful life while providing alternative solutions. Achieving this requires complex interactions among various actors in industrial and decision-making processes. Our tutorial explores current trends, promising research directions in Explainable AI (XAI) relevant to Explainable Predictive Maintenance (XPM), and future challenges and open issues on this topic. We will also present three case studies that highlight XPM's challenges in bus and train operations and steel factories.

CloseRead Abstract

2009

Evaluating algorithms that learn from data streams

Authors
Gama, J; Rodrigues, PP; Sebastião, R;

Publication
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), Honolulu, Hawaii, USA, March 9-12, 2009

Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. In this paper we propose a general framework for assessing the quality of streaming learning algorithms. We defend the use of Predictive Sequential error estimates over a sliding window to assess performance of learning algorithms that learn from open-ended data streams in non-stationary environments. This paper studies properties of convergence and methods to comparatively assess algorithms performance. Copyright 2009 ACM.

CloseRead Abstract