Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Tiago Filipe Gonçalves

2025

Evaluating the Impact of Pulse Oximetry Bias in Machine Learning Under Counterfactual Thinking

Authors
Martins, I; Matos, J; Goncalves, T; Celi, LA; Wong, AKI; Cardoso, JS;

Publication
APPLICATIONS OF MEDICAL ARTIFICIAL INTELLIGENCE, AMAI 2024

Abstract
Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a perfect world, without pulse oximetry bias, using SaO(2) (blood-gas), to the actual world, with biased measurements, using SpO(2) (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO(2) are a control and models using SpO(2) a treatment. The blood-gas oximetry linked dataset was a suitable testbed, containing 163,396 nearly-simultaneous SpO(2) - SaO(2) paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 h, and SOFA score increase by two points. Models using SaO(2) instead of SpO(2) generally showed better performance. Patients with overestimation of O-2 by pulse oximetry of >= 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients' oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.

2024

An End-to-End Framework to Classify and Generate Privacy-Preserving Explanations in Pornography Detection

Authors
Vieira, M; Goncalves, T; Silva, W; Sequeira, F;

Publication
BIOSIG 2024 - Proceedings of the 23rd International Conference of the Biometrics Special Interest Group

Abstract
The proliferation of explicit material online, particularly pornography, has emerged as a paramount concern in our society. While state-of-the-art pornography detection models already show some promising results, their decision-making processes are often opaque, raising ethical issues. This study focuses on uncovering the decision-making process of such models, specifically fine-tuned convolutional neural networks and transformer architectures. We compare various explainability techniques to illuminate the limitations, potential improvements, and ethical implications of using these algorithms. Results show that models trained on diverse and dynamic datasets tend to have more robustness and generalisability when compared to models trained on static datasets. Additionally, transformer models demonstrate superior performance and generalisation compared to convolutional ones. Furthermore, we implemented a privacy-preserving framework during explanation retrieval, which contributes to developing secure and ethically sound biometric applications. © 2024 IEEE.

2024

Interpretable AI for medical image analysis: methods, evaluation, and clinical considerations

Authors
Gonçalves, T; Hedström, A; Pahud de Mortanges, A; Li, X; Müller, H; Cardoso, S; Reyes, M;

Publication
Trustworthy Ai in Medical Imaging

Abstract
In the healthcare context, artificial intelligence (AI) has the potential to power decision support systems and help health professionals in their clinical decisions. However, given its complexity, AI is usually seen as a black box that receives data and outputs a prediction. This behavior may jeopardize the adoption of this technology by the healthcare community, which values the existence of explanations to justify a clinical decision. Besides, the developers must have a strategy to assess and audit these systems to ensure their reproducibility and quality in production. The field of interpretable artificial intelligence emerged to study how these algorithms work and clarify their behavior. This chapter reviews several interpretability of AI algorithms for medical imaging, discussing their functioning, limitations, benefits, applications, and evaluation strategies. The chapter concludes with considerations that might contribute to bringing these methods closer to the daily routine of healthcare professionals. © 2025 Elsevier Inc. All rights reserved.

2024

Disentangling morphed identities for face morphing detection

Authors
Caldeira, E; Neto, PC; Gonçalves, T; Damer, N; Sequeira, AF; Cardoso, JS;

Publication
Science Talks

Abstract

2024

Classification of Keratitis from Eye Corneal Photographs using Deep Learning

Authors
Beirao, MM; Matos, J; Gon alves, T; Kase, C; Nakayama, LF; de Freitas, D; Cardoso, JS;

Publication
2024 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, BIBM

Abstract
Keratitis is an inflammatory corneal condition responsible for 10% of visual impairment in low- and middleincome countries (LMICs), with bacteria, fungi, or amoeba as the most common infection etiologies. While an accurate and timely diagnosis is crucial for the selected treatment and the patients' sight outcomes, due to the high cost and limited availability of laboratory diagnostics in LMICs, diagnosis is often made by clinical observation alone, despite its lower accuracy. In this study, we investigate and compare different deep learning approaches to diagnose the source of infection: 1) three separate binary models for infection type predictions; 2) a multitask model with a shared backbone and three parallel classification layers (Multitask V1); and, 3) a multitask model with a shared backbone and a multi-head classification layer (Multitask V2). We used a private Brazilian cornea dataset to conduct the empirical evaluation. We achieved the best results with Multitask V2, with an area under the receiver operating characteristic curve (AUROC) confidence intervals of 0.7413-0.7740 (bacteria), 0.83950.8725 (fungi), and 0.9448-0.9616 (amoeba). A statistical analysis of the impact of patient features on models' performance revealed that sex significantly affects amoeba infection prediction, and age seems to affect fungi and bacteria predictions.

2024

Classification of Keratitis from Eye Corneal Photographs using Deep Learning

Authors
Beirão, MM; Matos, J; Gonçalves, T; Kase, C; Nakayama, LF; Freitas, Dd; Cardoso, JS;

Publication
CoRR

Abstract

  • 5
  • 7