Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Isabel Rio-Torto

2019

Towards a Joint Approach to Produce Decisions and Explanations Using CNNs

Authors
Torto, IR; Fernandes, K; Teixeira, LF;

Publication
Pattern Recognition and Image Analysis - 9th Iberian Conference, IbPRIA 2019, Madrid, Spain, July 1-4, 2019, Proceedings, Part I

Abstract
Convolutional Neural Networks, as well as other deep learning methods, have shown remarkable performance on tasks like classification and detection. However, these models largely remain black-boxes. With the widespread use of such networks in real-world scenarios and with the growing demand of the right to explanation, especially in highly-regulated areas like medicine and criminal justice, generating accurate predictions is no longer enough. Machine learning models have to be explainable, i.e., understandable to humans, which entails being able to present the reasons behind their decisions. While most of the literature focuses on post-model methods, we propose an in-model CNN architecture, composed by an explainer and a classifier. The model is trained end-to-end, with the classifier taking as input not only images from the dataset but also the explainer’s resulting explanation, thus allowing for the classifier to focus on the relevant areas of such explanation. We also developed a synthetic dataset generation framework, that allows for automatic annotation and creation of easy-to-understand images that do not require the knowledge of an expert to be explained. Promising results were obtained, especially when using L1 regularisation, validating the potential of the proposed architecture and further encouraging research to improve the proposed architecture’s explainability and performance. © 2019, Springer Nature Switzerland AG.

2020

Understanding the decisions of CNNs: An in-model approach

Authors
Rio Torto, I; Fernandes, K; Teixeira, LF;

Publication
PATTERN RECOGNITION LETTERS

Abstract
With the outstanding predictive performance of Convolutional Neural Networks on different tasks and their widespread use in real-world scenarios, it is essential to understand and trust these black-box models. While most of the literature focuses on post-model methods, we propose a novel in-model joint architecture, composed by an explainer and a classifier. This architecture outputs not only a class label, but also a visual explanation of such decision, without the need for additional labelled data to train the explainer besides the image class. The model is trained end-to-end, with the classifier taking as input an image and the explainer's resulting explanation, thus allowing for the classifier to focus on the relevant areas of such explanation. Moreover, this approach can be employed with any classifier, provided that the necessary connections to the explainer are made. We also propose a three-phase training process and two alternative custom loss functions that regularise the produced explanations and encourage desired properties, such as sparsity and spatial contiguity. The architecture was validated in two datasets (a subset of ImageNet and a cervical cancer dataset) and the obtained results show that it is able to produce meaningful image- and class-dependent visual explanations, without direct supervision, aligned with intuitive visual features associated with the data. Quantitative assessment of explanation quality was conducted through iterative perturbation of the input image according to the explanation heatmaps. The impact on classification performance is studied in terms of average function value and AOPC (Area Over the MoRF (Most Relevant First) Curve). For further evaluation, we propose POMPOM (Percentage of Meaningful Pixels Outside the Mask) as another measurable criteria of explanation goodness. These analyses showed that the proposed method outperformed state-of-the-art post-model methods, such as LRP (Layer-wise Relevance Propagation).

2021

Automatic quality inspection in the automotive industry: a hierarchical approach using simulated data

Authors
Rio-Torto, I; Campanico, AT; Pereira, A; Teixeira, LF; Filipe, V;

Publication
2021 IEEE 8th International Conference on Industrial Engineering and Applications (ICIEA)

Abstract

2021

Improving Automatic Quality Inspection in the Automotive Industry by Combining Simulated and Real Data

Authors
Pinho, P; Rio Torto, I; Teixeira, LF;

Publication
ADVANCES IN VISUAL COMPUTING (ISVC 2021), PT I

Abstract
Considerable amounts of data are required for a deep learning model to generalize to unseen cases successfully. Furthermore, such data is often manually labeled, making its annotation process costly and time-consuming. We propose using unlabeled real-world data in conjunction with automatically labeled synthetic data, obtained from simulators, to surpass the increasing need for annotated data. By obtaining real counterparts of simulated samples using CycleGAN and subsequently performing fine-tuning with such samples, we manage to improve a vehicle part's detection system performance by 2.5%, compared to the baseline exclusively trained on simulated images. We explore adding a semantic consistency loss to CycleGAN by re-utilizing previous work's trained networks to regularize the conversion process. Moreover, the addition of a post-processing step, which we denominate global NMS, highlights our approach's effectiveness by better utilizing our detection model's predictions and ultimately improving the system's performance by 14.7%.

2022

From Captions to Explanations: A Multimodal Transformer-based Architecture for Natural Language Explanation Generation

Authors
Rio-Torto, I; Cardoso, JS; Teixeira, LF;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022)

Abstract
The growing importance of the Explainable Artificial Intelligence (XAI) field has led to the proposal of several methods for producing visual heatmaps of the classification decisions of deep learning models. However, visual explanations are not sufficient because different end-users have different backgrounds and preferences. Natural language explanations (NLEs) are inherently understandable by humans and, thus, can complement visual explanations. Therefore, we introduce a novel architecture based on multimodal Transformers to enable the generation of NLEs for image classification tasks. Contrary to the current literature, which models NLE generation as a supervised image captioning problem, we propose to learn to generate these textual explanations without their direct supervision, by starting from image captions and evolving to classification-relevant text. Preliminary experiments on a novel dataset where there is a clear demarcation between captions and NLEs show the potential of the approach and shed light on how it can be improved.

2022

Hybrid Quality Inspection for the Automotive Industry: Replacing the Paper-Based Conformity List through Semi-Supervised Object Detection and Simulated Data

Authors
Rio-Torto, I; Campanico, AT; Pinho, P; Filipe, V; Teixeira, LF;

Publication
APPLIED SCIENCES-BASEL

Abstract
The still prevalent use of paper conformity lists in the automotive industry has a serious negative impact on the performance of quality control inspectors. We propose instead a hybrid quality inspection system, where we combine automated detection with human feedback, to increase worker performance by reducing mental and physical fatigue, and the adaptability and responsiveness of the assembly line to change. The system integrates the hierarchical automatic detection of the non-conforming vehicle parts and information visualization on a wearable device to present the results to the factory worker and obtain human confirmation. Besides designing a novel 3D vehicle generator to create a digital representation of the non conformity list and to collect automatically annotated training data, we apply and aggregate in a novel way state-of-the-art domain adaptation and pseudo labeling methods to our real application scenario, in order to bridge the gap between the labeled data generated by the vehicle generator and the real unlabeled data collected on the factory floor. This methodology allows us to obtain, without any manual annotation of the real dataset, an example-based F1 score of 0.565 in an unconstrained scenario and 0.601 in a fixed camera setup (improvements of 11 and 14.6 percentage points, respectively, over a baseline trained with purely simulated data). Feedback obtained from factory workers highlighted the usefulness of the proposed solution, and showed that a truly hybrid assembly line, where machine and human work in symbiosis, increases both efficiency and accuracy in automotive quality control.

  • 1
  • 2