Publications

Publications by LIAAD

2022

Reasoning with Portuguese Word Embeddings

Authors
Costa Cunha, LF; Almeida, JJ; Simões, A;

Publication
SLATE

Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models’ parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models’ evaluation methods.

CloseRead Abstract

2022

Interpretability of Machine Intelligence in Medical Image Computing - 5th International Workshop, iMIMIC 2022, Held in Conjunction with MICCAI 2022, Singapore, Singapore, September 22, 2022, Proceedings

Authors
Reyes, M; Abreu, PH; Cardoso, JS;

Publication
iMIMIC@MICCAI

Abstract

2022

The impact of heterogeneous distance functions on missing data imputation and classification performance

Authors
Santos, MS; Abreu, PH; Fernández, A; Luengo, J; Santos, J;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).

CloseRead Abstract

2022

The identification of cancer lesions in mammography images with missing pixels: analysis of morphology

Authors
Santos, JC; Abreu, PH; Santos, MS;

Publication
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
The quality of mammography images is essential for the diagnosis of breast cancer and image imputation has become a popular technique to overcome noise, artifacts, and missing data to aid in the diagnosis of diseases. In this paper, we assess the performance of six imputation methodologies for the reconstruction of missing pixels in different morphologies in mammography images. The images included in this study are collected from four public datasets (CBIS-DDSM, Mini-MIAS, INbreast, and CSAW) and the imputation results are evaluated through the mean absolute error (MAE) and structural similarity index measure (SSIM). This study goes beyond the traditional evaluation of imputation algorithms, analyzing imputation quality, morphology preservation and classification performance. The effects of imputation on the morphology of cancer lesions are of utmost importance since it lays the foundation for physicians to interpret and analyze the imputation results. The results show that DIP is the most promising methodology for higher missing pixel rates, morphology preservation, and classifying malignant and benign images.

CloseRead Abstract

2022

Brown-Sequard syndrome in a patient with spondyloarthritis after COVID-19 vaccine: a challenging differential diagnosis

Authors
Costa, R; Soares, C; Vaz, C; Bernardes, M; Tavares, M; Abreu, P;

Publication
ARP RHEUMATOLOGY

Abstract

2022

Churn in services - A bibliometric review

Authors
Ribeiro, H; Barbosa, B; Moreira, AC; Rodrigues, R;

Publication
CUADERNOS DE GESTION

Abstract
The purpose of this article is to identify the most impactful research on customer churn and to map the conceptual and intellectual structure of its field of study. Data were collected from the WoS database, comprising 338 articles published between 1995 and 2020. Several bibliometric techniques were applied, including analysis of co-words, co-citation, bibliographic coupling, and co-authorship networks. R software and the Bibliometrix/Biblioshiny package were used to perform the analyses. The results identify the most active and influential authors, articles, and journals on the topic. More specifically, through co-citations and bibliographic coupling, it was possible to map the oldest articles (retrospective analysis) and the current research front (prospective analysis). The retrospective analysis, based on co-citations, revealed that the foundations of this research field are constructs such as quality of service, satisfaction, loyalty, and changing behaviors. The prospective analysis, performed through bibliographic coupling, revealed that current research is embedded in predictive analysis, clusters, data mining, and algorithms. The results provide robust guidance for further investigation in this field.

CloseRead Abstract