2022
Autores
Costa Cunha, LF; Ramalho, JC;
Publicação
Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium, Padua, Italy, September 20, 2022.
Abstract
In recent works, several NER models were developed to extract named entities from Portuguese Archival Finding Aids. In this paper, we are complementing the work already done by presenting a different NER model with a new architecture, Bidirectional Encoding Representation from Transformers (BERT). In order to do so, we used a BERT model that was pre-trained in Portuguese vocabulary and fine-tuned it to our concrete classification problem, NER. In the end, we compared the results obtained with previous architectures. In addition to this model we also developed an annotation tool that uses ML models to speed up the corpora annotation process. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
2022
Autores
Costa Cunha, LF; Almeida, JJ; Simões, A;
Publicação
11th Symposium on Languages, Applications and Technologies, SLATE 2022, July 14-15, 2022, Universidade da Beira Interior, Covilhã, Portugal.
Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models’ parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models’ evaluation methods. © Luís Filipe Cunha, J. João Almeida, and Alberto Simões.
2022
Autores
Neto, C; Ferreira, D; Nunes, J; Braga, L; Martins, L; Cunha, L; Machado, J;
Publicação
DEVELOPMENTS AND ADVANCES IN DEFENSE AND SECURITY, MICRADS 2021
Abstract
Dementia is a broad term for a large number of conditions, and it is often associated with Alzheimer's disease. A reliable diagnosis of this disease, especially in the early stages, may prevent further complications. As such, machine learning algorithms can be applied in order to validate and correctly classify cases of dementia or non dementia in adults, assisting physicians in the diagnosis and management of this clinical condition. In this study, a dataset containing magnetic resonance imaging comparisons of demented/non demented adults was used to conduct a Data Mining process, following the Cross Industry Standard Process for Data Mining methodology, with the main goal of classifying instances of dementia. Different machine learning algorithms were applied during this process, more specifically Support Vector Machines, Decision Trees, Logistic Regression, Neural Networks, Naive Bayes and Random Forest. The maximum accuracy of 95.41% was achieved with the Naive Bayes algorithm using Split Validation.
2022
Autores
Reyes, M; Abreu, PH; Cardoso, JS;
Publicação
iMIMIC@MICCAI
Abstract
2022
Autores
Santos, MS; Abreu, PH; Fernandez, A; Luengo, J; Santos, J;
Publicação
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
Abstract
This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).
2022
Autores
Santos, JC; Abreu, PH; Santos, MS;
Publicação
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)
Abstract
The quality of mammography images is essential for the diagnosis of breast cancer and image imputation has become a popular technique to overcome noise, artifacts, and missing data to aid in the diagnosis of diseases. In this paper, we assess the performance of six imputation methodologies for the reconstruction of missing pixels in different morphologies in mammography images. The images included in this study are collected from four public datasets (CBIS-DDSM, Mini-MIAS, INbreast, and CSAW) and the imputation results are evaluated through the mean absolute error (MAE) and structural similarity index measure (SSIM). This study goes beyond the traditional evaluation of imputation algorithms, analyzing imputation quality, morphology preservation and classification performance. The effects of imputation on the morphology of cancer lesions are of utmost importance since it lays the foundation for physicians to interpret and analyze the imputation results. The results show that DIP is the most promising methodology for higher missing pixel rates, morphology preservation, and classifying malignant and benign images.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.