2022
Autores
Pereira, K; Vinagre, J; Alonso, AN; Coelho, F; Carvalho, M;
Publicação
Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part II
Abstract
The application of machine learning to insurance risk prediction requires learning from sensitive data. This raises multiple ethical and legal issues. One of the most relevant ones is privacy. However, privacy-preserving methods can potentially hinder the predictive potential of machine learning models. In this paper, we present preliminary experiments with life insurance data using two privacy-preserving techniques: discretization and encryption. Our objective with this work is to assess the impact of such privacy preservation techniques in the accuracy of ML models. We instantiate the problem in three general, but plausible Use Cases involving the prediction of insurance claims within a 1-year horizon. Our preliminary experiments suggest that discretization and encryption have negligible impact in the accuracy of ML models. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
2022
Autores
Muhongo, TS; Brazdil, PB; Silva, F;
Publicação
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE
Abstract
Angola is characterized by many different languages and social, cultural and political realities, which had a marked effect on Angolan Portuguese (AP). Consequently, AP is characterized by diatopic variation. One of the marked effects is the loanwords imported from other Angolan languages. Our objective is to analyze different Angolan texts, analyze the lexical forms used and conduct a comparative study with European Portuguese, aiming at identifying the possible loanwords in Angolan Portuguese. This process was automated, as well as the identification of all loanwords' cotexts. In addition, we determine the lexical class of each loanword and the Angolan language of its origin. Most lexical loanwords come from the Kimbundu, although AP includes loanwords from some other Angolan languages too. Our study serves as a basis for preparing an Angolan regionalism dictionary. We noticed that more than 700 identified loanwords do not figure in the existing dictionaries.
2022
Autores
Brazdil, P; Muhammad, SH; Oliveira, F; Cordeiro, J; Silva, F; Silvano, P; Leal, A;
Publicação
MATHEMATICS
Abstract
This paper describes two different approaches to sentiment analysis. The first is a form of symbolic approach that exploits a sentiment lexicon together with a set of shifter patterns and rules. The sentiment lexicon includes single words (unigrams) and is developed automatically by exploiting labeled examples. The shifter patterns include intensification, attenuation/downtoning and inversion/reversal and are developed manually. The second approach exploits a deep neural network, which uses a pre-trained language model. Both approaches were applied to texts on economics and finance domains from newspapers in European Portuguese. We show that the symbolic approach achieves virtually the same performance as the deep neural network. In addition, the symbolic approach provides understandable explanations, and the acquired knowledge can be communicated to others. We release the shifter patterns to motivate future research in this direction.
2022
Autores
Brazdil, P; van Rijn, JN; Gouk, H; Mohr, F;
Publicação
ECML/PKDD Workshop on Meta-Knowledge Transfer, 23 September 2022, Grenoble, France
Abstract
2022
Autores
Brazdil, P; van Rijn, JN; Gouk, H; Mohr, F;
Publicação
Meta-Knowledge Transfer @ ECML/PKDD
Abstract
2022
Autores
Pedrosa, J; Aresta, G; Ferreira, C; Carvalho, C; Silva, J; Sousa, P; Ribeiro, L; Mendonca, AM; Campilho, A;
Publicação
SCIENTIFIC REPORTS
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has impacted healthcare systems across the world. Chest radiography (CXR) can be used as a complementary method for diagnosing/following COVID-19 patients. However, experience level and workload of technicians and radiologists may affect the decision process. Recent studies suggest that deep learning can be used to assess CXRs, providing an important second opinion for radiologists and technicians in the decision process, and super-human performance in detection of COVID-19 has been reported in multiple studies. In this study, the clinical applicability of deep learning systems for COVID-19 screening was assessed by testing the performance of deep learning systems for the detection of COVID-19. Specifically, four datasets were used: (1) a collection of multiple public datasets (284.793 CXRs); (2) BIMCV dataset (16.631 CXRs); (3) COVIDGR (852 CXRs) and 4) a private dataset (6.361 CXRs). All datasets were collected retrospectively and consist of only frontal CXR views. A ResNet-18 was trained on each of the datasets for the detection of COVID-19. It is shown that a high dataset bias was present, leading to high performance in intradataset train-test scenarios (area under the curve 0.55-0.84 on the collection of public datasets). Significantly lower performances were obtained in interdataset train-test scenarios however (area under the curve > 0.98). A subset of the data was then assessed by radiologists for comparison to the automatic systems. Finetuning with radiologist annotations significantly increased performance across datasets (area under the curve 0.61-0.88) and improved the attention on clinical findings in positive COVID-19 CXRs. Nevertheless, tests on CXRs from different hospital services indicate that the screening performance of CXR and automatic systems is limited (area under the curve < 0.6 on emergency service CXRs). However, COVID-19 manifestations can be accurately detected when present, motivating the use of these tools for evaluating disease progression on mild to severe COVID-19 patients.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.