2025
Authors
Sousa, H; Almasian, S; Campos, R; Jorge, A;
Publication
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 24
Abstract
Language models have become foundational to many widely used systems. However, these seemingly advantageous models are double-edged swords. While they excel in tasks related to resource-rich languages like English, they often lose the fine nuances of language forms, dialects, and varieties that are inherent to languages spoken in multiple regions of the world. Languages like European Portuguese are neglected in favor of their more popular counterpart, Brazilian Portuguese, leading to suboptimal performance in various linguistic tasks. To address this gap, we introduce the first open-source translation model specifically tailored for European Portuguese, along with a novel dataset specifically designed for this task. Results from automatic evaluations on two benchmark datasets demonstrate that our best model surpasses existing open-source translation systems for Portuguese and approaches the performance of industry-leading closed-source systems for European Portuguese. By making our dataset, models, and code publicly available, we aim to support and encourage further research, fostering advancements in the representation of underrepresented language varieties.
2025
Authors
Chandramohan, MS; da Silva, IM; Ribeiro, RP; Jorge, A; da Silva, JE;
Publication
ENVIRONMENTS
Abstract
This study investigates spatial distribution and chemical elemental composition screening in soils in Rome (Italy) using X-ray fluorescence analysis. Fifty-nine soil samples were collected from various locations within the urban areas of the Rome municipality and were analyzed for 19 elements. Multivariate statistical techniques, including nonlinear mapping, principal component analysis, and hierarchical cluster analysis, were employed to identify clusters of similar soil samples and their spatial distribution and to try to obtain environmental quality information. The soil sample clusters result from natural geological processes and anthropogenic activities on soil contamination patterns. Spatial clustering using the k-means algorithm further identified six distinct clusters, each with specific geographical distributions and elemental characteristics. Hence, the findings underscore the importance of targeted soil assessments to ensure the sustainable use of land resources in urban areas.
2024
Authors
Guimarães, N; Campos, R; Jorge, A;
Publication
Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, March 12-15, 2024, Volume 2
Abstract
2025
Authors
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2024
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Mansouri, B;
Publication
SIGIR Forum
Abstract
2024
Authors
Piskorski, J; Stefanovitch, N; Alam, F; Campos, R; Dimitrov, D; Jorge, A; Pollak, S; Ribin, N; Fijavz, Z; Hasanain, M; Silvano, P; Sartori, E; Guimarães, N; Vitez, AZ; Pacheco, AF; Koychev, I; Yu, N; Nakov, P; San Martino, GD;
Publication
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024.
Abstract
We present an overview of CheckThat! Lab's 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali-Palestian conflict, the Russia-Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers' system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts. © 2024 Copyright for this paper by its authors.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.