Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Alípio Jorge

2025

Tradutor: Building a Variety Specific Translation Model

Authors
Sousa, H; Almasian, S; Campos, R; Jorge, A;

Publication
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 24

Abstract
Language models have become foundational to many widely used systems. However, these seemingly advantageous models are double-edged swords. While they excel in tasks related to resource-rich languages like English, they often lose the fine nuances of language forms, dialects, and varieties that are inherent to languages spoken in multiple regions of the world. Languages like European Portuguese are neglected in favor of their more popular counterpart, Brazilian Portuguese, leading to suboptimal performance in various linguistic tasks. To address this gap, we introduce the first open-source translation model specifically tailored for European Portuguese, along with a novel dataset specifically designed for this task. Results from automatic evaluations on two benchmark datasets demonstrate that our best model surpasses existing open-source translation systems for Portuguese and approaches the performance of industry-leading closed-source systems for European Portuguese. By making our dataset, models, and code publicly available, we aim to support and encourage further research, fostering advancements in the representation of underrepresented language varieties.

2025

Screening Urban Soil Contamination in Rome: Insights from XRF and Multivariate Analysis

Authors
Chandramohan, MS; da Silva, IM; Ribeiro, RP; Jorge, A; da Silva, JE;

Publication
ENVIRONMENTS

Abstract
This study investigates spatial distribution and chemical elemental composition screening in soils in Rome (Italy) using X-ray fluorescence analysis. Fifty-nine soil samples were collected from various locations within the urban areas of the Rome municipality and were analyzed for 19 elements. Multivariate statistical techniques, including nonlinear mapping, principal component analysis, and hierarchical cluster analysis, were employed to identify clusters of similar soil samples and their spatial distribution and to try to obtain environmental quality information. The soil sample clusters result from natural geological processes and anthropogenic activities on soil contamination patterns. Spatial clustering using the k-means algorithm further identified six distinct clusters, each with specific geographical distributions and elemental characteristics. Hence, the findings underscore the importance of targeted soil assessments to ensure the sustainable use of land resources in urban areas.

2024

Perfil Público: Automatic Generation and Visualization of Author Profiles for Digital News Media

Authors
Guimarães, N; Campos, R; Jorge, A;

Publication
Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, March 12-15, 2024, Volume 2

Abstract

2025

MedLink: Retrieval and Ranking of Case Reports to Assist Clinical Decision Making

Authors
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;

Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V

Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

2024

Report on the 7th International Workshop on Narrative Extraction from Texts (Text2Story 2024) at ECIR 2024

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Mansouri, B;

Publication
SIGIR Forum

Abstract
The Seventh International Workshop on Narrative Extraction from Texts (Text2Story'24) was held on March 24 th , 2024, in conjunction with the 46 th European Conference on Information Retrieval (ECIR 2024) in Glasgow, Scotland. Over the day, more than 50 attendees engaged in discussions and presentations focused on recent advancements in narrative representation, extraction, and generation. The workshop featured two invited keynote addresses, fourteen research paper presentations, and a poster session. The workshop proceedings are available online. 1 Date : 24 March 2024. Website : https://text2story24.inesctec.pt/.

2024

Overview of the CLEF-2024 CheckThat! Lab Task 3 on Persuasion Techniques

Authors
Piskorski, J; Stefanovitch, N; Alam, F; Campos, R; Dimitrov, D; Jorge, A; Pollak, S; Ribin, N; Fijavz, Z; Hasanain, M; Silvano, P; Sartori, E; Guimarães, N; Vitez, AZ; Pacheco, AF; Koychev, I; Yu, N; Nakov, P; San Martino, GD;

Publication
Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September, 2024.

Abstract
We present an overview of CheckThat! Lab's 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali-Palestian conflict, the Russia-Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers' system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts. © 2024 Copyright for this paper by its authors.

  • 25
  • 46