Detalhes
Nome
Hugo Oliveira SousaCargo
Estudante ExternoDesde
07 dezembro 2020
Nacionalidade
PortugalCentro
Laboratório de Inteligência Artificial e Apoio à DecisãoContactos
+351220402963
hugo.o.sousa@inesctec.pt
2025
Autores
Sousa, H; Ward, AR; Alonso, O;
Publicação
PROCEEDINGS OF THE EIGHTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2025
Abstract
Events like Valentine's Day and Christmas can influence user intent when interacting with search engines. For example, a user searching for gift around Valentine's Day is likely to be looking for Valentine's-themed options, whereas the same query close to Christmas would more likely suggest an interest in Holiday-themed gifts. These shifts in user intent, driven by temporal factors, are often implicit but important to determine the relevance of search results. In this demo, we explore how incorporating temporal awareness can enhance search relevance in an e-commerce setting. We constructed a database of 2K events and, using historical purchase data, developed a temporal model that estimates each event's importance on a specific date. The most relevant events on the date the query was issued are then used to enrich search results with event-specific items. Our demo illustrates how this approach enables a search system to better adapt to temporal nuances, ultimately delivering more contextually relevant products.
2024
Autores
Almeida, R; Sousa, H; Cunha, LF; Guimaraes, N; Campos, R; Jorge, A;
Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V
Abstract
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.
2024
Autores
Nunes, S; Jorge, AM; Amorim, E; Sousa, HO; Leal, A; Silvano, PM; Cantante, I; Campos, R;
Publicação
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.
Abstract
Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.
2023
Autores
Sousa, H; Pasquali, A; Jorge, A; Santos, CS; Lopes, MA;
Publicação
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023
Abstract
Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved..1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.
2023
Autores
Sousa, H; Jorge, A; Campos, R;
Publicação
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023
Abstract
Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades. Such endeavors have led to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult to benchmark TIE systems. On the one hand, different datasets have different annotation schemes, which hinders the comparison between competitors across different corpora. On the other hand, the fact that each corpus is disseminated in a different format requires a considerable engineering effort for a researcher/practitioner to develop parsers for all of them. These constraints force researchers to select a limited amount of datasets to evaluate their systems which consequently limits the comparability of the systems. Yet another obstacle to the comparability of TIE systems is the evaluation metric employed. While most research works adopt traditional metrics such as precision, recall, and..1, a few others prefer temporal awareness - a metric tailored to be more comprehensive on the evaluation of temporal systems. Although the reason for the absence of temporal awareness in the evaluation of most systems is not clear, one of the factors that certainly weighs on this decision is the need to implement the temporal closure algorithm, which is neither straightforward to implement nor easily available. All in all, these problems have limited the fair comparison between approaches and consequently, the development of TIE systems. To mitigate these problems, we have developed tieval, a Python library that provides a concise interface for importing different corpora and is equipped with domain-specific operations that facilitate system evaluation. In this paper, we present the first public release of tieval and highlight its most relevant features. The library is available as open source, under MIT License, at PyPI1 and GitHub(2).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.