Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2023

Text Mining and Visualization of Political Party Programs Using Keyword Extraction Methods: The Case of Portuguese Legislative Elections

Autores
Campos, R; Jatowt, A; Jorge, A;

Publicação
iConference (1)

Abstract
Extracting keywords from textual data is a crucial step for text analysis. One such process may involve a considerable amount of time when done manually. In this paper, we show how keyword extraction techniques can be used to untap texts of political nature. To accomplish this objective, we conduct a case-study on top of 16 Portuguese (PT) political party programs made available in the context of the legislative elections that took place in 30th of January 2022. Our contributions are two-fold. At the level of resources, we make available a curated dataset and a python notebook that systematizes the process of transforming text into quantitative data and into visual aspects. At the methodological level, we propose to extend the keyword extraction algorithm used in this study to extract the most relevant keywords, not only from individual political party programs, but also across the entire collection of documents. A further contribution is the case-study itself, which calls attention to the fact that such solutions may be of interest not only to common people, but also to journalists or politicians alike. Broadly, we demonstrate how the discussion and the analysis that stems from the results obtained may foster the political science research by making available large-scale processing of documents with marginal costs.

FecharLer Abstract

2023

Geovisualisation Tools for Reporting and Monitoring Transthyretin-Associated Familial Amyloid Polyneuropathy Disease

Autores
Lôpo, RX; Jorge, AM; Pedroto, M;

Publicação
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I

Abstract
Transthyretin-associated Familial Amyloid Polyneuropathy (TTR-FAP) is a chronic fatal disease with a high incidence in Portugal. It is therefore relevant to provide professionals and citizens with a tool that enables a detailed geographical and territorial study. For this reason, we have developed an web based application that brings together techniques applied to spatial data that allow the study of the historical progression and growth of cases in patients' residential areas and areas of origin as well as an epidemic forecast. The tool enables the exploration of geographical longitudinal data at national, district and county levels. High density regions and periods can be visually identified according to parameters selected by the user. The visual evaluation of the data and its comparison across different time spans of the disease era can have an impact on more informed decision making by those working with patients to improve their quality of life, treatment or follow-up. The tool is available online for data exploration and its code is available on GitHub for adaptation to other geospatial scenarios.

FecharLer Abstract

2023

The 6th International Workshop on Narrative Extraction from Texts: Text2Story 2023

Autores
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III

Abstract
Over these past five years, significant breakthroughs, led by Transformers and large language models, have been made in understanding natural language text. However, the ability to capture contextual nuances in longer texts is still an elusive goal, let alone the understanding of consistent fine-grained narrative structures in text. These unsolved challenges and the interest in the community are at the basis of the sixth edition of Text2Story workshop to be held in Dublin on April 2nd, 2023 in conjunction with the 45th European Conference on Information Retrieval (ECIR'23). In its sixth edition, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction and generation task. Text2Story includes sessions devoted to full research papers, work-in-progress, demos and dissemination papers, keynote talks and space for an informal discussion of the methods, of the challenges and of the future of this research area.

FecharLer Abstract

2023

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Autores
Sousa, H; Pasquali, A; Jorge, A; Santos, CS; Lopes, MA;

Publicação
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved..1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.

FecharLer Abstract

2023

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Autores
Sousa, H; Jorge, A; Campos, R;

Publicação
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades. Such endeavors have led to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult to benchmark TIE systems. On the one hand, different datasets have different annotation schemes, which hinders the comparison between competitors across different corpora. On the other hand, the fact that each corpus is disseminated in a different format requires a considerable engineering effort for a researcher/practitioner to develop parsers for all of them. These constraints force researchers to select a limited amount of datasets to evaluate their systems which consequently limits the comparability of the systems. Yet another obstacle to the comparability of TIE systems is the evaluation metric employed. While most research works adopt traditional metrics such as precision, recall, and..1, a few others prefer temporal awareness - a metric tailored to be more comprehensive on the evaluation of temporal systems. Although the reason for the absence of temporal awareness in the evaluation of most systems is not clear, one of the factors that certainly weighs on this decision is the need to implement the temporal closure algorithm, which is neither straightforward to implement nor easily available. All in all, these problems have limited the fair comparison between approaches and consequently, the development of TIE systems. To mitigate these problems, we have developed tieval, a Python library that provides a concise interface for importing different corpora and is equipped with domain-specific operations that facilitate system evaluation. In this paper, we present the first public release of tieval and highlight its most relevant features. The library is available as open source, under MIT License, at PyPI1 and GitHub(2).

FecharLer Abstract

2023

Proceedings of Text2Story - Sixth Workshop on Narrative Extraction From Texts held in conjunction with the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2, 2023

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
Text2Story@ECIR

Abstract