Publicacoes - INESC TEC

Publicações

Publicações por Purificação Silvano

2021

Brat2Viz: a Tool and Pipeline for Visualizing Narratives from Annotated Texts

Autores
Amorim, E; Ribeiro, A; Santana, BS; Cantante, I; Jorge, A; Nunes, S; Silvano, P; Leal, A; Campos, R;

Publicação
Text2Story@ECIR

Abstract
Narrative Extraction from text is a complex task that starts by identifying a set of narrative elements (actors, events, times), and the semantic links between them (temporal, referential, semantic roles). The outcome is a structure or set of structures which can then be represented graphically, thus opening room for further and alternative exploration of the plot. Such visualization can also be useful during the on-going annotation process. Manual annotation of narratives can be a complex effort and the possibility offered by the Brat annotation tool of annotating directly on the text does not seem sufficiently helpful. In this paper, we propose Brat2Viz, a tool and a pipeline that displays visualization of narrative information annotated in Brat. Brat2Viz reads the annotation file of Brat, produces an intermediate representation in the declarative language DRS (Discourse Representation Structure), and from this obtains the visualization. Currently, we make available two visualization schemes: MSC (Message Sequence Chart) and Knowledge Graphs. The modularity of the pipeline enables the future extension to new annotation sources, different annotation schemes, and alternative visualizations or representations. We illustrate the pipeline using examples from an European Portuguese news corpus.

FecharLer Abstract

2022

Semi-Automatic Approaches for Exploiting Shifter Patterns in Domain-Specific Sentiment Analysis

Autores
Brazdil, P; Muhammad, SH; Oliveira, F; Cordeiro, J; Silva, F; Silvano, P; Leal, A;

Publicação
MATHEMATICS

Abstract
This paper describes two different approaches to sentiment analysis. The first is a form of symbolic approach that exploits a sentiment lexicon together with a set of shifter patterns and rules. The sentiment lexicon includes single words (unigrams) and is developed automatically by exploiting labeled examples. The shifter patterns include intensification, attenuation/downtoning and inversion/reversal and are developed manually. The second approach exploits a deep neural network, which uses a pre-trained language model. Both approaches were applied to texts on economics and finance domains from newspapers in European Portuguese. We show that the symbolic approach achieves virtually the same performance as the deep neural network. In addition, the symbolic approach provides understandable explanations, and the acquired knowledge can be communicated to others. We release the shifter patterns to motivate future research in this direction.

FecharLer Abstract

2021

Extending General Sentiment Lexicon to Specific Domains in (Semi-)Automatic Manner

Autores
Brazdil P.; Silvano P.; Silva F.; Muhammad S.; Oliveira F.; Cordeiro J.; Leal A.;

Publicação
CEUR Workshop Proceedings

Abstract
This paper describes an approach to the construction of a sentiment analysis system that uses both automatic and manual processes. The system includes a domain-specific sentiment lexicon, modifier patterns and rules that are used to derive the sentiment values of sentences in new texts. The lexicon that includes single words (unigrams) is obtained in an automatic manner from the distribution of ratings for all words in the labelled training data. The sentiment values of phrases is derived from a list of modifier patterns, built/developed manually. These include a modifier and a focal element. The modifiers can be of different types, depending on whether the operation is intensification, downtoning or reversal. This approach was applied to texts on economics and finance in European Portuguese. In our view, this line of work deserves more attention in the community, as the system not only has reasonable performance, but also can provide understandable explanations to the user.

FecharLer Abstract

2025

Será o ChatGPT um bom divulgador científico em cosmetologia? Um estudo linguístico sobre textos de divulgação científica - Is ChatGPT a good popular science disseminator in cosmetology? A linguistic study on popular science texts

Autores
Pacheco, AF; Guimarães, N; Torres, A; Silvano, P; Almeida, I;

Publicação
Revista da Associação Portuguesa de Linguística

Abstract
O género textual de divulgação científica é fundamental para a disseminação do conhecimento científico de forma acessível e compreensível junto do público não especializado, apresentando estrutura e características diferentes das dos artigos científicos (e.g., Garces-Conejos & Sanchez-Macarro, 1998; Zamboni, 1998). Os estudos sobre as propriedades linguísticas do texto de divulgação científica em português europeu não abundam, sendo a exceção o projeto Promoção da Literacia Científica (Gonçalves & Jorge, 2018). Por outro lado, no âmbito da produção de conteúdo, os grandes modelos de linguagem (LLM), nomeadamente os modelos GPT da OpenAI, ganharam, em pouco tempo, atenção generalizada do público. Sendo recentes, a avaliação da qualidade linguística dos textos produzidos é ainda muito reduzida. Tendo estas premissas em consideração, o presente estudo tem como objetivo avaliar a qualidade linguística das respostas geradas pelo ChatGPT (GPT-3.5) no domínio da cosmetologia, no que respeita às categorias de produtos cosméticos, ingredientes, segurança e eficácia e regulamentação, visando identificar padrões que permitam compreender as diferenças e/ou semelhanças entre o conteúdo gerado pelo LLM e aquele produzido por especialistas humanos no Portal infoCosméticos. Para isso, foram selecionadas vinte questões previamente respondidas e publicadas no portal e, posteriormente, criados quatro prompts distintos com diferentes graus de complexidade, que deram origem a oitenta respostas geradas pelo ChatGPT. As respostas foram, de seguida, analisadas, de acordo com os resultados conduzidos por uma grelha de avaliação linguística composta por 11 perguntas. A análise produziu resultados de diferentes tipos: em termos globais, as respostas escritas pelos especialistas produzem resultados ligeiramente superiores às do ChatGPT; quanto à coesão interfrásica, constatou-se que os textos produzidos por especialistas usam um número reduzido de conectores, contrastando com o uso recorrentemente de marcadores discursivos nos textos do ChatGPT; verifica-se o uso de jargão científico não explicado e uma macroestrutura com ausência do parágrafo da conclusão, nos textos publicados no portal; os textos gerados pelo ChatGPT apresentam uma frequência elevada de repetições e/ou tautologias.

FecharLer Abstract

2025

FRaN-X: FRaming and Narratives-eXplorer

Autores
Muratov, A; Shaikh, HF; Jani, V; Mahmoud, T; Xie, Z; Orel, D; Singh, A; Wang, Y; Joshi, A; Iqbal, H; Hee, MS; Sahnan, D; Nikolaidis, N; Silvano, P; Dimitrov, D; Yangarber, R; Campos, R; Jorge, A; Guimarães, N; Sartori, E; Stefanovitch, N; San Martino, GD; Piskorski, J; Nakov, P;

Publicação
CoRR

Abstract

2025

PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles

Autores
Nikolaidis, N; Stefanovitch, N; Silvano, P; Dimitrov, D; Yangarber, R; Guimaraes, N; Sartori, E; Androutsopoulos, I; Nakov, P; Da San Martino, G; Piskorski, J;

Publicação
PROCEEDINGS OF THE 63RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS

Abstract
We present PolyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.

FecharLer Abstract