Publications

Publications by Pavel Brazdil

2022

Metalearning

Authors
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;

Publication
Cognitive Technologies

Abstract

2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Authors
Muhammad, SH; Adelani, DI; Ruder, S; Ahmad, IS; Abdulmumin, I; Bello, BS; Choudhury, M; Emezue, CC; Abdullahi, SS; Aremu, A; Jorge, A; Brazdil, P;

Publication
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria-Hausa, Igbo, Nigerian-Pidgin, and Yoruba-consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.

CloseRead Abstract

2021

Extending General Sentiment Lexicon to Specific Domains in (Semi-)Automatic Manner

Authors
Brazdil P.; Silvano P.; Silva F.; Muhammad S.; Oliveira F.; Cordeiro J.; Leal A.;

Publication
CEUR Workshop Proceedings

Abstract
This paper describes an approach to the construction of a sentiment analysis system that uses both automatic and manual processes. The system includes a domain-specific sentiment lexicon, modifier patterns and rules that are used to derive the sentiment values of sentences in new texts. The lexicon that includes single words (unigrams) is obtained in an automatic manner from the distribution of ratings for all words in the labelled training data. The sentiment values of phrases is derived from a list of modifier patterns, built/developed manually. These include a modifier and a focal element. The modifiers can be of different types, depending on whether the operation is intensification, downtoning or reversal. This approach was applied to texts on economics and finance in European Portuguese. In our view, this line of work deserves more attention in the community, as the system not only has reasonable performance, but also can provide understandable explanations to the user.

CloseRead Abstract

2025

Reducing algorithm configuration spaces for efficient search

Authors
Freitas, F; Brazdil, P; Soares, C;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space). This increases the probability of including the best one for any dataset but makes the task of identifying it for a new dataset more difficult. In this paper, we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best algorithm configuration, with limited risk of significant loss of predictive performance. We empirically validate the method with a large set of alternatives based on five ML algorithms with different sets of hyperparameters and one preprocessing method (feature selection). Our results show that it is possible to reduce the given search space by more than one order of magnitude, from a few thousands to a few hundred items. After reduction, the search for the best algorithm configuration is about one order of magnitude faster than on the original space without significant loss in predictive performance.

CloseRead Abstract

2022

ECML/PKDD Workshop on Meta-Knowledge Transfer, 23 September 2022, Grenoble, France

Authors
Brazdil, P; van Rijn, JN; Gouk, H; Mohr, F;

Publication
Meta-Knowledge Transfer @ ECML/PKDD

Abstract

2023

Symbolic Versus Deep Learning Techniques for Explainable Sentiment Analysis

Authors
Muhammad, SH; Brazdil, P; Jorge, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I

Abstract
Deep learning approaches have become popular in many different areas, including sentiment analysis (SA), because of their competitive performance. However, the downside of this approach is that they do not provide understandable explanations on how the sentiment values are calculated. In contrast, previous approaches that used sentiment lexicons can do that, but their performance is normally not high. To leverage the strengths of both approaches, we present a neuro-symbolic approach that combines deep learning (DL) and symbolic methods for SA tasks. The DL approach uses a pre-trained language model (PLM) to construct sentiment lexicon. The symbolic approach exploits the constructed sentiment lexicon and manually constructed shifter patterns to determine the sentiment of a sentence. Our experimental results show that the proposed approach leads to promising results with the additional advantage that sentiment predictions can be accompanied by understandable explanations.

CloseRead Abstract