2022
Authors
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;
Publication
Cognitive Technologies
Abstract
2022
Authors
Muhammad, SH; Adelani, DI; Ruder, S; Ahmad, IS; Abdulmumin, I; Bello, BS; Choudhury, M; Emezue, CC; Abdullahi, SS; Aremu, A; Jorge, A; Brazdil, P;
Publication
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Abstract
Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria-Hausa, Igbo, Nigerian-Pidgin, and Yoruba-consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.
2021
Authors
Brazdil P.; Silvano P.; Silva F.; Muhammad S.; Oliveira F.; Cordeiro J.; Leal A.;
Publication
CEUR Workshop Proceedings
Abstract
This paper describes an approach to the construction of a sentiment analysis system that uses both automatic and manual processes. The system includes a domain-specific sentiment lexicon, modifier patterns and rules that are used to derive the sentiment values of sentences in new texts. The lexicon that includes single words (unigrams) is obtained in an automatic manner from the distribution of ratings for all words in the labelled training data. The sentiment values of phrases is derived from a list of modifier patterns, built/developed manually. These include a modifier and a focal element. The modifiers can be of different types, depending on whether the operation is intensification, downtoning or reversal. This approach was applied to texts on economics and finance in European Portuguese. In our view, this line of work deserves more attention in the community, as the system not only has reasonable performance, but also can provide understandable explanations to the user.
2025
Authors
Freitas, F; Brazdil, P; Soares, C;
Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space). This increases the probability of including the best one for any dataset but makes the task of identifying it for a new dataset more difficult. In this paper, we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best algorithm configuration, with limited risk of significant loss of predictive performance. We empirically validate the method with a large set of alternatives based on five ML algorithms with different sets of hyperparameters and one preprocessing method (feature selection). Our results show that it is possible to reduce the given search space by more than one order of magnitude, from a few thousands to a few hundred items. After reduction, the search for the best algorithm configuration is about one order of magnitude faster than on the original space without significant loss in predictive performance.
2022
Authors
Brazdil, P; van Rijn, JN; Gouk, H; Mohr, F;
Publication
ECML/PKDD Workshop on Meta-Knowledge Transfer, 23 September 2022, Grenoble, France
Abstract
2022
Authors
Brazdil, P; van Rijn, JN; Gouk, H; Mohr, F;
Publication
Meta-Knowledge Transfer @ ECML/PKDD
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.