Publicacoes - INESC TEC

Publicações

Publicações por Luís Torgo

2018

Environmental controls on estuarine nitrifying communities along a salinity gradient

Autores
Monteiro, M; Seneca, J; Torgo, L; Cleary, DFR; Gomes, NCM; Santoro, AE; Magalhaes, C;

Publicação
AQUATIC MICROBIAL ECOLOGY

Abstract
Estuaries are transitional zones between marine and freshwater environments and are ideal systems to study the influence of environmental gradients on microbial biodiversity and activity. In this study, we investigated the effect of a salinity gradient on the structure of prokaryotic communities from intertidal sediments of the Douro estuary, and on the nitrification process. Four locations were chosen with distinct salinities and characterized for a range of environmental parameters including measurements of potential nitrification rates. The structure of prokaryotic communities and ammonia-oxidizing bacteria and archaea were described and identified using the 16S rRNA gene. Potential nitrification rates ranged from 1.3 to 7.4 mu mol cm(-2) h(-1), with the highest rate at mesohaline sites; however, the relative abundance of nitrifying taxa was higher at locations with higher salinity. Ammonia-oxidizing bacteria could not be detected in oligohaline sites, in contrast to ammonia-oxidizing archaea, which showed a ubiquitous distribution. Nitrite-oxidizing bacteria were more abundant than ammonia-oxidizing groups across meso-oligohaline sites, showing increased relative abundance at less saline sites. One operational taxonomic unit closely related to Nitrospira moscoviensis showed a positive correlation with potential nitrification rates, suggesting a possible association of N. moscoviensis with ammonia-oxidizing organisms in a natural ecosystem. Such results point out the need to re-assess the relative roles of different nitrifying groups in the nitrification process.

FecharLer Abstract

2019

A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks

Autores
Branco, P; Torgo, L;

Publicação
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019)

Abstract
The class imbalance problem has been thoroughly studied over the past two decades. More recently, the research community realized that the problem of imbalanced distributions also occurred in other tasks beyond classification. Regression problems are among these newly studied tasks where the problem of imbalanced domains also poses important challenges. Imbalanced regression problems occur in a diversity of real world domains such as meteorological (predicting weather extreme values), financial (extreme stock returns forecasting) or medical (anticipate rare values). In imbalanced regression the end-user preferences are biased towards values of the target variable that are under-represented on the available data. Several pre-processing methods were proposed to address this problem. These methods change the training set to force the learner to focus on the rare cases. However, as far as we know, the relationship between the data intrinsic characteristics and the performance achieved by these methods has not yet been studied for imbalanced regression tasks. In this paper we describe a study of the impact certain data characteristics may have in the results of applying pre-processing methods to imbalanced regression problems. To achieve this goal, we define potentially interesting data characteristics of regression problems. We then conduct our study using a synthetic data repository build for this purpose. We show that all the different characteristics studied have a different behaviour that is related with the level at which the data characteristic is present and the learning algorithm used. The main contributions of our work are: i) to define interesting data characteristics for regression tasks; ii) to create the first repository of imbalanced regression tasks containing 6000 data sets with controlled data characteristics; and iii) to provide insights on the impact of intrinsic data characteristics in the results of pre-processing methods for handling imbalanced regression tasks.

FecharLer Abstract

2021

Evaluation Procedures for Forecasting with Spatiotemporal Data

Autores
Oliveira, M; Torgo, L; Costa, VS;

Publicação
MATHEMATICS

Abstract
The increasing use of sensor networks has led to an ever larger number of available spatiotemporal datasets. Forecasting applications using this type of data are frequently motivated by important domains such as environmental monitoring. Being able to properly assess the performance of different forecasting approaches is fundamental to achieve progress. However, traditional performance estimation procedures, such as cross-validation, face challenges due to the implicit dependence between observations in spatiotemporal datasets. In this paper, we empirically compare several variants of cross-validation (CV) and out-of-sample (OOS) performance estimation procedures, using both artificially generated and real-world spatiotemporal datasets. Our results show both CV and OOS reporting useful estimates, but they suggest that blocking data in space and/or in time may be useful in mitigating CV's bias to underestimate error. Overall, our study shows the importance of considering data dependencies when estimating the performance of spatiotemporal forecasting models.

FecharLer Abstract

2021

Profiling Accounts Political Bias on Twitter

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021)

Abstract
Twitter has become a major platform to share ideas and promoting discussion on relevant topics. However, with a large number of users to resort to it as their primary source of information and with an increasing number of accounts spreading newsworthy content, a characterization of the political bias associated with the social network ecosystem becomes necessary. In this work, we aim at analyzing accounts spreading or publishing content from five different classes of the political spectrum. We also look further and study accounts who spread content from both right and left sides. Conclusions show that there is a large presence of accounts which disseminate right bias content although it is the more central classes that have a higher influence on the network. In addition, users who spread content from both sides are more actively spreading right content with opposite content associated with criticism towards left political parties or promoting right political decisions.

FecharLer Abstract

2021

Towards a pragmatic detection of unreliable accounts on social networks

Autores
Guimarães, N; Figueira, A; Torgo, L;

Publicação
Online Soc. Networks Media

Abstract
In recent years, the problem of unreliable content in social networks has become a major threat, with a proven real-world impact in events like elections and pandemics, undermining democracy and trust in science, respectively. Research in this domain has focused not only on the content but also on the accounts that propagate it, with the bot detection task having been thoroughly studied. However, not all bot accounts work as unreliable content spreaders (p.e. bot for news aggregation), and not all human accounts are necessarily reliable. In this study, we try to distinguish unreliable from reliable accounts, independently of how they are operated. In addition, we work towards providing a methodology capable of coping with real-world situations by introducing the content available (restricting it by volume- and time-based batches) as a parameter of the methodology. Experiments conducted on a validation set with a different number of tweets per account provide evidence that our proposed solution produces an increase of up to 20% in performance when compared with traditional (individual) models and with cross-batch models (which perform better with different batches of tweets).

FecharLer Abstract

2020

Knowledge-based Reliability Metrics for Social Media Accounts

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST)

Abstract
The growth of social media as an information medium without restrictive measures on the creation of new accounts led to the rise of malicious agents with the intend to diffuse unreliable information in the network, ultimately affecting the perception of users in important topics such as political and health issues. Although the problem is being tackled within the domain of bot detection, the impact of studies in this area is still limited due to 1) not all accounts that spread unreliable content are bots, 2) human-operated accounts are also responsible for the diffusion of unreliable information and 3) bot accounts are not always malicious (e.g. news aggregators). Also, most of these methods are based on supervised models that required annotated data and updates to maintain their performance through time. In this work, we build a framework and develop knowledge-based metrics to complement the current research in bot detection and characterize the impact and behavior of a Twitter account, independently of the way it is operated (human or bot). We proceed to analyze a sample of the accounts using the metrics proposed and evaluate the necessity of these metrics by comparing them with the scores from a bot detection system. The results show that the metrics can characterize different degrees of unreliable accounts, from unreliable bot accounts with a high number of followers to human-operated accounts that also spread unreliable content (but with less impact on the network). Furthermore, evaluating a sample of the accounts with a bot detection system shown that bots compose around 11% of the sample of unreliable accounts extracted and that the bot score is not correlated with the proposed metrics. In addition, the accounts that achieve the highest values in our metrics present different characteristics than the ones that achieve the highest bot score. This provides evidence on the usefulness of our metrics in the evaluation of unreliable accounts in social networks. Copyright

FecharLer Abstract