Publicacoes - INESC TEC

Publicações

Publicações por Luís Torgo

2019

Biased Resampling Strategies for Imbalanced Spatio-Temporal Forecasting

Autores
Oliveira, M; Moniz, N; Torgo, L; Costa, VS;

Publicação
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019)

Abstract
Extreme and rare events, such as abnormal spikes in air pollution or weather conditions can have serious repercussions. Many of these sorts of events develop from spatio-temporal processes, and accurate predictions are a most valuable tool in addressing their impact, in a timely manner. In this paper, we propose a new set of resampling strategies for imbalanced spatiotemporal forecasting tasks, by introducing bias into formerly random processes. This spatio-temporal bias includes a hyperparameter that regulates the relative importance of the temporal and spatial dimensions in the selection of observations during under- or over-sampling. We test and compare our proposals against standard versions of the strategies on 10 different georeferenced numeric time series, using 3 distinct off-the-shelf learning algorithms. Experimental results show that our proposal provides an advantage over random resampling strategies in imbalanced spatio-temporal forecasting tasks. Additionally, we also find that valuing an observation's recency is more useful when over-sampling; while valuing its spatial distance to other cases with extreme values is more beneficial when under-sampling.

FecharLer Abstract

2019

Layered Learning for Early Anomaly Detection: Predicting Critical Health Episodes

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28-30, 2019, Proceedings

Abstract
Critical health events represent a relevant cause of mortality in intensive care units of hospitals, and their timely prediction has been gaining increasing attention. This problem is an instance of the more general predictive task of early anomaly detection in time series data. One of the most common approaches to solve this problem is to use standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to solve early anomaly detection problems. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two layers, which we hypothesize are easier to solve. Focusing on critical health episodes, the results suggest that the proposed approach is advantageous relative to state of the art approaches for early anomaly detection. Although we focus on a particular case study, the proposed method is generalizable to other domains. © Springer Nature Switzerland AG 2019.

FecharLer Abstract

2020

Analysis and Detection of Unreliable Users in Twitter: Two Case Studies

Autores
Guimaraes, N; Figueira, A; Torgo, L;

Publicação
Communications in Computer and Information Science

Abstract
The emergence of online social networks provided users with an easy way to publish and disseminate content, reaching broader audiences than previous platforms (such as blogs or personal websites) allowed. However, malicious users started to take advantage of these features to disseminate unreliable content through the network like false information, extremely biased opinions, or hate speech. Consequently, it becomes crucial to try to detect these users at an early stage to avoid the propagation of unreliable content in social networks’ ecosystems. In this work, we introduce a methodology to extract large corpus of unreliable posts using Twitter and two databases of unreliable websites (OpenSources and Media Bias Fact Check). In addition, we present an analysis of the content and users that publish and share several types of unreliable content. Finally, we develop supervised models to classify a twitter account according to its reliability. The experiments conducted using two different data sets show performance above 94% using Decision Trees as the learning algorithm. These experiments, although with some limitations, provide some encouraging results for future research on detecting unreliable accounts on social networks. © 2020, Springer Nature Switzerland AG.

FecharLer Abstract

2019

A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks

Autores
Branco, P; Torgo, L;

Publicação
2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, DC, USA, October 5-8, 2019

Abstract

2019

Visual Interpretation of Regression Error

Autores
Areosa, I; Torgo, L;

Publicação
Progress in Artificial Intelligence, 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3-6, 2019, Proceedings, Part II.

Abstract
Numerous sophisticated machine learning tools (e.g. ensembles or deep networks) have shown outstanding performance in terms of accuracy on different numeric forecasting tasks. In many real world application domains the numeric predictions of the models drive important and costly decisions. Frequently, decision makers require more than a black box model to be able to “trust” the predictions up to the point that they base their decisions on them. In this context, understanding these black boxes has become one of the hot topics in Machine Learning and Data Mining research. This paper proposes a series of visualisation tools that help in understanding the predictive performance of non-interpretable regression models. More specifically, these tools allow the user to relate the expected error of any model to the values of the predictor variables. This type of information allows end-users to correctly assess the risks associated with the use of the models, by showing how concrete values of the predictors may affect the performance of the models. Our illustrations with different real world data sets and learning algorithms provide insights on the type of usage and information these tools bring to both the data analyst and the end-user. © 2019, Springer Nature Switzerland AG.

FecharLer Abstract

2020

Wise Sliding Window Segmentation: A Classification-Aided Approach for Trajectory Segmentation

Autores
Etemad, M; Etemad, Z; Soares, A; Bogorny, V; Matwin, S; Torgo, L;

Publicação
Advances in Artificial Intelligence - 33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13-15, 2020, Proceedings

Abstract
Large amounts of mobility data are being generated from many different sources, and several data mining methods have been proposed for this data. One of the most critical steps for trajectory data mining is segmentation. This task can be seen as a pre-processing step in which a trajectory is divided into several meaningful consecutive sub-sequences. This process is necessary because trajectory patterns may not hold in the entire trajectory but on trajectory parts. In this work we propose a supervised trajectory segmentation algorithm, called Wise Sliding Window Segmentation (WS-II). It processes the trajectory coordinates to find behavioral changes in space and time, generating an error signal that is further used to train a binary classifier for segmenting trajectory data. This algorithm is flexible and can be used in different domains. We evaluate our method over three real datasets from different domains (meteorology, fishing, and individuals movements), and compare it with four other trajectory segmentation algorithms: OWS, GRASP-UTS, CB-SMoT, and SPD. We observed that the proposed algorithm achieves the highest performance for all datasets with statistically significant differences in terms of the harmonic mean of purity and coverage. © Springer Nature Switzerland AG 2020.

FecharLer Abstract