Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2020

ECML PKDD 2020 Workshops - Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Ghent, Belgium, September 14-18, 2020, Proceedings

Autores
Koprinska, I; Kamp, M; Appice, A; Loglisci, C; Antonie, L; Zimmermann, A; Guidotti, R; Özgöbek, O; Ribeiro, RP; Gavaldà, R; Gama, J; Adilova, L; Krishnamurthy, Y; Ferreira, PM; Malerba, D; Medeiros, I; Ceci, M; Manco, G; Masciari, E; Ras, ZW; Christen, P; Ntoutsi, E; Schubert, E; Zimek, A; Monreale, A; Biecek, P; Rinzivillo, S; Kille, B; Lommatzsch, A; Gulla, JA;

Publicação
PKDD/ECML Workshops

Abstract

2020

Trustability in Algorithmic Systems Based on Artificial Intelligence in the Public and Private Sectors

Autores
Teixeira, S; Gama, J; Amorim, P; Figueira, G;

Publicação
ERCIM NEWS

Abstract
Algorithmic systems based on artificial intelligence (AI) increasingly play a role in decision-making processes, both in government and industry. These systems are used in areas such as retail, finances, and manufacturing. In the latter domain, the main priority is that the solutions are interpretable, as this characteristic correlates to the adoption rate of users (e.g., schedulers). However, more recently, these systems have been applied in areas of public interest, such as education, health, public administration, and criminal justice. The adoption of these systems in this domain, in particular the data-driven decision models, has raised questions about the risks associated with this technology, from which ethical problems may emerge. We analyse two important characteristics, interpretability and trustability, of AI-based systems in the industrial and public domains, respectively.

FecharLer Abstract

2021

Multi-aspect renewable energy forecasting

Autores
Corizzo, R; Ceci, M; Fanaee T, H; Gama, J;

Publicação
INFORMATION SCIENCES

Abstract
The increasing presence of renewable energy plants has created new challenges such as grid integration, load balancing and energy trading, making it fundamental to provide effective prediction models. Recent approaches in the literature have shown that exploiting spatio-temporal autocorrelation in data coming from multiple plants can lead to better predictions. Although tensor models and techniques are suitable to deal with spatio-temporal data, they have received little attention in the energy domain. In this paper, we propose a new method based on the Tucker tensor decomposition, capable of extracting a new feature space for the learning task. For evaluation purposes, we have investigated the performance of predictive clustering trees with the new feature space, compared to the original feature space, in three renewable energy datasets. The results are favorable for the proposed method, also when compared with state-of-the-art algorithms.

FecharLer Abstract

2021

Classification and Recommendation With Data Streams

Autores
Veloso, B; Gama, J; Malheiro, B;

Publicação
Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management

Abstract
Nowadays, with the exponential growth of data stream sources (e.g., Internet of Things [IoT], social networks, crowdsourcing platforms, and personal mobile devices), data stream processing has become indispensable for online classification, recommendation, and evaluation. Its main goal is to maintain dynamic models updated, holding the captured patterns, to make accurate predictions. The foundations of data streams algorithms are incremental processing, in order to reduce the computational resources required to process large quantities of data, and relevance model updating. This article addresses data stream knowledge processing, covering classification, recommendation, and evaluation; describing existing algorithms/techniques; and identifying open challenges.

FecharLer Abstract

2021

Forecasting conditional extreme quantiles for wind energy

Autores
Goncalves, C; Cavalcante, L; Brito, M; Bessa, RJ; Gama, J;

Publicação
ELECTRIC POWER SYSTEMS RESEARCH

Abstract
Probabilistic forecasting of distribution tails (i.e., quantiles below 0.05 and above 0.95) is challenging for non parametric approaches since data for extreme events are scarce. A poor forecast of extreme quantiles can have a high impact in various power system decision-aid problems. An alternative approach more robust to data sparsity is extreme value theory (EVT), which uses parametric functions for modelling distribution's tails. In this work, we apply conditional EVT estimators to historical data by directly combining gradient boosting trees with a truncated generalized Pareto distribution. The parametric function parameters are conditioned by covariates such as wind speed or direction from a numerical weather predictions grid. The results for a wind power plant located in Galicia, Spain, show that the proposed method outperforms state-of-the-art methods in terms of quantile score.

FecharLer Abstract

2020

AutoML for Stream <i>k</i>-Nearest Neighbors Classification

Autores
Bahri, M; Veloso, B; Bifet, A; Gama, J;

Publicação
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

Abstract
The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, the accuracy of these algorithms is very sensitive to their hyper-parameters, which requires expertise and extensive trials to tune. Another relevant aspect is the high-dimensionality of data, which can causes degradation to computational performance. To cope with these issues, this paper proposes a stream k-nearest neighbors (kNN) algorithm that applies an internal dimension reduction to the stream in order to reduce the resource usage and uses an automatic monitoring system that tunes dynamically the configuration of the kNN algorithm and the output dimension size with big data streams. Experiments over a wide range of datasets show that the predictive and computational performances of the kNN algorithm are improved.

FecharLer Abstract