2020
Authors
Bahri, M; Veloso, B; Bifet, A; Gama, J;
Publication
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Abstract
The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, the accuracy of these algorithms is very sensitive to their hyper-parameters, which requires expertise and extensive trials to tune. Another relevant aspect is the high-dimensionality of data, which can causes degradation to computational performance. To cope with these issues, this paper proposes a stream k-nearest neighbors (kNN) algorithm that applies an internal dimension reduction to the stream in order to reduce the resource usage and uses an automatic monitoring system that tunes dynamically the configuration of the kNN algorithm and the output dimension size with big data streams. Experiments over a wide range of datasets show that the predictive and computational performances of the kNN algorithm are improved.
2021
Authors
Nogueira, AR; Gama, J; Ferreira, CA;
Publication
JOURNAL OF DYNAMICS AND GAMES
Abstract
Determining the cause of a particular event has been a case of study for several researchers over the years. Finding out why an event happens (its cause) means that, for example, if we remove the cause from the equation, we can stop the effect from happening or if we replicate it, we can create the subsequent effect. Causality can be seen as a mean of predicting the future, based on information about past events, and with that, prevent or alter future outcomes. This temporal notion of past and future is often one of the critical points in discovering the causes of a given event. The purpose of this survey is to present a cross-sectional view of causal discovery domain, with an emphasis in the machine learning/data mining area.
2021
Authors
Veloso, B; Gama, J; Malheiro, B; Vinagre, J;
Publication
INFORMATION FUSION
Abstract
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. Therefore, the online processing of these data streams requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their batch-learning counterparts, stream-based learning algorithms require careful hyperparameter settings. However, this problem is exacerbated in online learning settings, especially with the occurrence of concept drifts, which frequently require the reconfiguration of hyperparameters. In this article, we present SSPT, an extension of the Self Parameter Tuning (SPT) optimisation algorithm for data streams. We apply the Nelder-Mead algorithm to dynamically-sized samples, converging to optimal settings in a single pass over data while using a relatively small number of hyperparameter configurations. In addition, our proposal automatically readjusts hyperparameters when concept drift occurs. To assess the effectiveness of SSPT, the algorithm is evaluated with three different machine learning problems: recommendation, regression, and classification. Experiments with well-known data sets show that the proposed algorithm can outperform previous hyperparameter tuning efforts by human experts. Results also show that SSPT converges significantly faster and presents at least similar accuracy when compared with the previous double-pass version of the SPT algorithm.
2021
Authors
Abreu, PH; Rodrigues, PP; Fernández, A; Gama, J;
Publication
IDA
Abstract
2020
Authors
Paraíso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;
Publication
2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020)
Abstract
This paper uses non-traditional data, from a MicroFinance Institution (MFI), in a Credit Scoring loan classification problem and addresses a common problem in emerging markets of the lack of a verifiable customers' credit history. We perform a set of experiments to define a baseline model and prove the relevance of node embedding features, in credit scoring models, using a real world dataset.
2021
Authors
Paulos, JP; Fidalgo, JN; Gama, J;
Publication
2021 IEEE MADRID POWERTECH
Abstract
The present work aims to compare several load disaggregation methods. While the supervised alternative was found to be the most competent, the semi-supervised is proved to be close in terms of potential, while the unsupervised alternative seems insufficient. By the same token, the tests with long-lasting data prove beneficial to confirm the long-term performance since no significant loss of performance is noticed with the scalar of the time-horizon. Finally, the patchwork of new parametrization and methodology fine-tuning also proves interesting for improving global performance in several methods.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.