2024
Authors
Moya, AR; Veloso, B; Gama, J; Ventura, S;
Publication
DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.
2024
Authors
Salazar, T; Gama, J; Araújo, H; Abreu, PH;
Publication
CoRR
Abstract
2024
Authors
Gama, J; Ribeiro, RP; Mastelini, SM; Davari, N; Veloso, B;
Publication
CoRR
Abstract
2024
Authors
Andrade, T; Gama, J;
Publication
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024
Abstract
Various relevant aspects of our lives relate to the places we visit and our daily activities. The movement of individuals between regular places, such as work, school, or other important personal locations is getting increasing attention due to the pervasiveness of geolocation devices and the amount of data they generate. This work presents an approach for location prediction using a probabilistic model and data mining techniques over mobility data streams. We evaluate the method over 5 real-world datasets. The results show the usefulness of the proposal in comparison with other-well-known approaches.
2024
Authors
Vieira, PC; Montrezol, JP; Vieira, JT; Gama, J;
Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT II, IDA 2024
Abstract
We present S+t-SNE, an adaptation of the t-SNE algorithm designed to handle infinite data streams. The core idea behind S+t-SNE is to update the t-SNE embedding incrementally as new data arrives, ensuring scalability and adaptability to handle streaming scenarios. By selecting the most important points at each step, the algorithm ensures scalability while keeping informative visualisations. By employing a blind method for drift management, the algorithm adjusts the embedding space, which facilitates the visualisation of evolving data dynamics. Our experimental evaluations demonstrate the effectiveness and efficiency of S+t-SNE, whilst highlighting its ability to capture patterns in a streaming scenario. We hope our approach offers researchers and practitioners a real-time tool for understanding and interpreting high-dimensional data.
2024
Authors
Ukil, A; Majumdar, A; Jara, AJ; Gama, J;
Publication
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024
Abstract
Deep neural networks (DNN) are used to analyze images, videos, signals and texts require a lot of memory and intensive computing power. For example, the very successful GPT4 model contains more than a few trillion parameters. Although such models are of great impact, but they have been used very little in real-world applications, including industrial Internet of Things, self-driving cars, algorithmic health monitoring for use in limited mobile or edge devices. The requirement to run large models on resource-constrained peripherals has led to significant research interest in compressing DNN models. Signal processing researchers have traditionally advocated data (image/video/audio) compression, and by the way, many of these techniques are used for DNN compression. For example, source coding is a basic technique that has been widely used to compress various DNN models. In this paper, we present our views on the use of signal processing methods for DNN model compression.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.