2022
Autores
Santos, MS; Abreu, PH; Japkowicz, N; Fernandez, A; Soares, C; Wilk, S; Santos, J;
Publicação
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
Current research on imbalanced data recognises that class imbalance is aggravated by other data intrinsic characteristics, among which class overlap stands out as one of the most harmful. The combination of these two problems creates a new and difficult scenario for classification tasks and has been discussed in several research works over the past two decades. In this paper, we argue that despite some insightful information can be derived from related research, the joint-effect of class overlap and imbalance is still not fully understood, and advocate for the need to move towards a unified view of the class overlap problem in imbalanced domains. To that end, we start by performing a thorough analysis of existing literature on the joint-effect of class imbalance and overlap, elaborating on important details left undiscussed on the original papers, namely the impact of data domains with different characteristics and the behaviour of classifiers with distinct learning biases. This leads to the hypothesis that class overlap comprises multiple representations, which are important to accurately measure and analyse in order to provide a full characterisation of the problem. Accordingly, we devise two novel taxonomies, one for class overlap measures and the other for class overlap-based approaches, both resonating with the distinct representations of class overlap identified. This paper therefore presents a global and unique view on the joint-effect of class imbalance and overlap, from precursor work to recent developments in the field. It meticulously discusses some concepts taken as implicit in previous research, explores new perspectives in light of the limitations found, and presents new ideas that will hopefully inspire researchers to move towards a unified view on the problem and the development of suitable strategies for imbalanced and overlapped domains.
2022
Autores
Hetlerovic, D; Popelínský, L; Brazdil, P; Soares, C; Freitas, F;
Publicação
Advances in Intelligent Data Analysis XX - 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20-22, 2022, Proceedings
Abstract
2022
Autores
Strecht, P; Mendes Moreira, J; Soares, C;
Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II
Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.
2022
Autores
Cerqueira, V; Torgo, L; Soares, C;
Publicação
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
Abstract
Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, evidence was shown that these approaches systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The R code to reproduce all of our experiments is available at https://github.com/vcerqueira/MLforForecasting.
2022
Autores
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;
Publicação
Cognitive Technologies
Abstract
2022
Autores
Baghcheband, H; Soares, C; Reis, LP;
Publicação
2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT
Abstract
The amount of data produced by distributed devices, such as smart devices and the IoT, is increasing continuously. The cost of transmitting data and also distributed computing power raise interest in distributed data mining (DDM). However, in a pure DDM scenario, data availability may not be enough to generate reliable models in a distributed environment. So, the ability to exchange data efficiently and effectively will become a crucial component of DDM. In this paper, we propose the concept of the Machine Learning Data Market (MLDM), a framework for the exchange of data among autonomous agents. We consider a set of learning agents in a cooperative distributed ML, where agents negotiate data to improve the models they use locally. In the proposed data market, the system's predictive accuracy is investigated, as well as the economic value of data. The question addressed in this paper is: How data exchange among the agents will improve the accuracy of the learning model. Agent budget is defined as a limitation of negotiation. We defined a multi-agent system with negotiation and assessed it against the multi-agent system baseline and the single-agent system. The proposed framework is analyzed based on the different sizes of batch data collected over time to find out how this changes the effect of the negotiation on the accuracy of the model. The results indicate that even simple negotiation among agents increases their learning accuracy.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.