Publications

Publications by AI

2021

Generalised Partial Association in Causal Rules Discovery

Authors
Nogueira, AR; Ferreira, C; Gama, J; Pinto, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
One of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal discovery algorithms have appeared that do not fall into this category. In this paper, we present a new algorithm that explores global causal association rules with Uncertainty Coefficient. Our algorithm, CRPA-UC, is a global structure discovery approach that combines the advantages of association mining with causal discovery and can be applied to binary and non-binary discrete data. This approach was compared to the PC algorithm using several well-known data sets, using several metrics.

CloseRead Abstract

2021

Chebyshev approaches for imbalanced data streams regression models

Authors
Aminian, E; Ribeiro, RP; Gama, J;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
In recent years data stream mining and learning from imbalanced data have been active research areas. Even though solutions exist to tackle these two problems, most of them are not designed to handle challenges inherited from both problems. As far as we are aware, the few approaches in the area of learning from imbalanced data streams fall in the context of classification, and no efforts on the regression domain have been reported yet. This paper proposes a technique that uses sampling strategies to cope with imbalanced data streams in a regression setting, where the most important cases have rare and extreme target values. Specifically, we employ under-sampling and over-sampling strategies that resort to Chebyshev's inequality value as a heuristic to disclose the type of incoming cases (i.e. frequent or rare). We have evaluated our proposal by applying it in the training of models by four well-known regression algorithms over fourteen benchmark data sets. We conducted a series of experiments with different setups on both synthetic and real-world data sets. The experimental results confirm our approach's effectiveness by showing the models' superior performance trained by each of the sampling strategies compared with their baseline pairs.

CloseRead Abstract

2021

Data stream analysis: Foundations, major tasks and tools

Authors
Bahri, M; Bifet, A; Gama, J; Gomes, HM; Maniu, S;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The significant growth of interconnected Internet-of-Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state-of-the-art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

CloseRead Abstract

2021

Modelling Voting Behaviour During a General Election Campaign Using Dynamic Bayesian Networks

Authors
Costa, P; Nogueira, AR; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
This work aims to develop a Machine Learning framework to predict voting behaviour. Data resulted from longitudinally collected variables during the Portuguese 2019 general election campaign. Naive Bayes (NB), and Tree Augmented Naive Bayes (TAN) and three different expert models using Dynamic Bayesian Networks (DBN) predict voting behaviour systematically for each moment in time considered using past information. Even though the differences found in some performance comparisons are not statistically significant, TAN and NB outperformed DBN experts' models. The learned models outperformed one of the experts' models when predicting abstention and two when predicting right-wing parties vote. Specifically, for the right-wing parties vote, TAN and NB presented satisfactory accuracy, while the experts' models were below 50% in the third evaluation moment.

CloseRead Abstract

2021

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Authors
Bécue, A; Praça, I; Gama, J;

Publication
Artif. Intell. Rev.

Abstract

2021

Terrace Vineyards Detection from UAV Imagery Using Machine Learning: A Preliminary Approach

Authors
Figueiredo, N; Padua, L; Sousa, JJ; Sousa, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
Alto Douro Wine Region is located in the Northeast of Portugal and is classified by UNESCO as a World Heritage Site. Snaked by the Douro River, the region has been producing wines for over 2000 years, with the world-famous Porto wine standing out. The vineyards, in that region, are built in a territory marked by steep slopes and the almost inexistence of flat land and water. The vineyards that cover the great slopes rise from the Douro River and form an immense terraced staircase. All these ingredients combined make the right key for exploring precision agriculture techniques. In this study, a preliminary approach allowing to perform terrace vineyards identification is presented. This is a key-enabling task towards the achievement of important goals such as production estimation and multi-temporal crop evaluation. The proposed methodology consists in the use of Convolutional Neural Networks (CNNs) to classify and segment the terrace vineyards, considering a high-resolution dataset acquired with remote sensing sensors mounted in unmanned aerial vehicles (UAVs).

CloseRead Abstract