Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2013

Preface

Autores
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, A; Lucas, P; Soda, P;

Publicação
Proceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems

Abstract

2015

Very fast decision rules for classification in data streams

Autores
Kosina, P; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

FecharLer Abstract

2017

Fading histograms in detecting distribution and concept changes

Autores
Sebastião, R; Gama, J; Mendonça, T;

Publicação
Int. J. Data Sci. Anal.

Abstract
The remarkable number of real applications under dynamic scenarios is driving a novel ability to generate and gatherinformation.Nowadays,amassiveamountofinforma- tion is generated at a high-speed rate, known as data streams. Moreover, data are collected under evolving environments. Due to memory restrictions, data must be promptly processed and discarded immediately. Therefore, dealing with evolving data streams raises two main questions: (i) how to remember discarded data? and (ii) how to forget outdated data? To main- tain an updated representation of the time-evolving data, this paper proposes fading histograms. Regarding the dynamics of nature, changes in data are detected through a windowing scheme that compares data distributions computed by the fading histograms: the adaptive cumulative windows model (ACWM). The online monitoring of the distance between data distributions is evaluated using a dissimilarity measure based on the asymmetry of the Kullback–Leibler divergence.The experimental results support the ability of fading his- tograms in providing an updated representation of data. Such property works in favor of detecting distribution changes with smaller detection delay time when compared with stan- dard histograms. With respect to the detection of concept changes, the ACWM is compared with 3 known algorithms taken from the literature, using artificial data and using pub- lic data sets, presenting better results. Furthermore, we the proposed method was extended for multidimensional and the experiments performed show the ability of the ACWM for detecting distribution changes in these settings.

FecharLer Abstract

2014

Unsupervised density-based behavior change detection in data streams

Autores
Vallim, RMM; Andrade, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.

FecharLer Abstract

2016

How to Correctly Evaluate an Automatic Bioacoustics Classification Method

Autores
Colonna, JG; Gama, J; Nakamura, EF;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.

FecharLer Abstract

2016

Measures for Combining Prediction Intervals Uncertainty and Reliability in Forecasting

Autores
Almeida, V; Gama, J;

Publicação
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015

Abstract
In this paper we propose a new methodology for evaluating prediction intervals (PIs). Typically, PIs are evaluated with reference to confidence values. However, other metrics should be considered, since high values are associated to too wide intervals that convey little information and are of no use for decision-making. We propose to compare the error distribution (predictions out of the interval) and the maximum mean absolute error (MAE) allowed by the confidence limits. Along this paper PIs based on neural networks for short-term load forecast are compared using two different strategies: (1) dual perturb and combine (DPC) algorithm and (2) conformal prediction. We demonstrated that depending on the real scenario (e.g., time of day) different algorithms perform better. The main contribution is the identification of high uncertainty levels in forecast that can guide the decision-makers to avoid the selection of risky actions under uncertain conditions. Small errors mean that decisions can be made more confidently with less chance of confronting a future unexpected condition.

FecharLer Abstract