2013
Autores
Rodrigues, PP; Pechenizkiy, M; Gama, J; Correia, RC; Liu, J; Traina, A; Lucas, P; Soda, P;
Publicação
Proceedings of CBMS 2013 - 26th IEEE International Symposium on Computer-Based Medical Systems
Abstract
2015
Autores
Kosina, P; Gama, J;
Publicação
DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.
2017
Autores
Sebastião, R; Gama, J; Mendonça, T;
Publicação
Int. J. Data Sci. Anal.
Abstract
The remarkable number of real applications under
dynamic scenarios is driving a novel ability to generate and
gatherinformation.Nowadays,amassiveamountofinforma-
tion is generated at a high-speed rate, known as data streams.
Moreover, data are collected under evolving environments.
Due to memory restrictions, data must be promptly processed
and discarded immediately. Therefore, dealing with evolving
data streams raises two main questions: (i) how to remember
discarded data? and (ii) how to forget outdated data? To main-
tain an updated representation of the time-evolving data, this
paper proposes fading histograms. Regarding the dynamics
of nature, changes in data are detected through a windowing
scheme that compares data distributions computed by the
fading histograms: the adaptive cumulative windows model
(ACWM). The online monitoring of the distance between
data distributions is evaluated using a dissimilarity measure
based on the asymmetry of the Kullback–Leibler divergence.The experimental results support the ability of fading his-
tograms in providing an updated representation of data. Such
property works in favor of detecting distribution changes
with smaller detection delay time when compared with stan-
dard histograms. With respect to the detection of concept
changes, the ACWM is compared with 3 known algorithms
taken from the literature, using artificial data and using pub-
lic data sets, presenting better results. Furthermore, we the
proposed method was extended for multidimensional and the
experiments performed show the ability of the ACWM for
detecting distribution changes in these settings.
2014
Autores
Vallim, RMM; Andrade, JA; de Mello, RF; de Carvalho, ACPLF; Gama, J;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
The ability to detect changes in the data distribution is an important issue in Data Stream mining. Detecting changes in data distribution allows the adaptation of a previously learned model to accommodate the most recent data and, therefore, improve its prediction capability. This paper proposes a framework for non-supervised automatic change detection in Data Streams called M-DBScan. This framework is composed of a density-based clustering step followed by a novelty detection procedure based on entropy level measures. This work uses two different types of entropy measures, where one considers the spatial distribution of data while the other models temporal relations between observations in the stream. The performance of the method is assessed in a set of experiments comparing M-DBScan with a proximity-based approach. Experimental results provide important insight on how to design change detection mechanisms for streams.
2016
Autores
Colonna, JG; Gama, J; Nakamura, EF;
Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016
Abstract
In this work, we introduce a more appropriate (or alternative) approach to evaluate the performance and the generalization capabilities of a framework for automatic anuran call recognition. We show that, by using the common k-folds Cross-Validation (k-CV) procedure to evaluate the expected error in a syllable-based recognition system the recognition accuracy is overestimated. To overcome this problem, and to provide a fair evaluation, we propose a new CV procedure in which the specimen information is considered during the split step of the k-CV. Therefore, we performed a k-CV by specimens (or individuals) showing that the accuracy of the system decrease considerably. By introducing the specimen information, we are able to answer a more fundamental question: Given a set of syllables that belongs to a specific group of individuals, can we recognize new specimens of the same species? In this article, we go deeper into the reviews and the experimental evaluations to answer this question.
2016
Autores
Almeida, V; Gama, J;
Publicação
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015
Abstract
In this paper we propose a new methodology for evaluating prediction intervals (PIs). Typically, PIs are evaluated with reference to confidence values. However, other metrics should be considered, since high values are associated to too wide intervals that convey little information and are of no use for decision-making. We propose to compare the error distribution (predictions out of the interval) and the maximum mean absolute error (MAE) allowed by the confidence limits. Along this paper PIs based on neural networks for short-term load forecast are compared using two different strategies: (1) dual perturb and combine (DPC) algorithm and (2) conformal prediction. We demonstrated that depending on the real scenario (e.g., time of day) different algorithms perform better. The main contribution is the identification of high uncertainty levels in forecast that can guide the decision-makers to avoid the selection of risky actions under uncertain conditions. Small errors mean that decisions can be made more confidently with less chance of confronting a future unexpected condition.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.