2009
Authors
Rodrigues, PP; Gama, J;
Publication
INTELLIGENT DATA ANALYSIS
Abstract
Sensors distributed all around electrical-power distribution networks produce streams of data at high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables ( sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. In this work we analyze the most relevant data mining problems and issues: continuously learning clusters and predictive models, model adaptation in large domains, and change detection and adaptation. The goal is to continuously maintain a clustering model, defining profiles, and a predictive model able to incorporate new information at the speed data arrives, detecting changes and adapting the decision models to the most recent information. We present experimental results in a large real-world scenario, illustrating the advantages of the continuous learning and its competitiveness against Wavelets based prediction. We also propose a light electrical load visualization system which enhances the ability to inspect forecast results in mobile devices.
2009
Authors
Sebastiao, R; Rodrigues, PP; Gama, J;
Publication
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009)
Abstract
This paper addresses the space-time change detection problem in climate data over the Iberian Peninsula using a 50 years dataset. The data were analyzed concerning the temporal and geographical information, using the following methodology: information about space-time drifts in climate data was obtained by applying a change detection algorithm on all the temporal data available for each physical location considered in this study; the performance and the robustness of this algorithm were then assessed by the McNemar nonparametric statistical test on cluster structures; geographical correlations were inferred using visualization tools and graphical representations of data. Most of the space-temporal drifts detected by the algorithm were confirmed by the results of the McNemar test and are in accordance with visual and graphical representations, supporting the advantage of using inter-disciplinary methods. This analysis also shows that there are locations which do not reveal any change along all the observed years.
2009
Authors
Omitaomu, OA; Ganguly, AR; Vatsavai, RR; Gama, J; Chawla, NV; Gaber, MM;
Publication
KDD Workshop on Knowledge Discovery from Sensor Data
Abstract
2009
Authors
Rodrigues, PP; Gama, J; Lopes, L;
Publication
Intelligent Techniques for Warehousing and Mining Sensor Network Data
Abstract
2009
Authors
Huang, R; Yang, Q; Pei, J; Gama, J; Meng, X; Li, X;
Publication
Lecture Notes in Computer Science
Abstract
2009
Authors
Marques de Sa, JPM; Gama, J; Sebastiao, R; Alexandre, LA;
Publication
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS
Abstract
Binary decision trees based on univariate splits have traditionally employed so-called impurity functions as a means of searching for the best node splits. Such functions use estimates of the class distributions. In the present paper we introduce a new concept to binary tree design: instead of working with the class distributions of the data we work directly with the distribution of the errors originated by the node splits. Concretely, we search for the best splits using a minimum entropy-of-error (MEE) strategy. This strategy has recently been applied in other areas (e.g. regression, clustering, blind source separation, neural network training) with success. We show that MEE trees are capable of producing good results with often simpler trees, have interesting generalization properties and in the many experiments we have performed they could be used without pruning.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.