2007
Autores
Rodrigues, PP; Gama, J;
Publicação
Modulad
Abstract
2012
Autores
Rodrigues, PP; Gama, J;
Publicação
CEUR Workshop Proceedings
Abstract
Smart grids consist of millions of automated electronic meters that will be installed in electricity distribution networks and connected to servers that will manage grid supervision, billing and customer services. World sustainability regarding energy management will definitely rely on such grids, so smart grids need also to be sustainable themselves. This sustainability depends on several research problems that emerge from this new setting (from power balance to energy markets) requiring new approaches for knowledge discovery and decision support. This paper presents a holistic distributed stream clustering view of possible solutions for those problems, supported by previous research in related domains. The approach is based on two orthogonal clustering algorithms, combined for a holistic clustering of the grid. Experimental results are included to illustrate the benefits of each algorithm, while the proposal is discussed in terms of application to smart grid problems. This holistic approach could be used to help solving some of the smart grid intelligent layer research problems, thus improving global sustainability.
2011
Autores
Gama, J; Rodrigues, PP; Lopes, L;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Nowadays applications produce infinite streams of data distributed across wide sensor networks. In this work we study the problem of continuously maintain a cluster structure over the data points generated by the entire network. Usual techniques operate by forwarding and concentrating the entire data in a central server, processing it as a multivariate stream. In this paper, we propose DGClust, a new distributed algorithm which reduces both the dimensionality and the communication burdens, by allowing each local sensor to keep an online discretization of its data stream, which operates with constant update time and (almost) fixed space. Each new data point triggers a cell in this univariate grid, reflecting the current state of the data stream at the local site. Whenever a local site changes its state, it notifies the central server about the new state it is in. This way, at each point in time, the central site has the global multivariate state of the entire network. To avoid monitoring all possible states, which is exponential in the number of sensors, the central site keeps a small list of counters of the most frequent global states. Finally, a simple adaptive partitional clustering algorithm is applied to the frequent states central points in order to provide an anytime definition of the clusters centers. The approach is evaluated in the context of distributed sensor networks, focusing on three outcomes: loss to real centroids, communication prevention, and processing reduction. The experimental work on synthetic data supports our proposal, presenting robustness to a high number of sensors, and the application to real data from physiological sensors exposes the aforementioned advantages of the system.
2009
Autores
Rodrigues, PP; Gama, J;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Sensors distributed all around electrical-power distribution networks produce streams of data at high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables ( sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. In this work we analyze the most relevant data mining problems and issues: continuously learning clusters and predictive models, model adaptation in large domains, and change detection and adaptation. The goal is to continuously maintain a clustering model, defining profiles, and a predictive model able to incorporate new information at the speed data arrives, detecting changes and adapting the decision models to the most recent information. We present experimental results in a large real-world scenario, illustrating the advantages of the continuous learning and its competitiveness against Wavelets based prediction. We also propose a light electrical load visualization system which enhances the ability to inspect forecast results in mobile devices.
2008
Autores
Rodrigues, PP; Gama, J;
Publicação
ECAI 2008, PROCEEDINGS
Abstract
Online learning algorithms which address fast data streams should process examples at the rate they arrive, using a single scan of data and fixed memory, maintaining a decision model at any time and being able to adapt the model to the most recent data. These features yield the necessity of using approximate models. One problem that usually arises with approximate models is the definition of a minimum number of observations necessary to assure convergence, which implies a high risk since the system may have to decide based only on a small subset of the entire data. One approach is to apply techniques based on the Hoeffding bound to enforce decisions with a confidence level. In divisive clustering of time series, the goal is to find clusters of similar time series over time. In online approaches there are two decisions to make: when to split and how to assign variables to new clusters. We can define a confidence level to both the decision of splitting and the assignment of data variables to new clusters. Previous works have already addressed confident decisions on the moment of split. Our proposal is to include a confidence level to the assignment process. When a split point is reported, creating two new clusters, we can directly assign points which are confidently closer to one cluster than the other, having a different strategy for those variables which do not satisfy the confidence level. In this paper we propose to assign the unsure variables to a third cluster. Experimental evaluation is presented in the context of a recently proposed hierarchical algorithm, assessing the advantages of the proposal, revealing also advantages on memory usage reduction and processing speed. Although this proposal is evaluated under the scope of an existent method, it can be generalized to any divisive procedure.
2009
Autores
Sebastiao, R; Rodrigues, PP; Gama, J;
Publicação
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009)
Abstract
This paper addresses the space-time change detection problem in climate data over the Iberian Peninsula using a 50 years dataset. The data were analyzed concerning the temporal and geographical information, using the following methodology: information about space-time drifts in climate data was obtained by applying a change detection algorithm on all the temporal data available for each physical location considered in this study; the performance and the robustness of this algorithm were then assessed by the McNemar nonparametric statistical test on cluster structures; geographical correlations were inferred using visualization tools and graphical representations of data. Most of the space-temporal drifts detected by the algorithm were confirmed by the results of the McNemar test and are in accordance with visual and graphical representations, supporting the advantage of using inter-disciplinary methods. This analysis also shows that there are locations which do not reveal any change along all the observed years.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.