Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Gama

2011

L2GClust: local-to-global clustering of stream sources

Autores
Rodrigues, PP; Gama, J; Araújo, J; Lopes, LMB;

Publicação
Proceedings of the 2011 ACM Symposium on Applied Computing (SAC), TaiChung, Taiwan, March 21 - 24, 2011

Abstract
In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce is an important problem that gives insights on the phenomenon being monitored by such networks. However, if these techniques require data to be gathered centrally, communication and storage requirements are often unbounded. The goal of this paper is to assess the feasibility of computing local clustering at each node, using only neighbors' centroids, as an approximation of the global clustering computed by a centralized process. A local algorithm is proposed to perform clustering of sensors based on the moving average of each node's data over time: the moving average of each node is approximated using memory-less fading average; clustering is based on the furthest point algorithm applied to the centroids computed by the node's direct neighbors. The algorithm was evaluated on a state-of-the-art sensor network simulator, measuring the agreement between local and global clustering. Experimental work on synthetic data with spherical Gaussian clusters is consistently analyzed for different network size, number of clusters and cluster overlapping. Results show a high level of agreement between each node's clustering definitions and the global clustering definition, with special emphasis on separability agreement. Overall, local approaches are able to keep a good approximation of the global clustering, improving privacy among nodes, and decreasing communication and computation load in the network. Hence, the basic requirements for distributed clustering of streaming data sensors recommend that clustering on these settings should be performed locally. © 2011 ACM.

2011

Data Streams

Autores
Gama, J; Rodrigues, PP;

Publicação
Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract

2011

Learning from Data Streams

Autores
Gama, J; Rodrigues, PP;

Publicação
Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract

2009

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data, Paris, France, June 28, 2009

Autores
Omitaomu, OA; Ganguly, AR; Vatsavai, RR; Gama, J; Chawla, NV; Gaber, MM;

Publicação
KDD Workshop on Knowledge Discovery from Sensor Data

Abstract

2012

Estimating reliability for assessing and correcting individual streaming predictions

Autores
Rodrigues, PPE; Bosnic, Z; Gama, J; Kononenko, I;

Publicação
Reliable Knowledge Discovery

Abstract
Several predictive systems are nowadays vital for operations and decision support. The quality of these systems is most of the time defined by their average accuracy which has low or no information at all about the estimated error of each individual prediction. In these cases, users should be allowed to associate a measure of reliability to each prediction. However, with the advent of data streams, batch state-of-the-art reliability estimates need to be redefined. In this chapter we adapt and evaluate five empirical measures for online reliability estimation of individual predictions: similarity-based (k-NN) error, local sensitivity (bias and variance) and online bagging predictions (bias and variance). Evaluation is performed with a neural network base model on two different problems, with results showing that online bagging and k-NN estimates are consistently correlated with the error of the base model. Furthermore, we propose an approach for correcting individual predictions based on the CNK reliability estimate. Evaluation is done on a real-world problem (prediction of the electricity load for a selected European geographical region), using two different regression models: neural network and the k nearest neighbors algorithm. Comparison is performed with corrections based on the Kalman filter. The results show that our method performs better than the Kalman filter, significantly improving the original predictions to more accurate values.

2009

Knowledge discovery for sensor network comprehension

Autores
Rodrigues, PP; Gama, J; Lopes, L;

Publicação
Intelligent Techniques for Warehousing and Mining Sensor Network Data

Abstract

  • 56
  • 93