Publications

Publications by LIAAD

2007

Predictive learning in sensor networks

Authors
Gama, J; Pedersen, RU;

Publication
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
Sensor networks act in dynamic environments with distributed sources of continuous data and computing with resource constraints. Learning in these environments is faced with new challenges: the need to continuously maintain a decision model consistent with the most recent data. Desirable properties of learning algorithms include: the ability to maintain an any time model; the ability to modify the decision model whenever new information is available; the ability to forget outdated information; and the ability to detect and react to changes in the underlying process generating data, monitoring the learning process and managing the trade-off between the cost of updating a model and the benefits in performance gains. In this chapter we illustrate these ideas in two learning scenarios - centralized and distributed - and present illustrative algorithms for these contexts. © 2007 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2007

Incremental discretization, application to data with concept drift

Authors
Pinto, C; Gama, J;

Publication
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
In this paper we present a method for incremental discretization able to be adapted to gradual changes in the target concept. The proposed method is based on the Partition incremental Discretization (PiD for short). The algorithm divides the discretization task in two layers. The first layer receives the sequence of input data and retains some statistics of the data using more intervals than required. The second layer computes the final discretization, based in the statistics stored by the first layer. The method is able to process streaming examples in a single scan, in constant time and space even for infinite sequences of examples. In dynamic environments the target concept can gradually change over time. Past examples may not reflect the actual status of the problem. To accommodate concept drift we use an exponential decay that smoothly reduces the importance of older examples. Experimental evaluation on a benchmark problem for drift environments, clearly illustrates the benefits of the weighting examples technique.

CloseRead Abstract

2007

An overview on learning from data streams - Preface

Authors
Gama, J; Rodrigues, P; Aguilar Ruiz, J;

Publication
NEW GENERATION COMPUTING

Abstract

2007

Semi-fuzzy splitting in Online Divisive-Agglomerative Clustering

Authors
Rodrigues, PP; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach.

CloseRead Abstract

2007

Stream-based electricity load forecast

Authors
Gama, J; Rodrigues, PP;

Publication
Knowledge Discovery in Databases: PKDD 2007, Proceedings

Abstract
Sensors distributed all around electrical-power distribution networks produce strean is of data it high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables (sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. We propose an architecture based on an online clustering algorithm where each cluster (group of sensors with high correlation) contains a neural-network based predictive model. The goal is to maintain in real-time a clustering model and a predictive model able to incorporate new information at the speed data arrives. detecting changes and adapting the decision models to the most recent information. We present results illustrating the advantages of the proposed architecture, on several temporal horizons, and its competitiveness with another predictive strategy.

CloseRead Abstract

2007

Clustering techniques in sensor networks

Authors
Rodrigues, PP; Gama, J;

Publication
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
The traditional knowledge discovery environment, where data and processing units are centralized in controlled laboratories and servers, is now completely transformed into a web of sensorial devices, some of them with local processing ability. This scenario represents a new knowledge-extraction environment, possibly not completely observable, that is much less controlled by both the human user and a common centralized control process. © 2007 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract