Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2007

Knowledge discovery from data streams

Autores
Gama, J; Aguilar Ruiz, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

2007

Change detection in learning histograms from data streams

Autores
Sebastiao, R; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.

2007

Predictive learning in sensor networks

Autores
Gama, J; Pedersen, RU;

Publicação
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
Sensor networks act in dynamic environments with distributed sources of continuous data and computing with resource constraints. Learning in these environments is faced with new challenges: the need to continuously maintain a decision model consistent with the most recent data. Desirable properties of learning algorithms include: the ability to maintain an any time model; the ability to modify the decision model whenever new information is available; the ability to forget outdated information; and the ability to detect and react to changes in the underlying process generating data, monitoring the learning process and managing the trade-off between the cost of updating a model and the benefits in performance gains. In this chapter we illustrate these ideas in two learning scenarios - centralized and distributed - and present illustrative algorithms for these contexts. © 2007 Springer-Verlag Berlin Heidelberg.

2007

Incremental discretization, application to data with concept drift

Autores
Pinto, C; Gama, J;

Publicação
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
In this paper we present a method for incremental discretization able to be adapted to gradual changes in the target concept. The proposed method is based on the Partition incremental Discretization (PiD for short). The algorithm divides the discretization task in two layers. The first layer receives the sequence of input data and retains some statistics of the data using more intervals than required. The second layer computes the final discretization, based in the statistics stored by the first layer. The method is able to process streaming examples in a single scan, in constant time and space even for infinite sequences of examples. In dynamic environments the target concept can gradually change over time. Past examples may not reflect the actual status of the problem. To accommodate concept drift we use an exponential decay that smoothly reduces the importance of older examples. Experimental evaluation on a benchmark problem for drift environments, clearly illustrates the benefits of the weighting examples technique.

2007

An overview on learning from data streams - Preface

Autores
Gama, J; Rodrigues, P; Aguilar Ruiz, J;

Publicação
NEW GENERATION COMPUTING

Abstract

2007

Semi-fuzzy splitting in Online Divisive-Agglomerative Clustering

Autores
Rodrigues, PP; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach.

  • 462
  • 516