Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2008

Learning Model Trees from Data Streams

Autores
Ikonotnovska, E; Gama, J;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
In this paper we propose a fast and incremental algorithm for learning model trees from data streams (FIMT) for regression problems. The algorithm is incremental, works online, processes examples once at the speed they arrive, and maintains an any-time regression model. The leaves contain linear-models trained online from the examples that fall at that leaf, a process with low complexity. The use of linear models in the leaves increases its any-time global performance. FIMT is able to obtain competitive accuracy with batch learners even for medium size datasets, but with better training time in an order of magnitude. We study the properties of FIMT over several artificial and real datasets and evaluate its sensitivity on the order of examples and the noise level.

FecharLer Abstract

2008

Improving the performance of an incremental algorithm driven by error margins

Autores
del Campo Avila, J; Ramos Jimenez, G; Gama, J; Morales Bueno, R;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.

FecharLer Abstract

2008

Knowledge discovery from data streams

Autores
Gama, J; Aguilar Ruiz, J; Klinkenberg, R;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract

2008

Hierarchical clustering of time-series data streams

Autores
Rodrigues, PP; Gama, J; Pedroso, JP;

Publicação
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Abstract
This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data, using a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams. The system also uses a merge operator that reaggregates a previously split node in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters, assuming that in stationary environments, expanding the structure leads to a decrease in the diameters of the clusters. The system is designed to process thousands of data streams that flow at a high rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting a competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.

FecharLer Abstract

2008

Improvement in Wind Power Forecasting Based on Information Entropy-Related Concepts

Autores
Bessa, R; Miranda, V; Gama, J;

Publicação
2008 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, VOLS 1-11

Abstract
This paper reports new results in adopting entropy concepts to the training of mappers such as neural networks to perform wind power prediction as a function of wind characteristics (mainly speed and direction) in wind parks connected to a power grid. It also addresses the differences relevant to power system operation between off-line and on-line training of neural networks. Real case examples are presented.

FecharLer Abstract

2008

Wind Power Forecasting With Entropy-Based Criteria Algorithms

Autores
Bessa, R; Miranda, V; Gama, J;

Publicação
2008 10TH INTERNATIONAL CONFERENCE ON PROBABILISTIC METHODS APPLIED TO POWER SYSTEMS

Abstract
This paper reports new results in adopting entropy concepts to the training of mappers such as neural networks to perform wind power prediction as a function of wind characteristics (mainly speed and direction) in wind parks connected to a power grid. Renyi's Entropy is combined with a Parzen Windows estimation of the error pdf to form the basis of three criteria (MEE, MCC and MEEF) under which neural networks are trained. The results are favourably compared with the traditional minimum square error (MSE) criterion. Real case examples for two distinct wind parks are presented.

FecharLer Abstract