Publications

Publications by Luís Torgo

2001

A study on end-cut preference in least squares regression trees

Authors
Torgo, L;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Regression trees are models developed to deal with multiple regression data analysis problems. These models fit constants to a set of axes-parallel partitions of the input space defined by the predictor variables. These partitions are described by a hierarchy of logical tests on the input variables of the problem. Several authors have remarked that the preference criteria used to select these tests have a clear preference for what is known as end-cut splits. These splits lead to branches with a few training cases, which is usually considered as counter-intuitive by the domain experts. In this paper we describe an empirical study of the effect of this end-cut preference on a large set of regression domains. The results of this study, carried out for the particular case of least squares regression trees, contradict the prior belief that these type of tests should be avoided. As a consequence of these results, we present a new method to handle these tests that we have empirically shown to have better predictive accuracy than the alternatives that are usually considered in tree-based models. © Springer-Verlag Berlin Heidelberg 2001.

CloseRead Abstract

2011

2D-interval predictions for time series

Authors
Torgo, L; Ohashi, O;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract
Research on time series forecasting is mostly focused on point predictions - models are obtained to estimate the expected value of the target variable for a certain point in future. However, for several relevant applications this type of forecasts has limited utility (e.g. costumer wallet value estimation, wind and electricity power production, control of water quality, etc.). For these domains it is frequently more important to be able to forecast a range of plausible future values of the target variable. A typical example is wind power production, where it is of high relevance to predict the future wind variability in order to ensure that supply and demand are balanced. This type of predictions will allow timely actions to be taken in order to cope with the expected values of the target variable on a certain future time horizon. In this paper we study this type of predictions - the prediction of a range of expected values for a future time interval. We describe some possible approaches to this task and propose an alternative procedure that our extensive experiments on both artificial and real world domains show to have clear advantages. Copyright 2011 ACM.

CloseRead Abstract

2005

Regression Error Characteristic Surfaces

Authors
Torgo, L;

Publication
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Abstract
This paper presents a generalization of Regression Error Characteristic (REC) curves. REC curves describe the cumulative distribution function of the prediction error of models and can be seen as a generalization of ROC curves to regression problems. REC curves provide useful information for analyzing the performance of models, particularly when compared to error statistics like for instance the Mean Squared Error. In this paper we present Regression Error Characteristic (REC) surfaces that introduce a further degree of detail by plotting the cumulative distribution function of the errors across the distribution of the target variable, i.e. the joint cumulative distribution function of the errors and the target variable. This provides a more detailed analysis of the performance of models when compared to REC curves. This extra detail is particularly relevant in applications with non-uniform error costs, where it is important to study the performance of models for specific ranges of the target variable. In this paper we present the notion of REC surfaces, describe how to use them to compare the performance of models, and illustrate their use with an important practical class of applications: the prediction of rare extreme values. Copyright 2005 ACM.

CloseRead Abstract

2010

Predictive models for forecasting hourly urban water demand

Authors
Herrera, M; Torgo, L; Izquierdo, J; Perez Garcia, R;

Publication
JOURNAL OF HYDROLOGY

Abstract
One of the goals of efficient water supply management is the regular supply of clean water at the pressure required by consumers. In this context, predicting water consumption in urban areas is of key importance for water supply management. This prediction is also relevant in processes for reviewing prices: as well as for operational management of a water network. In this paper, we describe and compare a series of predictive models for forecasting water demand. The models are obtained using time series data from water consumption in an urban area of a city in south-eastern Spain. This includes highly non-linear time series data, which has conditioned the type of models we have included in our study. Namely, we have considered artificial neural networks, projection pursuit regression, multivariate adaptive regression splines, random forests and support vector regression. Apart from these models, we also propose a simple model based on the weighted demand profile resulting from our exploratory analysis of the data. In our comparative study, all predictive models were evaluated using an experimental methodology for hourly time series data that detailed water demand in a hydraulic sector of a water supply network in a city in south-eastern Spain. The accuracy of the obtained results, together with the medium size of the demand area, suggests that this was a suitable environment for making adequate management decisions.

CloseRead Abstract

2005

Adapting peepholing to regression trees

Authors
Torgo, L; Marques, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper presents an adaptation of the peepholing method to regression trees. Peepholing was described as a means to overcome the major computational bottleneck of growing classification trees by Catlett [3]. This method involves two major steps: shortlisting and blinkering. The former has the goal of eliminating some continuous variables from consideration when growing the tree, while the second tries to restrict the range of values of the remaining continuous variables that should be considered when searching for the best cut point split. Both are effective means of overcoming the most costly step of growing tree-based models: sorting the values of the continuous variables before selecting their best split. In this work we describe the adaptations that are necessary to use this method within regression trees. The major adaptations involve developing means to obtain biased estimates of the criterion used to select the best split of these models. We present some preliminary experiments that show the effectiveness of our proposal.

CloseRead Abstract

2010

Data Mining with R: Learning with Case Studies

Authors
Torgo, L;

Publication

Abstract