Publications

Publications by Luís Torgo

2012

Spatial Interpolation using Multiple Regression

Authors
Ohashi, O; Torgo, L;

Publication
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012)

Abstract
Many real world data mining applications involve analyzing geo-referenced data. Frequently, this type of data sets are incomplete in the sense that not all geographical coordinates have measured values of the variable(s) of interest. This incompleteness may be caused by poor data collection, measurement errors, costs management and many other factors. These missing values may cause several difficulties in many applications. Spatial imputation/interpolation methods try to fill in these unknown values in geo-referenced data sets. In this paper we propose a new spatial imputation method based on machine learning algorithms and a series of data preprocessing steps. The key distinguishing factor of this method is allowing the use of data from faraway regions, contrary to the state of the art on spatial data mining. Images (e. g. from a satellite or video surveillance cameras) may also suffer from this incompleteness where some pixels are missing, which again may be caused by many factors. An image can be seen as a spatial data set in a Cartesian coordinates system, where each pixel (location) registers some value (e. g. degree of gray on a black and white image). Being able to recover the original image from a partial or incomplete version of the reality is a key application in many domains (e. g. surveillance, security, etc.). In this paper we evaluate our general methodology for spatial interpolation on this type of problems. Namely, we check the ability of our method to fill in unknown pixels on several images. We compare it to state of the art methods and provide strong experimental evidence of the advantages of our proposal.

CloseRead Abstract

2003

Clustered partial linear regression

Authors
Torgo, L; Da Costa, JP;

Publication
MACHINE LEARNING

Abstract
This paper presents a new method that deals with a supervised learning task usually known as multiple regression. The main distinguishing feature of our technique is the use of a multistrategy approach to this learning task. We use a clustering method to form sub-sets of the training data before the actual regression modeling takes place. This pre-clustering stage creates several training sub-samples containing cases that are "nearby" to each other from the perspective of the multidimensional input space. Supervised learning within each of these sub-samples is easier and more accurate as our experiments show. We call the resulting method clustered partial linear regression. Predictions using these models are preceded by a cluster membership query for each test case. The cluster membership probability of a test case is used as a weight in an averaging process that calculates the final prediction. This averaging process involves the predictions of the regression models associated to the clusters for which the test case may belong. We have tested this general multistrategy approach using several regression techniques and we have observed significant accuracy gains in several data sets. We have also compared our method to bagging that also uses an averaging process to obtain predictions. This experiment showed that the two methods are significantly different. Finally, we present a comparison of our method with several state-of-the-art regression methods showing its competitiveness.

CloseRead Abstract

1998

Error Estimators for Pruning Regression Trees

Authors
Torgo, L;

Publication
Machine Learning: ECML-98, 10th European Conference on Machine Learning, Chemnitz, Germany, April 21-23, 1998, Proceedings

Abstract
This paper presents a comparative study of several methods for estimating the true error of tree-structured regression models. We evaluate these methods in the context of regression tree pruning. The study is focused on problems where large samples of data are available. We present two novel variants of existent estimation methods. We evaluate several methods that follow different approaches to the estimation problem, and perform experimental evaluation in twelve domains. The goal of this evaluation is to characterise the performance of the methods in the task of selecting the best possible tree among the alternative trees considered during pruning. The results of the comparison show that certain estimators lead to very bad decisionsin some domains. Our proposed variant of the holdout method obtained the best results in the experimental comparisons. © Springer-Veriag Berlin Heidelberg 1998.

CloseRead Abstract

1993

Rule Combination in Inductive Learning

Authors
Torgo, L;

Publication
Machine Learning: ECML-93, European Conference on Machine Learning, Vienna, Austria, April 5-7, 1993, Proceedings

Abstract
This paper describes the work on methods for combining rules obtained by machine learning systems. Three methods for obtaining the classification of examples with those rules are compared. The advantages and disadvantages of each method are discussed and the results obtained on three real world domains are commented. The methods compared are: selection of the best rule; PROSPECTOR-like probabilistic approximation for rule combination; and MYCIN-like approximation. Results show significant differences between methods indicating that the problem-solving strategy is important for accuracy of learning systems. © Springer-Verlag Berlin Heidelberg 1993.

CloseRead Abstract

2000

Efficient and comprehensible local regression

Authors
Torgo, L;

Publication
KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS: CURRENT ISSUES AND NEW APPLICATIONS

Abstract
This paper describes an approach to multivariate regression that aims at improving the computational efficiency and comprehensibility of local regression techniques. Local regression modeling is known for its ability to accurately approximate quite diverse regression surfaces with high accuracy. However, theses methods are also known for being computationally demanding and for not providing any comprehensible model of the data. These two characteristics can be regarded as major drawbacks in the context of a typical data, mining scenario. The method we describe tackles these problems by integrating local regression within a partition-based induction method.

CloseRead Abstract

2012

Wind speed forecasting using spatio-temporal indicators

Authors
Ohashi, O; Torgo, L;

Publication
20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012)

Abstract
From small farms to electricity markets the interest and importance of wind power production is continuously increasing. This interest is mainly caused by the fact that wind is a continuous resource of clean energy. To take full advantage of the potential of wind power production it is crucial to have tools that accurately forecast the expected wind speed. However, forecasting the wind speed is not a trivial task. Wind speed is characterised by a random behaviour as well as several other intermittent characteristics. This paper proposes a new approach to the task of wind speed forecasting. The main distinguishing feature of this proposal is its reliance on both temporal and spatial characteristics to produce a forecast of the future wind speed. We have experimentally tested the proposed method with historical data concerning wind speed on the eastern region of the US. Nevertheless, the methodology that is described in the paper can be seen as a general approach to spatio-temporal prediction. We have compared our proposal to other standard approaches in the task of forecasting 2 hours ahead wind speed. Our extensive experiments show that our proposal has clear advantages in most setups.

CloseRead Abstract