2001
Autores
Soares, C; Petrak, J; Brazdil, P;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures. © Springer-Verlag Berlin Heidelberg 2001.
2001
Autores
Brazdil, P; Soares, C; Pereira, R;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
Several methods have been proposed to generate rankings of supervised classification algorithms based on their previous performance on other datasets [8,4]. Like any other prediction method, ranking methods will sometimes err, for instance, they may not rank the best algorithm in the first position. Often the user is willing to try more than one algorithm to increase the possibility of identifying the best one. The information provided in the ranking methods mentioned is not quite adequate for this purpose. That is, they do not identify those algorithms in the ranking that have reasonable possibility of performing best. In this paper, we describe a method for that purpose. We compare our method to the strategy of executing all algorithms and to a very simple reduction method, consisting of running the top three algorithms. In all this work we take time as well as accuracy into account. As expected, our method performs better than the simple reduction method and shows a more stable behavior than running all algorithms. © Springer-Verlag Berlin Heidelberg 2001.
2001
Autores
Gama, J;
Publicação
Advances in Intelligent Data Analysis, 4th International Conference, IDA 2001, Cascais, Portugal, September 13-15, 2001, Proceedings
Abstract
In this paper we present and evaluate a new algorithm for supervised learning regression problems. The algorithm combines a univariate regression tree with a linear regression function by means of constructive induction. When growing the tree, at each internal node, a linear-regression function creates one new attribute. This new attribute is the instantiation of the regression function for each example that fall at this node. This new instance space is propagated down through the tree. Tests based on those new attributes correspond to an oblique decision surface. Our approach can be seen as a hybrid model that combines a linear regression known to have low variance with a regression tree known to have low bias. Our algorithm was compared against to its components, and two simplified versions, and M5 using 16 benchmark datasets. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability when compared against its components and competes well against the state-of-art in regression trees. © Springer-Verlag Berlin Heidelberg 2001.
2001
Autores
Amado, N; Gama, J; Silva, FMA;
Publicação
Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving, 10th Portuguese Conference on Artificial Intelligence, EPIA 2001, Porto, Portugal, December 17-20, 2001, Proceedings
Abstract
In the fields of data mining and machine learning the amount of data available for building classifiers is growing very fast. Therefore, there is a great need for algorithms that are capable of building classifiers from very-large datasets and, simultaneously, being computationally efficient and scalable. One possible solution is to employ parallelism to reduce the amount of time spent in building classifiers from very-large datasets and keeping the classification accuracy. This work first overviews some strategies for implementing decision tree construction algorithms in parallel based on techniques such as task parallelism, data parallelism and hybrid parallelism. We then describe a new parallel implementation of the C4.5 decision tree construction algorithm. Even though the implementation of the algorithm is still in final development phase, we present some experimental results that can be used to predict the expected behavior of the algorithm. © Springer-Verlag Berlin Heidelberg 2001.
2001
Autores
Gama, J;
Publicação
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS
Abstract
The design of algorithms that explore multiple representation languages and explore different search spaces has an intuitive appeal. In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. The same applies to model trees algorithms, in regression domains, but using linear models at leaf nodes. In this paper we study where to use combinations of attributes in decision tree learning, We present an algorithm for multivariate tree learning that combines a univariate decision tree with a discriminant function by means of constructive induction. This algorithm is able to use decision nodes with multivariate tests, and leaf nodes that predict a class using a discriminant function. Multivariate decision nodes are built when growing the tree, while junctional leaves are built when pruning the tree. Functional trees can be seen as a generalization of multivariate trees. Our algorithm was compared against to its components and two simplified versions using 30 benchmark datasets. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability and model sizes at statistically significant confidence levels.
2001
Autores
Gama, J;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The design of algorithms that explore multiple representation languages and explore different search spaces has an intuitive appeal. In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. The same applies to model trees algorithms, in regression domains, but using linear models at leaf nodes. In this paper we study where to use combinations of attributes in regression and classification tree learning. We present an algorithm for multivariate tree learning that combines a univariate decision tree with a linear function by means of constructive induction. This algorithm is able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. The algorithm has been implemented both for classification problems and regression problems. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability when compared against its components, two simplified versions, and competes well against the state-of-the-art in multivariate regression and classification trees. © Springer-Verlag Berlin Heidelberg 2001.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.