Publicacoes - INESC TEC

Publicações

Publicações por João Gama

1997

Search-based class discretization

Autores
Torgo, L; Gama, J;

Publicação
MACHINE LEARNING : ECML-97

Abstract
We present a methodology that enables the use of classification algorithms on regression tasks. We implement this method in system RECLA that transforms a regression problem into a classification one and then uses an existent classification system to solve this new problem. The transformation consists of mapping a continuous variable into an ordinal variable by grouping its values into an appropriate set of intervals. We use misclassification costs as a means to reflect the implicit ordering among the ordinal values of the new variable. We describe a set of alternative discretization methods and, based on our experimental results, justify the need for a search-based approach to choose the best method. Our experimental results confirm the validity of our search-based approach to class discretization, and reveal the accuracy benefits of adding misclassification costs.

FecharLer Abstract

2009

Advanced Data Mining and Applications

Autores
Huang, R; Yang, Q; Pei, J; Gama, J; Meng, X; Li, X;

Publicação
Lecture Notes in Computer Science

Abstract

2010

Knowledge Discovery from Sensor Data

Autores
Gaber, MM; Vatsavai, RR; Omitaomu, OA; Gama, J; Chawla, NV; Ganguly, AR;

Publicação
Lecture Notes in Computer Science

Abstract

2001

Functional trees

Autores
Gama, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
The design of algorithms that explore multiple representation languages and explore different search spaces has an intuitive appeal. In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. The same applies to model trees algorithms, in regression domains, but using linear models at leaf nodes. In this paper we study where to use combinations of attributes in regression and classification tree learning. We present an algorithm for multivariate tree learning that combines a univariate decision tree with a linear function by means of constructive induction. This algorithm is able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. The algorithm has been implemented both for classification problems and regression problems. The experimental evaluation shows that our algorithm has clear advantages with respect to the generalization ability when compared against its components, two simplified versions, and competes well against the state-of-the-art in multivariate regression and classification trees. © Springer-Verlag Berlin Heidelberg 2001.

FecharLer Abstract

2010

Clustering from Data Streams

Autores
Shultz, TR; Fahlman, SE; Craw, S; Andritsos, P; Tsaparas, P; Silva, R; Drummond, C; Ling, CX; Sheng, VS; Drummond, C; Lanzi, PL; Gama, J; Wiegand, RP; Sen, P; Namata, G; Bilgic, M; Getoor, L; He, J; Jain, S; Stephan, F; Jain, S; Stephan, F; Sammut, C; Harries, M; Sammut, C; Ting, KM; Pfahringer, B; Case, J; Jain, S; Wagstaff, KL; Nijssen, S; Wirth, A; Ling, CX; Sheng, VS; Zhang, X; Sammut, C; Cancedda, N; Renders, J; Michelucci, P; Oblinger, D; Keogh, E; Mueen, A;

Publicação
Encyclopedia of Machine Learning

Abstract

2008

Improving the performance of an incremental algorithm driven by error margins

Autores
del Campo Avilaa, J; Ramos Jimeneza, G; Gamab, J; Morales Buenoa, R;

Publicação
Intelligent Data Analysis

Abstract
Classification is a quite relevant task within data analysis field. This task is not a trivial task and different difficulties can arise depending on the nature of the problem. All these difficulties can become worse when the datasets are too large or when new information can arrive at any time. Incremental learning is an approach that can be used to deal with the classification task in these cases. It must alleviate, or solve, the problem of limited time and memory resources. One emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them. IADEM is one of the most recent algorithms that use this approach. The aim of this paper is to improve the performance of this algorithm in different ways: simplifying the complexity of the induced models, adding the ability to deal with continuous data, improving the detection of noise, selecting new criteria for evolutionating the model, including the use of more powerful prediction techniques, etc. Besides these new properties, the new system, IADEM-2, preserves the ability to obtain a performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does: using short time per example.

FecharLer Abstract