Publications

Publications by LIAAD

2003

Visualization and evaluation support of knowledge discovery through the predictive model markup language

Authors
Wettschereck, D; Jorge, A; Moyle, S;

Publication
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS

Abstract
The emerging standard for the platform- and system-independent representation of data mining models PMML (Predictive Model Markup Language) is currently supported by a number of knowledge discovery support engines. The primary purpose of the PMML standard is to separate model generation from model storage in order to enable users to view, post-process, and utilize data mining models independently of the tool that generated the model. In this paper two systems, called VizWiz and PEAR, are described. These software packages allow for the visualization and evaluation of data mining models that are specified in PMML. They can be viewed. as decision support systems, since they enable non-expert users of data mining results to interactively inspect and evaluate these results.

CloseRead Abstract

2003

Predicting outliers

Authors
Torgo, L; Ribeiro, R;

Publication
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS

Abstract
This paper describes a method designed for data mining applications where the main goal is to predict extreme and rare values of a continuous target variable, as well as to understand under which conditions these values occur. Our objective is to induce models that are accurate at predicting these outliers but are also interpretable from the user perspective. We describe a new splitting criterion for regression trees that enables the induction of trees achieving these goals. We evaluate our proposal on several real world problems and contrast the obtained models with standard regression trees. The results of this evaluation show the clear advantage of our proposal in terms of the evaluation statistics that are relevant for these applications.

CloseRead Abstract

2003

Predicting harmful algae blooms

Authors
Ribeiro, R; Torgo, L;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In several applications the main interest resides in predicting rare and extreme values. This is the case of the prediction of harmful algae blooms. Though it's rare, the occurrence of these blooms has a strong impact in river life forms and water quality and turns out to be a serious ecological problem. In this paper, we describe a data mining method whose main goal is to predict accurately this kind of rare extreme values. We propose a new splitting criterion for regression trees that enables the induction of trees achieving these goals. We carry out an analysis of the results obtained with our method on this application domain and compare them to those obtained with standard regression trees. We conclude that this new method achieves better results in terms of the evaluation statistics that are relevant for this kind of applications.

CloseRead Abstract

2003

Hierarchical and Pyramidal Clustering for Symbolic Data

Authors
Brito, P;

Publication
Journal of the Japanese Society of Computational Statistics

Abstract

2003

Mining official data

Authors
Brito, P; Malerba, D;

Publication
Intelligent Data Analysis

Abstract

2003

Symbolic clustering of constrained probabilistic data

Authors
Brito, P; de Carvalho, FAT;

Publication
EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS

Abstract
In previous work (Brito and De Carvalho (1999)) we have considered the presence of dependence rules between variables in the framework of a symbolic clustering method. In another paper Brito (1998) has addressed the problem of clustering probabilistic data. The aim of this paper is to bring together the two issues, that is, to take into account dependence rules on probabilistic data. This is accomplished by introducing new generality measures with an appropriate generalization operator. This approach allows for the extension of a symbolic clustering. method to constrained probabilistic data.

CloseRead Abstract