2007
Authors
Lucas, JP; Jorge, AM; Pereira, F; PernaS, AM; Machado, AA;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS
Abstract
We describe an approach and a tool for the discovery of subgroups within the framework of distribution rule mining. Distribution rules are a kind of association rules particularly suited for the exploratory study of numerical variables of interest. Being an exploratory technique, the result of a distribution mining process is typically a very large number of patterns. Exploring such results is thus a complex task and limits the use of the technique. To overcome this shortcoming we developed a tool, written in Java, which supports subgroup discovery in a post-processing step. The tool engages the analyst in an interactive process of subgroup discovery by means of a graphical interface with well defined statistical grounds, where domain knowledge can be used during the identification of such subgroups amid the population. We show a case study to analyze the results of students in a large scale university admission examination.
2007
Authors
Azevedo, PJ; Jorge, AM;
Publication
Machine Learning: ECML 2007, Proceedings
Abstract
We study the predictive ability of some association rule measures typically used to assess descriptive interest. Such measures, namely conviction, lift and chi(2) are compared with confidence, Laplace, mutual information, cosine, Jaccard and phi-coefficient. As prediction models, we use sets of association rules. Classification is done by selecting the best rule, or by weighted voting. We performed an evaluation on 17 datasets with different characteristics and conclude that conviction is on average the best predictive measure to use in this setting. We also provide some meta-analysis insights for explaining the results.
2007
Authors
Rebelo, C; Brito, PQ; Soares, C; Jorge, A; Brandao, R;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS
Abstract
The potential value of a market segmentation for a company is usually assessed in terms of six criteria: identifiability, substantiality, accessibility, responsiveness, stability and actionability. These are widely accepted as essential criteria, but they are difficult to quantify. Quantification is particularly important in early stages of the segmentation process, especially when automatic clustering methods are employed. With such methods it is easy to produce a large number of segmentations but only the most interesting ones should be selected for further analysis. In this paper, we address the problem of how to quantify the value of a segmentation according to the criteria above. We propose several measures and test them on a case study, consisting of a segmentation of portal users.
2007
Authors
Torgo, L; Ribeiro, R;
Publication
Knowledge Discovery in Databases: PKDD 2007, Proceedings
Abstract
Cost-sensitive learning is a key technique for addressing many real world data mining applications. Most existing research has been focused on classification problems. In this paper we propose a framework for evaluating regression models in applications with non-uniform costs and benefits across the domain of the continuous target variable. Namely, we describe two metrics for asserting the costs and benefits of the predictions of any model given a set of test cases. We illustrate the use of our metrics in the context of a specific type of applications where non-uniform costs are required: the prediction of rare extreme values of a continuous target variable. Our experiments provide clear evidence of the utility of the proposed framework for evaluating the merits of any model in this class of regression domains.
2007
Authors
Brito, P; Cucumel, G; Bertrand, P; de Carvalho, F;
Publication
Studies in Classification, Data Analysis, and Knowledge Organization
Abstract
2007
Authors
Brito, P;
Publication
Selected Contributions in Data Analysis and Classification - Studies in Classification, Data Analysis, and Knowledge Organization
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.