Publications

Publications by Vítor Santos Costa

2012

Relational differential prediction

Authors
Nassif, H; Santos Costa, V; Burnside, ES; Page, D;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
A typical classification problem involves building a model to correctly segregate instances of two or more classes. Such a model exhibits differential prediction with respect to given data subsets when its performance is significantly different over these subsets. Driven by a mammography application, we aim at learning rules that predict breast cancer stage while maximizing differential prediction over age-stratified data. In this work, we present the first multi-relational differential prediction (aka uplift modeling) system, and propose three different approaches to learn differential predictive rules within the Inductive Logic Programming framework. We first test and validate our methods on synthetic data, then apply them on a mammography dataset for breast cancer stage differential prediction rule discovery. We mine a novel rule linking calcification to in situ breast cancer in older women. © 2012 Springer-Verlag.

CloseRead Abstract

2012

Conceptual clustering of multi-relational data

Authors
Fonseca, NA; Santos Costa, V; Camacho, R;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
"Traditional" clustering, in broad sense, aims at organizing objects into groups (clusters) whose members are "similar" among them and are "dissimilar" to objects belonging to the other groups. In contrast, in conceptual clustering the underlying structure of the data together with the description language which is available to the learner is what drives cluster formation, thus providing intelligible descriptions of the clusters, facilitating their interpretation. We present a novel conceptual clustering system for multi-relational data, based on the popular k?-?medoids algorithm. Although clustering is, generally, not straightforward to evaluate, experimental results on several applications show promising results. Clusters generated without class information agree very well with the true class labels of cluster's members. Moreover, it was possible to obtain intelligible and meaningful descriptions of the clusters. © 2012 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2011

Assessing the Effect of 2D Fingerprint Filtering on ILP-Based Structure-Activity Relationships Toxicity Studies in Drug Design

Authors
Camacho, R; Pereira, M; Costa, VS; Fonseca, NA; Simoes, CJV; Brito, RMM;

Publication
5TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2011)

Abstract
The rational development of new drugs is a complex and expensive process. A myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognised as the major hurdle behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship studies, using relational Machine Learning algorithms, proved already to be very useful in the complex process of rational drug design. However, a typical problem with those studies concerns the use of available repositories of previously studied molecules. It is quite often the case that those repositories are highly biased since they contain lots of molecules that are similar to each other. This results from the common practice where an expert chemist starts off with a lead molecule, presumed to have some potential, and then introduces small modifications to produce a set of similar molecules. Thus, the resulting sets have a kind of similarity bias. In this paper we assess the advantages of filtering out similar molecules in order to improve the application of relational learners in Structure-Activity Relationship (SAR) problems to predict toxicity. Furthermore, we also assess the advantage of using a relational learner to construct comprehensible models that may be quite valuable to bring insights into the workings of toxicity.

CloseRead Abstract

2011

On the Portability of Prolog Applications

Authors
Wielemaker, J; Costa, VS;

Publication
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES

Abstract
The non-portability of Prolog programs is widely considered one of the main problems facing Prolog programmers. Although since 1995, the core of the language is covered by the ISO standard 13211-1, this standard has not been sufficient to support large Prolog applications. As an approach to address this problem, since 2007, YAP and SWI-Prolog have established a basic compatibility framework. The aim of the framework is running the same code on Edinburgh-based Prolog systems rather than having to migrate an application. This article describes the implementation and evaluates this framework by studying how it can be used on a number of libraries and an important application.

CloseRead Abstract

2010

Chess Revision: Acquiring the Rules of Chess Variants through FOL Theory Revision from Examples

Authors
Muggleton, S; Paes, A; Costa, VS; Zaverucha, G;

Publication
INDUCTIVE LOGIC PROGRAMMING

Abstract
The game of chess has been a major testbed for research in artificial intelligence, since it requires focus on intelligent reasoning. Particularly, several challenges arise to machine learning systems when inducing a model describing legal moves of the chess, including the collection of the examples, the learning of a model correctly representing the official rules of the game, covering all the branches and restrictions of the correct moves, and the comprehensibility of such a model. Besides, the game of chess has inspired the creation of numerous variants, ranging from faster to more challenging or to regional versions of the game. The question arises if it is possible to take advantage of an initial classifier of chess as a starting point to obtain classifiers for the different variants. We approach this problem as an instance of theory revision from examples. The initial classifier of chess is inspired by a FOL theory approved by a chess expert and the examples are defined as sequences of moves within a game. Starting from a standard revision system, we argue that abduction and negation are also required to best address this problem. Experimental results show the effectiveness of our approach.

CloseRead Abstract

2010

Predicting the Start of Protein alpha-Helices Using Machine Learning Algorithms

Authors
Camacho, R; Ferreira, R; Rosa, N; Guimaraes, V; Fonseca, NA; Costa, VS; de Sousa, M; Magalhaes, A;

Publication
ADVANCES IN BIOINFORMATICS

Abstract