Publications

Publications by Vítor Santos Costa

2005

A framework for set-oriented computation in inductive logic programming and its application in generalizing inverse entailment

Authors
Bravo, HC; Page, D; Ramakrishnan, R; Shavlik, J; Costa, VS;

Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
We propose a new approach to Inductive Logic Programming i that systematically exploits caching and offers a number of advantages over current systems. It avoids redundant computation, is more amenable to the use of set-oriented generation and evaluation of hypotheses, and allows relational DBMS technology to be more easily applied to ILP systems. Further, our approach opens up new avenues such as probabilistically scoring rules during search and the generation of probabilistic rules. As a first example of the benefits of our ILP framework, we propose a scheme for denning the hypothesis search space through Inverse Entailment using multiple example seeds. © Springer-Verlag Berlin Heidelberg 2005.

CloseRead Abstract

2005

Probabilistic first-order theory revision from examples

Authors
Paes, A; Revoredo, K; Zaverucha, G; Costa, VS;

Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
Recently, there has been significant work in the integration of probabilistic reasoning with first order logic representations. Learning algorithms for these models have been developed and they all considered modifications in the entire structure. In a previous work we argued that when the theory is approximately correct the use of techniques from theory revision to just modify the structure in places that failed in classification can be a more adequate choice. To score these modifications and choose the best one the log likelihood was used. However, this function was shown not to be well-suited in the propositional Bayesian classification task and instead the conditional log likelihood should be used. In the present paper, we extend this revision system showing the necessity of using specialization operators even when there are no negative examples. Moreover, the results of a theory modified only in places that are responsible for the misclassification of some examples are compared with the one that was modified in the entire structure using three databases and considering four probabilistic score functions, including conditional log likelihood. © Springer-Verlag Berlin Heidelberg 2005.

CloseRead Abstract

2003

Toward automatic management of embarrassingly parallel applications

Authors
Dutra, I; Page, D; Costa, VS; Shavlik, J; Waddell, M;

Publication
EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS

Abstract
Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our system controls all experiments, including re-submission of failed jobs and relies on available resource managers to spawn jobs through pools of machines. Our results show that we can manage a very large number of experiments, using a reasonable amount of idle CPU cycles, with very little user intervention.

CloseRead Abstract

2003

An empirical evaluation of bagging in inductive logic programming

Authors
De Dutra, IC; Page, D; Costa, VS; Shavlik, J;

Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
Ensembles have proven useful for a variety of applications, with a variety of machine learning approaches. While Quinlan has applied boosting to FOIL, the widely-used approach of bagging has never been employed in ILP. Bagging has the advantage over boosting that the different members of the ensemble can be learned and used in parallel. This advantage is especially important for ILP where run-times often are high. We evaluate bagging on three different application domains using the complete-search ILP system, Aleph. We contrast bagging with an approach where we take advantage of the non-determinism in ILP search, by simply allowing Aleph to run multiple times, each time choosing "seed" examples at random.

CloseRead Abstract

2011

Trebuchet: Exploring TLP with dataflow virtualisation

Authors
Alves, TAO; Marzulo, LAJ; Franca, FMG; Costa, VS;

Publication
International Journal of High Performance Systems Architecture

Abstract
Parallel programming has become mandatory to fully exploit the potential of multi-core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine-grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM: a coarse-grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi-cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state-of-the-art technology, while providing the benefits of dataflow execution. Copyright © 2011 Inderscience Enterprises Ltd.

CloseRead Abstract

2007

Improving model construction of profile HMMs for remote homology detection through structural alignment

Authors
Bernardes, JS; Davila, AM; Costa, VS; Zaverucha, G;

Publication
BMC BIOINFORMATICS

Abstract
Background: Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results: We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion: We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

CloseRead Abstract