Publicacoes - INESC TEC

Publicações

Publicações por Vítor Santos Costa

2008

Compile the Hypothesis Space: Do it Once, Use it Often

Autores
Fonseca, NA; Camacho, R; Rocha, R; Costa, VS;

Publicação
FUNDAMENTA INFORMATICAE

Abstract
Inductive Logic Programming (ILP) is a powerful and well-developed abstraction for multi-relational data mining techniques. Despite the considerable success of ILP, deployed ILP systems still have efficiency problems when applied to complex problems. In this paper we propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique is based on the Mode Directed Inverse Entailment approach to ILP, where a bottom clause is generated for each example and the generated clauses are subsets of the literals of such bottom clause. We propose to store in a prefix-tree all clauses that can be generated from all bottom clauses together with some extra information. We show that this information is sufficient to estimate the number of examples that can be deduced from a clause and present an ILP algorithm that exploits this representation. We also present an extension of the algorithm where each prefix-tree is computed only once (compiled) per example. The evaluation of hypotheses requires only basic and efficient operations on trees. This proposal avoids re-computation of hypothesis' value in theory-level search, in cross-validation evaluation procedures and in parameter tuning. Both proposals are empirically evaluated on real applications and considerable speedups were observed.

FecharLer Abstract

2006

A pipelined data-parallel algorithm for ILP

Autores
Fonseca, NA; Silva, F; Costa, VS; Camacho, R;

Publicação
2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)

Abstract
The amount of data collected and stored in databases is growing considerably for almost all areas of human activity. Processing this amount of data is very expensive, both humanly and computationally. This justifies the increased interest both on the automatic discovery of useful knowledge from databases, and on using parallel processing for this task. Multi Relational Data Mining (MRDM) techniques, such as Inductive Logic Programming (ILP), can learn rules from relational databases consisting of multiple tables. However current ILP systems are designed to run in main memory and can have long running times. We propose a pipelined data-parallel algorithm for ILP. The algorithm was implemented and evaluated on a commodity PC cluster with 8 processors. The results show that our algorithm yields excellent speedups, while preserving the quality of learning.

FecharLer Abstract

2012

A design and implementation of the Extended Andorra Model

Autores
Lopes, R; Costa, VS; Silva, F;

Publicação
THEORY AND PRACTICE OF LOGIC PROGRAMMING

Abstract
Logic programming provides a high-level view of programming, giving implementers a vast latitude into what techniques to explore to achieve the best performance for logic programs. Towards obtaining maximum performance, one of the holy grails of logic programming has been to design computational models that could be executed efficiently and that would allow both for a reduction of the search space and for exploiting all the available parallelism in the application. These goals have motivated the design of the Extended Andorra Model (EAM), a model where goals that do not constrain nondeterministic goals can execute first. In this work, we present and evaluate the Basic design for EAM, a system that builds upon David H. D. Warren's original EAM with Implicit Control. We provide a complete description and implementation of the Basic design for EAM System as a set of rewrite and control rules. We present the major data structures and execution algorithms that are required for efficient execution, and evaluate system performance. A detailed performance study of our system is included. Our results show that the system achieves acceptable base performance and that a number of applications benefit from the advanced search inherent to the EAM.

FecharLer Abstract

2001

A Novel Implementation of the Extended Andorra Model

Autores
Lopes, R; Costa, VS; Silva, FMA;

Publicação
Practical Aspects of Declarative Languages, Third International Symposium, PADL 2001, Las Vegas, Nevada, March 11-12, 2001, Proceedings

Abstract
Logic programming is based on the idea that computation is controlled inference. The Extended Andorra Model provides a very powerful framework that supports both co-routining and parallelism. We present the BEAM, a design that builds upon David H. D.Warren’s original EAM with Implicit Control. The BEAM supports Warren’s original EAM rewrite rules plus eager splitting and sequential conjunctions. We discuss the main issues in the implementation of the BEAM and show that the EAM with Implicit Control can perform quite well when compared with other implementations that use the Andorra principle. © Springer-Verlag Berlin Heidelberg 2001

FecharLer Abstract

2002

Achieving Scalability in Parallel Tabled Logic Programs

Autores
Rocha, R; Silva, FMA; Costa, VS;

Publicação
16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 15-19 April 2002, Fort Lauderdale, FL, USA, CD-ROM/Abstracts Proceedings

Abstract
Tabling or memoing is a technique where one stores intermediate answers to a problem so that they can be reused in further calls. Tabling is of interest to logic programming because it addresses some of the most significant weaknesses of Prolog. Namely, it can guarantee termination for programs with the bounded term-size property. Tabled programs exhibit a more complex execution mechanism than traditional Prolog's left-to-right search with backtracking. The reason is that Prolog programs are highly recursive and generate multiple answers. This rather involved execution mechanism requires a more complex implementation than traditional Prolog. The declarative nature of tabled logic programming suggests that it might be amenable to parallel execution. On the other hand, the complexity of the tabling mechanism, and the existence of a shared resource, the table, argues that parallelism might be limited, and that performance for real applications might never scale. In this work we prove that parallel tabling is indeed scalable for real applications by experimenting the OPTYap parallel tabled system on a scalable shared-memory machine. © 2002 IEEE.

FecharLer Abstract

2001

On a Tabling Engine That Can Exploit Or-Parallelism

Autores
Rocha, R; Silva, FMA; Costa, VS;

Publicação
Logic Programming, 17th International Conference, ICLP 2001, Paphos, Cyprus, November 26 - December 1, 2001, Proceedings

Abstract
Tabling is an implementation technique that improves the declarativeness and expressiveness of Prolog by reusing solutions to goals. Quite a few interesting applications of tabling have been developed in the last few years, and several are by nature non-deterministic. This raises the question of whether parallel search techniques can be used to improve the performance of tabled applications. In this work we demonstrate that the mechanisms proposed to parallelize search in the context of SLD resolution naturally generalize to parallel tabled computations, and that resulting systems can achieve good performance on multi-processors. To do so, we present the OPT Yap parallel engine. In our system individual SLG engines communicate data through stack copying. Completion is detected through a novel parallel completion algorithm that builds upon the data structures proposed for or-parallelism. Scheduling is simplified by building on previous research on or-parallelism. We show initial performance results for our implementation. Our best result is for an actual application, model checking, where we obtain linear speedups. © Springer-Verlag Berlin Heidelberg 2001.

FecharLer Abstract