Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CRACS

2007

Demand-driven indexing of Prolog clauses

Authors
Costa, VS; Sagonas, K; Lopes, R;

Publication
Logic Programming, Proceedings

Abstract
As logic programming applications grow in size, Prolog systems need to efficiently access larger and larger data sets and the need for any- and multi-argument indexing becomes more and more profound. Static generation of multi-argument indexing is one alternative, but applications often rely on features that are inherently dynamic which makes static techniques inapplicable or inaccurate. Another alternative is to employ dynamic schemes for flexible demand-driven indexing of Prolog clauses. We propose such schemes and discuss issues that need to be addressed for their efficient implementation in the context of WAM-based Prolog systems. We have implemented demand-driven indexing in two different Prolog systems and have been able to obtain non-negligible performance speedups: from a few percent up to orders of magnitude. Given these results, we see very little reason for Prolog systems not to incorporate some form of dynamic indexing based on actual demand. In fact, we see demand-driven indexing as only the first step towards effective runtime optimization of Prolog programs.

2007

A study of structural properties on profiles HMMs

Authors
Bernardes, JulianaS.; Dávila, AlbertoM.R.; Costa, VitorSantos; Zaverucha, Gerson;

Publication
CoRR

Abstract

2007

Prolog performance on larger datasets

Authors
Costa, VS;

Publication
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES

Abstract
Declarative systems, such as logic programming, should be ideal to process large data sets efficiently. Unfortunately, the high-level nature of logic-based representations can cause inefficiencies, and may lead in some cases to unacceptable performance. We discuss how logic programming systems can accommodate large amounts of data in main memory. We use a number of real datasets to evaluate performance and discuss how a number of techniques can be used to improve memory scalabality for such datasets.

2007

Design, implementation, and evaluation of a dynamic compilation framework for the YAP system

Authors
da Silva, AF; Costa, VS;

Publication
Logic Programming, Proceedings

Abstract
We propose dynamic compilation for Prolog, in the style of Just-In-Time compilers. Our approach adapts to the actual characteristics of the target program by (i) compiling only the parts of the program that are executed frequently, and (ii) adapting to actual call patterns. This allows aggressive optimization of the parts of the program that are really executed, and better informed heuristics to drive these optimizations. Our compiler does need to support all features in the language, only what is deemed important to performance. Complex execution patterns, such as the ones caused by error handling, may be left to the interpreter. On the other hand, compilation is now part of the run-time, and thus incurs run-time overheads. We have implemented dynamic compilation for YAP system. Our initial results suggest that dynamic compilation achieves very substantial performance improvements over the original interpreter, and that it can approach and even out-perform state-of-the-art native code systems. We believe that we have shown that dynamic compilation is worthwhile and fits naturally with Prolog execution.

2007

Improving model construction of profile HMMs for remote homology detection through structural alignment

Authors
Bernardes, JS; Davila, AM; Costa, VS; Zaverucha, G;

Publication
BMC BIOINFORMATICS

Abstract
Background: Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results: We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion: We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

2007

An integrated approach to feature invention and model construction for drug activity prediction

Authors
Davis, J; Costa, VS; Ray, S; Page, D;

Publication
ACM International Conference Proceeding Series

Abstract
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the "Score As You Use" (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multiple-instance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.

  • 175
  • 202