Publications

Publications by CRACS

2014

A Datalog Engine for GPUs

Authors
Alberto Martinez Angeles, CA; Dutra, I; Costa, VS; Buenabad Chavez, J;

Publication
DECLARATIVE PROGRAMMING AND KNOWLEDGE MANAGEMENT

Abstract
We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a multicore) and memory in the GPU in order to reduce the number of memory transfers. To evaluate the performance of the engine, four Datalog queries were run on the engine and on a single CPU in the multicore host. One query runs up to 200 times faster on the (GPU) engine than on the CPU.

CloseRead Abstract

2014

Discovering Differentially Expressed Genes in Yeast Stress Data

Authors
Goncalves, A; Ong, I; Lewis, JA; Costa, VS;

Publication
2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Transcriptional regulation plays an important role in every cellular decision. Gaining an understanding of the dynamics that govern how a cell will respond to diverse environmental cues is difficult using intuition alone. We try to discover how genes interact when submitted to stress by exploring techniques of gene expression data analysis. We use several types of data, including high-throughput data. These results will help us recreate plausible regulatory networks by using a probabilistic logical model. Hence, network hypotheses can be generated from existing gene expression data for use by experimental biologists.

CloseRead Abstract

2014

Towards using Probabilities and Logic to Model Regulatory Networks

Authors
Goncalves, A; Ong, I; Lewis, JA; Costa, VS;

Publication
2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Transcriptional regulation plays an important role in every cellular decision. Unfortunately, understanding the dynamics that govern how a cell will respond to diverse environmental cues is difficult using intuition alone. We introduce logic-based regulation models based on state-of-the-art work on statistical relational learning, and validate our approach by using it to analyze time-series gene expression data of the Hog1 pathway. Our results show that plausible regulatory networks can be learned from time series gene expression data using a probabilistic logical model. Hence, network hypotheses can be generated from existing gene expression data for use by experimental biologists.

CloseRead Abstract

2014

Couillard: Parallel programming via coarse-grained Data-flow Compilation

Authors
Marzulo, LAJ; Alves, TAO; Franca, FMG; Costa, VS;

Publication
PARALLEL COMPUTING

Abstract
Data-flow is a natural approach to parallelism. However, describing dependencies and control between fine-grained data-flow tasks can be complex and present unwanted overheads. TALM (TALM is an Architecture and Language for Multi-threading) introduces a user-defined coarse-grained parallel data-flow model, where programmers identify code blocks, called super-instructions, to be run in parallel and connect them in a data-flow graph. TALM has been implemented as a hybrid Von Neumann/data-flow execution system: the Trebuchet. We have observed that TALM's usefulness largely depends on how programmers specify and connect super-instructions. Thus, we present Couillard, a full compiler that creates, based on an annotated C-program, a data-flow graph and C-code corresponding to each super-instruction. We show that our toolchain allows one to benefit from data-flow execution and explore sophisticated parallel programming techniques, with small effort. To evaluate our system we have executed a set of real applications on a large multi-core machine. Comparison with popular parallel programming methods shows competitive speedups, while providing an easier parallel programing approach. More specifically, for an application that follows the wavefront method, running with big inputs, Trebuchet achieved up to 4.7% speedup over Intel (R) TBB novel flow-graph approach and up to 44% over OpenMP.

CloseRead Abstract

2014

PrologCheck - Property-Based Testing in Prolog

Authors
Amaral, C; Florido, M; Costa, VS;

Publication
FUNCTIONAL AND LOGIC PROGRAMMING, FLOPS 2014

Abstract
We present PrologCheck, an automatic tool for property-based testing of programs in the logic programming language Prolog with randomised test data generation. The tool is inspired by the well known QuickCheck, originally designed for the functional programming language Haskell. It includes features that deal with specific characteristics of Prolog such as its relational nature (as opposed to Haskell) and the absence of a strong type discipline. PrologCheck expressiveness stems from describing properties as Prolog goals. It enables the definition of custom test data generators for random testing tailored for the property to be tested. Further, it allows the use of a predicate specification language that supports types, modes and constraints on the number of successful computations. We evaluate our tool on a number of examples and apply it successfully to debug a Prolog library for AVL search trees.

CloseRead Abstract

2014

Support vector machines for differential prediction

Authors
Kuusisto, F; Costa, VS; Nassif, H; Burnside, E; Page, D; Shavlik, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Machine learning is continually being applied to a growing set of fields, including the social sciences, business, and medicine. Some fields present problems that are not easily addressed using standard machine learning approaches and, in particular, there is growing interest in differential prediction. In this type of task we are interested in producing a classifier that specifically characterizes a subgroup of interest by maximizing the difference in predictive performance for some outcome between subgroups in a population. We discuss adapting maximum margin classifiers for differential prediction. We first introduce multiple approaches that do not affect the key properties of maximum margin classifiers, but which also do not directly attempt to optimize a standard measure of differential prediction. We next propose a model that directly optimizes a standard measure in this field, the uplift measure. We evaluate our models on real data from two medical applications and show excellent results. © 2014 Springer-Verlag.

CloseRead Abstract