Publications

Publications by Inês Dutra

2014

A Hybrid MapReduce Model for Prolog

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC)

Abstract
Interest in the MapReduce programming model has been rekindled by Google in the past 10 years; its popularity is mostly due to the convenient abstraction for parallelization details this framework provides. State-of-the-art systems such as Google's, Hadoop or SAGA often provide added features like a distributed file system, fault tolerance mechanisms, data redundancy and portability to the basic MapReduce framework. However, these features pose an additional overhead in terms of system performance. In this work, we present a MapReduce design for Prolog which can potentially take advantage of hybrid parallel environments; this combination allies the easy declarative syntax of logic programming with its suitability to represent and handle multi-relational data due to its first order logic basis. MapReduce for Prolog addresses efficiency issues by performing load balancing on data with different granularity and allowing for parallelization in shared memory, as well as across machines. In an era where multicore processors have become common, taking advantage of a cluster's full capabilities requires the hybrid use of parallelism.

CloseRead Abstract

2016

A Speech-to-Text Interface for MammoClass

Authors
Roche, RS; Ferreira, P; Dutra, I; Correia, R; Salvini, R; Burnside, E;

Publication
2016 IEEE 29TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Mammoclass is a web tool that allows users to enter a small set of variable values that describe a finding in a mammography, and produces a probability of this finding being malignant or benign. The tool requires that the user types in every variable a value in order to perform a prediction. In this work, we present a speech-to-text interface integrated to MammoClass that allows radiologists to speak up a mammography report instead of typing it in. This new MammoClass module can take audio content, transcribe it into written words, and automatically extract the variable values by applying a parser to the recognized text. Results of spoken mammography reports show that the same variables are extracted for both types of input: typed in or dictated text.

CloseRead Abstract

2015

Performance Evaluation of Statistical Functions

Authors
Rodrigues, A; Silva, C; Borges, P; Silva, S; Dutra, I;

Publication
2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)

Abstract
Statistical data analysis methods are well known for their difficulty in handling large number of instances or large number of parameters. This is most noticeable in the presence of "big data", i.e., of data that are heterogeneous, and come from several sources, which makes their volume increase very rapidly. In this paper, we study popular and well-known statistical functions generally applied to data analysis, and assess their performance using our own implementation (DataIP) 1, MatLab and R. We show that DataIP outperforms MatLab and R by several orders of magnitude and that the design and implementation of these functions need to be rethought to adapt to today's data challenges.

CloseRead Abstract

2016

Processing Markov Logic Networks with GPUs: Accelerating Network Grounding

Authors
Alberto Martinez Angeles, CA; Dutra, I; Costa, VS; Buenabad Chavez, J;

Publication
INDUCTIVE LOGIC PROGRAMMING, ILP 2015

Abstract
Markov Logic is an expressive and widely used knowledge representation formalism that combines logic and probabilities, providing a powerful framework for inference and learning tasks. Most Markov Logic implementations perform inference by transforming the logic representation into a set of weighted propositional formulae that encode a Markov network, the ground Markov network. Probabilistic inference is then performed over the grounded network. Constructing, simplifying, and evaluating the network are the main steps of the inference phase. As the size of a Markov network can grow rather quickly, Markov Logic Network (MLN) inference can become very expensive, motivating a rich vein of research on the optimization of MLN performance. We claim that parallelism can have a large role on this task. Namely, we demonstrate that widely available Graphics Processing Units (GPUs) can be used to improve the performance of a state-of-the-art MLN system, Tuffy, with minimal changes. Indeed, comparing the performance of our GPU-based system, TuGPU, to that of the Alchemy, Tuffy and RockIt systems on three widely used applications shows that TuGPU is up to 15x times faster than the other systems.

CloseRead Abstract

2013

Prolog programming with a map-reduce parallel construct

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming, PPDP 2013

Abstract
Map-Reduce is a programming model that has its roots in early functional programming. In addition to producing short and elegant code for problems involving lists or collections, this model has proven very useful for large-scale highly parallel data processing. In this work, we present the design and implementation of a high-level parallel construct that makes the Map-Reduce programming model available for Prolog programmers. To the best of our knowledge, there is no Map-Reduce framework native to Prolog, and so the aim of this work is to offer data processing features from which several applications can greatly benefit; the Inductive Logic Programming field, for instance, can take advantage of a Map-Reduce predicate when proving newly created rules against sets of examples. Our Map-Reduce model was comprehensively tested with different applications. Our experiments, using the Yap Prolog system, show that: (i) the model scales linearly up to 24 processors; (ii) a dynamic distributed scheduling strategy performs better than centralized or static scheduling strategies; and (iii) the performance varies significantly with the number of items being sent to each processor at a time. Overall, our Map-Reduce framework presents as a good alternative for both taking advantage of the currently available low cost multi-core architectures and developing scalable data processing applications, native to the Prolog programming language. © 2013 ACM.

CloseRead Abstract

2014

A Datalog Engine for GPUs

Authors
Alberto Martinez Angeles, CA; Dutra, I; Costa, VS; Buenabad Chavez, J;

Publication
DECLARATIVE PROGRAMMING AND KNOWLEDGE MANAGEMENT

Abstract
We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a multicore) and memory in the GPU in order to reduce the number of memory transfers. To evaluate the performance of the engine, four Datalog queries were run on the engine and on a single CPU in the multicore host. One query runs up to 200 times faster on the (GPU) engine than on the CPU.

CloseRead Abstract