Publications

Publications by Joana Côrte-Real

2014

A Hybrid MapReduce Model for Prolog

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC)

Abstract
Interest in the MapReduce programming model has been rekindled by Google in the past 10 years; its popularity is mostly due to the convenient abstraction for parallelization details this framework provides. State-of-the-art systems such as Google's, Hadoop or SAGA often provide added features like a distributed file system, fault tolerance mechanisms, data redundancy and portability to the basic MapReduce framework. However, these features pose an additional overhead in terms of system performance. In this work, we present a MapReduce design for Prolog which can potentially take advantage of hybrid parallel environments; this combination allies the easy declarative syntax of logic programming with its suitability to represent and handle multi-relational data due to its first order logic basis. MapReduce for Prolog addresses efficiency issues by performing load balancing on data with different granularity and allowing for parallelization in shared memory, as well as across machines. In an era where multicore processors have become common, taking advantage of a cluster's full capabilities requires the hybrid use of parallelism.

CloseRead Abstract

2013

Prolog programming with a map-reduce parallel construct

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming, PPDP 2013

Abstract
Map-Reduce is a programming model that has its roots in early functional programming. In addition to producing short and elegant code for problems involving lists or collections, this model has proven very useful for large-scale highly parallel data processing. In this work, we present the design and implementation of a high-level parallel construct that makes the Map-Reduce programming model available for Prolog programmers. To the best of our knowledge, there is no Map-Reduce framework native to Prolog, and so the aim of this work is to offer data processing features from which several applications can greatly benefit; the Inductive Logic Programming field, for instance, can take advantage of a Map-Reduce predicate when proving newly created rules against sets of examples. Our Map-Reduce model was comprehensively tested with different applications. Our experiments, using the Yap Prolog system, show that: (i) the model scales linearly up to 24 processors; (ii) a dynamic distributed scheduling strategy performs better than centralized or static scheduling strategies; and (iii) the performance varies significantly with the number of items being sent to each processor at a time. Overall, our Map-Reduce framework presents as a good alternative for both taking advantage of the currently available low cost multi-core architectures and developing scalable data processing applications, native to the Prolog programming language. © 2013 ACM.

CloseRead Abstract

2015

SkILL - a Stochastic Inductive Logic Learner

Authors
Corte Real, J; Mantadelis, T; Dutra, I; Rocha, R; Burnside, E;

Publication
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA)

Abstract
Probabilistic Inductive Logic Programming (PILP) is a relatively unexplored area of Statistical Relational Learning which extends classic Inductive Logic Programming (ILP). Within this scope, we introduce SkILL, a Stochastic Inductive Logic Learner, which takes probabilistic annotated data and produces First Order Logic (FOL) theories. Data in several domains such as medicine and bioinformatics have an inherent degree of uncertainty, and because SkILL can handle this type of data, the models produced for these areas are closer to reality. SkILL can then use probabilistic data to extract non-trivial knowledge from databases, and also address efficiency issues by introducing an efficient search strategy for finding hypotheses in PILP environments. SkILL's capabilities are demonstrated using a real world medical dataset in the breast cancer domain.

CloseRead Abstract

2016

Estimation-Based Search Space Traversal in PILP Environments

Authors
Real, JC; Dutra, I; Rocha, R;

Publication
Inductive Logic Programming - 26th International Conference, ILP 2016, London, UK, September 4-6, 2016, Revised Selected Papers

Abstract
Probabilistic Inductive Logic Programming (PILP) systems extend ILP by allowing the world to be represented using probabilistic facts and rules, and by learning probabilistic theories that can be used to make predictions. However, such systems can be inefficient both due to the large search space inherited from the ILP algorithm and to the probabilistic evaluation needed whenever a new candidate theory is generated. To address the latter issue, this work introduces probability estimators aimed at improving the efficiency of PILP systems. An estimator can avoid the computational cost of probabilistic theory evaluation by providing an estimate of the value of the combination of two subtheories. Experiments are performed on three real-world datasets of different areas (biology, medical and web-based) and show that, by reducing the number of theories to be evaluated, the estimators can significantly shorten the execution time without losing probabilistic accuracy. © Springer International Publishing AG 2017.

CloseRead Abstract

2017

On Applying Probabilistic Logic Programming to Breast Cancer Data

Authors
Real, JC; Dutra, I; Rocha, R;

Publication
Inductive Logic Programming - 27th International Conference, ILP 2017, Orléans, France, September 4-6, 2017, Revised Selected Papers

Abstract
Medical data is particularly interesting as a subject for relational data mining due to the complex interactions which exist between different entities. Furthermore, the ambiguity of medical imaging causes interpretation to be complex and error-prone, and thus particularly amenable to improvement through automated decision support. Probabilistic Inductive Logic Programming (PILP) is a particularly well-suited tool for this task, since it makes it possible to combine the relational nature of this field with the ambiguity inherent in human interpretation of medical imaging. This work presents a PILP setting for breast cancer data, where several clinical and demographic variables were collected retrospectively, and new probabilistic variables and rules reflecting domain knowledge were introduced. A PILP predictive model was built automatically from this data and experiments show that it can not only match the predictions of a team of experts in the area, but also consistently reduce the error rate of malignancy prediction, when compared to other non-relational techniques. © Springer International Publishing AG, part of Springer Nature 2018.

CloseRead Abstract

2018

Improving Candidate Quality of Probabilistic Logic Models

Authors
Real, JC; Dries, A; Dutra, I; Rocha, R;

Publication
Technical Communications of the 34th International Conference on Logic Programming, ICLP 2018, July 14-17, 2018, Oxford, United Kingdom

Abstract
Many real-world phenomena exhibit both relational structure and uncertainty. Probabilistic Inductive Logic Programming (PILP) uses Inductive Logic Programming (ILP) extended with probabilistic facts to produce meaningful and interpretable models for real-world phenomena. This merge between First Order Logic (FOL) theories and uncertainty makes PILP a very adequate tool for knowledge representation and extraction. However, this flexibility is coupled with a problem (inherited from ILP) of exponential search space growth and so, often, only a subset of all possible models is explored due to limited resources. Furthermore, the probabilistic evaluation of FOL theories, coming from the underlying probabilistic logic language and its solver, is also computationally demanding. This work introduces a prediction-based pruning strategy, which can reduce the search space based on the probabilistic evaluation of models, and a safe pruning criterion, which guarantees that the optimal model is not pruned away, as well as two alternative more aggressive criteria that do not provide this guarantee. Experiments performed using three benchmarks from different areas show that prediction pruning is effective in (i) maintaining predictive accuracy for all criteria and experimental settings; (ii) reducing the execution time when using some of the more aggressive criteria, compared to using no pruning; and (iii) selecting better candidate models in limited resource settings, also when compared to using no pruning. © Joana Côrte-Real, Anton Dries, Inês Dutra, and Ricardo Rocha; licensed under Creative Commons License CC-BY

CloseRead Abstract