Publications

Publications by HumanISE

2014

On expressing strategies for directive-driven multicore programing models

Authors
Nobre, R; Pinto, P; Carvalho, T; Cardoso, JMP; Diniz, PC;

Publication
ACM International Conference Proceeding Series

Abstract
A common migration path for applications to high-performance multicore architectures relies on code annotations with concurrent semantics. Some annotations, however, are very target architecture specific and thus highly non-portable. In this paper we describe a source-to-source code transformation system that allows programmers to specify transformations using an aspect-oriented domain specific language - LARA. LARA allows programmers to specify strategies to search large code transformation design spaces while preserving the original source code. As the experimental results reveal, this approach leads to a substantial reduction in code maintenance costs, and promotes the portability of both programmers and performance. Copyright © 2014 ACM.

CloseRead Abstract

2014

Specifying Dynamic Adaptations for Embedded Applications Using a DSL

Authors
Santos, AC; Cardoso, JMP; Diniz, PC; Ferreira, DR; Petrov, Z;

Publication
Embedded Systems Letters

Abstract
Embedded systems are severely resource constrained and thus can benefit from adaptations to enhance their functionality in highly dynamic operating conditions. Adaptations, however, often require additional programming effort or complex architectural solutions, resulting in long design cycles, troublesome maintenance, and impractical use for legacy applications. In this letter, we introduce an adaptation logic for the dynamic reconfiguration of embedded applications and its implementation via a domain-specific language. We illustrate the approach in a real-world case study of a navigation application for avionics. © 2014 IEEE.

CloseRead Abstract

2014

A Clustering-Based Approach for Exploring Sequences of Compiler Optimizations

Authors
Martins, LGA; Nobre, R; Delbem, ACB; Marques, E; Cardoso, JMP;

Publication
2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC)

Abstract
In this paper we present a clustering-based selection approach for reducing the number of compilation passes used in search space during the exploration of optimizations aiming at increasing the performance of a given function and/or code fragment. The basic idea is to identify similarities among functions and to use the passes previously explored each time a new function is being compiled. This subset of compiler optimizations is then used by a Design Space Exploration (DSE) process. The identification of similarities is obtained by a data mining method which is applied to a symbolic code representation that translates the main structures of the source code to a sequence of symbols based on transformation rules. Experiments were performed for evaluating the effectiveness of the proposed approach. The selection of compiler optimization sequences considering a set. of 49 compilation passes and targeting a Xilinx Nlicrofilaze processor was performed aiming at latency improvements for 41 functions from Texas Instruments benchmarks. The results reveal that the passes selection based on our clustering method achieves a significant gain on execution time over the full search space still achieving important performance speedups.

CloseRead Abstract

2014

Trace-Based Reconfigurable Acceleration with Data Cache and External Memory Support

Authors
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publication
2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA)

Abstract
This paper presents a binary acceleration approach based on extending a General Purpose Processor (GPP) with a Reconfigurable Processing Unit (RPU), both sharing an external data memory. In this approach repeating sequences of GPP instructions are migrated to the RPU. The RPU resources are selected and organized off-line using execution trace information. The RPU core is composed of Functional Units (FUs) that correspond to single CPU instructions. The FUs are arranged in stages of mutually independent operations. The RPU can enable several stages in tandem, depending on the data dependencies. External data memory accesses are handled by a configurable dual-port cache. A prototype implementation of the architecture on a Spartan-6 FPGA was validated with 12 benchmarks and achieved an overall geometric mean speedup of 1.91x.

CloseRead Abstract

2014

Coarse/Fine-grained Approaches for Pipelining Computing Stages in FPGA-Based Multicore Architectures

Authors
Azarian, A; Cardoso, JMP;

Publication
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II

Abstract
In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper presents coarse/fine-grained data flow synchronization approaches to achieve pipelining execution of the producer/consumer tasks in FPGA-based multicore architectures. Our approaches are able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the producer/consumer tasks. The experimental results show the feasibility of the approach when dealing with producer/consumer tasks with out-of-order communication and reveal noticeable performance improvements for a number of benchmarks over a single core implementation and not using task-level pipelining.

CloseRead Abstract

2014

Multi-target c code generation from MATLAB

Authors
Bispo, J; Reis, L; Cardoso, JMP;

Publication
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Abstract
This paper describes our recent work on MATISSE, a framework for MATLAB to C compilation. We focus on the new optimizations and transformations, as well as on OpenCL generation. MATISSE is controlled with LARA, an aspect-oriented language, able to specify transformations to the input MATLAB code (e.g., insertion of code for variable initialization and for monitoring) and to express information concerning types and shapes of variables. We evaluate the compiler with a set of benchmarks when targeting both an embedded system and a desktop system. The results show that we were able to achieve a speedup up to 1.8× by employing information provided by LARA aspects. We also compare the execution time of the generated C code with the original code running on MATLAB, and we achieve a geometric mean speedup of 19×. The geometric mean speedup reduces to 12× when optimizing the MATLAB code with LARA aspects. Finally, we present a preliminary version of a fully-functioning pragma-based OpenCL generator, built over the MATISSE framework..

CloseRead Abstract