Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2014

Exploration of Compiler Optimization Sequences Using Clustering-Based Selection

Authors
Martins, LGA; Nobre, R; Delbem, ACB; Marques, E; Cardoso, JMP;

Publication
ACM SIGPLAN NOTICES

Abstract
Due to the large number of optimizations provided in modern compilers and to compiler optimization specific opportunities, a Design Space Exploration (DSE) is necessary to search for the best sequence of compiler optimizations for a given code fragment (e. g., function). As this exploration is a complex and time consuming task, in this paper we present DSE strategies to select optimization sequences to both improve the performance of each function and reduce the exploration time. The DSE is based on a clustering approach which groups functions with similarities and then explore the reduced search space provided by the optimizations previously suggested for the functions in each group. The identification of similarities between functions uses a data mining method which is applied to a symbolic code representation of the source code. The DSE process uses the reduced set identified by clustering in two ways: as the design space or as the initial configuration. In both ways, the adoption of a pre-selection based on clustering allows the use of simple and fast DSE algorithms. Our experiments for evaluating the effectiveness of the proposed approach address the exploration of compiler optimization sequences considering 49 compilation passes and targeting a Xilinx MicroBlaze processor, and were performed aiming performance improvements for 41 functions. Experimental results reveal that the use of our new clustering-based DSE approach achieved a significant reduction on the total exploration time of the search space (18 x over a Genetic Algorithm approach for DSE) at the same time that important performance speedups (43% over the baseline) were obtained by the optimized codes.

2014

On expressing strategies for directive-driven multicore programing models

Authors
Nobre, R; Pinto, P; Carvalho, T; Cardoso, JMP; Diniz, PC;

Publication
ACM International Conference Proceeding Series

Abstract
A common migration path for applications to high-performance multicore architectures relies on code annotations with concurrent semantics. Some annotations, however, are very target architecture specific and thus highly non-portable. In this paper we describe a source-to-source code transformation system that allows programmers to specify transformations using an aspect-oriented domain specific language - LARA. LARA allows programmers to specify strategies to search large code transformation design spaces while preserving the original source code. As the experimental results reveal, this approach leads to a substantial reduction in code maintenance costs, and promotes the portability of both programmers and performance. Copyright © 2014 ACM.

2014

Specifying Dynamic Adaptations for Embedded Applications Using a DSL

Authors
Santos, AC; Cardoso, JMP; Diniz, PC; Ferreira, DR; Petrov, Z;

Publication
Embedded Systems Letters

Abstract
Embedded systems are severely resource constrained and thus can benefit from adaptations to enhance their functionality in highly dynamic operating conditions. Adaptations, however, often require additional programming effort or complex architectural solutions, resulting in long design cycles, troublesome maintenance, and impractical use for legacy applications. In this letter, we introduce an adaptation logic for the dynamic reconfiguration of embedded applications and its implementation via a domain-specific language. We illustrate the approach in a real-world case study of a navigation application for avionics. © 2014 IEEE.

2014

A Clustering-Based Approach for Exploring Sequences of Compiler Optimizations

Authors
Martins, LGA; Nobre, R; Delbem, ACB; Marques, E; Cardoso, JMP;

Publication
2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC)

Abstract
In this paper we present a clustering-based selection approach for reducing the number of compilation passes used in search space during the exploration of optimizations aiming at increasing the performance of a given function and/or code fragment. The basic idea is to identify similarities among functions and to use the passes previously explored each time a new function is being compiled. This subset of compiler optimizations is then used by a Design Space Exploration (DSE) process. The identification of similarities is obtained by a data mining method which is applied to a symbolic code representation that translates the main structures of the source code to a sequence of symbols based on transformation rules. Experiments were performed for evaluating the effectiveness of the proposed approach. The selection of compiler optimization sequences considering a set. of 49 compilation passes and targeting a Xilinx Nlicrofilaze processor was performed aiming at latency improvements for 41 functions from Texas Instruments benchmarks. The results reveal that the passes selection based on our clustering method achieves a significant gain on execution time over the full search space still achieving important performance speedups.

2014

Trace-Based Reconfigurable Acceleration with Data Cache and External Memory Support

Authors
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publication
2014 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA)

Abstract
This paper presents a binary acceleration approach based on extending a General Purpose Processor (GPP) with a Reconfigurable Processing Unit (RPU), both sharing an external data memory. In this approach repeating sequences of GPP instructions are migrated to the RPU. The RPU resources are selected and organized off-line using execution trace information. The RPU core is composed of Functional Units (FUs) that correspond to single CPU instructions. The FUs are arranged in stages of mutually independent operations. The RPU can enable several stages in tandem, depending on the data dependencies. External data memory accesses are handled by a configurable dual-port cache. A prototype implementation of the architecture on a Spartan-6 FPGA was validated with 12 benchmarks and achieved an overall geometric mean speedup of 1.91x.

2014

Coarse/Fine-grained Approaches for Pipelining Computing Stages in FPGA-Based Multicore Architectures

Authors
Azarian, A; Cardoso, JMP;

Publication
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II

Abstract
In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper presents coarse/fine-grained data flow synchronization approaches to achieve pipelining execution of the producer/consumer tasks in FPGA-based multicore architectures. Our approaches are able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the producer/consumer tasks. The experimental results show the feasibility of the approach when dealing with producer/consumer tasks with out-of-order communication and reveal noticeable performance improvements for a number of benchmarks over a single core implementation and not using task-level pipelining.

  • 460
  • 663