Publicacoes - INESC TEC

Publicações

Publicações por HumanISE

2019

Fast Heuristic-Based GPU Compiler Sequence Specialization

Autores
Nobre, R; Reis, L; Cardoso, JMP;

Publicação
EURO-PAR 2018: PARALLEL PROCESSING WORKSHOPS

Abstract
Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU, mitigates this issue by first using a training phase on reference codes to produce a small set of high-quality reusable phase orders. This approach then uses these phase orders to compile new codes, without any code analysis. In this paper, we evaluate the viability of using this approach to optimize the GPU execution performance of OpenCL kernels. In addition, we propose and evaluate the use of a heuristic to further reduce the number of evaluated phase orders, by comparing the speedups of the resulting binaries with those of the training phase for each phase order. This information is used to predict which untested phase order is most likely to produce good results (e.g., highest speedup). We performed our measurements using the PolyBench/GPU OpenCL benchmark suite on an NVIDIA Pascal GPU. Without heuristics, we can achieve a geomean execution speedup of 1.64x, using cross-validation, with 5 non-standard phase orders. With the heuristic, we can achieve the same speedup with only 3 non-standard phase orders. This is close to the geomean speedup achieved in our iterative compilation experiments exploring thousands of phase orders. Given the significant reduction in exploration time and other advantages of this approach, we believe that it is suitable for a wide range of compiler users concerned with performance.

FecharLer Abstract

2019

Supporting the Scale-up of High Performance Application to Pre-Exascale Systems: The ANTAREX Approach

Autores
Silvano, C; Agosta, G; Bartolini, A; Beccari, AR; Benini, L; Besnard, L; Bispo, J; Cmar, R; Cardoso, JMP; Cavazzoni, C; Cesarini, D; Cherubin, S; Ficarelli, F; Gadioli, D; Golasowski, M; Lasri, I; Libri, A; Manelfi, C; Martinovic, J; Palermo, G; Pinto, P; Rohou, E; Sanna, N; Slaninová, K; Vitali, E;

Publicação
2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP)

Abstract
The ANTAREX project developed an approach to the performance tuning of High Performance applications based on an Aspect-oriented Domain Specific Language (DSL), with the goal to simplify the enforcement of extra-functional properties in large scale applications. The project aims at demonstrating its tools and techniques on two relevant use cases, one in the domain of computational drug discovery, the other in the domain of online vehicle navigation. In this paper, we present an overview of the project and of its main achievements, as well as of the large scale experiments that have been planned to validate the approach.

FecharLer Abstract

2019

Graph-Based Code Restructuring Targeting HLS for FPGAs

Autores
Ferreira, AC; Cardoso, JMP;

Publicação
ARC

Abstract
High-level synthesis (HLS) is of paramount importance to enable software developers to map critical computations to FPGA-based hardware accelerators. However, in order to generate efficient hardware accelerators one needs to apply significant code transformations and adequately use the directive-driven approach, part of most HLS tools. The code restructuring and directives needed are dependent not only of the characteristics of the input code but also of the HLS tools and target FPGAs. These aspects require a deep knowledge about the subjects involved and tend to exclude software developers. This paper presents our recent approach for automatic code restructuring targeting HLS tools. Our approach uses an unfolded graph representation, which can be generated from program execution traces, and graph-based optimizations, such as folding, to generate suitable HLS C code. In this paper, we describe the approach and the new optimizations proposed. We evaluate the approach with a number of representative kernels and the results show its capability to generating efficient hardware implementations only achievable using manual restructuring of the input software code and manual insertion of adequate HLS directives.

FecharLer Abstract

2019

Nonio - modular automatic compiler phase selection and ordering specialization framework for modern compilers

Autores
Nobre, R; Bispo, J; Carvalho, T; Cardoso, JMP;

Publicação
SOFTWAREX

Abstract
This article presents Nonio, a modular, easy-to-use, design space exploration framework focused on exploring custom combinations of compiler flags and compiler sequences. We describe the framework and discuss its use with two of the most popular compiler toolchains, GCC and Clang+LLVM. Particularly, we discuss implementation details in the context of flag selection, when using GCC, and phase selection and ordering, when using Clang+LLVM. The framework software organization allows to easily add new components as plug-ins (e.g., an exploration algorithm, an objective metric, integration with another compiler toolchain). The software architecture provides well-defined interfaces, in order to enable seamless composition and interaction between different components. We present, as an example, a use case where we rely on Nonio to obtain custom compiler flags for reducing the execution time and the energy consumption of a C program, in relation to the best predetermined optimization settings provided by the compiler (e.g., -O3). (C) 2019 The Authors. Published by Elsevier B.V.

FecharLer Abstract

2019

Message from the symposium general chair and program chairs

Autores
Shibata, Y; Cardoso, JMP; Takamaeda Yamazaki, S;

Publicação
ACM International Conference Proceeding Series

Abstract

2019

A framework for automatic and parameterizable memoization

Autores
Besnard, L; Pinto, P; Lasri, I; Bispo, J; Rohou, E; Cardoso, JMP;

Publicação
SOFTWAREX

Abstract
Improving execution time and energy efficiency is needed for many applications and usually requires sophisticated code transformations and compiler optimizations. One of the optimization techniques is memoization, which saves the results of computations so that future computations with the same inputs can be avoided. In this article we present a framework that automatically applies memoization techniques to C/C++ applications. The framework is based on automatic code transformations using a source-to-source compiler and on a memoization library. With the framework users can select functions to memoize as long as they obey to certain restrictions imposed by our current memoization library. We show the use of the framework and associated memoization technique and the impact on reducing the execution time and energy consumption of four representative benchmarks. (C) 2019 The Authors. Published by Elsevier B.V.

FecharLer Abstract