Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por HumanISE

2017

Introduction to the special issue on architecture of computing systems

Autores
Hannig, F; Cardoso, JMP; Fey, D;

Publicação
JOURNAL OF SYSTEMS ARCHITECTURE

Abstract

2017

Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, ANDARE@PACT 2017, Portland, OR, USA, September 9, 2017

Autores
Bartolini, A; Cardoso, JMP; Silvano, C;

Publicação
ANDARE@PACT

Abstract

2017

Message from general and program co-chairs

Autores
Cardoso, JMP; Huebner, M; Agosta, G; Silvano, C;

Publicação
ACM International Conference Proceeding Series

Abstract

2017

Impact of Compiler Phase Ordering When Targeting GPUs

Autores
Nobre, R; Reis, L; Cardoso, JMP;

Publicação
Euro-Par Workshops

Abstract
Research in compiler pass phase ordering (i.e., selection of compiler analysis/transformation passes and their order of execution) has been mostly performed in the context of CPUs and, in a small number of cases, FPGAs. In this paper we present experiments regarding compiler pass phase ordering specialization of OpenCL kernels targeting NVIDIA GPUs using Clang/LLVM 3.9 and the libclc OpenCL library. More specifically, we analyze the impact of using specialized compiler phase orders on the performance of 15 PolyBench/GPU OpenCL benchmarks. In addition, we analyze the final NVIDIA PTX assembly code generated by the different compilation flows in order to identify the main reasons for the cases with significant performance improvements. Using specialized compiler phase orders, we were able to achieve performance improvements over the CUDA version and OpenCL compiled with the NVIDIA driver. Compared to CUDA, we were able to achieve geometric mean improvements of 1.54× (up to 5.48×). Compared to the OpenCL driver version, we were able to achieve geometric mean improvements of 1.65× (up to 5.70×).

2017

On Coding Techniques for Targeting FPGAs via OpenCL

Autores
Paulino, N; Reis, L; Cardoso, JMP;

Publicação
PARCO

Abstract
Software developers have always found it difficult to adopt Field-Programmable Gate Arrays (FPGAs) as computing platforms. Recent advances in HLS tools aim to ease the mapping of computations to FPGAs by abstracting the hardware design effort via a standard OpenCL interface and execution model. However, OpenCL is a low-level programming language and requires that developers master the target architecture in order to achieve efficient results. Thus, efforts addressing the generation of OpenCL from high-level languages are of paramount importance to increase design productivity and to help software developers. Existing approaches bridge this by translating MATLAB/Octave code into C, or similar languages, in order to improve performance by efficiently compiling for the target hardware. One example is the MATISSE source-to-source compiler, which translates MATLAB code into standard-compliant C and/or OpenCL code. In this paper, we analyse the viability of combining both flows so that sections of MATLAB code can be translated to specialized hardware with a small amount of effort, and test a few code optimizations and their effect on performance. We present preliminary results relative to execution times, and resource and power consumption, for two OpenCL kernels generated by MATISSE, and manual optimizations of each kernel based on different coding techniques.

2017

The ANTAREX tool flow for monitoring and autotuning energy efficient HPC systems

Autores
Silvano, C; Agosta, G; Barbosa, JG; Bartolini, A; Beccari, AR; Benini, L; Bispo, J; Cardoso, JMP; Cavazzoni, C; Cherubin, S; Cmar, R; Gadioli, D; Manelfi, C; Martinovic, J; Nobre, R; Palermo, G; Palkovic, M; Pinto, P; Rohou, E; Sanna, N; Slaninová, K;

Publicação
SAMOS

Abstract
Designing and optimizing HPC applications are difficult and complex tasks, which require mastering specialized languages and tools for performance tuning. As this is incompatible with the current trend to open HPC infrastructures to a wider range of users, the availability of more sophisticated programming languages and tools to assist and automate the design stages is crucial to provide smoothly migration paths towards novel heterogeneous HPC platforms. The ANTAREX project intends to address these issues by providing a tool flow, a Domain Specific Launguage and APIs to provide application's adaptivity and to runtime manage and autotune applications for heterogeneous HPC systems. Our DSL provides a separation of concerns, where analysis, runtime adaptivity, performance tuning and energy strategies are specified separately from the application functionalities with the goal to increase productivity, significantly reduce time to solution, while making possible the deployment of substantially improved implementations. This paper presents the ANTAREX tool flow and shows the impact of optimization strategies in the context of one of the ANTAREX use cases related to personalized drug design. We show how simple strategies, not devised by typical compilers, can substantially speedup the execution and reduce energy consumption.

  • 377
  • 694