Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2017

Introduction to the Special Section on FPL 2015

Authors
Cardoso, JMP; Silvano, C;

Publication
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS

Abstract

2017

Foreword to the special issue of the 18th IEEE international conference on computational science and engineering (CSE2015)

Authors
Plessl, C; Cong, GJ; Cardoso, JMP;

Publication
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Abstract

2017

Introduction to the special issue on architecture of computing systems

Authors
Hannig, F; Cardoso, JMP; Fey, D;

Publication
JOURNAL OF SYSTEMS ARCHITECTURE

Abstract

2017

Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, ANDARE@PACT 2017, Portland, OR, USA, September 9, 2017

Authors
Bartolini, A; Cardoso, JMP; Silvano, C;

Publication
ANDARE@PACT

Abstract

2017

Message from general and program co-chairs

Authors
Cardoso, JMP; Huebner, M; Agosta, G; Silvano, C;

Publication
ACM International Conference Proceeding Series

Abstract

2017

Impact of Compiler Phase Ordering When Targeting GPUs

Authors
Nobre, R; Reis, L; Cardoso, JMP;

Publication
Euro-Par 2017: Parallel Processing Workshops - Euro-Par 2017 International Workshops, Santiago de Compostela, Spain, August 28-29, 2017, Revised Selected Papers

Abstract
Research in compiler pass phase ordering (i.e., selection of compiler analysis/transformation passes and their order of execution) has been mostly performed in the context of CPUs and, in a small number of cases, FPGAs. In this paper we present experiments regarding compiler pass phase ordering specialization of OpenCL kernels targeting NVIDIA GPUs using Clang/LLVM 3.9 and the libclc OpenCL library. More specifically, we analyze the impact of using specialized compiler phase orders on the performance of 15 PolyBench/GPU OpenCL benchmarks. In addition, we analyze the final NVIDIA PTX assembly code generated by the different compilation flows in order to identify the main reasons for the cases with significant performance improvements. Using specialized compiler phase orders, we were able to achieve performance improvements over the CUDA version and OpenCL compiled with the NVIDIA driver. Compared to CUDA, we were able to achieve geometric mean improvements of 1.54× (up to 5.48×). Compared to the OpenCL driver version, we were able to achieve geometric mean improvements of 1.65× (up to 5.70×). © Springer International Publishing AG, part of Springer Nature 2018.

  • 342
  • 667