Cookies Policy
We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out More
Close
  • Menu
About

About

João M. P. Cardoso received his PhD degree in Electrical and Computer Engineering from the IST/UTL (Technical University of Lisbon), Lisbon, Portugal in 2001. He is currently Full Professor at the Department of Informatics Eng., Faculty of Eng. of the University of Porto, Porto, Portugal, and a research member of INESC TEC. Before, he was with the IST/UTL (2006-2008), a senior researcher at INESC-ID (2001-2009), and with the University of Algarve (1993-2006). In 2001/2002, he worked for PACT XPP Technologies, Inc., Munich, Germany. He has been involved in the organization and served as a Program Committee member for many international conferences. For example, he was general Co-Chair of IEEE/IFIP EUC’2015 and IEEE CSE’2015, General Chair of FPL’2013, General Co-Chair of ARC’2014 and ARC’2006, Program Co-Chair of ARCS’2016, DASIP’2014, and RAW’2010. He has (co-)authored over 150 scientific publications on subjects related to compilers, embedded systems, and reconfigurable computing. He has coordinated a number of research projects. He is a senior member of IEEE, a member of IEEE Computer Society, and a senior member of ACM.  His research interests include compilation techniques, domain-specific languages, reconfigurable computing, application-specific architectures, and high-performance computing with a particular emphasis in embedded computing.

Interest
Topics
Details

Details

  • Name

    João Paiva Cardoso
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st July 2011
001
Publications

2018

Aspect composition for multiple target languages using LARA

Authors
Pinto, P; Carvalho, T; Bispo, J; Ramalho, MA; Cardoso, JMP;

Publication
Computer Languages, Systems and Structures

Abstract
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target programming language (e.g., AspectJ for JAVA and AspectC++ for C++). Although providing AOP support with target language extensions may ease the adoption of an approach, it may impose constraints related with constructs and semantics. Furthermore, by tightly coupling the AOP language to the target language the reuse potential of many aspects, especially the ones regarding non-functional requirements, is lost. LARA is a domain-specific language inspired by AOP concepts, having the specification of source-to-source transformations as one of its main goals. LARA has been designed to be, as much as possible, independent of the target language and to provide constructs and semantics that ease the definition of concerns, especially related to non-functional requirements. In this paper, we propose techniques to overcome some of the challenges presented by a multilanguage approach to AOP of cross-cutting concerns focused on non-functional requirements and applied through the use of a weaving process. The techniques mainly focus on providing well-defined library interfaces that can have concrete implementations for each supported target language. The developer uses an agnostic interface and the weaver provides a specific implementation for the target language. We evaluate our approach using 8 concerns with varying levels of language agnosticism that support 4 target languages (C, C++, JAVA and MATLAB) and show that the proposed techniques contribute to more concise LARA aspects, high reuse of aspects, and to significant effort reductions when developing weavers for new imperative, object-oriented programming languages. © 2018 Elsevier Ltd

2018

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Authors
Nobre, R; Reis, L; Bispo, J; Carvalho, T; Cardoso, JMP; Cherubin, S; Agosta, G;

Publication
Proceedings of the 9th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and 7th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM@HiPEAC 2018, Manchester, United Kingdom, January 23-23, 2018

Abstract
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios. © 2018 Copyright held by the owner/author(s).

2018

AutoPar-Clava: An Automatic Parallelization source-to-source tool for C code applications

Authors
Arabnejad, H; Bispo, J; Barbosa, JG; Cardoso, JMP;

Publication
Proceedings of the 9th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and 7th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM@HiPEAC 2018, Manchester, United Kingdom, January 23-23, 2018

Abstract
Automatic parallelization of sequential code has become increasingly relevant in multicore programming. In particular, loop parallelization continues to be a promising optimization technique for scientific applications, and can provide considerable speedups for program execution. Furthermore, if we can verify that there are no true data dependencies between loop iterations, they can be easily parallelized. This paper describes Clava AutoPar, a library for the Clava weaver that performs automatic and symbolic parallelization of C code. The library is composed of two main parts, parallel loop detection and source-to-source code parallelization. The system is entirely automatic and attempts to statically detect parallel loops for a given input program, without any user intervention or profiling information. We obtained a geometric mean speedup of 1.5 for a set of programs from the C version of the NAS benchmark, and experimental results suggest that the performance obtained with Clava AutoPar is comparable or better than other similar research and commercial tools. © 2018 Copyright held by the owner/author(s).

2018

An approach based on a DSL + API for programming runtime adaptivity and autotuning concerns

Authors
Carvalho, T; Cardoso, JMP;

Publication
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018

Abstract
In the context of compiler optimizations, tuning of parameters and selection of algorithms, runtime adaptivity and autotuning are becoming increasingly important, especially due to the complexity of applications, workloads, computing devices and execution environments. For identifying and specifying adaptivity, different phases are required: analysis of program hotspots and adaptivity opportunities, code restructuring, and programming of adaptivity strategies. These phases usually require different tools and modifications to the source code that may result in difficult to maintain and error prone code. This paper presents a flexible approach to support the different phases when developing adaptive applications. The approach is based on a single domain-specific language (DSL), able to specify and evaluate multiple strategies and to maintain a separation of concerns. We describe the requirements and the design of the DSL, an accompanying API, and of a Java-to-Java compiler that implements the approach. In addition, we present and evaluate the use of the approach to specify runtime adaptivity strategies in the context of Java programs, especially when considering runtime autotuning of optimization parameters and runtime selection of algorithms. Although simple, the case studies shown truly demonstrate the main advantages of the approach in terms of the programming model and of the performance impact. © 2018 ACM.

2018

Impact of Vectorization Over 16-bit Data-Types on GPUs

Authors
Reis, L; Nobre, R; Cardoso, JMP;

Publication
Proceedings of the 9th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and 7th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM@HiPEAC 2018, Manchester, United Kingdom, January 23-23, 2018

Abstract
Since the introduction of Single Instruction Multiple Thread (SIMT) GPU architectures, vectorization has seldom been recommended. However, for efficient use of 8-bit and 16-bit data types, vector types are necessary even on these GPUs. When only integer types were natively supported in sizes of less than 32-bits, the usefulness of vectors was limited, but the introduction of hardware support for packed half-precision floating point computations in recent GPU architectures changes this, as now floating-point programs can also benefit from vector types. Given a GPU kernel, using smaller datatypes might not be sufficient to achieve the optimal performance for a given device, even on hardware with native support for halfprecision, because the compiler targeting the GPU may not able to automatically vectorize the code. In this paper, we present a number of examples that make use of the OpenCL vector data-types, which we are currently implementing in our tool for automatic vectorization. We present a number of experiments targeting a graphics card with an AMD Vega 10 XT GPU, which has 2× peak arithmetic throughput using half-precision when compared with single-precision. For comparison, we also target an older GPU architecture, without native support for half-precision arithmetic. We found that, on an AMD Vega 10 XT GPU, half-precision vectorization leads to performance improvements over the scalar version using the same precision (geometric mean speedup of 1.50×), which can be attributed to the GPU being able to make use of native native support for arithmetic over packed half-precision data. However, we found that most of the performance improvement of vectorization is caused by related transformations, such as thread coarsening or loop unrolling. © 2018 Copyright held by the owner/author(s).

Supervised
thesis

2016

RAVEN: a Node.js Static Metadata Extracting Solution for JavaScript Applications

Author
Carlos Maria Antunes Matias

Institution
UP-FEUP

2016

Exploiting JavaScript Birthmarking Techniques for Code Theft Detection

Author
João Carlos Costa Pinto

Institution
UP-FEUP

2016

0

Author
Rosária Maria Afonso Rodrigues de Melo

Institution
UP-FCNA

2016

Multitarget Compilation Techniques for Generating E_cient OpenCL Code from Matrix-oriented Computations

Author
Luis Alexandre Cubal dos Reis

Institution
UP-FEUP

2016

Runtime-aware Compiler Optimizations for High-Performance Embedded Computing

Author
Pedro Miguel dos Santos Pinto

Institution
UP-FEUP