Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About
Download Photo HD

About

João M. P. Cardoso received his PhD degree in Electrical and Computer Engineering from the IST/UTL (Technical University of Lisbon), Lisbon, Portugal in 2001. He is currently Full Professor at the Department of Informatics Eng., Faculty of Eng. of the University of Porto, Porto, Portugal, and a research member of INESC TEC. Before, he was with the IST/UTL (2006-2008), a senior researcher at INESC-ID (2001-2009), and with the University of Algarve (1993-2006). In 2001/2002, he worked for PACT XPP Technologies, Inc., Munich, Germany. He has been involved in the organization and served as a Program Committee member for many international conferences. For example, he was general Co-Chair of IEEE/IFIP EUC’2015 and IEEE CSE’2015, General Chair of FPL’2013, General Co-Chair of ARC’2014 and ARC’2006, Program Co-Chair of ARCS’2016, DASIP’2014, and RAW’2010. He has (co-)authored over 150 scientific publications on subjects related to compilers, embedded systems, and reconfigurable computing. He has coordinated a number of research projects. He is a senior member of IEEE, a member of IEEE Computer Society, and a senior member of ACM.  His research interests include compilation techniques, domain-specific languages, reconfigurable computing, application-specific architectures, and high-performance computing with a particular emphasis in embedded computing.

Interest
Topics
Details

Details

  • Name

    João Paiva Cardoso
  • Cluster

    Computer Science
  • Role

    Senior Researcher
  • Since

    01st July 2011
003
Publications

2020

Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications

Authors
Arabnejad, H; Bispo, J; Cardoso, JMP; Barbosa, JG;

Publication
Journal of Supercomputing

Abstract
Directive-driven programming models, such as OpenMP, are one solution for exploring the potential parallelism when targeting multicore architectures. Although these approaches significantly help developers, code parallelization is still a non-trivial and time-consuming process, requiring parallel programming skills. Thus, many efforts have been made toward automatic parallelization of the existing sequential code. This article presents AutoPar-Clava, an OpenMP-based automatic parallelization compiler which: (1) statically detects parallelizable loops in C applications; (2) classifies variables used inside the target loop based on their access pattern; (3) supports reduction clauses on scalar and array variables whenever it is applicable; and (4) generates a C OpenMP parallel code from the input sequential version. The effectiveness of AutoPar-Clava is evaluated by using the NAS and Polyhedral Benchmark suites and targeting a x86-based computing platform. The achieved results are very promising and compare favorably with closely related auto-parallelization compilers, such as Intel C/C++ Compiler (icc), ROSE, TRACO and CETUS. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.

2020

A Study on Hyperparameter Configuration for Human Activity Recognition

Authors
Crarcia, KD; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF;

Publication
14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) - Seville, Spain, May 13-15, 2019, Proceedings

Abstract
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of information about a person’s life has become available. However, since each person has a unique way of performing physical activities, a Human Activity Recognition system needs to be adapted to the characteristics of a person in order to maintain or improve accuracy. Additionally, when smartphones devices are used to collect data, it is necessary to manage its limited resources, so the system can efficiently work for long periods of time. In this paper, we present a semi-supervised ensemble algorithm and an extensive study of the influence of hyperparameter configuration in classification accuracy. We also investigate how the classification accuracy is affected by the person and the activities performed. Experimental results show that it is possible to maintain classification accuracy by adjusting hyperparameters, like window size and window overlap, depending on the person and activity performed. These results motivate the development of a system able to automatically adapt hyperparameter settings for the activity performed by each person. © 2020, Springer Nature Switzerland AG.

2020

Improving performance and energy consumption in embedded systems via binary acceleration: A survey

Authors
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publication
ACM Computing Surveys

Abstract
The breakdown of Dennard scaling has resulted in a decade-long stall of the maximum operating clock frequencies of processors. To mitigate this issue, computing shifted to multi-core devices. This introduced the need for programming flows and tools that facilitate the expression of workload parallelism at high abstraction levels. However, not all workloads are easily parallelizable, and the minor improvements to processor cores have not significantly increased single-threaded performance. Simultaneously, Instruction Level Parallelism in applications is considerably underexplored. This article reviews notable approaches that focus on exploiting this potential parallelism via automatic generation of specialized hardware from binary code. Although research on this topic spans over more than 20 years, automatic acceleration of software via translation to hardware has gained new importance with the recent trend toward reconfigurable heterogeneous platforms. We characterize this kind of binary acceleration approach and the accelerator architectures on which it relies. We summarize notable state-of-the-art approaches individually and present a taxonomy and comparison. Performance gains from 2.6× to 5.6× are reported, mostly considering bare-metal embedded applications, along with power consumption reductions between 1.3× and 3.9×. We believe the methodologies and results achievable by automatic hardware generation approaches are promising in the context of emergent reconfigurable devices. © 2020 Association for Computing Machinery.

2020

Compilation of MATLAB computations to CPU/GPU via C/OpenCL generation

Authors
Reis, L; Bispo, J; Cardoso, JMP;

Publication
Concurrency Computation

Abstract
In order to take advantage of the processing power of current computing platforms, programmers typically need to develop software versions for different target devices. This task is time-consuming and requires significant programming and computer architecture expertise. A possible and more convenient alternative is to start with a single high-level description of a program with minimum implementation details, and generate custom implementations according to the target platform. In this paper, we use MATLAB as a high-level programming language and propose a compiler that targets CPU/GPU computing platforms by generating customized implementations in C and OpenCL. We propose a number of compiler techniques to automatically generate efficient C and OpenCL code from MATLAB programs. One of such compiler techniques relies on heuristics to decide when and how to use Shared Virtual Memory (SVM). The experimental results show that our approach is able to generate code that provides significant speedups (eg, geometric mean speedup of 11× for a set of simple benchmarks) using a discrete GPU over equivalent sequential C code executing on a CPU. With more complex benchmarks, for which only some code regions can be parallelized, and are thus offloaded, the generated code achieved speedups of up to 2.2×. We also show the impact of using SVM, specifically fine-grained buffers, and the results show that the compiler is able to achieve significant speedups, both over the versions without SVM and with naïve aggressive SVM use, across three CPU/GPU platforms. © 2020 John Wiley & Sons, Ltd.

2020

Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications

Authors
Arabnejad, H; Bispo, J; Cardoso, JMP; Barbosa, JG;

Publication
J. Supercomput.

Abstract

Supervised
thesis

2019

From Binary to Multi-Class Divisions: improvements on Hierarchical Divisive Human Activity Recognition

Author
Tomás Vieira Caldas

Institution
UP-FEUP

2019

Runtime-aware Compiler Optimizations for High-Performance Embedded Computing

Author
Pedro Miguel dos Santos Pinto

Institution
UP-FEUP

2019

Programming and mapping strategies for embedded computing runtime adaptability

Author
Tiago Diogo Ribeiro de Carvalho

Institution
UP-FEUP

2019

Automatic switching between video and audio according to user’s context

Author
Paulo Jorge Silva Ferreira

Institution
UP-FEUP

2019

Acceleration of Applications with FPGA-Based Computing Machines: New DSL

Author
Daniel Alexandre Pimenta Lopes Fernandes

Institution
UP-FEUP