Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering

Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
João Paiva Cardoso

Read Full presentation

João M. P. Cardoso received his PhD degree in Electrical and Computer Engineering from the IST/UTL (Technical University of Lisbon), Lisbon, Portugal in 2001. He is currently Full Professor at the Department of Informatics Eng., Faculty of Eng. of the University of Porto, Porto, Portugal, and a research member of INESC TEC. Before, he was with the IST/UTL (2006-2008), a senior researcher at INESC-ID (2001-2009), and with the University of Algarve (1993-2006). In 2001/2002, he worked for PACT XPP Technologies, Inc., Munich, Germany. He has been involved in the organization and served as a Program Committee member for many international conferences. For example, he was general Co-Chair of IEEE/IFIP EUC’2015 and IEEE CSE’2015, General Chair of FPL’2013, General Co-Chair of ARC’2014 and ARC’2006, Program Co-Chair of ARCS’2016, DASIP’2014, and RAW’2010. He has (co-)authored over 150 scientific publications on subjects related to compilers, embedded systems, and reconfigurable computing. He has coordinated a number of research projects. He is a senior member of IEEE, a member of IEEE Computer Society, and a senior member of ACM. His research interests include compilation techniques, domain-specific languages, reconfigurable computing, application-specific architectures, and high-performance computing with a particular emphasis in embedded computing.

Read Full presentation

About

About

João M. P. Cardoso received his PhD degree in Electrical and Computer Engineering from the IST/UTL (Technical University of Lisbon), Lisbon, Portugal in 2001. He is currently Full Professor at the Department of Informatics Eng., Faculty of Eng. of the University of Porto, Porto, Portugal, and a research member of INESC TEC. Before, he was with the IST/UTL (2006-2008), a senior researcher at INESC-ID (2001-2009), and with the University of Algarve (1993-2006). In 2001/2002, he worked for PACT XPP Technologies, Inc., Munich, Germany. He has been involved in the organization and served as a Program Committee member for many international conferences. For example, he was general Co-Chair of IEEE/IFIP EUC’2015 and IEEE CSE’2015, General Chair of FPL’2013, General Co-Chair of ARC’2014 and ARC’2006, Program Co-Chair of ARCS’2016, DASIP’2014, and RAW’2010. He has (co-)authored over 150 scientific publications on subjects related to compilers, embedded systems, and reconfigurable computing. He has coordinated a number of research projects. He is a senior member of IEEE, a member of IEEE Computer Society, and a senior member of ACM. His research interests include compilation techniques, domain-specific languages, reconfigurable computing, application-specific architectures, and high-performance computing with a particular emphasis in embedded computing.

Interest
Topics

Details

Details

Name
João Paiva Cardoso
Role
Senior Researcher
Since
01st July 2011

Nationality
Portugal
Centre
Human-Centered Computing and Information Science
Contacts
+351222094199
joao.paiva.cardoso@inesctec.pt

003

Publications

View all Publications

2025

On improving the HLS compatibility of large C/C plus plus code regions

Authors
Santos, T; Bispo, J; Cardoso, JMP; Hoe, JC;

Publication
2025 IEEE 33RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM

Abstract
Heterogeneous CPU-FPGA C/C++ applications may rely on High-level Synthesis (HLS) tools to generate hardware for critical code regions. As typical HLS tools have several restrictions in terms of supported language features, to increase the size and variety of offloaded regions, we propose several code transformations to improve synthesizability. Such code transformations include: struct and array flattening; moving dynamic memory allocations out of a region; transforming dynamic memory allocations into static; and asynchronously executing host functions, e.g., printf(). We evaluate the impact of these transformations on code region size using three realworld applications whose critical regions are limited by nonsynthesizable C/C++ language features.

CloseRead Abstract

2025

Ph.D. Project: Holistic Partitioning and Optimization of CPU-FPGA Applications Through Source-to-Source Compilation

Authors
Santos, T; Bispo, J; Cardoso, JMP;

Publication
2025 IEEE 33RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM

Abstract
Critical performance regions of software applications are often accelerated by offloading them onto an FPGA. An efficient end result requires the judicious application of two processes: hardware/software (hw/sw) partitioning, which identifies the regions for offloading, and the optimization of those regions for efficient High-level Synthesis (HLS). Both processes are commonly applied separately, not relying on any potential interplay between them, and not revealing how the decisions made in one process could positively influence the other. This paper describes our primary efforts and contributions made so far, and our work-in-progress, in an approach that combines both hw/sw partitioning and optimization into a unified, holistic process, automated using source-to-source compilation. By using an Extended Task Graph (ETG) representation of a C/C++ application, and expanding the synthesizable code regions, our approach aims at creating clusters of tasks for offloading by a) maximizing the potential optimizations applied to the cluster, b) minimizing the global communication cost, and c) grouping tasks that share data in the same cluster.

CloseRead Abstract

2025

First Twenty Years of the International Symposium on Applied Reconfigurable Computing (ARC): A Selection of Papers

Authors
Cardoso, JMP; Najjar, WA;

Publication
ARC

Abstract
The International Symposium on Applied Reconfigurable Computing (ARC) is an annual forum for the discussion and dissemination of research, notably applying the Reconfigurable Computing (RC) concept to real-world problems. The first edition of ARC took place in 2005, and in 2024, ARC celebrated its 20th edition. During those 20 years, the field of reconfigurable computing saw a tremendous growth in its underlying technology. ARC contributed very significantly to the presentation and dissemination of new ideas, innovative applications, and fruitful discussions, all of which have resulted in the shaping of novel lines of research. Here, we present selected papers from the first 20 years of ARC, that we believe represent the corpus of work and reflect the ARC spirit by covering a broad spectrum of RC applications, benchmarks, tools, and architectures.

CloseRead Abstract

2025

HLS to FPGAs: Extending Software Regions Via Transformations and Offloading Functions to the CPU

Authors
Santos, T; Bispo, J; Cardoso, JMP; Hoe, JC;

Publication
MCSoC

Abstract
On a CPU-FPGA system, C/C++ applications are typically accelerated by offloading specific code regions onto the FPGA using High-level Synthesis (HLS). Although modern FPGAs can implement increasingly large and complex designs, the size and variety of potential offloading code regions remain constrained by the limitations of HLS tools (e.g., no support for dynamic memory allocation and system calls). This paper proposes automated C/C++ source-to-source transformations that tackle these limitations in two steps. Firstly, transformations reduce the entropy of an input C/C++ application by converting it into a subset of C, e.g., by flattening arrays and structs. Secondly, additional transformations make a selected code region synthesizable, e.g., by moving dynamic memory allocations out of the region, converting them to static memory, and offloading non-synthesizable C standard library calls, such as printf(), to the CPU. We evaluate the impact of these transformations showing results obtained through Vitis HLS for four real-world examples: the disparity and texture-synthesis benchmarks from CortexSuite, which contain dynamic memory allocations and indirect pointers in their hotspots; llama2, a Large Language Model that calls printf() every time it predicts a new word; and the spam-filter benchmark from Rosetta, as a debugging showcase. © 2025 IEEE.

CloseRead Abstract

2024

A Fast and Energy-Efficient Method for Online and Incremental Pareto-Front Update

Authors
Ferreira, PJS; Moreira, JM; Cardoso, JMP;

Publication
WF-IoT

Abstract
Self-adaptive Systems (SaS) are becoming increasingly important for adapting to dynamic environments and for optimizing performance on resource-constrained devices. A practical approach to achieving self-adaptability involves using a Pareto-Front (PF) to store the system's hyper-parameters and the outcomes of hyperparameter combinations. This paper proposes a novel method to approximate a PF, offering a configurable number of solutions that can be adapted to the device's limitations. We conducted extensive experiments across various scenarios, where all PF solutions were replaced, and real world scenarios were performed using actual measurements from a Human Activity Recognition (HAR) system. Our results show that our method consistently outperforms previous methods, mainly when the maximum number of PF solutions is in the order of hundreds. The effectiveness of our method is most apparent in real-case scenarios where it achieves, when executed in a Raspberry Pi 5, up to 87% energy consumption reduction and lower execution times than the second-best algorithm. Additionally, our method ensures a more evenly distributed solution across the PF, preventing the high concentration of solutions.

CloseRead Abstract