Publications

Publications by João Bispo

2019

The ANTAREX domain specific language for high performance computing

Authors
Silvano, C; Agosta, G; Bartolini, A; Beccari, AR; Benini, L; Besnard, L; Bispo, J; Cmar, R; Cardoso, JMP; Cavazzoni, C; Cesarini, D; Cherubin, S; Ficarelli, F; Gadioli, D; Golasowski, M; Libri, A; Martinovic, J; Palermo, G; Pinto, P; Rohou, E; Slaninová, K; Vitali, E;

Publication
MICROPROCESSORS AND MICROSYSTEMS

Abstract
The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. In this paper, we present an overview of the key outcome of the project, the ANTAREX DSL, and some of its capabilities through a number of examples, including how the DSL is applied in the context of the project use cases.

CloseRead Abstract

2025

Acceleration of C/C plus plus Kernels and ONNX Models on CGRAs with MLIR-Based Compilation

Authors
Gallego, J; Ferreira, J; Alves, L; Vázquez, D; Bispo, J; Rodríguez, A; Paulino, N; Otero, A;

Publication
2025 40TH CONFERENCE ON DESIGN OF CIRCUITS AND INTEGRATED SYSTEMS, DCIS

Abstract
Executing Artificial Intelligence (AI) at the edge is challenging due to tight energy and computational constraints. Heterogeneous platforms, particularly those incorporating Coarse-Grained Reconfigurable Arrays (CGRAs), offer a compelling trade-off between hardware specialization and programmability, supporting spatially distributed and energy-efficient computation. Despite their potential, the deployment of applications on CGRA accelerators remains limited by the lack of practical toolchains and methodologies. In this work, we propose a compilation flow based on MLIR to enable the seamless integration of both C/C++ kernels and ONNX-based AI models into a RISC-V system augmented with a CGRA accelerator. Our approach extracts the underlying Data Flow Graph (DFG) from the high-level representation. It maps it onto the CGRA using an Integer Linear Programming (ILP) mapper that accounts for the accelerator's architectural constraints. A custom backend completes the toolchain by generating the necessary binaries for coordinated execution across the RISC-V processor and the CGRA. This framework enables the practical deployment of heterogeneous edge workloads, combining the flexibility of software execution with the efficiency of hardware acceleration.

CloseRead Abstract

2025

HLS to FPGAs: Extending Software Regions Via Transformations and Offloading Functions to the CPU

Authors
Santos, T; Bispo, J; Cardoso, JMP; Hoe, JC;

Publication
MCSoC

Abstract
On a CPU-FPGA system, C/C++ applications are typically accelerated by offloading specific code regions onto the FPGA using High-level Synthesis (HLS). Although modern FPGAs can implement increasingly large and complex designs, the size and variety of potential offloading code regions remain constrained by the limitations of HLS tools (e.g., no support for dynamic memory allocation and system calls). This paper proposes automated C/C++ source-to-source transformations that tackle these limitations in two steps. Firstly, transformations reduce the entropy of an input C/C++ application by converting it into a subset of C, e.g., by flattening arrays and structs. Secondly, additional transformations make a selected code region synthesizable, e.g., by moving dynamic memory allocations out of the region, converting them to static memory, and offloading non-synthesizable C standard library calls, such as printf(), to the CPU. We evaluate the impact of these transformations showing results obtained through Vitis HLS for four real-world examples: the disparity and texture-synthesis benchmarks from CortexSuite, which contain dynamic memory allocations and indirect pointers in their hotspots; llama2, a Large Language Model that calls printf() every time it predicts a new word; and the spam-filter benchmark from Rosetta, as a debugging showcase. © 2025 IEEE.

CloseRead Abstract