Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I received the Ph.D. degree in Electrical and  Computer Engineering from the University of Porto (Portugal) in 2001. I 'm currently an assistant professor with the Faculty of Engineering, University of Porto, and a senior researcher at INESC TEC. I'm a member of IEEE, ACM and Euromicro.


My research interests center around the design of dedicated digital systems for complex and demanding embedded applications. I'm particularly interested in three areas:

  • Design of self-adaptive digital systems
  • FPGA-based reconfigurable computing
  • Hardware acceleration of embedded systems
  • Some concrete research topics are:

    • dynamic reconfiguration of FPGAs
    • generation of FPGA configurations at run-time
    • fast physical synthesis for digital circuits
    • virtual programmable hardware architectures
    • transparent task migration from software→hardware

    Interest
    Topics
    Details

    Details

    • Name

      João Canas Ferreira
    • Role

      Senior Researcher
    • Since

      01st November 1988
    007
    Publications

    2022

    A Flexible HLS Hoeffding Tree Implementation for Runtime Learning on FPGA

    Authors
    Sousa, LM; Paulino, N; Ferreira, JC; Bispo, J;

    Publication
    2022 IEEE 21ST MEDITERRANEAN ELECTROTECHNICAL CONFERENCE (IEEE MELECON 2022)

    Abstract
    Decision trees are often preferred when implementing Machine Learning in embedded systems for their simplicity and scalability. Hoeffding Trees are a type of Decision Trees that take advantage of the Hoeffding Bound to allow them to learn patterns in data without having to continuously store the data samples for future reprocessing. This makes them especially suitable for deployment on embedded devices. In this work we highlight the features of a HLS implementation of the Hoeffding Tree. The implementation parameters include the feature size of the samples (D), the number of output classes (K), and the maximum number of nodes to which the tree is allowed to grow (Nd). We target a Xilinx MPSoC ZCU102, and evaluate: the design's resource requirements and clock frequency for different numbers of classes and feature size, the execution time on several synthetic datasets of varying sizes (N) and the execution time and accuracy for two datasets from UCI. For a problem size of D=3, K=5, and N=40000, a single decision tree operating at 103MHz is capable of 8.3x faster inference than the 1.2 GHz ARM Cortex-A53 core. Compared to a reference implementation of the Hoeffding tree, we achieve comparable classification accuracy for the UCI datasets.

    2021

    Transparent Control Flow Transfer between CPU and Accelerators for HPC

    Authors
    Granhao, D; Ferreira, JC;

    Publication
    ELECTRONICS

    Abstract
    Heterogeneous platforms with FPGAs have started to be employed in the High-Performance Computing (HPC) field to improve performance and overall efficiency. These platforms allow the use of specialized hardware to accelerate software applications, but require the software to be adapted in what can be a prolonged and complex process. The main goal of this work is to describe and evaluate mechanisms that can transparently transfer the control flow between CPU and FPGA within the scope of HPC. Combining such a mechanism with transparent software profiling and accelerator configuration could lead to an automatic way of accelerating regular applications. In this work, a mechanism based on the ptrace system call is proposed, and its performance on the Intel Xeon+FPGA platform is evaluated. The feasibility of the proposed approach is demonstrated by a working prototype that performs the transparent control flow transfer of any function call to a matching hardware accelerator. This approach is more general than shared library interposition at the cost of a small time overhead in each accelerator use (about 1.3 ms in the prototype implementation).

    2021

    A Binary Translation Framework for Automated Hardware Generation

    Authors
    Paulino, N; Bispo, J; Ferreira, JC; Cardoso, JMP;

    Publication
    IEEE MICRO

    Abstract
    As applications move to the edge, efficiency in computing power and power/energy consumption is required. Heterogeneous computing promises to meet these requirements through application-specific hardware accelerators. Runtime adaptivity might be of paramount importance to realize the potential of hardware specialization, but further study is required on workload retargeting and offloading to reconfigurable hardware. This article presents our framework for the exploration of both offloading and hardware generation techniques. The framework is currently able to process instruction sequences from MicroBlaze, ARMv8, and riscv32imaf binaries, and to represent them as Control and Dataflow Graphs for transformation to implementations of hardware modules. We illustrate the framework's capabilities for identifying binary sequences for hardware translation with a set of 13 benchmarks.

    2021

    Pedagogical Innovation in Pandemic Times: The Experience of a Microprocessor Programming Course

    Authors
    Lima, B; Granhao, D; Araujo, AJ; Ferreira, JC;

    Publication
    2021 4TH INTERNATIONAL CONFERENCE OF THE PORTUGUESE SOCIETY FOR ENGINEERING EDUCATION (CISPEE)

    Abstract
    The 2019/2020 school year will always be remembered for the impact of the COVID-19 pandemic. For the first time in recent history, countries closed schools and forced instructors and students to quickly adjust to online classes. This sudden and forced shift to a method of teaching that was completely different from what we were used to presented several challenges and opportunities on a pedagogical level. In this paper we describe our experience as instructors in a course on microprocessor programming in the Master's Degree in Computer Science and Computing Engineering at the Faculty of Engineering of the University of Porto. Our approach included changes to the assessment plan, which became more distributed, and improvements in communication between students and instructors through the use of Slack. We found that the changes introduced were not only very well received by students, but also resulted in the best exam attendance and average final grade in the last 10 years of the course's history.

    2021

    On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators

    Authors
    Santos, T; Paulino, N; Bispo, J; Cardoso, JMP; Ferreira, JC;

    Publication
    2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)

    Abstract
    By using Dynamic Binary Translation, instruction traces from pre-compiled applications can be offloaded, at runtime, to FPGA-based accelerators, such as Coarse-Grained Loop Accelerators, in a transparent way. However, scheduling onto coarse-grain accelerators is challenging, with two of current known issues being the density of computations that can be mapped, and the effects of memory accesses on performance. Using an in-house framework for analysis of instruction traces, we explore the effect of different window sizes when applying list scheduling, to map the window operations to a coarse-grain loop accelerator model that has been previously experimentally validated. For all window sizes, we vary the number of ALUs and memory ports available in the model, and comment how these parameters affect the resulting latency. For a set of benchmarks taken from the PolyBench suite, compiled for the 32-bit MicroBlaze softcore, we have achieved an average iteration speedup of 5.10x for a basic block repeated 5 times and scheduled with 8 ALUs and memory ports, and an average speedup of 5.46x when not considering resource constraints. We also identify which benchmarks contribute to the difference between these two speedups, and breakdown their limiting factors. Finally, we reflect on the impact memory dependencies have on scheduling.

    Supervised
    thesis

    2022

    High Power Efficiency, Wideband Microwave Power Amplifier Design Using Low-Cost Packaging and Integration Techniques for Emerging Transmitter Systems

    Author
    Hassan Safdary

    Institution
    UP-FEUP

    2022

    BotsBFUOD: Web Bot Detection using Biometric Features and Unsupervised Outlier Detection

    Author
    Pedro Maria Passos Ribeiro do Carmo Pereira

    Institution
    UP-FEUP

    2022

    Regulatory Design with Storage Systems: Assessment of Operational, Market and Economic Features

    Author
    Igor Roberto Rezende e Castro de Abreu

    Institution
    UP-FEUP

    2022

    An HAROS extension for Variability Aware ROS Code Analysis

    Author
    Ricardo Ribeiro Pereira

    Institution
    UP-FCUP

    2021

    Runtime Management of Heterogeneous Compute Resources in Embedded Systems

    Author
    Luís Miguel Mendes Pimentel Alves de Sousa

    Institution
    UP-FEUP