Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2016

Towards a Multi-softcore FPGA Approach for the HOG Algorithm

Authors
Mascagni de Holanda, JAM; Paiva Cardoso, JMP; Marques, E;

Publication
2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN)

Abstract
Object detection in images is a computing demanding task which usually needs to deal with the detection of different classes of objects, and thus requiring variations and adaptations easily provided by software solutions. Object detection algorithms are being part of real-time smarter embedded systems, such as automotive, medical, robotics and security systems. In most embedded systems, efficient implementations of object oriented algorithms need to provide high performance, low power consumption, and programmability to allow greater development flexibility. The Histogram of Oriented Gradients (HOG) is one of the most widely used algorithms for object detection in images. In this paper, we show our work towards mapping the HOG algorithm to an FPGA-based system consisting of multiple Nios II softcore processors and bearing in mind high-performance and programmability issues. We show how to reduce 19x the algorithms execution time by source to source transformations and specially avoiding redundant processing. Furthermore, we show how the use of pipelining processing using three Nios II processors allows a speedup of 49x compared to the embedded baseline application.

2016

The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems

Authors
Silvano, C; Agosta, G; Cherubin, S; Gadioli, D; Palermo, G; Bartolini, A; Benini, L; Martinovic, J; Palkovic, M; Slaninová, K; Bispo, J; Cardoso, JMP; Abreu, R; Pinto, P; Cavazzoni, C; Sanna, N; Beccari, AR; Cmar, R; Rohou, E;

Publication
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, Como, Italy, May 16-19, 2016

Abstract
The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-App extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL. © 2016 Copyright held by the owner/author(s).

2016

Pipelining data-dependent tasks in FPGA-based multicore architectures

Authors
Azarian, A; Cardoso, JMP;

Publication
MICROPROCESSORS AND MICROSYSTEMS

Abstract
In recent years, there has been increasing interest in using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper proposes fine- and coarse-grained data synchronization approaches to achieve pipelining execution of producer/consumer tasks in FPGA-based multicore architectures. Our approaches are able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated with the producer/consumer tasks. We propose techniques to reduce the number of accesses to external memory in our fine-grained data synchronization approach. The experimental results show the feasibility of the approach in both in-order and out-of-order producer/consumer tasks. Moreover, the results using our approach reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining.

2016

A Pipelined Multi-softcore Approach for the HOG Algorithm

Authors
Mascagni de Holanda, JAM; Paiva Cardoso, JMP; Marques, E;

Publication
PROCEEDINGS OF THE 2016 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL & IMAGE PROCESSING

Abstract
This paper describes the mapping and the acceleration of an object detection algorithm on a multiprocessor system based on an FPGA. We use HOG ( Histogram of Oriented Gradients), one of the most popular algorithms for detection of different classes of objects and currently being used in smart embedded systems. The use of HOG on such systems requires efficient implementations in order to provide high performance possibly with low energy/power consumption budgets. Also, as variations and adaptations of this algorithm are needed to deal with different scenarios and classes of objects, programmability is required to allow greater development flexibility. In this paper we show our approach towards implementing the HOG algorithm into a multi-softcore Nios II based-system, bearing in mind high-performance and programmability issues. By applying sourceto-source transformations we obtain speedups of 19x and by using pipelined processing we reduce the algorithms execution time 49x. We also show that improving the hardware with acceleration units can result in speedups of 72.4x compared to the embedded baseline application.

2016

High-Level Synthesis

Authors
Cardoso, JMP; Weinhardt, M;

Publication
FPGAs for Software Programmers

Abstract
The compilation of high-level languages, such as software programming languages, to FPGAs is of paramount importance for the mainstream adoption of FPGAs. An efficient compilation process will improve designer productivity and will make the use of FPGA technology viable for software programmers. When targeting the hardware resources provided by FPGAs, a compilation process usually requires a stage known as High-Level Synthesis (HLS) which is responsible for generating application specific hardware architectures from the input source code or from an intermediate representation of the input application. This chapter briefly describes HLS and its main processing stages. The chapter provides the indispensable knowledge for readers who want to follow the remaining chapters of this book. © Springer International Publishing Switzerland 2016.

2016

Architecture of computing systems – ARCS 2016: 29th international conference Nuremberg, Germany, April 4-7, 2016 Proceedings

Authors
Hannig, F; Cardoso, JMP; Pionteck, T; Fey, D; Preikschat, WS; Teich, J;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

  • 381
  • 662