Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por HumanISE

2020

Exploration of FPGA-Based Hardware Designs for QR Decomposition for Solving Stiff ODE Numerical Methods Using the HARP Hybrid Architecture

Autores
de Souza, CAO; Bispo, J; Cardoso, JMP; Diniz, PC; Marques, E;

Publicação
ELECTRONICS

Abstract
In this article, we focus on the acceleration of a chemical reaction simulation that relies on a system of stiff ordinary differential equation (ODEs) targeting heterogeneous computing systems with CPUs and field-programmable gate arrays (FPGAs). Specifically, we target an essential kernel of the coupled chemistry aerosol-tracer transport model to the Brazilian developments on the regional atmospheric modeling system (CCATT-BRAMS). We focus on a linear solve step using the QR factorization based on the modified Gram-Schmidt method as the basis of the ODE solver in this application. We target Intel hardware accelerator research program (HARP) architecture with the OpenCL programming environment for these early experiments. Our design exploration reveals a hardware design that is up to 4 times faster than the original iterative Jacobi method used in this solver. Still, even with hardware support, the overall performance of our QR-based hardware is lower than its original software version.

2020

Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets

Autores
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publicação
IEEE ACCESS

Abstract
High Level Synthesis (HLS) tools targeting Field Programmable Gate Arrays (FPGAs) aim to provide a method for programming these devices via high-level abstractions. Initially, HLS support for FPGAs focused on compiling C/C CC to hardware circuits. This raised the issue of determining the programming practices which resulted in the best performing circuits. Recently, to further increase the applicability of HLS approaches, renewed effort was placed on support for HLS of OpenCL code for FPGA, raising the same issues of coding practices and performance portability. This paper explores the performance of OpenCL code compiled for FPGAs for different coding techniques. We evaluate the use of task-kernels versus NDRange kernels, data vectorization, the use of on-chip local memories, and data transfer optimizations by exploiting burst access inference. We present this exploration via a case study of the k-means algorithm, and produce a total of 10 OpenCL implementations of the kernel. To determine the effects of different data set characteristics, and to determine the gains from specialization based on number of attributes, we generated a total of 12 integer data sets. The data sets vary regarding the number of instances, number of attributes (i.e., features), and number of clusters. We also vary the number of processing cores, and present the resulting required resources and operating frequencies. Finally, we execute the same OpenCL code on a 4 GHz Intel i7-6700K CPU, showing that the FPGA achieves speedups up to 1.54 x for four cases, and energy savings up to 80% in all cases.

2020

Executing ARMv8 Loop Traces on Reconfigurable Accelerator via Binary Translation Framework

Autores
Paulino, N; Ferreira, JC; Bispo, J; Cardoso, JMP;

Publicação
2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL)

Abstract
Performance and power efficiency in edge and embedded systems can benefit from specialized hardware. To avoid the effort of manual hardware design, we explore the generation of accelerator circuits from binary instruction traces for several Instruction Set Architectures.

2020

kNN Prototyping Schemes for Embedded Human Activity Recognition with Online Learning

Autores
Ferreira, PJS; Cardoso, JMP; Moreira, JM;

Publicação
Comput.

Abstract
The kNN machine learning method is widely used as a classifier in Human Activity Recognition (HAR) systems. Although the kNN algorithm works similarly both online and in offline mode, the use of all training instances is much more critical online than offline due to time and memory restrictions in the online mode. Some methods propose decreasing the high computational costs of kNN by focusing, e.g., on approximate kNN solutions such as the ones relying on Locality-Sensitive Hashing (LSH). However, embedded kNN implementations also need to address the target device’s memory constraints, especially as the use of online classification needs to cope with those constraints to be practical. This paper discusses online approaches to reduce the number of training instances stored in the kNN search space. To address practical implementations of HAR systems using kNN, this paper presents simple, energy/computationally efficient, and real-time feasible schemes to maintain at runtime a maximum number of training instances stored by kNN. The proposed schemes include policies for substituting the training instances, maintaining the search space to a maximum size. Experiments in the context of HAR datasets show the efficiency of our best schemes. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.

2020

Automatic Selection and Insertion of HLS Directives Via a Source-to-Source Compiler

Autores
Santos, T; Cardoso, JMP;

Publicação
2020 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2020)

Abstract
High-level Synthesis (HLS) is of paramount importance to leverage the use of FPGA-based accelerators by software developers. To achieve efficient FPGA implementations, code restructuring and source code annotating with HLS directives are necessary. However, this is still a manual process conducted by experienced developers. This paper proposes a step on a framework to automatically optimize C code via directives, using a source-to-source compiler on a stage before HLS. This optimization is primarily applied by strategies that select, configure, and insert directives on the code input to the Vivado HLS tool to synthesize more latency-efficient FPGA hardware. Those strategies rely on very simple but effective heuristics, which use a small set of properties extracted from the control/dataflow graphs generated from the input source code. We evaluate the framework using a variety of source codes. The experiments show that it can achieve efficient results while maintaining a low resource usage in most cases. Our experiments also compare the framework results to code optimized manually with directives, and they show that for most benchmarks used, it achieves similar results.

2020

Clava: C/C plus plus source-to-source compilation using LARA

Autores
Bispo, J; Cardoso, JMP;

Publicação
SOFTWAREX

Abstract
This article presents Clava, a Clang-based source-to-source compiler, that accepts scripts written in LARA, a JavaScript-based DSL with special constructs for code queries, analysis and transformations. Clava improves Clang's source-to-source capabilities by providing a more convenient and flexible way to analyze, transform and generate C/C++ code, and provides support for building strategies that capture run-time behavior. We present the Clava framework, its main capabilities, and how it can been used. Furthermore, we show that Clava is sufficiently robust to analyze, instrument and test a set of large C/C++ application codes, such as GCC. (C) 2020 The Authors. Published by Elsevier B.V.

  • 200
  • 663