Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Interest
Topics
Details

Details

Publications

2020

Exploration of FPGA-Based Hardware Designs for QR Decomposition for Solving Stiff ODE Numerical Methods Using the HARP Hybrid Architecture

Authors
de Souza, CAO; Bispo, J; Cardoso, JMP; Diniz, PC; Marques, E;

Publication
ELECTRONICS

Abstract
In this article, we focus on the acceleration of a chemical reaction simulation that relies on a system of stiff ordinary differential equation (ODEs) targeting heterogeneous computing systems with CPUs and field-programmable gate arrays (FPGAs). Specifically, we target an essential kernel of the coupled chemistry aerosol-tracer transport model to the Brazilian developments on the regional atmospheric modeling system (CCATT-BRAMS). We focus on a linear solve step using the QR factorization based on the modified Gram-Schmidt method as the basis of the ODE solver in this application. We target Intel hardware accelerator research program (HARP) architecture with the OpenCL programming environment for these early experiments. Our design exploration reveals a hardware design that is up to 4 times faster than the original iterative Jacobi method used in this solver. Still, even with hardware support, the overall performance of our QR-based hardware is lower than its original software version.

2019

Preface

Authors
Hochberger, C; Nelson, B; Koch, A; Woods, R; Diniz, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2018

RedThreads: An Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading

Authors
Hukerikar, S; Teranishi, K; Diniz, PC; Lucas, RF;

Publication
International Journal of Parallel Programming

Abstract
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. However, the use of complete redundancy incurs significant overhead to the application performance. In this paper we present RedThreads, an interface that provides application-level fault detection and correction based on RMT, but applies the thread-level redundancy adaptively. We describe the RedThreads syntax and semantics, and the supporting compiler infrastructure and runtime system. Our approach enables application programmers to scope the extent of redundant computation. Additionally, the runtime system permits the use of RMT to be dynamically enabled, or disabled, based on the resiliency needs of the application and the state of the system. Our experimental results demonstrate how adaptive RMT exploits programmer insight and runtime inference to dynamically navigate the trade-off space between an application’s resilience coverage and the associated performance overhead of redundant computation. © 2017, Springer Science+Business Media New York.

2018

Applied Reconfigurable Computing. Architectures, Tools, and Applications - 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings

Authors
Voros, NS; Hübner, M; Keramidas, G; Goehringer, D; Antonopoulos, CP; Diniz, PC;

Publication
ARC

Abstract

2018

A Faddeev Systolic Array for EKF-SLAM and its Arithmetic Data Representation Impact on FPGA

Authors
de Souza Rosa, L; Dasu, A; C. Diniz, P; Bonato, V;

Publication
Journal of Signal Processing Systems

Abstract
The Extended Kalman Filter (EKF) computation is a core task for the simultaneous localization and mapping (SLAM) problem in autonomous mobile robots. The SLAM problem involves operations over high dimension data sets, requiring high throughput and performance, given the real-time nature of the robotics, control-decision algorithm this task is a part of. The lightweight and power restricted computing environments in mobile robotics requires customized processing systems such as Field-Programmable Gate Arrays (FPGAs). This work presents an arithmetic precision analysis and a Faddeev algorithm to calculate the Schur’s Complement hardware architecture implementation for the EKF-SLAM using a Systolic Array (SA). While it is widely believed that fixed-point implementations of arithmetic operations lead to area and performance benefits on FPGAs, the results in this article reveal that each Processing Element (PE) in the SA consumes 25% more logic and about 30% more register resources for the fixed-point 13.23 representation than if using the IEEE-754 single precision floating-point format. In addition, for FPGA devices with hardware support for key components of floating-point computations, a single PE floating-point implementation can achieve a maximum frequency up to 50% higher than a corresponding fixed-point implementation for the same relative numeric errors. © 2017, Springer Science+Business Media New York.