Detalhes
Nome
Pedro DinizCluster
InformáticaCargo
Investigador CoordenadorDesde
01 maio 2015
Nacionalidade
PortugalCentro
Computação Centrada no Humano e Ciência da InformaçãoContactos
+351222094199
pedro.diniz@inesctec.pt
2020
Autores
de Souza, CAO; Bispo, J; Cardoso, JMP; Diniz, PC; Marques, E;
Publicação
ELECTRONICS
Abstract
In this article, we focus on the acceleration of a chemical reaction simulation that relies on a system of stiff ordinary differential equation (ODEs) targeting heterogeneous computing systems with CPUs and field-programmable gate arrays (FPGAs). Specifically, we target an essential kernel of the coupled chemistry aerosol-tracer transport model to the Brazilian developments on the regional atmospheric modeling system (CCATT-BRAMS). We focus on a linear solve step using the QR factorization based on the modified Gram-Schmidt method as the basis of the ODE solver in this application. We target Intel hardware accelerator research program (HARP) architecture with the OpenCL programming environment for these early experiments. Our design exploration reveals a hardware design that is up to 4 times faster than the original iterative Jacobi method used in this solver. Still, even with hardware support, the overall performance of our QR-based hardware is lower than its original software version.
2019
Autores
Hochberger, C; Nelson, B; Koch, A; Woods, R; Diniz, P;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2018
Autores
Hukerikar, S; Teranishi, K; Diniz, PC; Lucas, RF;
Publicação
International Journal of Parallel Programming
Abstract
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. However, the use of complete redundancy incurs significant overhead to the application performance. In this paper we present RedThreads, an interface that provides application-level fault detection and correction based on RMT, but applies the thread-level redundancy adaptively. We describe the RedThreads syntax and semantics, and the supporting compiler infrastructure and runtime system. Our approach enables application programmers to scope the extent of redundant computation. Additionally, the runtime system permits the use of RMT to be dynamically enabled, or disabled, based on the resiliency needs of the application and the state of the system. Our experimental results demonstrate how adaptive RMT exploits programmer insight and runtime inference to dynamically navigate the trade-off space between an application’s resilience coverage and the associated performance overhead of redundant computation. © 2017, Springer Science+Business Media New York.
2018
Autores
Voros, NS; Hübner, M; Keramidas, G; Goehringer, D; Antonopoulos, CP; Diniz, PC;
Publicação
ARC
Abstract
2018
Autores
de Souza Rosa, L; Dasu, A; C. Diniz, P; Bonato, V;
Publicação
Journal of Signal Processing Systems
Abstract
The Extended Kalman Filter (EKF) computation is a core task for the simultaneous localization and mapping (SLAM) problem in autonomous mobile robots. The SLAM problem involves operations over high dimension data sets, requiring high throughput and performance, given the real-time nature of the robotics, control-decision algorithm this task is a part of. The lightweight and power restricted computing environments in mobile robotics requires customized processing systems such as Field-Programmable Gate Arrays (FPGAs). This work presents an arithmetic precision analysis and a Faddeev algorithm to calculate the Schur’s Complement hardware architecture implementation for the EKF-SLAM using a Systolic Array (SA). While it is widely believed that fixed-point implementations of arithmetic operations lead to area and performance benefits on FPGAs, the results in this article reveal that each Processing Element (PE) in the SA consumes 25% more logic and about 30% more register resources for the fixed-point 13.23 representation than if using the IEEE-754 single precision floating-point format. In addition, for FPGA devices with hardware support for key components of floating-point computations, a single PE floating-point implementation can achieve a maximum frequency up to 50% higher than a corresponding fixed-point implementation for the same relative numeric errors. © 2017, Springer Science+Business Media New York.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.