2007
Autores
Menotti, R; Marques, E; Cardoso, JMP;
Publicação
2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2
Abstract
2007
Autores
de Holanda, JA; Assumpcao, J; Wolf, DE; Marques, E; Cardoso, JMP;
Publicação
2007 INTERNATIONAL SYMPOSIUM ON INDUSTRIAL EMBEDDED SYSTEMS
Abstract
The increasing use of battery-powered embedded systems has motivated the development of power consumption models in order to help designers to build low-power systems. Due to the configurability features of FPGAs, the adoption of systems containing one or more soft-core processors on a single chip is becoming more and more attractive. This paper presents an adaptation of the instruction-level power estimation model to soft-core processors implemented in FPGAs. This model allowed to estimate the power dissipated in eleven test applications with a maximum error of 4.78%. The Ongoing work includes efforts towards a software power estimation model for multi-core systems embedded in a single FPGA device.
2007
Autores
Bonato, V; Peron, R; Wolf, DF; de Holanda, JAM; Marques, E; Cardoso, JMP;
Publicação
2007 INTERNATIONAL SYMPOSIUM ON INDUSTRIAL EMBEDDED SYSTEMS
Abstract
The problem of simultaneous localization and mapping has been studied by the mobile robotics scientific community over the last two decades. Most solutions for this problem are based on probabilistic theory in order to represent the uncertainty in robot perception and action. One of the most efficient probabilistic methods is the Extended Kalman Filter (EKF). However, the EKF demands a considerable amount of computing power and is usually processed by high-end laptops coupled to the robots. In this work, we present an implementation of the EKF targeting an embedded system based on an FPGA device. In order to improve performance, our approach combines a softcore processor with customized hardware. We present experiments with four different FPGA implementations, being the first purely based on software, the second using custom instruction logic directly connected to the processor's ALU, the third using hardware accelerators connected to the processor's data bus, and finally the fourth combining those two hardware/software solutions. For the experiments conducted, the results obtained with a small addition of hardware resources permitted to increase from 2x to 4x the performance of the global system.
2007
Autores
Ferreira, R; Garcia, A; Teixeira, T; Cardoso, JMP;
Publicação
IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: EMERGING VLSI TECHNOLOGIES AND ARCHITECTURES
Abstract
Coarse-grained reconfigurable computing architectures vary widely in the number and characteristics of the processing elements (cells) and routing topologies used. In order to exploit several different topologies, a place and route framework, able to deal with such vast design exploration space, is of paramount importance. Bearing this in mind, this paper proposes a placement scheme able to target different topologies when considering data-driven reconfigurable architectures. Our approach uses graph models for the target architecture and for the dataflow representation of the application being mapped. Our placement algorithm is guided by a Depth-First Traversal in both the architecture and the application graphs. Two versions of the placement algorithm with respectively O(e) and O(e + n(3)) computational complexities are presented, where e is the number of edges in the dataflow representation of the application and n is the number of cells in the graph model of the architecture. The achieved experimental results show that our approach can be useful to exploit different interconnect topologies as far as coarse-grained reconfigurable computing architectures are concerned.
2007
Autores
Morra, C; Cardoso, JMP; Becker, J;
Publicação
21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA
Abstract
This paper presents a new and retargetable method to identify patterns of instructions with direct support in coarsegrained processing elements (PEs). The method uses a three-address code SSA (static single assignment) representation of the kernel being mapped and Rewriting Logic for template matching and algebraic optimizations. This approach is able to identify sets of SSA instructions that can be mapped to different PE complexities available in coarsegrained reconfigurable computing architectures. As a proof of concept, results of the approach with a number of benchmark kernels, as far as coverage of template instructions is concerned, are included. © 2007 IEEE.
2007
Autores
Rodrigues, R; Cardoso, JMP; Diniz, PC;
Publicação
FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS
Abstract
Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm and are therefore amenable to pipelined execution. This paper presents an execution technique to speed-up the overall execution of successive, data-dependent tasks on a reconfigurahle architecture. The technique pipelines sequences of data-dependent tasks by overlapping their execution subject to data-dependences. It decouples the concurrent data-path and control units and uses a custom, application data-driven, fine-grained synchronization and buffering scheme. In addition, the execution scheme allows for out of-order, but data-dependent producer-consumer pairs not allowed by previous data-driven pipelining approaches. The approach has been exploited in the context of a high-level compiler targeting FPGAs. The preliminary experimental results reveal noticeable performance improvements and buffer size reductions for a number of benchmarks over traditional approaches.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.