2013
Autores
Paulino, N; Ferreira, JC; Cardoso, JMP;
Publicação
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS
Abstract
This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, are accelerated by a Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43x is achieved, and a potential speedup of 2.09x is predicted for an implementation using a low overhead interface for communication between GPP and RPU.
2013
Autores
dos Santos, PV; Alves, JC; Ferreira, JC;
Publicação
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS
Abstract
The genetic algorithm (GA) is an optimization metaheuristic that relies on the evolution of a set of solutions (population) according to genetically inspired transformations. In the variant of this technique called cellular GA, the evolution is done separately for subgroups of solutions. This paper describes a hardware framework capable of efficiently supporting custom accelerators for this metaheuristic. This approach builds a regular array of problem-specific processing elements (PEs), which perform the genetic evolution, connected to shared memories holding the local subpopulations. To assist the design of the custom PEs, a methodology based on highlevel synthesis from C++ descriptions is used. The proposed architecture was applied to a spectrum allocation problem in cognitive radio networks. For an array of 5x5 PEs in a Virtex-6 FPGA, the results show a minimum speedup of 22x compared to a software version running on a PC and a speedup near 2000x over a MicroBlaze soft processor.
2013
Autores
Goncalves, F; Petrov, Z; De F. Coutinho, JG; Nane, R; Sima, VM; Cardoso, JMP; Werner, S; Bhattacharya, S; Carvalho, T; Nobre, R; De Sa, J; Teixeira, J; Diniz, PC; Bertels, K; Constantinides, G; Luk, W; Becker, J; Alves, JC; Ferreira, JC; Almeida, GM;
Publicação
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach
Abstract
This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7]. © Springer Science+Business Media New York 2013. All rights are reserved.
2013
Autores
Nova, B; Ferreira, JC; Araujo, A;
Publicação
2013 1ST INTERNATIONAL CONFERENCE OF THE PORTUGUESE SOCIETY FOR ENGINEERING EDUCATION (CISPEE)
Abstract
Computer architecture is an important subject for informatics and electrical engineering courses. However, students display some difficulties in this subject, mainly due to the lack of educational tools that are intuitive, versatile and graphical. Existing tools are not adequate enough or are very specific. In this paper, an educational MIPS simulator, DrMIPS, is described. This tool simulates the execution of an assembly program on the CPU and displays the datapath graphically. Registers, data memory and assembled code are also displayed and a "performance mode" is also provided. Both unicycle and pipeline implementations are supported and the CPUs and their instruction sets are configurable. The tool is currently available for PCs and Android tablets, and is fairly intuitive and versatile on both platforms.
2013
Autores
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;
Publicação
International Journal of Reconfigurable Computing
Abstract
The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 × to 3.69 × were achieved for the best alternative using a MicroBlaze processor with local memory. © 2013 João Bispo et al.
2013
Autores
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;
Publicação
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Abstract
This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed instructions, called Megablocks, to a runtime reconfigurable array of functional units. An offline tool suite extracts Megablocks from microprocessor instruction traces and generates a Reconfigurable Processing Unit (RPU) tailored for the execution of those Megablocks. The system is able to transparently movebcomputations from the microprocessor to the RPU at runtime. A prototype implementation of the system using a cacheless MicroBlaze microprocessor running code located in external memory reaches speedups from 2.2x to 18.2x for a set of 14 benchmark kernels. For a system setup which maximizes microprocessor performance by having the application code located in internal block RAMs, speedups from 1.4x to 2.8x were estimated.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.