Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CTM

2013

Architecture for Transparent Binary Acceleration of Loops with Memory Accesses

Autores
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publicação
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS

Abstract
This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, are accelerated by a Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43x is achieved, and a potential speedup of 2.09x is predicted for an implementation using a low overhead interface for communication between GPP and RPU.

2013

A FRAMEWORK FOR HARDWARE CELLULAR GENETIC ALGORITHMS: AN APPLICATION TO SPECTRUM ALLOCATION IN COGNITIVE RADIO

Autores
dos Santos, PV; Alves, JC; Ferreira, JC;

Publicação
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS

Abstract
The genetic algorithm (GA) is an optimization metaheuristic that relies on the evolution of a set of solutions (population) according to genetically inspired transformations. In the variant of this technique called cellular GA, the evolution is done separately for subgroups of solutions. This paper describes a hardware framework capable of efficiently supporting custom accelerators for this metaheuristic. This approach builds a regular array of problem-specific processing elements (PEs), which perform the genetic evolution, connected to shared memories holding the local subpopulations. To assist the design of the custom PEs, a methodology based on highlevel synthesis from C++ descriptions is used. The proposed architecture was applied to a spectrum allocation problem in cognitive radio networks. For an array of 5x5 PEs in a Virtex-6 FPGA, the results show a minimum speedup of 22x compared to a software version running on a PC and a speedup near 2000x over a MicroBlaze soft processor.

2013

LARA experiments

Autores
Goncalves, F; Petrov, Z; De F. Coutinho, JG; Nane, R; Sima, VM; Cardoso, JMP; Werner, S; Bhattacharya, S; Carvalho, T; Nobre, R; De Sa, J; Teixeira, J; Diniz, PC; Bertels, K; Constantinides, G; Luk, W; Becker, J; Alves, JC; Ferreira, JC; Almeida, GM;

Publicação
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7]. © Springer Science+Business Media New York 2013. All rights are reserved.

2013

Tool to Support Computer Architecture Teaching and Learning

Autores
Nova, B; Ferreira, JC; Araujo, A;

Publicação
2013 1ST INTERNATIONAL CONFERENCE OF THE PORTUGUESE SOCIETY FOR ENGINEERING EDUCATION (CISPEE)

Abstract
Computer architecture is an important subject for informatics and electrical engineering courses. However, students display some difficulties in this subject, mainly due to the lack of educational tools that are intuitive, versatile and graphical. Existing tools are not adequate enough or are very specific. In this paper, an educational MIPS simulator, DrMIPS, is described. This tool simulates the execution of an assembly program on the CPU and displays the datapath graphically. Registers, data memory and assembled code are also displayed and a "performance mode" is also provided. Both unicycle and pipeline implementations are supported and the CPUs and their instruction sets are configurable. The tool is currently available for PCs and Android tablets, and is fairly intuitive and versatile on both platforms.

2013

Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units

Autores
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;

Publicação
International Journal of Reconfigurable Computing

Abstract
The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 × to 3.69 × were achieved for the best alternative using a MicroBlaze processor with local memory. © 2013 João Bispo et al.

2013

Transparent Trace-Based Binary Acceleration for Reconfigurable HW/SW Systems

Autores
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;

Publicação
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

Abstract
This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed instructions, called Megablocks, to a runtime reconfigurable array of functional units. An offline tool suite extracts Megablocks from microprocessor instruction traces and generates a Reconfigurable Processing Unit (RPU) tailored for the execution of those Megablocks. The system is able to transparently movebcomputations from the microprocessor to the RPU at runtime. A prototype implementation of the system using a cacheless MicroBlaze microprocessor running code located in external memory reaches speedups from 2.2x to 18.2x for a set of 14 benchmark kernels. For a system setup which maximizes microprocessor performance by having the application code located in internal block RAMs, speedups from 1.4x to 2.8x were estimated.

  • 260
  • 368