2010
Autores
Bispo, J; Cardoso, JMP;
Publicação
International Conference on Field Programmable Logic and Applications, FPL 2010, August 31 2010 - September 2, 2010, Milano, Italy
Abstract
Typical computing systems based on general purpose processors (GPPs) are extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, efficient dynamic mapping techniques are required. Those dynamic mapping techniques will be responsible for automatically moving computations originally running in the GPP to the CGRA. The concept of dynamic compilation, widespread in the context of JIT compilation to GPPs, is receiving more attention by the reconfigurable computing community. This paper presents our approach to dynamically map computations to CGRAs coupled to a GPP. Specifically, we present the identification of large sequences of instructions, MegaBlocks, being executed in a GPP. These MegaBlocks are then mapped to the target CGRA. We evaluate the potential of the MegaBlocks over Basic Blocks and SuperBlocks to increase the IPC when targeting a CGRA and considering the execution of a number of representative benchmarks. © 2010 IEEE.
2010
Autores
Bispo, J; Cardoso, JMP;
Publicação
Proceedings of the International Conference on Field-Programmable Technology, FPT 2010, 8-10 December 2010, Tsinghua University, Beijing, China
Abstract
Typical computing systems based on general purpose processors (GPPs) can be extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, possibly including CGRAs varying in size, efficient dynamic compilation/mapping techniques are required. Dynamic mapping will be responsible for automatically moving computations originally running in the GPP to the CGRA. This paper presents our approach to dynamically map computations to CGRAs coupled to a GPP. Specifically, we evaluate the potential of the MegaBlock to accelerate the execution of a number of representative benchmarks when targeting an architecture based on a GPP and a CGRA. In addition, we show the impact on performance when using constant folding and propagation optimizations. © 2010 IEEE.
2010
Autores
Rosado, A; Cardoso, JMP;
Publicação
Fourth International Conference on Network and System Security, NSS 2010, Melbourne, Victoria, Australia, September 1-3, 2010
Abstract
There have been several authors asserting that conceptual query languages (CQLs) perform better for querying purposes than logical query languages such as SQL. This paper proposes a query mapping algorithm for the FConQuer system. FConQuer is a framework based on object-role modeling (ORM) schemas, which allow the end-user to formulate conceptual queries through the FConQuer language. Our mapping algorithm allows the FConQuer system to process conceptual queries based on ORM schemas. More precisely, our algorithm maps FConQuer queries to OQL. © 2010 IEEE.
2010
Autores
Menotti, R; Cardoso, JMP; Fernandes, MM; Marques, E;
Publicação
IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE 2010)
Abstract
This paper presents the use of LALP to implement typical industrial application kernels, ADPCM Encoder and Decoder, in FPGAs. LALP is a domain specific language and its compilation framework aims to the direct mapping of algorithms originally described in a high-level language onto FPGAs. In particular, LALP focuses on loop pipelining, a key technique for the design of hardware accelerators. While the language syntax resembles C, it contains certain constructs that allow programmer interventions to enforce or relax data dependences as needed, and so optimize the performance of the generated hardware. We present experimental results showing significant performance gains using this approach, while still keeping the language syntax and semantics close to popular high level software languages, a desirable feature when considering time to market constraints. We believe the performance gains observed for the ADPCM implementation can be extended to other industrial applications relying on algorithms spending most of their execution time on loop structures, such signal and image processing.
2009
Autores
Marcelino, R; Neto, HC; Cardoso, JMP;
Publicação
16th IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2009, Yasmine Hammamet, Tunesia, 13-19 December, 2009
Abstract
Sorting is an important operation in a myriad of applications. It can contribute substantially to the overall execution time of an application. Dedicated sorting architectures can be used to accelerate applications and/or to reduce energy consumption. In this paper, we propose an efficient sorting unit aiming at acceleratin. The sort operation in FPGA-based embedded systems. The proposed sorting unit, named Unbalanced FIFO Merge Sorting Unit, is based on a FIFO merger implementation and is easily scalable to handle different data-set sizes. We show results oy the proposed sorting unit when isolated and when integrated in a software/hardware solution. When using a Xilinx Virtex-5 SX50T FPGA device. The logic resources for a 32 Kword machine is lower than 1%, an. The block RAM usage is about 22%. When compared to a quicksort pure software implementation, our Sorting Unit provides speed-ups from 1.2× to 50× and about 20× when isolated and when integrated in a software/hardware solution, respectively. © 2009 IEEE.
2009
Autores
Menotti, R; Cardoso, JMP; Fernandes, MM; Marques, E;
Publicação
PROCEEDINGS OF THE 21ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING
Abstract
Field-Programmable Gate Arrays (FPGAs) are becoming increasingly important in embedded and high-performance computing systems. They allow performance levels close to the ones obtained from Application-Specific Integrated Circuits (ASICs), while still keeping design and implementation flexibility. However to efficiently program FPGAs, one needs the expertise of hardware developers and to master hardware description languages (HDLs) such as VHDL or Verilog. The attempts to furnish a high-level compilation flow (e.g., front C programs) still have open issues before efficient and consistent results can be obtained. Bearing in mind the FPGA resources, we have developed LALP, a novel language to program FPGAs. A compilation framework including mapping capabilities supports the language. The main ideas behind LALP is to provide a higher abstraction level than HDLs, to exploit the intrinsic parallelism of hardware resources, and to permit the programmer to control execution stages whenever the compiler techniques are unable to generate efficient implementations. In this paper we describe LALP, and show how it can be used to achieve high-performance computing solutions.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.