Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por João Canas Ferreira

2016

Scalable hardware architecture for disparity map computation and object location in real-time

Autores
Santos, PM; Ferreira, JC; Matos, JS;

Publicação
JOURNAL OF REAL-TIME IMAGE PROCESSING

Abstract
We present the disparity map computation core of a hardware system for isolating foreground objects in stereoscopic video streams. The operation is based on the computation of dense disparity maps using block-matching algorithms and two well-known metrics: sum of absolute differences and Census transform. Two sets of disparity maps are computed by taking each of the images as reference so that a consistency check can be performed to identify occluded pixels and eliminate spurious foreground pixels. Taking advantage of parallelism, the proposed architecture is highly scalable and provides numerous degrees of adjustment to different application needs, performance levels and resource usage. A version of the system for 640 x 480 images and a maximum disparity of 135 pixels was implemented in a system based on a Xilinx Virtex II-Pro FPGA and two cameras with a frame rate of 25 fps (less than the maximum supported frame rate of 40 fps on this platform). Implementation of the same system on a Virtex-5 FPGA is estimated to achieve 80 fps, while a version with increased parallelism is estimated to run at 140 fps (which corresponds to the calculation of more than 5.9 x 10(9) disparity-pixels per second).

2013

Architecture for Transparent Binary Acceleration of Loops with Memory Accesses

Autores
Paulino, N; Ferreira, JC; Cardoso, JMP;

Publicação
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS

Abstract
This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, are accelerated by a Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43x is achieved, and a potential speedup of 2.09x is predicted for an implementation using a low overhead interface for communication between GPP and RPU.

2013

A FRAMEWORK FOR HARDWARE CELLULAR GENETIC ALGORITHMS: AN APPLICATION TO SPECTRUM ALLOCATION IN COGNITIVE RADIO

Autores
dos Santos, PV; Alves, JC; Ferreira, JC;

Publicação
2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS

Abstract
The genetic algorithm (GA) is an optimization metaheuristic that relies on the evolution of a set of solutions (population) according to genetically inspired transformations. In the variant of this technique called cellular GA, the evolution is done separately for subgroups of solutions. This paper describes a hardware framework capable of efficiently supporting custom accelerators for this metaheuristic. This approach builds a regular array of problem-specific processing elements (PEs), which perform the genetic evolution, connected to shared memories holding the local subpopulations. To assist the design of the custom PEs, a methodology based on highlevel synthesis from C++ descriptions is used. The proposed architecture was applied to a spectrum allocation problem in cognitive radio networks. For an array of 5x5 PEs in a Virtex-6 FPGA, the results show a minimum speedup of 22x compared to a software version running on a PC and a speedup near 2000x over a MicroBlaze soft processor.

2014

E-legging for monitoring the human locomotion patterns

Autores
Catarino, A; Rocha, AM; Abreu, MJ; Derogarian, F; Da Silva, J; Ferreira, J; Tavares, V; Correia, M; Dias, R;

Publicação
Journal of Textile Engineering

Abstract
Human motion capture systems help clinicians to detect and identify mobility impairments, early stages of pathologies and evaluate the effectiveness of surgical or rehabilitation intervention. Although there is a considerable number of solutions presently available, these systems are often expensive, complex, difficult to wear, and uncomfortable for the patient. With the purpose of solving the formerly mentioned problems, a new wearable locomotion data capture system for gait analysis is being developed. This system will allow the measurement of several locomotion-related parameters in a practical and non-invasive way, also reusable, that can be used by patients from light to severe impairments or disabilities. © 2013 The Textile Machinery Society of Japan.

2017

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Autores
Paulino, NMC; Ferreira, JC; Cardoso, JMP;

Publicação
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Abstract
Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.

2013

LARA experiments

Autores
Goncalves, F; Petrov, Z; De F. Coutinho, JG; Nane, R; Sima, VM; Cardoso, JMP; Werner, S; Bhattacharya, S; Carvalho, T; Nobre, R; De Sa, J; Teixeira, J; Diniz, PC; Bertels, K; Constantinides, G; Luk, W; Becker, J; Alves, JC; Ferreira, JC; Almeida, GM;

Publicação
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7]. © Springer Science+Business Media New York 2013. All rights are reserved.

  • 2
  • 16