Publications

Publications by João Paiva Cardoso

2005

Pipelining sequences of loops: A first example

Authors
Rodrigues, R; Cardoso, JMP;

Publication
ARC 2005 - International Workshop on Applied Reconfigurable Computing 2005

Abstract
Sequences of loops or sets of nested loops exist in many applications. This paper shows a scheme to pipeline those sequences of loops in such a way that subsequent loops can start execution before the end of the previous ones. It uses a hardware scheme with decoupled and concurrent datapath and control units that start execution at the same time. The communication of data items between two loops in sequence is conducted by memories. Each element of one of such memories is responsible to flag the availability of the data requested by a subsequence loop. Thus, the control execution of subsequent loops is also orchestrated by data availability and out-of-order produced-consumed pairs are permitted. We apply the concept to a real example: a fast DCT algorithm.

CloseRead Abstract

2012

LARA: An aspect-oriented programming language for embedded systems

Authors
Cardoso, JMP; Carvalho, T; Coutinho, JGF; Luk, W; Nobre, R; Diniz, PC; Petrov, Z;

Publication
AOSD'12 - Proceedings of the 11th Annual International Conference on Aspect Oriented Software Development

Abstract
The development of applications for high-performance embedded systems is typically a long and error-prone process. In addition to the required functions, developers must consider various and often conflicting non-functional application requirements such as performance and energy efficiency. The complexity of this process is exacerbated by the multitude of target architectures and the associated retargetable mapping tools. This paper introduces an Aspect-Oriented Programming (AOP) approach that conveys domain knowledge and non-functional requirements to optimizers and mapping tools. We describe a novel AOP language, LARA, which allows the specification of compilation strategies to enable efficient generation of software code and hardware cores for alternative target architectures. We illustrate the use of LARA for code instrumentation and analysis, and for guiding the application of compiler and hardware synthesis optimizations. An important LARA feature is its capability to deal with different join points, action models, and attributes, and to generate an aspect intermediate representation. We present examples of our aspect-oriented hardware/software design flow for mapping real-life application codes to embedded platforms based on Field Programmable Gate Array (FPGA) technology. © 2012 ACM.

CloseRead Abstract

2010

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Welcome message

Authors
Becker, J; Bozorgzadeh, E; Cardoso, JMP; Dasu, A;

Publication
Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

Abstract

2006

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Authors
Bertels, K; Cardoso, J; Vassiliadis, S;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2008

A teaching strategy for developing application specific architectures for FPGAs

Authors
Cardoso, JMP;

Publication
INTERNATIONAL JOURNAL OF ENGINEERING EDUCATION

Abstract
This paper presents an approach to teaching design of non-programmable application-specific architectures using VHDL, logic and physical synthesis tools and FPGAs. The approach relies on mini-projects that resemble typical problems that students may face in real-life concerning the design of application-specific architectures. The teaching approach presented in this paper supports the incremental learning of both VHDL and the tools used. as the projects are being developed, i.e., students are motivated to acquire skills at the pace at which those skills are required to advance project development. The results so far are very encouraging. Even students with little knowledge of hardware design and embedded systems have succeeded in their assignments. Feedback obtained front students reveals the suitability of certain aspects of the approach and the major difficulties they have faced.

CloseRead Abstract

2011

From Instruction Traces to Specialized Reconfigurable Arrays

Authors
Bispo, J; Cardanha Paulino, NM; Cardoso, JMP; Ferreira, JC;

Publication
2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, Cancun, Mexico, November 30 - December 2, 2011

Abstract
This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries. The system was implemented in an FPGA and for the tested kernels measured speedups ranged between 3.9x and 18.2x for a Micro Blaze CPU without cache. We estimate speedups from 1.03x to 2.01x, when comparing to the best estimated performance achieved with a single Micro Blaze. © 2011 IEEE.

CloseRead Abstract