2005
Autores
Rodrigues, R; Cardoso, JMP;
Publicação
ARC 2005 - International Workshop on Applied Reconfigurable Computing 2005
Abstract
Sequences of loops or sets of nested loops exist in many applications. This paper shows a scheme to pipeline those sequences of loops in such a way that subsequent loops can start execution before the end of the previous ones. It uses a hardware scheme with decoupled and concurrent datapath and control units that start execution at the same time. The communication of data items between two loops in sequence is conducted by memories. Each element of one of such memories is responsible to flag the availability of the data requested by a subsequence loop. Thus, the control execution of subsequent loops is also orchestrated by data availability and out-of-order produced-consumed pairs are permitted. We apply the concept to a real example: a fast DCT algorithm.
2012
Autores
Cardoso, JMP; Carvalho, T; Coutinho, JGF; Luk, W; Nobre, R; Diniz, PC; Petrov, Z;
Publicação
AOSD'12 - Proceedings of the 11th Annual International Conference on Aspect Oriented Software Development
Abstract
The development of applications for high-performance embedded systems is typically a long and error-prone process. In addition to the required functions, developers must consider various and often conflicting non-functional application requirements such as performance and energy efficiency. The complexity of this process is exacerbated by the multitude of target architectures and the associated retargetable mapping tools. This paper introduces an Aspect-Oriented Programming (AOP) approach that conveys domain knowledge and non-functional requirements to optimizers and mapping tools. We describe a novel AOP language, LARA, which allows the specification of compilation strategies to enable efficient generation of software code and hardware cores for alternative target architectures. We illustrate the use of LARA for code instrumentation and analysis, and for guiding the application of compiler and hardware synthesis optimizations. An important LARA feature is its capability to deal with different join points, action models, and attributes, and to generate an aspect intermediate representation. We present examples of our aspect-oriented hardware/software design flow for mapping real-life application codes to embedded platforms based on Field Programmable Gate Array (FPGA) technology. © 2012 ACM.
2010
Autores
Becker, J; Bozorgzadeh, E; Cardoso, JMP; Dasu, A;
Publicação
Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
Abstract
2006
Autores
Bertels, K; Cardoso, J; Vassiliadis, S;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2008
Autores
Cardoso, JMP;
Publicação
INTERNATIONAL JOURNAL OF ENGINEERING EDUCATION
Abstract
This paper presents an approach to teaching design of non-programmable application-specific architectures using VHDL, logic and physical synthesis tools and FPGAs. The approach relies on mini-projects that resemble typical problems that students may face in real-life concerning the design of application-specific architectures. The teaching approach presented in this paper supports the incremental learning of both VHDL and the tools used. as the projects are being developed, i.e., students are motivated to acquire skills at the pace at which those skills are required to advance project development. The results so far are very encouraging. Even students with little knowledge of hardware design and embedded systems have succeeded in their assignments. Feedback obtained front students reveals the suitability of certain aspects of the approach and the major difficulties they have faced.
2011
Autores
Bispo, J; Cardanha Paulino, NM; Cardoso, JMP; Ferreira, JC;
Publicação
2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, Cancun, Mexico, November 30 - December 2, 2011
Abstract
This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries. The system was implemented in an FPGA and for the tested kernels measured speedups ranged between 3.9x and 18.2x for a Micro Blaze CPU without cache. We estimate speedups from 1.03x to 2.01x, when comparing to the best estimated performance achieved with a single Micro Blaze. © 2011 IEEE.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.