2013
Authors
Cardoso, JMP; Fernandes, JM; Monteiro, MP; Carvalho, T; Nobre, R;
Publication
JOURNAL OF SYSTEMS ARCHITECTURE
Abstract
This article presents an approach to enrich the MATLAB(1) language with aspect-oriented modularity features, enabling developers to experiment different implementation characteristics and to acquire runtime data and traces without polluting their base MATLAB code. We propose a language through which programmers configure the low-level data representation of variables and expressions. Examples include specifically-tailored fixed-point data representations leading to more efficient support for the underlying hardware, e.g., digital signal processors and application-specific architectures, without built-in floating point units. This approach assists developers in adding handlers and monitoring features in a non-invasive way as well as configuring MATLAB functions with optimized implementations. Different aspect modules can be used to retarget common MATLAB code bases for different purposes and implementations. We validate the proposed approach with a set of representative examples where we attain a simple way to explore a number of properties. Experiment results and collected aspect-oriented software metrics lend support to the claims on its usefulness.
2013
Authors
Bispo, J; Cardoso, JMP; Monteiro, J;
Publication
Journal of Integrated Circuits and Systems
Abstract
Dynamic partitioning is a promising technique where computations are transparently moved from a General Purpose Processor (GPP) to a coprocessor during application execution. To be effective, the mapping of computations to the coprocessor needs to consider aggressive optimizations. One of the mapping optimizations is loop pipelining, a technique extensively studied and known to allow substantial performance improvements. This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning. The technique transforms the body of Mega-blocks into an acyclic dataflow graph which can be fully pipe-lined and is based on the atomic execution of loop iterations. For a set of 9 benchmarks without memory operations, we generated pipelined hardware versions of the loops and esti-mate that the presented loop pipelining technique increases the average speedup of non-pipelined coprocessor accelerated designs from 1.6× to 2.2×. For a larger set of 61 benchmarks which include memory operations, we estimate through simulation a speedup increase from 2.5× to 5.6× with this technique.
2013
Authors
Goncalves, F; Petrov, Z; De F. Coutinho, JG; Nane, R; Sima, VM; Cardoso, JMP; Werner, S; Bhattacharya, S; Carvalho, T; Nobre, R; De Sa, J; Teixeira, J; Diniz, PC; Bertels, K; Constantinides, G; Luk, W; Becker, J; Alves, JC; Ferreira, JC; Almeida, GM;
Publication
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach
Abstract
This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7]. © Springer Science+Business Media New York 2013. All rights are reserved.
2013
Authors
Santos, AC; Cardoso, JMP; Diniz, PC; Ferreira, DR;
Publication
OpenAccess Series in Informatics
Abstract
Developing applications for resource-constrained embedded systems is a challenging task specially when applications must adapt to changes in their operating conditions or environment. To ensure an appropriate response at all times, it is highly desirable to develop applications that can dynamically adapt their behavior at run-time. In this paper we introduce an architecture that allows the specification of adaptable behavior through an external, high-level and platform-independent domain-specific language (DSL). The DSL is used here to define adaptation rules that change the run-time behavior of the application depending on various operational factors, such as time constraints. We illustrate the use of the DSL in an application to mobile robot navigation using smartphones, where experimental results highlight the benefits of specifying the adaptable behavior in a flexible and external way to the main application logic. © André C. Santos, João M. P. Cardoso, Pedro C. Diniz and Diogo R. Ferreira.
2013
Authors
Azarian, A; Cardoso, JMP; Werner, S; Becker, J;
Publication
Proceedings of the ACM Symposium on Applied Computing
Abstract
In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of Producer-Consumer tasks. This paper proposes an approach to achieve pipelining execution of Producer-Consumer pairs of tasks in FPGA-based multi-core architectures. Our approach is able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the Producer-Consumer tasks. In order to improve performance, we propose a technique to optimize out-of-order Producer-Consumer pairs where the consumer uses more than once each data element produced, a behavior present in many applications (e.g., in image processing). All the schemes and optimizations proposed in this paper were evaluated with FPGA implementations. The experimental results show the feasibility of the approach in both in-order and out-of-order Producer-Consumer tasks. Furthermore, the results using our approach to task-level pipelining and a multi-core architecture reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining. Copyright 2013 ACM.
2013
Authors
Bispo, J; Pinto, P; Nobre, R; Carvalho, T; Cardoso, JMP; Diniz, PC;
Publication
2013 11TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN)
Abstract
This paper describes MATISSE, a MATLAB to C compiler targeting embedded systems that is based on Strategic and Aspect-Oriented Programming concepts. MATISSE takes as input: (1) MATLAB code and (2) LARA aspects related to types and shapes, code insertion/ removal, and specialization based directives defining default variable values. In this paper we also illustrate the use of MATISSE in leveraging data types and shapes to generate customized C code suitable for high-level hardware synthesis tools. The preliminary experimental results presented here reveal the described approach to yield performance results for the resulting hardware and software references implementations that are comparable in terms of performance with hand-crafted solutions but derived automatically at a fraction of the cost.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.