Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por HumanISE

2013

LARA experiments

Autores
Goncalves, F; Petrov, Z; De F. Coutinho, JG; Nane, R; Sima, VM; Cardoso, JMP; Werner, S; Bhattacharya, S; Carvalho, T; Nobre, R; De Sa, J; Teixeira, J; Diniz, PC; Bertels, K; Constantinides, G; Luk, W; Becker, J; Alves, JC; Ferreira, JC; Almeida, GM;

Publicação
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
This chapter describes a series of experiments aimed at evaluating the effectiveness of the REFLECT design-flow in terms of ease of use and quality of the generated designs. In these experiments, we exercised the use of LARA to control and guide the REFLECT design-flow components, such as the Harmonic weaver, the CoSy-based compilers, and the back-end Molen/ML510 toolchain. Various research results have been presented in previous publications focusing on specific aspects of the REFLECT design-flow [1], including strategies for optimizing hardware/software systems [2], strategies for optimizing hardware synthesis [3], strategies for hardware/software specialization [4], strategies for resource efficiency [5], and strategies addressing safety requirements [6, 7]. © Springer Science+Business Media New York 2013. All rights are reserved.

2013

Specifying adaptations through a DSL with an application to mobile robot navigation

Autores
Santos, AC; Cardoso, JMP; Diniz, PC; Ferreira, DR;

Publicação
OpenAccess Series in Informatics

Abstract
Developing applications for resource-constrained embedded systems is a challenging task specially when applications must adapt to changes in their operating conditions or environment. To ensure an appropriate response at all times, it is highly desirable to develop applications that can dynamically adapt their behavior at run-time. In this paper we introduce an architecture that allows the specification of adaptable behavior through an external, high-level and platform-independent domain-specific language (DSL). The DSL is used here to define adaptation rules that change the run-time behavior of the application depending on various operational factors, such as time constraints. We illustrate the use of the DSL in an application to mobile robot navigation using smartphones, where experimental results highlight the benefits of specifying the adaptable behavior in a flexible and external way to the main application logic. © André C. Santos, João M. P. Cardoso, Pedro C. Diniz and Diogo R. Ferreira.

2013

An FPGA-based multi-core approach for pipelining computing stages

Autores
Azarian, A; Cardoso, JMP; Werner, S; Becker, J;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of Producer-Consumer tasks. This paper proposes an approach to achieve pipelining execution of Producer-Consumer pairs of tasks in FPGA-based multi-core architectures. Our approach is able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the Producer-Consumer tasks. In order to improve performance, we propose a technique to optimize out-of-order Producer-Consumer pairs where the consumer uses more than once each data element produced, a behavior present in many applications (e.g., in image processing). All the schemes and optimizations proposed in this paper were evaluated with FPGA implementations. The experimental results show the feasibility of the approach in both in-order and out-of-order Producer-Consumer tasks. Furthermore, the results using our approach to task-level pipelining and a multi-core architecture reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining. Copyright 2013 ACM.

2013

The MATISSE MATLAB Compiler

Autores
Bispo, J; Pinto, P; Nobre, R; Carvalho, T; Cardoso, JMP; Diniz, PC;

Publicação
2013 11TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN)

Abstract
This paper describes MATISSE, a MATLAB to C compiler targeting embedded systems that is based on Strategic and Aspect-Oriented Programming concepts. MATISSE takes as input: (1) MATLAB code and (2) LARA aspects related to types and shapes, code insertion/ removal, and specialization based directives defining default variable values. In this paper we also illustrate the use of MATISSE in leveraging data types and shapes to generate customized C code suitable for high-level hardware synthesis tools. The preliminary experimental results presented here reveal the described approach to yield performance results for the resulting hardware and software references implementations that are comparable in terms of performance with hand-crafted solutions but derived automatically at a fraction of the cost.

2013

Controlling a complete hardware synthesis toolchain with LARA aspects

Autores
Cardoso, JMP; Carvalho, T; Coutinho, JGF; Nobre, R; Nane, R; Diniz, PC; Petrov, Z; Luk, W; Bertels, K;

Publicação
MICROPROCESSORS AND MICROSYSTEMS

Abstract
The synthesis and mapping of applications to configurable embedded systems is a notoriously complex process. Design-flows typically include tools that have a wide range of parameters which interact in very unpredictable ways, thus creating a large and complex design space. When exploring this space, designers must manage the interfaces between different tools and apply, often manually, a sequence of tool-specific transformations making design exploration extremely cumbersome and error-prone. This paper describes the use of techniques inspired by aspect-oriented technology and scripting languages for defining and exploring hardware compilation strategies. In particular, our approach allows developers to control all stages of a hardware/software compilation and synthesis toolchain: from code transformations and compiler optimizations to placement and routing for tuning the performance of application kernels. Our approach takes advantage of an integrated framework which provides a transparent and unified view over toolchains, their data output and the control of their execution. We illustrate the use of our approach when designing application-specific hardware architectures generated by a toolchain composed of high-level source-code transformation and synthesis tools. The results show the impact of various strategies when targeting custom hardware and expose the complexities in devising these strategies, hence highlighting the productivity benefits of this approach.

2013

Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units

Autores
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;

Publicação
International Journal of Reconfigurable Computing

Abstract
The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 × to 3.69 × were achieved for the best alternative using a MicroBlaze processor with local memory. © 2013 João Bispo et al.

  • 490
  • 658