Publicacoes - INESC TEC

Publicações

Publicações por João Canas Ferreira

2019

A reliable wearable system for BAN applications with a high number of sensors and high data rate

Autores
Derogarian, F; Ferreira, JC; Tavares, VG; Silva, JM; Velez, FJ;

Publicação
Wearable Technologies and Wireless Body Sensor Networks for Healthcare

Abstract
This chapter addresses a wearable body area network (BAN) system for both medical and nonmedical applications, especially those including a large number of sensors at BAN scale (<250), embedded in textile and with high data rate (<9+9 MHz) communication demands. The overall system includes an on-body central processing module (CPM) connected to a computer via a wireless link and a wearable sensor network. Due to the fixed location of the sensors and the possibility of using conductive yarns in textiles, a wired network has been considered for the wearable components. Employing conductive yarns instead of using wireless links provides a more reliable communication, higher data rates and throughput, and less power consumption. The wearable unit is composed of two types of circuits, the sensor nodes (SNs) and a base station (BS), all connected to each other with conductive yarns forming a mesh topology with the base node at the center. The reliability analysis shows that communication in a multi-hop connection of sensors in mesh topology is more reliable than in the conventional star topology. From the standpoint of the network, each SN is a four port router capable of handling packets from destination nodes to the BS. The end-to-end communication uses packet switching for packet delivery from SNs to the BS or in the reverse direction, or between SNs. The communication module has been implemented in a low power field programmable gate arrays (FPGA) and a microcontroller. The maximum data rate of the system is 9+9 Mbps while supporting tens of sensors, which is much more than current BAN applications need. The suitability of the proposed system for utilization in real applications has been demonstrated experimentally. © The Institution of Engineering and Technology 2017.

FecharLer Abstract

2019

Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent

Autores
Lopes, FE; Ferreira, JC; Fernandes, MAC;

Publicação
ELECTRONICS

Abstract
Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support Vector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that reason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better option for massive data mining applications. Furthermore, even with the use of SGD, training times can become extremely large depending on the data set. For this reason, accelerators such as Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in hardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed FPGA implementation of an SVM with SGD presents speedups of more than 10,000x relative to software implementations running on a quad-core processor and up to 319x compared to state-of-the-art FPGA implementations while requiring fewer hardware resources. The results show that the proposed architecture is a viable solution for highly demanding problems such as those present in big data analysis.

FecharLer Abstract

2019

Preface to the Special Issue on Methods, Tools, and Architectures for Signal and Image Processing

Autores
Ferreira, JC; Palumbo, F;

Publicação
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY

Abstract

2020

A Dynamically Reconfigurable Dual-Waveform Baseband Modulator for Flexible Wireless Communications

Autores
Ferreira, ML; Ferreira, JC;

Publicação
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY

Abstract
In future wireless communication systems, several radio access technologies will coexist and interwork to provide a great variety of services with different requirements. Thus, the design of flexible and reconfigurable hardware is a relevant topic in wireless communications. The combination of high performance, programmability and flexibility makes Field-programmable gate array a convenient platform to design such systems, especially for base stations. This paper describes a dynamically reconfigurable baseband modulator for Orthogonal Frequency Division Multiplexing and Filter-bank Multicarrier modulation waveforms implemented on a Virtex-7 board. The design features Dynamic Partial Reconfiguration (DPR) capabilities to adapt its mode of operation at run-time and is compared with a functionally equivalent static multi-mode design regarding processing throughput, resource utilization, functional density and power consumption. The DPR-based design implementation reserves about half the resources used by static multi-mode counterpart. Consequently, the baseband processing dynamic power consumption observed in the DPR-based design is between 26 mW to 90 mW lower than in the static multi-mode design, representing a dynamic power reduction between 13% to 52%. The worst-case DPR latency measured was 1.051 ms, while the DPR energy overhead is below 1.5 mJ. Considering latency requirements for modern wireless standards and power consumption constraints for commercial base stations, the DPR application is shown to be valuable in multi-standard and multi-mode systems, as well as in scenarios such as multiple-input and multiple-output or dynamic spectrum aggregation.

FecharLer Abstract

2020

Parallel Implementation of K-Means Algorithm on FPGA

Autores
Dias, LA; Ferreira, JC; Fernandes, MAC;

Publicação
IEEE ACCESS

Abstract
The K-means algorithm is widely used to find correlations between data in different application domains. However, given the massive amount of data stored, known as Big Data, the need for high-speed processing to analyze data has become even more critical, especially for real-time applications. A solution that has been adopted to increase the processing speed is the use of parallel implementations on FPGA, which has proved to be more efficient than sequential systems. Hence, this paper proposes a fully parallel implementation of the K-means algorithm on FPGA to optimize the system & x2019;s processing time, thus enabling real-time applications. This proposal, unlike most implementations proposed in the literature, even parallel ones, do not have sequential steps, a limiting factor of processing speed. Results related to processing time (or throughput) and FPGA area occupancy (or hardware resources) were analyzed for different parameters, reaching performances higher than 53 millions of data points processed per second. Comparisons to the state of the art are also presented, showing speedups of more than over a partially serial implementation.

FecharLer Abstract

2020

Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A Survey

Autores
Paulin, N; Ferreira, JC; Cardoso, JMP;

Publicação
ACM COMPUTING SURVEYS

Abstract
The breakdown of Dennard scaling has resulted in a decade-long stall of the maximum operating clock frequencies of processors. To mitigate this issue, computing shifted to multi-core devices. This introduced the need for programming flows and tools that facilitate the expression of workload parallelism at high abstraction levels. However, not all workloads are easily parallelizable, and the minor improvements to processor cores have not significantly increased single-threaded performance. Simultaneously, Instruction Level Parallelism in applications is considerably underexplored. This article reviews notable approaches that focus on exploiting this potential parallelism via automatic generation of specialized hardware from binary code. Although research on this topic spans over more than 20 years, automatic acceleration of software via translation to hardware has gained new importance with the recent trend toward reconfigurable heterogeneous platforms. We characterize this kind of binary acceleration approach and the accelerator architectures on which it relies. We summarize notable state-of-the-art approaches individually and present a taxonomy and comparison. Performance gains from 2.6x to 5.6x are reported, mostly considering bare-metal embedded applications, along with power consumption reductions between 1.3x and 3.9x. We believe the methodologies and results achievable by automatic hardware generation approaches are promising in the context of emergent reconfigurable devices.

FecharLer Abstract