Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2019

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Authors
Fortuna, P; Rocha da Silva, JR; Soler Company, J; Wanner, L; Nunes, S;

Publication
THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE

Abstract
Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

2019

Dynamic Partial Reconfiguration of Customized Single-Row Accelerators

Authors
Paulino, NMC; Ferreira, JC; Cardoso, JMP;

Publication
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Abstract
The use of specialized accelerator circuits is a feasible solution to address performance and energy issues in embedded systems. This paper extends a previous field-programmable gate array-based approach that automatically generates pipelined customized loop accelerators (CLAs) from runtime instruction traces. Despite efficient acceleration, the approach suffered from high area and resource requirements when offloading a large number of kernels from the target application. This paper addresses this by enhancing the CLA with dynamic partial reconfiguration (DPR) support. Each kernel to accelerate is implemented as a variant of a reconfigurable area of the CLA which hosts all functional units and configuration memory. Evaluation of the proposed system is performed on a Virtex-7 device. We show, for a set of 21 kernels, that when comparing two CLAs capable of accelerating the same subset of kernels, the one which benefits from DPR can be up to 4.3x smaller. Resorting to DPR allows for the implementation of CLAs which support numerous kernels without a significant decrease in operating frequency and does not affect the initiation intervals at which kernels are scheduled. Finally, the area required by a CLA instance can be further reduced by increasing the IIs of the scheduled kernels.

2019

Fast Heuristic-Based GPU Compiler Sequence Specialization

Authors
Nobre, R; Reis, L; Cardoso, JMP;

Publication
EURO-PAR 2018: PARALLEL PROCESSING WORKSHOPS

Abstract
Iterative compilation focused on specialized phase orders (i.e., custom selections of compiler passes and orderings for each program or function) can significantly improve the performance of compiled code. However, phase ordering specialization typically needs to deal with large solution space. A previous approach, evaluated by targeting an x86 CPU, mitigates this issue by first using a training phase on reference codes to produce a small set of high-quality reusable phase orders. This approach then uses these phase orders to compile new codes, without any code analysis. In this paper, we evaluate the viability of using this approach to optimize the GPU execution performance of OpenCL kernels. In addition, we propose and evaluate the use of a heuristic to further reduce the number of evaluated phase orders, by comparing the speedups of the resulting binaries with those of the training phase for each phase order. This information is used to predict which untested phase order is most likely to produce good results (e.g., highest speedup). We performed our measurements using the PolyBench/GPU OpenCL benchmark suite on an NVIDIA Pascal GPU. Without heuristics, we can achieve a geomean execution speedup of 1.64x, using cross-validation, with 5 non-standard phase orders. With the heuristic, we can achieve the same speedup with only 3 non-standard phase orders. This is close to the geomean speedup achieved in our iterative compilation experiments exploring thousands of phase orders. Given the significant reduction in exploration time and other advantages of this approach, we believe that it is suitable for a wide range of compiler users concerned with performance.

2019

Supporting the Scale-up of High Performance Application to Pre-Exascale Systems: The ANTAREX Approach

Authors
Silvano, C; Agosta, G; Bartolini, A; Beccari, AR; Benini, L; Besnard, L; Bispo, J; Cmar, R; Cardoso, JMP; Cavazzoni, C; Cesarini, D; Cherubin, S; Ficarelli, F; Gadioli, D; Golasowski, M; Lasri, I; Libri, A; Manelfi, C; Martinovic, J; Palermo, G; Pinto, P; Rohou, E; Sanna, N; Slaninova, K; Vitali, E;

Publication
2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP)

Abstract
The ANTAREX project developed an approach to the performance tuning of High Performance applications based on an Aspect-oriented Domain Specific Language (DSL), with the goal to simplify the enforcement of extra-functional properties in large scale applications. The project aims at demonstrating its tools and techniques on two relevant use cases, one in the domain of computational drug discovery, the other in the domain of online vehicle navigation. In this paper, we present an overview of the project and of its main achievements, as well as of the large scale experiments that have been planned to validate the approach.

2019

Graph-Based Code Restructuring Targeting HLS for FPGAs

Authors
Ferreira, AC; Cardoso, JMP;

Publication
Applied Reconfigurable Computing - 15th International Symposium, ARC 2019, Darmstadt, Germany, April 9-11, 2019, Proceedings

Abstract
High-level synthesis (HLS) is of paramount importance to enable software developers to map critical computations to FPGA-based hardware accelerators. However, in order to generate efficient hardware accelerators one needs to apply significant code transformations and adequately use the directive-driven approach, part of most HLS tools. The code restructuring and directives needed are dependent not only of the characteristics of the input code but also of the HLS tools and target FPGAs. These aspects require a deep knowledge about the subjects involved and tend to exclude software developers. This paper presents our recent approach for automatic code restructuring targeting HLS tools. Our approach uses an unfolded graph representation, which can be generated from program execution traces, and graph-based optimizations, such as folding, to generate suitable HLS C code. In this paper, we describe the approach and the new optimizations proposed. We evaluate the approach with a number of representative kernels and the results show its capability to generating efficient hardware implementations only achievable using manual restructuring of the input software code and manual insertion of adequate HLS directives. © 2019, Springer Nature Switzerland AG.

2019

Nonio - modular automatic compiler phase selection and ordering specialization framework for modern compilers

Authors
Nobre, R; Bispo, J; Carvalho, T; Cardoso, JMP;

Publication
SOFTWAREX

Abstract
This article presents Nonio, a modular, easy-to-use, design space exploration framework focused on exploring custom combinations of compiler flags and compiler sequences. We describe the framework and discuss its use with two of the most popular compiler toolchains, GCC and Clang+LLVM. Particularly, we discuss implementation details in the context of flag selection, when using GCC, and phase selection and ordering, when using Clang+LLVM. The framework software organization allows to easily add new components as plug-ins (e.g., an exploration algorithm, an objective metric, integration with another compiler toolchain). The software architecture provides well-defined interfaces, in order to enable seamless composition and interaction between different components. We present, as an example, a use case where we rely on Nonio to obtain custom compiler flags for reducing the execution time and the energy consumption of a C program, in relation to the best predetermined optimization settings provided by the compiler (e.g., -O3). (C) 2019 The Authors. Published by Elsevier B.V.

  • 237
  • 663