Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por André Martins Pereira

2025

Machine Learning Regression-Based Prediction for Improving Performance and Energy Consumption in HPC Platforms

Autores
Coelho, M; Ocana, K; Pereira, A; Porto, A; Cardoso, DO; Lorenzon, A; Oliveira, R; Navaux, POA; Osthoff, C;

Publicação
HIGH PERFORMANCE COMPUTING, CARLA 2024

Abstract
High-performance computing is pivotal for processing large datasets and executing complex simulations, ensuring faster and more accurate results. Improving the performance of software and scientific workflows in such environments requires careful analysis of their computational behavior and energy consumption. Therefore, maximizing computational throughput in these environments, through adequate software configuration and resource allocation, is essential for improving performance. The work presented in this paper focuses on leveraging regression-based machine learning and decision trees to analyze and optimize resource allocation in high-performance computing environments based on application's performance and energy metrics. Applied to a bioinformatics case study, these models enable informed decision-making by selecting the appropriate computing resources to enhance the performance of a phylogenomics software. Our contribution is to better explore and understand the efficient resource management of supercomputers, namely Santos Dumont. We show that the predictions for application's execution time using the proposed method are accurate for various amounts of computing nodes, while energy consumption predictions are less precise. The application parameters most relevant for this work are identified and the relative importance of each application parameter to the accuracy of the prediction is analysed.

2023

HEP-Frame: an efficient tool for big data applications at the LHC

Autores
Pereira, A; Onofre, A; Proenca, A;

Publicação
EUROPEAN PHYSICAL JOURNAL PLUS

Abstract
HEP-Frame is a new C++ package designed to efficiently perform analyses of datasets from a very large number of events, like those available at the Large Hadron Collider (LHC) at CERN, Geneva. It mainly targets high-performance servers and mini-clusters, and it was designed for natural science researchers with a user-friendly interface to access structured databases. HEP-Frame automatically evaluates the underlying computing resources and builds an adequate code skeleton when creating a data analysis application. At run-time, HEP-Frame analyses a sequence of datasets exploring the available parallelism in the code and hardware resources: it concurrently reads inputs from a user-defined data structure and processes them, following the user-specific sequence of requirements to select relevant data; it manages the efficient execution of that sequence; and it outputs results in userdefined objects (e.g., ROOT structures), stored together with the used input dataset. This paper shows how a domain expert software development can benefit from HEP-Frame, and how it significantly improved the performance of analyses of large datasets produced in proton-proton collisions at the LHC. Two case studies are discussed: the associated production of top quarks together with a Higgs boson (t (t) over barH) at the LHC, and a double- and single-top quark productions at the high-luminosity phase of the LHC (HL-LHC). Results show that the HEP-Frame awareness of the analysis code behaviour and structure, and the underlying hardware system, provides powerful and transparent parallelization mechanisms that largely improve the execution time of data analysis applications.

2021

PRNG-Broker: A High-Performance Broker to Supply Parallel Streams of Pseudorandom Numbers for Large-Scale Simulations

Autores
Pereira, A; Proenca, A;

Publicação
Advances in Parallel & Distributed Processing, and Applications - Transactions on Computational Science and Computational Intelligence

Abstract

2024

Berry: A code for the differentiation of Bloch wavefunctions from DFT calculations

Autores
Reascos, L; Carneiro, F; Pereira, A; Castro, NF; Ribeiro, RM;

Publicação
COMPUTER PHYSICS COMMUNICATIONS

Abstract
Density functional calculation of electronic structures of materials is one of the most used techniques in theoretical solid state physics. These calculations retrieve single electron wavefunctions and their eigenenergies. The berry suite of programs amplifies the usefulness of DFT by ordering the eigenstates in analytic bands, allowing the differentiation of the wavefunctions in reciprocal space. It can then calculate Berry connections and curvatures and the second harmonic generation conductivity. The berry software is implemented for two dimensional materials and was tested in hBN and InSe. In the near future, more properties and functionalities are expected to be added.Program summary Program Title: berry CPC Library link to program files: https://doi .org /10 .17632 /mpbbksz2t7 .1 Developer's repository link: https://github .com /ricardoribeiro -2020 /berry Licensing provisions: MIT Programming language: Python3 Nature of problem: Differentiation of Bloch wavefunctions in reciprocal space, numerically obtained from a DFT software, applied to two dimensional materials. This enables the numeric calculation of material's properties such as Berry geometries and Second Harmonic conductivity. Solution method: Extracts Kohn-Sham functions from a DFT calculation, orders them by analytic bands using graph and AI methods and calculates the gradient of the wavefunctions along an electronic band. Additional comments including restrictions and unusual features: Applies only to two dimensional materials, and only imports Kohn-Sham functions from Quantum Espresso package.

2021

HEP-Frame: Improving the efficiency of pipelined data transformation & filtering for scientific analyses

Autores
Pereira, A; Proenca, A;

Publicação
COMPUTER PHYSICS COMMUNICATIONS

Abstract
Software to analyse very large sets of experimental data often relies on a pipeline of irregular computational tasks with decisions to remove irrelevant data from further processing. A user-centred framework was designed and deployed, HEP-Frame, which aids domain experts to develop applications for scientific data analyses and to monitor and control their efficient execution. The key feature of HEP-Frame is the performance portability of the code across different heterogeneous platforms, due to a novel adaptive multi-layer scheduler, seamlessly integrated into the tool, an approach not available in competing frameworks. The multi-layer scheduler transparently allocates parallel data/tasks across the available heteroge-neous resources, dynamically balances threads among data input and computational tasks, adaptively reorders in run-time the parallel execution of the pipeline stages for each data stream, respecting data dependencies, and efficiently manages the execution of library functions in accelerators. Each layer implements a specific scheduling strategy: one balances the execution of the computational stages of the pipeline, distributing the execution of the stages of the same or different dataset elements among the available computing threads; another controls the order of the pipeline stages execution, so that most data is filtered out earlier and later stages execute the computationally heavy tasks; yet another adaptively balances the automatically created threads among data input and the computational tasks, taking into account the requirements of each application. Simulated data analyses from sensors in the ATLAS Experiment at CERN evaluated the scheduler efficiency, on dual multicore Xeon servers with and without accelerators, and on servers with the many-core Intel KNL. Experimental results show significant improved performance of these data analyses due to HEP-Frame features and the codes scaled well on multiple servers. Results also show the improved HEP-Frame scheduler performance over the key competitor, the HEFT list scheduler. The best overall performance improvement over a real fine tuned sequential data analysis was impressive in both homogeneous and heterogeneous multicore servers and in many-core servers: 81x faster in the homogeneous 24+24 core Skylake server, 86x faster in the heterogeneous 12+12 core Ivy Bridge server with the Kepler GPU, and 252x faster in the 64-core KNL server. Program summary Program Title: HEP-Frame CPC Library link to program files: https://doi.org/10.17632/m2jwxshtfz.1 Licencing provisions: GPLv3 Programming language: C++. Supplementary material: The current HEP-Frame public release available at https://bitbucket.org/ ampereira/hep-frame/wiki/Home . Nature of problem: Scientific data analysis applications are often developed to process large amounts of data obtained through experimental measurements or Monte Carlo simulations, aiming to identify patterns in the data or to test and/or validate theories. These large inputs are usually processed by a pipeline of computational tasks that may filter out irrelevant data (a task and its filter is addressed as a proposition in this communication), preventing it from being processed by subsequent tasks in the pipeline. This data filtering, coupled with the fact that propositions may have different computational intensities, contribute to the irregularity of the pipeline execution. This can lead to scientific data analyses I/O-, memory-, or compute-bound performance limitations, depending on the implemented algorithms and input data. To allow scientists to process more data with more accurate results their code and data structures should be optimized for the computing resources they can access. Since the main goal of most scientists is to obtain results relevant to their scientific fields, often within strict deadlines, optimizing the performance of their applications is very time consuming and is usually overlooked. Scientists require a software framework to aid the design and development of efficient applications and to control their parallel execution on distinct computing platforms. Solution method: This work proposes HEP-Frame, a framework to aid the development and efficient execution of pipelined scientific analysis applications on homogeneous and heterogeneous servers. HEP-Frame is a user-centred framework to aid scientists to develop applications to analyse data from a large number of dataset elements, with a flexible pipeline of propositions. It not only stresses the interface to domain experts so that code is more robust and is developed faster, but it also aims high-performance portability across different types of parallel computing platforms and desirable sustainability features. This framework aims to provide efficient parallel code execution without requiring user expertise in parallel computing. Frameworks to aid the design and deployment of scientific code usually fall into two categories: (i) resource-centred, closer to the computing platforms, where execution efficiency and performance portability are the main goals, but forces developers to adapt their code to strict framework con-straints; (ii) user-centred, which stresses the interface to domain experts to improve their code development speed and robustness, aiming to provide desirable sustainability features but disregarding the execution performance. There are also a set of frameworks that merge these two categories (Liu et al., 2015 [1]; Deelman et al., 2015 [2]) for scientific computing. While they do not have steep learning curves, concessions have to be made to their ease of use to allow for their broader scope of targeted applications. HEP-Frame attempts to merge this gap, placing itself between a fully user-or resource-centred framework, so that users develop code quickly and do not have to worry about the computational efficiency of the code It handles (i) by ensuring efficient execution of applications according to their computational requirements and the available resources on the server through a multi-layer scheduler, while (ii) is addressed by automatically generating code skeletons and transparently managing the data structure and automating repetitive tasks. Additional comments: An early stage proof-of-concept was published in a conference proceedings (Pereira et al., 2015). However, the HEP-Frame version presented in this communication only shares a very small portion of the code related to the skeleton generation (less than 5% of the overall code), while the rest of the user interface, multi-layer scheduler, and parallelization strategies were completely redesigned and re-implemented.

2016

Tuning Pipelined Scientific Data Analyses for Efficient Multicore Execution

Autores
Pereira, A; Onofre, A; Proenca, A;

Publicação
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016)

Abstract
Scientific data analyses often apply a pipelined sequence of computational tasks to independent datasets. Each task in the pipeline captures and processes a dataset element, may be dependent on other tasks in the pipeline, may have a different computational complexity and may be filtered out from progressing in the pipeline. The goal of this work is to develop an efficient scheduler that automatically (i) manages a parallel data reading and an adequate data structure creation, (ii) adaptively defines the most efficient order of pipeline execution of the tasks, considering their inter-dependence and both the filtering out rate and the computational weight, and (iii) manages the parallel execution of the computational tasks in a multicore system, applied to the same or to different dataset elements. A real case study data analysis application from High Energy Physics (HEP) was used to validate the efficiency of this scheduler. Preliminary results show an impressive performance improvement of the pipeline tuning when compared to the original sequential HEP code (up to a 35x speedup in a dual 12-core system), and also show significant performance speedups over conventional parallelization approaches of this case study application (up to 10x faster in the same system).

  • 1
  • 2