Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Research Opportunities
Apply now View Formal Call
Research Opportunities

Informatics

[Open soon]

Work description

Responsibilities under the grant: - Desing of new mechanisms to monitor the performance and energy consumption of model training workloads in advanced computing infrastructures. - Design techniques and mechanisms to improve GPU performance and energy efficiency, with minimal impact onkey training metrics, such as execution time and accuracy. - Integration and evaluation of the proposed techniques in large-scale, high-performance computing environments (i.e., supercomputers). - Conduct experimental evaluations of the developed techniques, using a variety of deep learning models and hardware devices (e.g., various processing and storage devices). - Writing of technical reports and scientific papers.

Academic Qualifications

Enrolled in the doctoral program of informatics or informatics engineering.

Minimum profile required

Experience in designing energy monitoring tools, with particular focus on multi-threaded and distributed scenarios.Experience with observability tools, particularly OpenTelemetry.Solid knowledge and experience in machine learning, deep learning, and large-language models (i.e., ResNet18, ResNet50, AlexNet, VGG19, Llama, Qwen, GPT).Solid knowledge of the training pipeline and respective performance bottlenecks.Knowledge and experience with high-performance computing environments, including scripting, experimental evaluations, collection and analysis of performance, resource usage, and energy consumption metrics.

Preference factors

- Experience with deep learning frameworks, including PyTorch, TensorFlow, and DeepSpeed. - Knowledge of performance and energy consumption optimizations designed for DL training. - Knowledge in operating systems and distributed systems. - Experience with Python, C++, and Go programming languages.

Application Period

Since 20 Nov 2025 to 04 Dec 2025

[Open soon]

Centre

High-Assurance Software

Scientific Advisor

Ricardo Gonçalves Macedo