Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Research Opportunities
Apply now View Formal Call
Research Opportunities

Distributed Systems

Work description

Responsibilities under the grant: - Design techniques and mechanisms for managing the performance and energy consumption of GPUs used in deep learning within distributed environments. - Integration and evaluation of the proposed techniques in large-scale, high-performance computing environments (i.e., supercomputers). - Conduct experimental evaluations of the developed techniques, using a variety of deep learning models and hardware devices (e.g., various processing and storage devices). - Production of technical reports and scientific articles.

Academic Qualifications

- Enrolled in the doctoral program of informatics or informatics engineering.

Minimum profile required

- Solid knowledge and experience in the design of machine learning, deep learning, and large-language models (i.e., ResNet18, ResNet50, AlexNet, VGG19, Llama, Qwen, GPT).- Solid knowledge of the training pipeline and respective performance bottlenecks.- Knowledge and experience with high-performance computing environments, including scripting, experimental evaluations, collection and analysis of performance, resource usage, and energy consumption metrics.

Preference factors

- Experience with deep learning frameworks, including PyTorch, TensorFlow, and DeepSpeed. - Knowledge of performance and energy consumption optimizations designed for DL training. - Knowledge in operating systems and distributed systems. - Experience with Python and C++ programming languages.

Application Period

Since 26 Jun 2025 to 09 Jul 2025

Centre

High-Assurance Software

Scientific Advisor

Ricardo Gonçalves Macedo