Computer Science
Work description
The work plan falls within the scope of the project supporting the European supercomputing infrastructure, with the participation of INESCTEC. The main focus is work plan is to support users of supercomputing infrastructures, especially in the optimization and profiling of scientific applications, focusing on ML/DL applications, in pre-exascale and exascale environments.
Academic Qualifications
PhD in Computer Science, Mathematics, Statistics, Physics, Biomedical Sciences, or a related scientific field, and possessing a scientific and professional curriculum that demonstrates a profile suited to the activities to be carried out, are eligible to apply for this position.
Minimum profile required
a) Proven experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn;b) Solid experience in ML/DL algorithms, ML pipelines, experience on LLMs deployment or/and fine-tuning;c) Programming skills in Python (plus experience with Git, Docker, or Linux);d) Proficiency in Portuguese and English.
Preference factors
- Experience in LLM and DL model optimization, including: o Prompt tuning, evaluation, and model monitoring; o Model quantization for memory reduction and inference acceleration; o Efficient Fine-Tuning techniques, such as LoRA, PEFT, and related methods; o Mixture-of-Experts (MoE) models and strategies for scalability. - Strong knowledge of High-Performance Computing (HPC), with experience in: o Distributed systems and parallelism for large-scale training; o HPC resource and queue managers (e.g., SLURM); o Training and inference optimization on heterogeneous architectures (CPU/GPU, Multi-GPU, Multi-Node); o CUDA and kernel optimization for GPU acceleration. - Familiarity with distributed computing frameworks and libraries, such as Ray, DeepSpeed, or Horovod. - Experience with continuous monitoring and evaluation, including techniques for: o Python code profiling, debugging, and bottleneck monitoring; o Continuous model monitoring and evaluation (e.g., MLflow), detection of data/model drift; o Performance, latency, and accuracy evaluation in production. - Advanced Linux expertise and best development practices in HPC environments.
Application Period
Since 11 Sep 2025 to 24 Sep 2025
Centre
High-Assurance Software