- Design of a benchmark to evaluate different data processing tools on scalable DataFrames; - Implementation of an initial prototype of the benchmark that allows the evaluation and analysis of trade-offs between performance and energy efficiency in distributed environments with massive datasets; - Incorporation in the benchmark of the collection of metrics on computational resources (CPU, Disk, Network) that will allow the analysis of the differences between the different tools with special emphasis on communication patterns/consumption of frameworks in distributed mode. The tasks described in this work plan require the application and development of concepts and techniques in the area of Computer Engineering typically taught in curricular units that compose the core of the study plan of Integrated Masters in Computer Engineering or Masters in Computer Engineering.
- BSc Degree in Informatics Engineering or related area.
Minimum profile required
- Knowledge in distributed systems.
- Experience evaluating tools in a distributed context; - Experience with scalable data processing tools with a pandas-like interface.
Since 02 Nov 2022 to 15 Nov 2022
Cluster / Centre
Computer Science / High-Assurance Software