Improved design of an existing benchmark to evaluate different data processing tools on scalable DataFrames; Implementation of new advanced features of the existing benchmark that allow the evaluation and analysis of trade-offs between performance and energy efficiency in distributed environments with massive data sets; Incorporation in the benchmark of the appropriate metrics that allow analyzing the differences between different tools in the trade-offs between performance and energy efficiency, with special emphasis on communication patterns/consumption frameworks in distributed mode; Writing a scientific paper on the benchmark and its use for evaluating various tools for processing data over scalable distributed-mode DataFrames. The tasks described in this work plan require the application and development of concepts and techniques in the area of Computer Engineering typically taught in curricular units that compose the core of the study plan of Integrated Masters in Computer Engineering or Masters in Computer Engineering.
BSc Degree in Informatics Engineering or similar.
Minimum profile required
Knowledge in distributed systems and benchmarking.
Experience evaluating tools in a distributed context; Experience with scalable data processing tools with a pandas-like interface.
Since 23 Feb 2023 to 08 Mar 2023