Eduardo Nuno Almeida and Luís Vilaça (CTM)

Eduardo Nuno Almeida and Luís Vilaça (CTM)

“Our researchers Luis Vilaça and Eduardo Almeida demonstrated great dedication and teamwork spirit during the major update of CTM SLURM platform, preserving the quality of their activities within the scope of their PhD, and the projects in which they are involved. In addition to the complex update and resolution of associated issues, the researchers prepared a new version of the user manual, currently available to all CTM researchers. This nomination stems from their professionalism, quality, and sense of duty. The impact of the results on the Centre’s activities became clear – with the increase of available resources, efficiency and equal distribution among all members of the Centre. Congratulations to both”.

– CTM coordinators

You worked on updating the SLURM platform; could you describe this platform (main purpose, current and potential applications, etc.) and outline its evolution from earlier stages to the present day?

Slurm is the shared computational infrastructure of CTM. This platform is composed of a cluster of GPUs and CPUs that can be used to run complex computational tasks, such as multimedia processing and machine learning. The computational resources are managed in an automatic and centralised way, with the resources being allocated to tasks according to their availability and the requirements of the task. This mechanism guarantees an efficient use of the shared resources, considering the tasks’ waiting times and ensuring fair access to the resources.
This platform emerged from the increasing needs of CTM’s researchers to carry out experiments requiring advanced computational power. Given the cost of these resources and the fact that each researcher uses it intermittently, it is essential to efficiently manage and distribute these resources throughout the Centre. The first version of Slurm was introduced at CTM in 2022. Since then, the platform’s computational resources expanded, and the infrastructure and user manuals were improved, culminating in the introduction of a new version of Slurm in December 2023. Additionally, we refined the issue management system, which allows CTM’s researchers to report feedback and leave suggestions to improve the platform.

What were the main challenges during this process? How can Slurm support the work and activities of the Centre?

Currently, Slurm is an essential platform to the research activities of CTM. As such, it is important that the platform’s maintenance is as fast as possible, in order to minimise its impact on the Centre’s activities. Given the scale and complexity of this maintenance, we had to plan and prepare all steps of this process, as well as predicting possible issues that could arise during the process. Moreover, we needed to coordinate and collaborate with SAS and SRC, so that the process was as efficient and fast as possible.
With this new version of the platform, CTM’s researchers have access to more and better computational resources, whose efficiency has been optimised, as well as the fairness in their distribution throughout the Centre. Additionally, the improvement of Slurm’s user manual has been helping researchers learn how to use the platform on their own.

Which aspects of your job do you enjoy the most?

In addition to our research work as PhD students in telecommunications and multimedia communications technologies, we feel motivated by the opportunity to contribute to the development of new solutions to support CTM’s research activities, such as Slurm. This opportunity has allowed us to learn, consolidate and apply important skills in systems administration and DevOps, which are highly demanded by the technology industry.

How do you comment on this nomination?

We would like to thank the CTM coordinators for the recognition of our work and effort. The implementation of this project entails multiple technical challenges, considering the complexity of the platform, the objectives outlined by the coordination and the requirements of the researchers. Most of these challenges derive from the need to create an effective, scalable and easy-to-maintain solution, whose goals have been achieved through an excellent team spirit and knowledge sharing. We feel fulfilled with the work developed, which has resulted in the establishment of Slurm as an important pillar of the Centre’s research activities. We would also like to thank CTM’s coordination, particularly Filipe Ribeiro, for the support provided during the development of this project.

Next Post
PHP Code Snippets Powered By : XYZScripts.com
EnglishPortugal