Ricardo Cruz (CTM)

Ricardo Cruz (CTM)

“The coordinators of the CTM would like to nominate researcher Ricardo Cruz, for the implementation of the SLURM resource management system on the Centre’s high-performance computing platform. This solution, planned by former CTM member Carlos Leocádio, in partnership with the Systems Administration Service (SAS), speeds up the management of computing resources (GPUs) by the several members of the Centre, who have an increasing need to use said resources within the scope of their R&D activities. Besides carrying out research tasks within the scope of his PhD (including writing the thesis), while providing scientific support to other younger colleagues, Ricardo showed significant willingness to lead the implementation of this solution. Supported by Luis Salgado, Ricardo showed a proactive attitude, resourcefulness and highly diversified technical skills in this engineering project”.

– CTM coordinators

Given the current context, what challenges did you face while carrying out your tasks?

Two aspects require a lot of work when setting up a platform like this one (which allows the sharing of computational resources – in our case, GPUs – between multiple users). There is the configuration itself and the need to provide support to the users, with the majority of them being Master’s students, needing more help than usual. As for the current context, it makes it more difficult to understand what colleagues are doing and whom to talk to in certain situations.

How did you overcome these challenges?

Our SLURM resource platform is currently in use, but the initial effort focused on having a proof of concept that we can now scale up – both at the users and the CTM resources’ levels. My esteemed college Luís Salgado helped me during this stage of maturity, while our assistant-coordinator Filipe Ribeiro supported the entire process.

In addition, and whenever I faced issues that required physical access to the servers, the SAS team was tireless and quick to address said problems.

Which aspects of your job do you enjoy the most? In your opinion, what is the main differentiating aspect of the initiative?

Our former colleague Carlos Leocádio initially proposed the project, and I was a somewhat sceptical myself. Usually, what we normally did was accessing the different servers available, and check them one-by-one, to make sure resources were available. We manually reserved the resources we needed to use. Nobody likes to change their work habits; but, understandably, this type of usage was not scalable, nor did it take advantage of the full potential of our servers, because sometimes the use rate as 0% (on weekends, for instance), but other occasions it reached 100%, forcing people to wait or ask someone else to free up resources. This platform unifies resources, which makes it surprisingly convenient for users to perform their tasks without thinking about issues such as which GPU to reserve; in addition, and whenever the use rate reaches 100%, the platform queues the new task automatically, before executing it whenever possible. It is a little different from the traditional method, but for now, our users are able to resort to this automatic platform to carry out tasks they did manually. The most complex task is running the debugger, namely for those who use VS Code, but we made it possible for users to configure this action by themselves.

How do you comment on this nomination?

I am extremely happy, because it means that the feedback from users has been quite positive. Although we’re talking about a proof of concept, there’s actually only one server currently dedicated to this platform, and it seems like a solution to address a gap at CTM, while showing scalability potential. The support of our assistant-coordinator Filipe Ribeiro, as well as of Carlos Leocádio and Luís Salgado is also worth mentioning. Moreover, the SAS team, particularly Jaime Dias, were quite helpful. Also important were our users, who ended up being our beta-testers.

Next Post
PHP Code Snippets Powered By : XYZScripts.com
EnglishPortugal