Post by Vassilis Vassiliadis

Research Scientist at IBM

Kubeflow just announced three major initiatives for the year ahead, one of which is: Automatic configuration of GPU requests for TrainJobs. This tackles a pain point every ML practitioner faces: the GPU guessing game. Request too few? Your job crashes hours into queuing and training. Request too many? You hoard resources your teammates need. This year, Kubeflow will enable plugins to automatically determine GPU requirements based on your training configuration. This means no more guesswork and no more wasted resources. The platform analyzes your parameters, then sets the GPU requests automatically. This stands out for me because we've spent a lot of time interacting with the community and working on this idea. We have built a prototype leveraging a recommender we developed by training a classifier on 10,000+ benchmarking runs. Now we're working with the community to enable using recommenders like this in Kubeflow! Sounds fun to you too? Help us shape the Kubeflow Trainer KEP here: https://lnkd.in/dbN5yeJu Daniele Lotito, Srikumar Venugopal, Michael Johnston For more information, see: - Kubeflow announcement: https://lnkd.in/duU3522n - Kubernetes Controller Prototype: https://lnkd.in/dXfXXG-K - Recommender: https://lnkd.in/dsep9XUc #KubeflowDay #Kubeflow #MachineLearning #MLOps #Kubernetes #AI #OpenSource