Pushing the Limits

In the previous Run:ai 2.16 release we introduced the new workloads experience.

In our latest release, the 2.17 version, workloads are getting even better. Workload management in Run:ai is improving significantly with new features, granular metrics, and API enhancements.

‍

Drilling Down on Metrics

One for all and all for one

Monitoring the performance of workloads is now easier with per-pod per GPU utilization metrics. Follow the utilization of each single GPU in the desired pod, zoom in and out, analyze the performance of workloads, and debug utilization bottlenecks

Requests and limits

Requests and limits are now visible on the metrics graph to better understand the workload limits. Better control researchers resource utilization whether it’s CPU or GPU. Check out our blogs for more information about our compute sharing and dynamic memory allocation capabilities.

‍

Submission API

Submitting a workload is now easy than ever using the API. Use it directly or embed it in scripts. Create workspaces, training, and distributed workloads. The full documentation is available in Swagger.

Additional Improvements

Expose external connections

View the workload’s external connections like Node Ports, and Ingress. Directly connect to supported tools or save it for later and copy to a clipboard. The new UI makes it easy to connect to your workload with just a click, and easily share it with your colleagues.

‍

Is the workload preemptible?

At Run:ai, training workloads of all types are preemptable. The rationale is to increase efficiency and enable running these unattended workloads over free resources.

In order to increase the clarity for data scientists, for each workload type or framework they use, it is critical to be aware that each workload they run is preemptable. With the latest release, the researchers will be able to see on the UI which jobs are preemptible.

CLI Syntax - copy CLI syntax to the clipboard

View and copy to the clipboard the CLI command syntax for workloads submitted via the CLI. Try it by clicking the Copy & Edit button. Easily reproduce an existing workload submission or share it with your colleagues.

All these features and more are available via the /workloads API as well. Check it out and let us know what you think.

Curious about the entire release? Check out the announcement here.

Ready to get started? Book your demo today and see how Run:ai can help you accelerate AI development and increase efficiency.