Integration

Managing your AI workloads at Run:ai - A path to improved productivity

by
Alon Lavian
–
January 25, 2024

Simplifying Kubernetes for Data Scientists and Engineers

Managing AI workloads is hard. Keeping track of available resources, utilizing them (especially GPUs), monitoring the workload lifecycle, and debugging problems - all can be challenging tasks. Whether you are a researcher or a system admin, those challenges can cost you time and money. Run:ai helps users overcome those challenges daily. And now, with our latest release, we introduce a new workload experience that makes it even easier to manage your AI workloads. We call it - the new workload experience.

The new workload experience

An easier, faster, more reliable, holistic experience.

  • Easy to use - The new Workloads UI replaces multiple UI views with one unified view.
  • Fast - The workload data is now synced to the Run:ai control plane in real-time, making it faster.
  • Reliable - It functions as a single source of truth for all workload data, so the data is consistent across the application.
  • Holistic - Not only run:ai workloads, any workload can be managed via the new workloads experience. Any custom resource you are using.

New Workloads API endpoint: A Leap Toward Flexibility

Run:ai takes a major leap forward by introducing a new generic API. This API provides real-time synchronization of workload status and information. It offers a standardized interface for various workload types. Every workload is synced to the CP and mapped to a unified set of phases. All interfaces work with a single service, ensuring that all workload information is consistent throughout the application.

Check out the new endpoint documentation here!

Workloads UI - The Core of the User Interface

The main view is the hub where users can overview and manage their tasks, from accessing the details panel—which includes events, metrics, and logs—to submitting new workloads. It offers a comprehensive view of what's running, allowing users to make informed decisions quickly.

Workload Creation - Simplified and Streamlined

Creating workloads, whether workspaces or training, is now a breeze. The UI simplifies the process by providing prompts and guidance, making it accessible even to those who are new to Kubernetes.

Single Workload Actions - More Control with Less Effort

Run:ai allows users to manage workloads with minimal clicks. Actions such as pausing, resuming, or duplicating tasks are simple to perform, ensuring an efficient workflow.

Phases and States: A New Way to Understand Workload Lifecycle

We have defined a generic set of statuses for a workload's lifecycle. We have mapped every type of workload, whether native to Run:ai or not, to the relevant statuses. This is to provide a unified view of your workloads so that you can monitor, investigate, and take action on all workload types with a single standard.

Metrics Revolution the Future of Performance Monitoring

Looking ahead, future releases promise to bring a granular metrics view, allowing users to see metrics for each GPU within pods and overall workload metrics. This level of detail brings about a new age in performance monitoring within Run:ai.

Simplified Experience, Sophisticated Engineering

With the enhancements in the Run:ai platform, managing workloads in Kubernetes becomes a seamless experience. The new UI marks a fundamental shift towards a more intuitive, user-friendly approach tailored to the needs of data scientists and engineers. It's not just about managing workloads anymore; it's about empowering the users to focus on what they do best: pushing the boundaries of machine learning and artificial intelligence.

As we look towards the future with continuous updates and feature rollouts, Run:ai keeps reaffirming its promise to simplify AI workloads for its users, providing them with advanced tools needed to accelerate research in the fast-paced world of AI. Stay tuned for more updates and experience the power of optimized workload management with Run:ai.

Try Workloads