Inference & Training

Maximize the Potential of your GPUs: A Guide to Dynamic GPU Fractions & Node Level Scheduler

by
Hagay Sharon
April 16, 2024

As artificial intelligence continues to revolutionize industries, the demand for powerful computational resources, particularly GPUs, has skyrocketed. However, efficiently managing these resources to maximize performance while minimizing costs remains a significant challenge. In the 2.17 release, we introduce Dynamic GPU Fractions and Node Level Scheduler, two groundbreaking features, designed to optimize GPU resource utilization and AI workloads performance.

Understanding Dynamic GPU Fractions

Dynamic GPU Fractions represent a paradigm shift in GPU resource management. Traditionally, GPU resources are allocated statically, with workloads assigned a fixed portion of GPU memory and compute power. However, this approach often leads to underutilization, as resources remain idle during periods of low demand. Dynamic GPU Fractions address this inefficiency by enabling workloads to dynamically request GPU resources based on their actual needs.

At its core, Dynamic GPU Fractions allow users to specify a fraction of GPU memory or compute resources required for their workloads, along with an upper limit for resource utilization. For example, a user might request 0.25 GPU fraction with a limit of 0.80 GPU. This means that the workload is guaranteed a fraction of the GPU resources (0.25), while also having the flexibility to utilize additional resources up to the limit (0.80) if available.

Benefits of Dynamic GPU Fractions

Dynamic GPU Fractions offer a multitude of benefits for users:

1. Optimized Resource Utilization: By dynamically allocating GPU resources based on workload demand, Dynamic GPU Fractions ensure optimal resource utilization, reducing waste and maximizing efficiency. This leads to cost savings and improved return on investment for GPU infrastructure.

2. Faster Execution & Improved Productivity: Dynamic allocation of GPU resources enables workloads to access the necessary resources when needed, enhancing overall performance and reducing latency. This results in faster execution times and improved productivity for AI applications and models.

3. Flexibility and Scalability: With the ability to request additional resources on-demand, users gain greater flexibility and scalability. Workloads can adapt to changing requirements without compromising performance, enabling organizations to scale their AI initiatives seamlessly.

4. Gracefully change the ownership of GPU resources: Detaching notebooks from unused GPUs while keeping the notebook running, users experience an uninterrupted usage of interactive notebooks. This allows them to keep using their notebooks smoothly while freeing up GPUs for other notebooks that need them.

Implementation and Configuration

Implementing Dynamic GPU Fractions is straightforward, thanks to Run:ai's intuitive interface and comprehensive documentation. Users can configure GPU memory limits per workload using either GPU fraction parameters or absolute GPU memory parameters. Run:ai provides detailed instructions on setting up Dynamic GPU Fractions, ensuring a seamless transition to dynamic resource allocation.

Figure 1: Creating a compute resource with Dynamic GPU Fraction on Run:ai UI
Figure 2: Memory usage of a POD using GPU Dynamic Fractions

Adding Node Level Scheduler for even more efficiency

In addition to configuring GPU memory limits, users can further optimize performance by enabling the Node Level Scheduler. This advanced feature maximizes GPU utilization and pod performance by making optimal local decisions on GPU allocation based on the node's internal GPU state. By leveraging the Node Level Scheduler in conjunction with Dynamic GPU Fractions, users can achieve unparalleled levels of performance optimization and resource efficiency.

Use Cases and Real-World Applications

Dynamic GPU Fractions are particularly well-suited for environments with fluctuating workload demands, such as AI research and development, where workloads often vary in intensity over time. For example, researchers conducting experiments or developing models may require varying levels of GPU resources depending on the complexity of the task at hand. With Dynamic GPU Fractions, these workloads can dynamically adjust their resource allocation to match their specific requirements, ensuring optimal performance without over-allocating resources.

Here is an example of how Node Level Scheduler optimizes performance and efficiency with Interactive Notebooks:

Consider the following example of a node with 2 GPUs and 2 interactive pods using Dynamic Fractions that are submitted and want GPU resources, each Juptyper notebook is requesting 0.5 GPU memory as Request and 1 GPU memory as Limit.

The Scheduler instructs the node to put the two pods on a single GPU, bin packing a single GPU and leave the other free for a workload that might want a full GPU or more than half GPU. However that would mean GPU#2 is idle while the two notebooks can only use up to half a GPU, even if they temporarily need more.

However, with Node Level Scheduler enabled, the local decision will be to spread those two pods on two GPUs and allow them to maximize both pods’ performance and GPUs’ utilization by bursting out up to the full GPU memory and GPU compute resources.

The Cluster Scheduler still sees a node with a full empty GPU.

When a 3rd pod is scheduled, and it requires a full GPU (or more than 0.5 GPU in this case), the scheduler will send it to that node, and Node Level Scheduler will move one of the Interactive workloads to run with the other pod in GPU#1, as was the Cluster Scheduler initial plan.

This is an example of one scenario that shows how Node Level Scheduler locally optimizes and maximizes GPU utilization and pods’ performance at all times.

Final Words

Dynamic GPU Fractions and Node Level Scheduler represent a significant advancement in GPU resource management, offering users the ability to optimize performance, improve efficiency, and scale their AI workloads with ease. By dynamically allocating GPU resources based on workload demand, and routing them to the best suited GPU on a node, Dynamic GPU Fractions together with Node Level Scheduler enable organizations to maximize the utilization of their GPU infrastructure while minimizing costs.

Curious to learn more? Explore our announcement here or book a demo today and see how Run:ai can help you accelerate AI initiatives and increase efficiency.