What GPU Options are Offered on Google Cloud?
Google Cloud Platform (GCP) is the world’s third largest cloud provider. Google offers a number of virtual machines (VMs) that provide graphical processing units (GPUs), including the NVIDIA Tesla K80, P4, T4, P100, and V100.
You can use NVIDIA GPUs on GCP for large scale cloud deep learning projects, analytics, physical object simulation, video transcoding, and molecular modeling. GCP also provides virtual NVIDIA GRID workstations, which can let an organization’s employees run graphics-intensive workloads remotely.
In this article, you will learn:
Google Cloud GPU Options
Google Cloud provides several GPU options. These GPUs can be selected as part of two Google instance types:
- Accelerator-Optimized High-GPU with 7 GB of RAM, 12–96 Cascade Lake CPUs, and SSD storage
- Accelerator-Optimized Mega-GPU with 14 GB of RAM, 96 Cascade Lake CPUs, and SSD storage
The available GPUs are as follows:
GPUs suitable for model training, inference, and high performance computing:
- A100—40 GB memory, NVLink operating at 600 GB/s
- V100—16 GB of memory, NVLink Ring networking operating at 300 GB/s
- P100—16 GB of memory, no NVLink, supports GPU GRID
- K80—12 GB of memory, no NVLink
GPUs suitable for inference, training, remote visualization, and transcoding:
- T4—16 GB of memory, no NVLink, supports GPU GRID
- P4—8 GB of memory, no NVLink, supports GPU GRID
Related content: read our guides to deep learning on other cloud providers:
Google Cloud TPU
Google Cloud provides another hardware acceleration option—the Tensorflow Processing Unit (TPU). While not strictly a GPU, TPUs are a powerful alternative for machine learning workloads, especially deep learning.
A TPU is an application-specific integrated circuit (ASIC) developed by Google specifically to accelerate machine learning. Google provides TPU on demand as a deep learning cloud service called Cloud TPU.
Cloud TPU is tightly integrated with Google's open source machine learning (ML) framework, TensorFlow, which provides dedicated APIs for TPU hardware. Cloud TPU lets you create TensorFlow compute unit clusters including TPUs, GPUs, and regular CPUs.
Cloud TPU is mainly suitable for machine learning models based on matrix calculations, models that require weeks or months to train, models with large datasets or a large number of variables, and those that run a training loop many times (as in neural networks).
Cloud TPU is not suitable for models that use linear or elementary algebra, models that do not access memory often, or those that involve high-precision arithmetic operations.
Related content: read our complete guide to google TPU
Working with GPUs on Google Cloud Compute Engine
Here is how to create a Google Cloud virtual machine (VM) with an attached NVIDIA A100 GPU:
- Access the Google Cloud Console and click on VM Instances.
- Click Create Instance, specify a name, region and zone you want to run your VM in.
- In the Machine Configuration section, under Machine Family, select GPU.
- Under Series, select A2—this is the Google Cloud VM series that comes with NVIDIA A100 GPUs (see details of other machine series on Google Cloud). Select the Machine type appropriate to your needs.
- Under CPU and GPU Platforms, see the GPU type and number of GPUs provided by the machine type you selected.
- If you want to load your VM using an existing image, select the image in the Boot disk section. Review other VM settings to ensure they meet your requirements.
- Click Create.
That’s it! This process spins up a Google Cloud VM with an attached NVIDIA GPU.
Optimizing Google Cloud Platform GPU Performance
Here are two tips that can help you improve GPU performance in a Google Cloud VM.
Disabling Autoboost and Setting Maximum Clock Frequency
Autoboost is a feature in GPUs of the NVIDIA Tesla K80 series. It automatically adjusts clock frequency to determine the best frequency for your particular application. However, constantly adjusting the clock frequency will also reduce GPU performance when running on Google infrastructure.
If you're running an NVIDIA Tesla K80 GPU on Compute Engine, it is recommended to disable auto boost, using the following command (in Linux):
sudo nvidia-smi --auto-boost-default=DISABLED
When using Tesla K80, you should also set the GPU clock speed to the highest frequency, using this command:
sudo nvidia-smi --applications-clocks=2505,875
Using Maximal Network Bandwidth—Up to 100 Gbps
To make distributed workloads run faster with NVIDIA Tesla T4 or V100, use the maximum network bandwidth of 100 Gbps, as follows:
- Make sure you meet the minimal system requirements to use maximum network bandwidth (see documentation).
- Create a VM instance connected to a T4 or V100 GPU. The image used to create the VM instance must have the virtual network interface (gVNIC).
- After creating the virtual machine instance, check the actual network bandwidth consumption, using iperf or a similar tool. You’ll need at least two instances of the VMs with connected GPUs.
See additional best practices from Google for using the maximum 100 Gbps bandwidth.
Google Cloud GPU with Run:AI
Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed, managing large numbers of GPUs in Google Cloud and other public clouds.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.