How Does PyTorch Support ResNet?
ResNets are a common neural network architecture used for deep learning computer vision applications like object detection and image segmentation.
ResNet can contain a large number of convolutional layers, commonly between 18-152, but supporting up to thousands of layers. There are newer variants called ResNext and DenseNet, which are more commonly used today. But understanding the original ResNet architecture is key to working with many common convolutional network patterns.
Pytorch is a Python deep learning framework, which provides several options for creating ResNet models:
- You can run ResNet networks with between 18-152 layers, pre-trained on the ImageNet database, or trained on your own data
- You can custom-code your own ResNet architecture
In this article, you will learn:
ResNet Architecture
Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture that overcame the “vanishing gradient” problem, making it possible to construct networks with up to thousands of convolutional layers, which outperform shallower networks.
Related content: read our guide to Pytorch CNN
A vanishing gradient occurs during backpropagation. When the neural network training algorithm tries to find weights that bring the loss function to a minimal value, if there are too many layers, the gradient becomes very small until it disappears, and optimization cannot continue.
ResNet solved the problem using “identity shortcut connections”. It operates in two stages:
- ResNet creates multiple layers that are initially not used, and skips them, reusing activation functions from previous layers.
- At a second stage, the network re-trains again, and the “residual” convolutional layers are expanded. This makes it possible to explore additional parts of the feature space which would have been missed in a shallow convolutional network architecture.
What is a Residual Block?
Residual blocks are the essential building blocks of ResNet networks. To make very deep convolutional structures possible, ResNet adds intermediate inputs to the output of a group of convolution blocks. This is also called skip connections, identity mapping, and “residual connections.
The objective of skip connections is to allow smoother gradient flow, and ensure important features carry until the final layers. They do not add computational load to the network.
The following diagram illustrates a residual block, where:
- x is the input to the ResNet block—output from previous layers
- F(x) is a small neural network with several convolution blocks
What are the Options for Running ResNet on PyTorch?
Running Pretrained PyTorch ResNet Models
PyTorch lets you run ResNet models, pre-trained on the ImageNet dataset. This is called “transfer learning”—you can make use of a model trained on an existing dataset, saving the time and computational effort of training it again on your own examples.
To import pre-trained ResNet into your model, use this code:
This will create ResNet18, an architecture with 18 convolutional layers.
If you need a different number of layers, substitute 18 with 34, 50, 101, or 152—these are the number of layers supported in the torchvision.models module.
You can use the pretrained flag to define if you want the ResNet network to leverage transfer learning or not:
- Select false to randomly initialize weights and start from scratch
- Select true to use a model pre-trained on data from ImageNet
Coding Your Own ResNet Architecture
PyTorch lets you customize the ResNet architecture to your needs. Liu Kuang provides a code example that shows how to implement residual blocks and use them to create different ResNet combinations.
Here is how to create a residual block for ResNets under 50 layers:
Kuang provides a different class called bottleneck to define the residual blocks for more than 50 layers—with expansion set to 4 and several other parameter changes.
He shows how to build the ResNet architecture with four persistent layers, defined like this:
And a make_layer function to create a variable number of additional residual layers.
Finally, you can define any ResNet architecture like this:
def ResNet152():
return ResNet(Bottleneck, [3, 8, 36, 3])
See the complete code example.
Running ResNet on PyTorch with Run:AI
ResNet is a computing intensive neural network architecture. Run:AI automates resource management and workload orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments as needed in PyTorch and other deep learning frameworks.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run:AI GPU virtualization platform.