- Overview
- Downloading the IBM Z Accelerated for PyTorch Container Image
- Container Image Contents
- PyTorch Usage
- A Look into the Acceleration
- Security and Deployment Guidelines
- Execution on the Integrated Accelerator for AI and on CPU
- Model Validation
- Using the Code Samples
- Frequently Asked Questions
- Technical Support
- Versioning Policy and Release Cadence
- Licenses
PyTorch is an open source machine learning framework. It has a comprehensive set of tools that enable model development, training, and inference. It also features a rich, robust ecosystem.
On IBM® z16™ and later (running Linux on IBM Z or IBM® z/OS® Container Extensions (IBM zCX)), PyTorch will leverage new inference acceleration capabilities that target the IBM Integrated Accelerator for AI through the IBM z Deep Neural Network (zDNN) library. The IBM zDNN library contains a set of primitives that support Deep Neural Networks. These primitives transparently target the IBM Integrated Accelerator for AI on IBM z16 and later. No changes to the original model are needed to take advantage of the new inference acceleration capabilities.
Note. When using IBM Z Accelerated for PyTorch on either an IBM z15® or an IBM z14®, PyTorch will transparently target the CPU with no changes to the model.
Downloading the IBM Z Accelerated for PyTorch container image requires credentials for the IBM Z and LinuxONE Container Registry, icr.io.
Documentation on obtaining credentials to icr.io
is located
here.
Once credentials to icr.io
are obtained and have been used to login to the
registry, you may pull (download) the IBM Z Accelerated for PyTorch container
image with the following code block:
# Replace X.X.X with the desired version to pull
docker pull icr.io/ibmz/ibmz-accelerated-for-pytorch:X.X.X
In the docker pull
command illustrated above, the version specified above is
X.X.X
. This is based on the version available in the
IBM Z and LinuxONE Container Registry.
Release notes about a particular version can be found in this GitHub Repository
under releases
here.
To remove the IBM Z Accelerated for PyTorch container image, please follow the commands in the code block:
# Find the Image ID from the image listing
docker images
# Remove the image
docker rmi <IMAGE ID>
*Note. This documentation will refer to image/containerization commands in
terms of Docker. If you are utilizing Podman, please replace docker
with
podman
when using our example code snippets.
To view a brief overview of the operating system version, software versions and
content installed in the container, as well as any release notes for each
released container image version, please visit the releases
section of this
GitHub Repository, or you can click
here.
For documentation on how to train and run inferences on models with PyTorch please visit the official Open Source PyTorch documentation.
For brief examples on how to run inferences on models with PyTorch please visit our samples section.
The acceleration is enabled through a Custom Device that get registered within PyTorch.
- The registered Device will check PyTorch Ops for valid input(s) and/or
output(s), targeting the accelerator where possible.
- Only Ops with valid input(s) and/or output(s) will target the accelerator.
- Some Ops with valid input(s) and/or output(s) may still not target the accelerator if their overhead is likely to outweigh any cost savings.
Ops will receive input(s) and/or output(s) in the form of Tensor
objects.
- PyTorch's internal
Tensor
objects manage the shape, data type, and a pointer to the data buffer - More info can be found here
Model inference calls are currently only fully supported when used within an inference_mode context, which has no interactions with autograd (e.g., model training).
Inference calls made outside a inference_mode
context may not fully leverage
the accelerator.
During runtime, input(s) and/or output(s) are checked to ensure they are the correct shape and data type.
- If all shapes and data type are valid, the accelerator is used.
- If any shape or data type is invalid, the default CPU logic is used.
Before the Custom Device is registered, a call to zdnn_is_nnpa_installed
is
made to ensure the NNPA instruction set for the accelerator is installed.
- If this call returns false, the Custom Device is not registered and runtime should proceed the same way PyTorch would without the acceleration benefits.
Certain environment variables can be set before execution to enable/disable features or logs.
-
ZDNN_ENABLE_PRECHECK
: true- If set to true, zDNN will print logging information before running any computational operation.
- Example:
export ZDNN_ENABLE_PRECHECK=true
.- Enable zDNN logging.
-
TORCH_NNPA_DEBUG
: 0 | 1- If set to 1, torch-nnpa will print logging information before running any operation.
- Example:
export TORCH_NNPA_DEBUG=1
.- Enable torch-nnpa logging.
- For security and deployment best practices, please visit the common AI Toolkit documentation found here.
IBM Z Accelerated for PyTorch follows IBM's train anywhere and deploy on IBM Z strategy.
By default, when using the IBM Z Accelerated for PyTorch on an IBM z16 and later system, PyTorch core will target the Integrated Accelerator for AI for a number of compute-intensive operations during inferencing with no changes to the model.
It is common practice to write PyTorch code in a device-agnostic way,
and then switch between CPU and NNPA depending on what hardware is available.
Typically, to do this you might have used if-statements and nnpa()
calls
to do this:
import torch
import torch_nnpa
USE_NNPA = True
mod = torch.nn.Linear(20, 30)
# Sends `mod` from its device to NNPA using `nnpa()` call.
if USE_NNPA:
mod.nnpa()
# Creates `inp` on NNPA by passing 'nnpa' as the device string.
device = 'nnpa' if USE_NNPA else 'cpu'
inp = torch.randn(128, 20, device=device)
mod.eval()
with torch.inference_mode():
out = mod(inp)
# Copies `out` from its device to CPU.
# `cpu_out` will be on CPU.
# `out` will keep its original device.
cpu_out = out.to('cpu')
print(out.device)
print(cpu_out.device)
###################################################################
# PyTorch now also has a context manager which can take care of the
# device transfer automatically. Here is an example:
with torch.device('nnpa'):
mod = torch.nn.Linear(20, 30)
print(mod.weight.device)
print(mod(torch.randn(128, 20)).device)
When using IBM Z Accelerated for PyTorch on either an IBM z15 or an IBM z14, PyTorch will transparently target the CPU with no changes to the model.
Training workloads are currently not supported and may result in runtime errors.
It is recommended to perform training on the CPU. The saved model can then be used for inference with IBM Z Accelerated for PyTorch.
Scaled Dot Product Attention is in beta support for Pytorch and has multiple different backends.
We currently only support the SDPBackend.MATH backend, which can be used within a sdpa_kernel context:
from torch.nn.functional import scaled_dot_product_attention
from torch.nn.attention import SDPBackend, sdpa_kernel
...
with sdpa_kernel(SDPBackend.MATH):
output = scaled_dot_product_attention(query, key, value)
Various models that were trained on x86 or IBM Z have been validated to target the IBM Integrated Accelerator for AI for a number of compute-intensive operations during inferencing.
Note. Models that were trained outside of the PyTorch ecosystem may throw endianness issues.
Documentation for our code samples can be found here.
Please visit this link here. Or read the section titled Downloading the IBM Z Accelerated for PyTorch container image.
You may have seen multiple PyTorch container images in IBM Z and LinuxONE Container Registry, namely ibmz/pytorch and ibmz/ibmz-accelerated-for-pytorch.
The ibmz/pytorch
container image does not have support for the IBM
Integrated Accelerator for AI. The ibmz/pytorch
container image only
transparently targets the CPU. It does not have any optimizations referenced in
this document.
The ibmz/ibmz-accelerated-for-pytorch
container image includes support for
PyTorch core Graph Execution to transparently target the IBM Integrated
Accelerator for AI. The ibmz/ibmz-accelerated-for-pytorch
container image
also still allows it's users to transparently target the CPU. This container
image contains the optimizations referenced in this document.
You may run the IBM Z Accelerated for PyTorch container image on IBM Linux on Z or IBM® z/OS® Container Extensions (IBM zCX).
Note. The IBM Z Accelerated for PyTorch container image will transparently target the IBM Integrated Accelerator for AI on IBM z16 and later. However, if using the IBM Z Accelerated for PyTorch container image on either an IBM z15 or an IBM z14, PyTorch will transparently target the CPU with no changes to the model.
No. Installing newer or older version of PyTorch than what is configured in the container will not target the IBM Integrated Accelerator for AI. Additionally, installing a newer or older version of PyTorch, or modifying the existing PyTorch that is installed in the container image may have unintended, unsupported, consequences. This is not advised.
Information regarding technical support can be found here.
IBM Z Accelerated for PyTorch will follow the semantic versioning guidelines with a few deviations. Overall, IBM Z Accelerated for PyTorch follows a continuous release model with a cadence of 1-2 minor releases per year. In general, bug fixes will be applied to the next minor release and not back ported to prior major or minor releases. Major version changes are not frequent and may include features supporting new zSystems hardware as well as major feature changes in PyTorch that are not likely backward compatible. Please refer to PyTorch guidelines for backwards compatibility across different versions of PyTorch.
Each release version of IBM Z Accelerated for PyTorch has the form MAJOR.MINOR.PATCH. For example, IBM Z Accelerated for PyTorch version 1.2.3 has MAJOR version 1, MINOR version 2, and PATCH version 3. Changes to each number have the following meaning:
All releases with the same major version number will have API compatibility. Major version numbers will remain stable. For instance, 1.X.Y may last 1 year or more. It will potentially have backwards incompatible changes. Code and data that worked with a previous major release will not necessarily work with the new release.
Minor releases will typically contain new backward compatible features, improvements, and bug fixes.
Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general, these releases are designed to patch bugs.
Feature releases for IBM Z Accelerated for PyTorch occur about every 6 months in general. Hence, IBM Z Accelerated for PyTorch 1.3.0 would generally be released about 6 months after 1.2.0. Maintenance releases happen as needed in between feature releases. Major releases do not happen according to a fixed schedule.
The International License Agreement for Non-Warranted Programs (ILAN) agreement can be found here.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis.
PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc.
Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms used herein.
IBM, the IBM logo, and ibm.com, IBM z16, IBM z15, IBM z14 are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks can be found here.