Create a Service with GPU access
This guide shows you how to create a service with GPU access and highlights important considerations for its use.
Before following this guide, ensure you have completed the Create your first service guide, as this guide focuses specifically on GPU-related topics and examples.
GPUs are shared between multiple services and do not offer resource guarantees. If your service is resource intensive it can lead to out-of-memory errors or degraded performance of your own as well as other GPU-enabled services running in the same solution.
Development setup
This guide requires an Intrinsic SDK development environment. If you haven't already, follow the guide to set up the development environment.
Bazel workspace
You will need a Bazel workspace to create an HMI service. The workspace can be
created at the root of the project using inctl.
You can skip this step if you already have a MODULE.bazel file in your project.
inctl bazel init --sdk_repository https://github.com/intrinsic-ai/sdk.git --sdk_version latest
CUDA Requirements
In most cases, the base image needs to include a CUDA environment. This requires using a different base image. Make sure to follow best practices for base images as outlined in tips for keeping services small.
Adding a new base image
After setting up the bazel workspace as outlined above, you should now have a
MODULE.bazel file in your top-level directory. It contains a section like this:
# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")
oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")
use_repo(
oci,
"distroless_base",
"distroless_python3",
)
This section declares the container images in which you can package your binaries. In cases where you require CUDA binaries installed in your image, you can add an additional base image here. To ensure hermetic builds, meaning your build consistently uses the exact same image version for reproducibility, you should use the image's content digest (SHA). To use a nvidia/cuda:12.8.1-base-ubuntu22.04 image for example, you can modify the section to look like this:
# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")
oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")
# Pull CUDA base image
oci.pull(
name = "cuda_base",
image = "docker.io/nvidia/cuda",
digest = "sha256:e711c99333fdfe8ae1e677b4972be6c5021f0128a1d31f775c7e58d88921b6a9"
)
use_repo(
oci,
"cuda_base", # <-- Add your pull target here
"distroless_base",
"distroless_python3",
)
Develop a service that uses the GPU
For this section, we will assume the goal is to create a service named gpunorm_service using cupy. This service will print GPU information, perform a simple L2 norm calculation on a small array using the GPU, and log the result. To keep this example concise we will not define an interface for this service here.
cupy is a small python library for GPU accelerated computing with Python.
For the most part the steps to creating a GPU-enabled service are identical to a normal service, which is why only the differing parts are highlighted here.
Add an image with CUDA dependencies
First, add the official cupy image from Docker Hub to the base images in MODULE.bazel. This file should be in the top-level directory of the previously set up workspace:
# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")
oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")
# Pull CUDA base image
oci.pull(
name = "cupy_image",
image = "docker.io/cupy/cupy",
digest = "sha256:f85108d3b53de56f0e51fd936c2d3a0ed207063858c55afe855a8dc125abea20",
)
use_repo(
oci,
"cupy_image", # <-- Add your pull target here
"distroless_base",
"distroless_python3",
)
Next, create a new services directory, and within it, a gpunorm folder.
All implementation files for the service will be located here.
Create service files
To check if we can actually use the GPU with a service, let's create a minimal
example service called gpunorm_service. For this, create the following folder structure in your workspace:
/bazel_ws/
├── MODULE.bazel
└── services/
└── gpunorm/
├── BUILD
├── gpunorm_main.py
├── gpunorm_service_manifest.textproto
└── requirements.txt
Create a simple GPU-using binary
We can use cupy to create a small sample service script, gpunorm_main.py, that prints information about the available GPUs.
"""GPU example service
Minimal example of a service that imports cupy and uses it to print GPU
information and do a simple calculation on GPU.
"""
import logging
import sys
import cupy
def main():
logging.info(f'Starting GPU example service...')
# Show detected hardware and drivers
logging.info(f'Printing GPU configuration to stdout:')
cupy.show_config()
# Initialize array on GPU
gpu_arr = cupy.array([1, 2, 3, 4, 5])
logging.info(
f"Created array on device {gpu_arr.device}, doing simple calculation.")
# Do a simple L2 norm calculation
l2_norm = cupy.linalg.norm(gpu_arr)
logging.info(f"Calculated length of array {gpu_arr} as {l2_norm} on GPU!")
logging.info("GPU example service finished, exiting...")
if __name__ == '__main__':
logging.basicConfig(stream=sys.stderr, level=logging.INFO)
main()
Use the new base image in the build rules
Create a build rule using the new cupy_image as a base image. We also need to add cupy as a pip dependency to the binary to be able to build it outside of the container. The resulting BUILD file is as follows:
load("@ai_intrinsic_sdks//bazel:python_oci_image.bzl", "python_oci_image")
load("@ai_intrinsic_sdks//intrinsic/assets/services/build_defs:services.bzl", "intrinsic_service")
load("@gpunorm_pip_deps//:requirements.bzl", "requirement")
load("@rules_python//python:defs.bzl", "py_binary")
py_binary(
name = "gpunorm_service_bin",
srcs = ["gpunorm_main.py"],
main = "gpunorm_main.py",
deps = [
requirement("cupy-cuda12x"),
],
)
python_oci_image(
name = "gpunorm_service_image",
binary = ":gpunorm_service_bin",
base = "@cupy_image",
)
intrinsic_service(
name = "gpunorm_service",
images = [":gpunorm_service_image.tar"],
manifest = "gpunorm_service_manifest.textproto",
)
Enable GPU access in the service manifest
To enable GPU access for the container in which the service runs, add resource requirements to service_manifest.textproto. Specifically, add the following block to the image field:
settings {
resource_requirements {
limits {
key: "nvidia.com/gpu"
value: "1"
}
}
}
A complete example of the service manifest could look like this:
# proto-file: https://github.com/intrinsic-ai/sdk/blob/main/intrinsic/assets/services/proto/service_manifest.proto
# proto-message: intrinsic_proto.services.ServiceManifest
metadata {
id {
package: "com.example"
name: "gpunorm_service"
}
vendor {
display_name: "Intrinsic"
}
documentation {
description: "A service that calculates an L2 norm on GPU."
}
display_name: "GPUNorm Service"
}
service_def {
real_spec {
image {
archive_filename: "gpunorm_service_image.tar"
settings {
resource_requirements {
limits {
key: "nvidia.com/gpu"
value: "1"
}
}
}
}
}
sim_spec {
image {
archive_filename: "gpunorm_service_image.tar"
settings {
resource_requirements {
limits {
key: "nvidia.com/gpu"
value: "1"
}
}
}
}
}
}
Add necessary pip requirements
You'll need to pin the pip requirements to specific versions in a requirements.txt file. Our binary requires only three dependencies:
cupy-cuda12x==13.4.1
fastrlock==0.8.3
numpy==2.3.0
Build the service files
Build the service bundle by executing the following command in your bazel workspace folder:
bazel build services/gpunorm:gpunorm_service
Install the service in a solution
For questions about this and the following steps, consult the Add your service to a solution section in the Create your first service guide.
Install the service in a solution running on a GPU-VM or an IPC with a GPU by executing:
inctl asset install --org $INTRINSIC_ORGANIZATION $GPUNORM_SERVICE_BUNDLE
View service logs
Stream the logs of a service instance to your terminal using this command:
inctl logs --follow --service gpunorm_service --org $INTRINSIC_ORGANIZATION --solution $INTRINSIC_SOLUTION
When you add an instance of your service to the solution, you should see logs similar to the following:
INFO:root:Starting GPU example service...
INFO:root:Printing GPU configuration to stdout:
OS : Linux-6.12.29-x86_64-with-glibc2.35
Python Version : 3.11.11
CuPy Version : 13.4.1
CuPy Platform : NVIDIA CUDA
NumPy Version : 2.3.0
SciPy Version : None
Cython Build Version : 3.0.12
Cython Runtime Version : None
CUDA Root : /usr/local/cuda
nvcc PATH : /usr/local/cuda/bin/nvcc
CUDA Build Version : 12080
CUDA Driver Version : 12080
CUDA Runtime Version : 12080 (linked to CuPy) / 12020 (locally installed)
CUDA Extra Include Dirs : []
cuBLAS Version : (available)
cuFFT Version : 11008
cuRAND Version : 10303
cuSOLVER Version : (11, 5, 2)
cuSPARSE Version : (available)
NVRTC Version : (12, 2)
Thrust Version : 200800
CUB Build Version : 200800
Jitify Build Version : <unknown>
cuDNN Build Version : (not loaded; try `import cupy.cuda.cudnn` first)
cuDNN Version : (not loaded; try `import cupy.cuda.cudnn` first)
NCCL Build Version : (not loaded; try `import cupy.cuda.nccl` first)
NCCL Runtime Version : (not loaded; try `import cupy.cuda.nccl` first)
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA RTX 4000 Ada Generation
Device 0 Compute Capability : 89
Device 0 PCI Bus ID : 0000:01:00.0
INFO:root:Created array on device <CUDA Device 0>, doing simple calculation.
INFO:root:Calculated length of array [1 2 3 4 5] as 7.416198487095663 on GPU!
INFO:root:GPU example service finished, exiting...