Create a Service with GPU access

This guide shows you how to create a service with GPU access and highlights important considerations for its use.

Before following this guide, ensure you have completed the Create your first service guide, as this guide focuses specifically on GPU-related topics and examples.

warning

GPUs are shared between multiple services and do not offer resource guarantees. If your service is resource intensive it can lead to out-of-memory errors or degraded performance of your own as well as other GPU-enabled services running in the same solution.

Development setup

This guide requires an Intrinsic SDK development environment. If you haven't already, follow the guide to set up the development environment.

Bazel workspace

You will need a Bazel workspace to create an HMI service. The workspace can be created at the root of the project using inctl. You can skip this step if you already have a MODULE.bazel file in your project.

inctl bazel init --sdk_repository https://github.com/intrinsic-ai/sdk.git --sdk_version latest

CUDA Requirements

In most cases, the base image needs to include a CUDA environment. This requires using a different base image. Make sure to follow best practices for base images as outlined in tips for keeping services small.

Adding a new base image

After setting up the bazel workspace as outlined above, you should now have a MODULE.bazel file in your top-level directory. It contains a section like this:

MODULE.bazel
# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")

oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")
use_repo(
    oci,
    "distroless_base",
    "distroless_python3",
)

This section declares the container images in which you can package your binaries. In cases where you require CUDA binaries installed in your image, you can add an additional base image here. To ensure hermetic builds, meaning your build consistently uses the exact same image version for reproducibility, you should use the image's content digest (SHA). To use a nvidia/cuda:12.8.1-base-ubuntu22.04 image for example, you can modify the section to look like this:

# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")

oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")

# Pull CUDA base image
oci.pull(
    name = "cuda_base",
    image = "docker.io/nvidia/cuda",
    digest = "sha256:e711c99333fdfe8ae1e677b4972be6c5021f0128a1d31f775c7e58d88921b6a9"
)

use_repo(
    oci,
    "cuda_base",  # <-- Add your pull target here 
    "distroless_base",
    "distroless_python3",
)

Develop a service that uses the GPU

For this section, we will assume the goal is to create a service named gpunorm_service using cupy. This service will print GPU information, perform a simple L2 norm calculation on a small array using the GPU, and log the result. To keep this example concise we will not define an interface for this service here.

note

cupy is a small python library for GPU accelerated computing with Python.

For the most part the steps to creating a GPU-enabled service are identical to a normal service, which is why only the differing parts are highlighted here.

Add an image with CUDA dependencies

First, add the official cupy image from Docker Hub to the base images in MODULE.bazel. This file should be in the top-level directory of the previously set up workspace:

MODULE.bazel
# OCI images
bazel_dep(name = "rules_oci", version = "2.2.5")

oci = use_extension("@rules_oci//oci:extensions.bzl", "oci")

# Pull CUDA base image
oci.pull(
    name = "cupy_image",
    image = "docker.io/cupy/cupy",
    digest = "sha256:f85108d3b53de56f0e51fd936c2d3a0ed207063858c55afe855a8dc125abea20",
)

use_repo(
    oci,
    "cupy_image",  # <-- Add your pull target here 
    "distroless_base",
    "distroless_python3",
)

Next, create a new services directory, and within it, a gpunorm folder. All implementation files for the service will be located here.

Create service files

To check if we can actually use the GPU with a service, let's create a minimal example service called gpunorm_service. For this, create the following folder structure in your workspace:

/bazel_ws/
├── MODULE.bazel
└── services/
    └── gpunorm/
        ├── BUILD
        ├── gpunorm_main.py
        ├── gpunorm_service_manifest.textproto
        └── requirements.txt

Create a simple GPU-using binary

We can use cupy to create a small sample service script, gpunorm_main.py, that prints information about the available GPUs.

services/gpunorm/gpunorm_main.py
"""GPU example service

Minimal example of a service that imports cupy and uses it to print GPU 
information and do a simple calculation on GPU.
"""

import logging
import sys

import cupy


def main():
    logging.info(f'Starting GPU example service...')

    # Show detected hardware and drivers
    logging.info(f'Printing GPU configuration to stdout:')
    cupy.show_config()

    # Initialize array on GPU
    gpu_arr = cupy.array([1, 2, 3, 4, 5])
    logging.info(
        f"Created array on device {gpu_arr.device}, doing simple calculation.")

    # Do a simple L2 norm calculation
    l2_norm = cupy.linalg.norm(gpu_arr)

    logging.info(f"Calculated length of array {gpu_arr} as {l2_norm} on GPU!")
    logging.info("GPU example service finished, exiting...")


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stderr, level=logging.INFO)
    main()

Use the new base image in the build rules

Create a build rule using the new cupy_image as a base image. We also need to add cupy as a pip dependency to the binary to be able to build it outside of the container. The resulting BUILD file is as follows:

services/gpunorm/BUILD
load("@ai_intrinsic_sdks//bazel:python_oci_image.bzl", "python_oci_image")
load("@ai_intrinsic_sdks//intrinsic/assets/services/build_defs:services.bzl", "intrinsic_service")
load("@gpunorm_pip_deps//:requirements.bzl", "requirement")
load("@rules_python//python:defs.bzl", "py_binary")

py_binary(
    name = "gpunorm_service_bin",
    srcs = ["gpunorm_main.py"],
    main = "gpunorm_main.py",
    deps = [
        requirement("cupy-cuda12x"),
    ],
)

python_oci_image(
    name = "gpunorm_service_image",
    binary = ":gpunorm_service_bin",
    base = "@cupy_image",
)

intrinsic_service(
    name = "gpunorm_service",
    images = [":gpunorm_service_image.tar"],
    manifest = "gpunorm_service_manifest.textproto",
)

Enable GPU access in the service manifest

To enable GPU access for the container in which the service runs, add resource requirements to service_manifest.textproto. Specifically, add the following block to the image field:

settings {
  resource_requirements {
    limits {
      key: "nvidia.com/gpu"
      value: "1"
    }
  }
}

A complete example of the service manifest could look like this:

services/gpunorm/gpunorm_service_manifest.textproto
# proto-file: https://github.com/intrinsic-ai/sdk/blob/main/intrinsic/assets/services/proto/service_manifest.proto
# proto-message: intrinsic_proto.services.ServiceManifest

metadata {
  id {
    package: "com.example"
    name: "gpunorm_service"
  }
  vendor {
    display_name: "Intrinsic"
  }
  documentation {
    description: "A service that calculates an L2 norm on GPU."
  }
  display_name: "GPUNorm Service"
}
service_def {
  real_spec {
    image {
      archive_filename: "gpunorm_service_image.tar"
      settings {
        resource_requirements {
          limits {
            key: "nvidia.com/gpu"
            value: "1"
          }
        }
      }
    }
  }
  sim_spec {
    image {
      archive_filename: "gpunorm_service_image.tar"
      settings {
        resource_requirements {
          limits {
            key: "nvidia.com/gpu"
            value: "1"
          }
        }
      }
    }
  }
}

Add necessary pip requirements

You'll need to pin the pip requirements to specific versions in a requirements.txt file. Our binary requires only three dependencies:

services/gpunorm/requirements.txt
cupy-cuda12x==13.4.1
fastrlock==0.8.3
numpy==2.3.0

Build the service files

Build the service bundle by executing the following command in your bazel workspace folder:

bazel build services/gpunorm:gpunorm_service

Install the service in a solution

For questions about this and the following steps, consult the Add your service to a solution section in the Create your first service guide.

Install the service in a solution running on a GPU-VM or an IPC with a GPU by executing:

inctl asset install --org $INTRINSIC_ORGANIZATION $GPUNORM_SERVICE_BUNDLE

View service logs

Stream the logs of a service instance to your terminal using this command:

inctl logs --follow --service gpunorm_service --org $INTRINSIC_ORGANIZATION --solution $INTRINSIC_SOLUTION

When you add an instance of your service to the solution, you should see logs similar to the following:

INFO:root:Starting GPU example service...
INFO:root:Printing GPU configuration to stdout:
OS                           : Linux-6.12.29-x86_64-with-glibc2.35
Python Version               : 3.11.11
CuPy Version                 : 13.4.1
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 2.3.0
SciPy Version                : None
Cython Build Version         : 3.0.12
Cython Runtime Version       : None
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 12080
CUDA Driver Version          : 12080
CUDA Runtime Version         : 12080 (linked to CuPy) / 12020 (locally installed)
CUDA Extra Include Dirs      : []
cuBLAS Version               : (available)
cuFFT Version                : 11008
cuRAND Version               : 10303
cuSOLVER Version             : (11, 5, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (12, 2)
Thrust Version               : 200800
CUB Build Version            : 200800
Jitify Build Version         : <unknown>
cuDNN Build Version          : (not loaded; try `import cupy.cuda.cudnn` first)
cuDNN Version                : (not loaded; try `import cupy.cuda.cudnn` first)
NCCL Build Version           : (not loaded; try `import cupy.cuda.nccl` first)
NCCL Runtime Version         : (not loaded; try `import cupy.cuda.nccl` first)
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA RTX 4000 Ada Generation
Device 0 Compute Capability  : 89
Device 0 PCI Bus ID          : 0000:01:00.0
INFO:root:Created array on device <CUDA Device 0>, doing simple calculation.
INFO:root:Calculated length of array [1 2 3 4 5] as 7.416198487095663 on GPU!
INFO:root:GPU example service finished, exiting...

Development setup​

Bazel workspace​

CUDA Requirements​

Adding a new base image​

Develop a service that uses the GPU​

Add an image with CUDA dependencies​

Create service files​

Create a simple GPU-using binary​

Use the new base image in the build rules​

Enable GPU access in the service manifest​

Add necessary pip requirements​

Build the service files​

Install the service in a solution​

View service logs​