Before starting

Before starting, install Docker with Nvidia container toolkit:

Install Docker using the following commands:

curl -fsSL get.docker.com -o get-docker.sh
CHANNEL=stable sh get-docker.sh
rm get-docker.sh

Add the Nvidia Container Toolkit repository and install it:

Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Then, install nvidia-docker2

Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/install-guide.html

sudo apt-get install -y nvidia-docker2

Restart the Docker service:

sudo systemctl restart docker

Verify the GPU setup in Docker:

sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Problem

As thread mentioned, we can not directly use GPU in docker swarm:

https://forums.docker.com/t/using-nvidia-gpu-with-docker-swarm-started-by-docker-compose-file/106688

version: '3.7'
services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu, utility]

If you deploy it, docker will say devices is not allowed in swarm mode.

> docker stack deploy -c docker-compose.yml gputest
services.test.deploy.resources.reservations Additional property devices is not allowed

Solution

However, I recently found a trick that allows you to run a container with GPU:

Before starting, I created a distributed attachable network, so my other containers managed by docker swarm can talk to the ollama container:

function create_network() {
    network_name=$1
    subnet=$2
    known_networks=$(sudo docker network ls --format '{{.Name}}')
    if [[ $known_networks != *"$network_name"* ]]; then
        networkId=$(sudo docker network create --driver overlay --attachable --subnet $subnet --scope swarm $network_name)
        echo "Network $network_name created with id $networkId"
    fi
}

create_network proxy_app 10.234.0.0/16

Then I deploy the following docker-compose file with docker swarm:

(I used ollama_warmup to demostrate how other containers interact with this ollama. You can replace that with other containers obviously.)

version: "3.6"

services:
  ollama_starter:
    image: hub.aiursoft.cn/aiursoft/internalimages/ubuntu-with-docker:latest
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      # Kill existing ollama and then start a new ollama
    entrypoint:
      - "/bin/sh"
      - "-c"
      - |
          echo 'Starter is starting ollama...' && \
          (docker kill ollama_server || true) && \
          docker run \
            --tty \
            --rm \
            --gpus=all \
            --network proxy_app \
            --name ollama_server \
            -v /swarm-vol/ollama/data:/root/.ollama \
            -e OLLAMA_HOST=0.0.0.0 \
            -e OLLAMA_KEEP_ALIVE=200m \
            -e OLLAMA_FLASH_ATTENTION=1 \
            -e OLLAMA_KV_CACHE_TYPE=q8_0 \
            -e GIN_MODE=release \
          hub.aiursoft.cn/ollama/ollama:latest

  ollama_warmup:
    depends_on:
      - ollama_starter
    image: hub.aiursoft.cn/alpine
    networks: 
      - proxy_app
    entrypoint:
      - "/bin/sh"
      - "-c"
      - |
          apk add curl && \
          sleep 40 && \
          while true; do \
            curl -v http://ollama_server:11434/api/generate -d '{"model": "deepseek-r1:32b"}'; \
            sleep 900; \
          done
    deploy:
      resources:
        limits:
          memory: 128M
      labels:
        swarmpit.service.deployment.autoredeploy: 'true'

networks:
  proxy_app:
    external: true

volumes:
  ollama-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /swarm-vol/ollama/data

And it worked!

file

Now I am running ollama with deepseek in Docker! And GPU is supported!