Step 1 - Confirm Device is Found

First, confirm that your Nvidia GPU is detected by the system:

$ sudo lspci | grep NVIDIA
01:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

Step 2 - Install Drivers

For Desktop:

List available drivers:

sudo ubuntu-drivers list

For Servers:

List GPU drivers:

sudo ubuntu-drivers list --gpgpu

You should see a list of drivers such as:

nvidia-driver-470
nvidia-driver-470-server
nvidia-driver-535
...

Automatic Installation:

sudo ubuntu-drivers install

Manual Installation:

Specify the driver version:

sudo ubuntu-drivers install nvidia:535

Reboot:

Reboot your system to apply the changes:

sudo reboot

Verify Installation:

Check the installed driver version:

nvidia-smi

Step 3 - Install Docker

Install Docker using the following commands:

curl -fsSL get.docker.com -o get-docker.sh
CHANNEL=stable sh get-docker.sh
rm get-docker.sh

Step 4 - Install Nvidia Container Toolkit

Add the Nvidia Container Toolkit repository and install it:

Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Step 5 - Install nvidia-docker2

Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/install-guide.html

Install nvidia-docker2:

sudo apt-get install -y nvidia-docker2

Step 6 - Restart Docker

Restart the Docker service:

sudo systemctl restart docker

Step 7 - Test the Installation

Verify the GPU setup in Docker:

sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Step 8 - Burn the GPU

Clone the gpu-burn repository, build the Docker image, and run the GPU burn test:

git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
sudo docker build -t gpu_burn .
sudo docker run --rm --gpus all gpu_burn

Expected output:

GPU 0: Tesla P4 (UUID: GPU-98102189-595e-4a64-3f32-3f0584ff9fe9)
Using compare file: compare.ptx
Burning for 60 seconds.
...
Tested 1 GPUs:
        GPU 0: OK

Step 9 - Share the GPU with Docker-Compose

Create a docker-compose.yml file to share the GPU:

version: '3.8'

services:
  cuda_app:
    image: your_image
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

To find CUDA images, visit NVIDIA CUDA Docker Hub.

Example CUDA Application

Dockerfile:

FROM nvidia/cuda:11.6.2-base-ubuntu20.04

RUN apt-get update && apt-get install -y \
    build-essential \
    cuda

COPY hello.cu /usr/src/hello.cu
WORKDIR /usr/src

RUN nvcc -o hello hello.cu

CMD ["./hello"]

hello.cu:

#include <iostream>

__global__ void helloFromGPU() {
    printf("Hello World from GPU!\n");
}

int main() {
    helloFromGPU<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}