Step 1 - Confirm Device is Found
First, confirm that your Nvidia GPU is detected by the system:
$ sudo lspci | grep NVIDIA
01:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
Step 2 - Install Drivers
For Desktop:
List available drivers:
sudo ubuntu-drivers list
For Servers:
List GPU drivers:
sudo ubuntu-drivers list --gpgpu
You should see a list of drivers such as:
nvidia-driver-470
nvidia-driver-470-server
nvidia-driver-535
...
Automatic Installation:
sudo ubuntu-drivers install
Manual Installation:
Specify the driver version:
sudo ubuntu-drivers install nvidia:535
Reboot:
Reboot your system to apply the changes:
sudo reboot
Verify Installation:
Check the installed driver version:
nvidia-smi
Step 3 - Install Docker
Install Docker using the following commands:
curl -fsSL get.docker.com -o get-docker.sh
CHANNEL=stable sh get-docker.sh
rm get-docker.sh
Step 4 - Install Nvidia Container Toolkit
Add the Nvidia Container Toolkit repository and install it:
Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Step 5 - Install nvidia-docker2
Reference: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/install-guide.html
Install nvidia-docker2
:
sudo apt-get install -y nvidia-docker2
Step 6 - Restart Docker
Restart the Docker service:
sudo systemctl restart docker
Step 7 - Test the Installation
Verify the GPU setup in Docker:
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Step 8 - Burn the GPU
Clone the gpu-burn
repository, build the Docker image, and run the GPU burn test:
git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
sudo docker build -t gpu_burn .
sudo docker run --rm --gpus all gpu_burn
Expected output:
GPU 0: Tesla P4 (UUID: GPU-98102189-595e-4a64-3f32-3f0584ff9fe9)
Using compare file: compare.ptx
Burning for 60 seconds.
...
Tested 1 GPUs:
GPU 0: OK
Step 9 - Share the GPU with Docker-Compose
Create a docker-compose.yml
file to share the GPU:
version: '3.8'
services:
cuda_app:
image: your_image
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
To find CUDA images, visit NVIDIA CUDA Docker Hub.
Example CUDA Application
Dockerfile:
FROM nvidia/cuda:11.6.2-base-ubuntu20.04
RUN apt-get update && apt-get install -y \
build-essential \
cuda
COPY hello.cu /usr/src/hello.cu
WORKDIR /usr/src
RUN nvcc -o hello hello.cu
CMD ["./hello"]
hello.cu:
#include <iostream>
__global__ void helloFromGPU() {
printf("Hello World from GPU!\n");
}
int main() {
helloFromGPU<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
这篇博客介绍了如何在Ubuntu上为Docker设置Nvidia环境。博客提供了详细的步骤,并提供了代码示例和参考链接。
博客的优点在于:
博客的核心理念是帮助读者在Ubuntu上为Docker设置Nvidia环境,以便在使用Docker时能够利用Nvidia GPU的计算能力。这个理念是非常有用和实用的,特别是对于需要进行GPU加速的机器学习和深度学习任务的人来说。
我鼓励作者继续写下去,可以考虑在博客中添加更多关于如何使用Nvidia GPU进行机器学习和深度学习的示例和案例。此外,博客中的代码示例可以更详细一些,包括一些解释和注释,以帮助读者更好地理解和使用。
总体而言,这篇博客是一篇有用的指南,提供了详细的步骤和代码示例,帮助读者在Ubuntu上为Docker设置Nvidia环境。希望作者能够继续分享更多关于GPU计算和机器学习的知识和经验。