Setup python virtual environment with tensorflow-gpu

The main issues with having a GPU accelerated Tensorflow installation is the myriad compatibility issues. The easiest way proposed online is to use a docker image. However, the docker image didn’t work and it took up too much space. I discarded the docker image idea mostly because of space constraints. I will return to it later during the production phase. The main issue with tensorflow is that the tensorflow version must be compatible with the CUDA version installed. 

Tensorflow 2.3.1 needs CUDA 10 and above and NVIDIA 450 above preferably nvidia-455

These are the steps to get a working GPU accelerated tensorflow environment (Debian based system).

1. Purge nvidia drivers

sudo apt remove --purge “*nvidia*”

2. Install latest Nvidia drivers

sudo apt install nvidia-driver-455

Check your GPU and CUDA version

nvidia-smi

Or you can skip this step if installing the older nvidia=450 drivers in step #4 below.

3. Create a virtual environment to contain the tensorflow

pip install virtualenv
cd ~
python3 -m venv tf-env
source tf-env/bin/activate

Replace tf-env by the name of your choice. This will create a directory structure which will contain all the python packages, so it’s best to create in a drive with lots of free space, although it is easy to move.

4. Install CUDA following the recommendations from tensorflow website

Trying to install CUDA independently from NVIDIA website will break it in all possible ways. I have tried all possible combinations – CUDA 11.1 with tensorflow nightly, CUDA 10.1 with tensorflow stable. Something always breaks. The best method is to follow the install instructions on the tensorflow website to the dot. 

https://www.tensorflow.org/install/gpu

The only exception is that I didn’t install the older nvidia-450 drivers. I kept the newer nvidia-455 driver.

5. Make sure all links are working

Make sure there’s a link from cuda to the actual CUDA installation in /usr/local

$ ls -l /usr/local/
lrwxrwxrwx  1 root root 9 Oct  9 17:21 cuda -> cuda-11.1
drwxr-xr-x 14 root root 4096 Oct  9 17:21 cuda-11.1

$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64

6. Install tensorflow

Start virtualenv if not in it already

$ source tf-env/bin/activate

And then install tensorflow

(tf-env) $ pip install tensorflow

If you already have installed the nightly (unstable) version from #4 above then it is better to uninstall it first with

(tf-env) $pip uninstall tf-nightly

7. Test tensorflow

(tf-env) $ python
>>> import tensorflow as tf
2020-10-09 18:24:57.371340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> tf.__version__
'2.3.1'
>>> tf.config.list_physical_devices()
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

All seems to be running OK

8. Setup virtualenv kernel to Jupyter 

While in the virtual environment install ipykernel

(tf-env) $ pip install ipykernel

Add current virtual environment to Jupyter 

(tf-env) $ python -m ipykernel install --user --name=tf-env

tf-env will show up in the list of Jupyter kernels. The name for the Jupyter kernel can be anything. I kept it the same for consistency.

You can find the Jupyter kernels in ~/.local/share/jupyter/kernels

Test tensorflow gpu support in jupyter

(tf-env) $ jupyter notebook

import tensorflow as tf
tf.config.experimental.list_physical_devices()
tf.config.list_physical_devices()
tf.test.gpu_device_name()

Note: The tensorflow GPU detection in Jupyter will only work when Jupyter is run from within the virtual environment. Running Jupyter outside the virtualenv will not work even if the virtualenv kernel (tf-env) is chosen over regular system python kernel.

Leave a comment