Setup python virtual environment with tensorflow-gpu

The main issues with having a GPU accelerated Tensorflow installation is the myriad compatibility issues. The easiest way proposed online is to use a docker image. However, the docker image didn’t work and it took up too much space. I discarded the docker image idea mostly because of space constraints. I will return to it later during the production phase. The main issue with tensorflow is that the tensorflow version must be compatible with the CUDA version installed. 

Tensorflow 2.3.1 needs CUDA 10 and above and NVIDIA 450 above preferably nvidia-455

These are the steps to get a working GPU accelerated tensorflow environment (Debian based system).

1. Purge nvidia drivers

sudo apt remove --purge “*nvidia*”

2. Install latest Nvidia drivers

sudo apt install nvidia-driver-455

Check your GPU and CUDA version

nvidia-smi

Or you can skip this step if installing the older nvidia=450 drivers in step #4 below.

3. Create a virtual environment to contain the tensorflow

pip install virtualenv
cd ~
python3 -m venv tf-env
source tf-env/bin/activate

Replace tf-env by the name of your choice. This will create a directory structure which will contain all the python packages, so it’s best to create in a drive with lots of free space, although it is easy to move.

4. Install CUDA following the recommendations from tensorflow website

Trying to install CUDA independently from NVIDIA website will break it in all possible ways. I have tried all possible combinations – CUDA 11.1 with tensorflow nightly, CUDA 10.1 with tensorflow stable. Something always breaks. The best method is to follow the install instructions on the tensorflow website to the dot. 

https://www.tensorflow.org/install/gpu

The only exception is that I didn’t install the older nvidia-450 drivers. I kept the newer nvidia-455 driver.

5. Make sure all links are working

Make sure there’s a link from cuda to the actual CUDA installation in /usr/local

$ ls -l /usr/local/
lrwxrwxrwx  1 root root 9 Oct  9 17:21 cuda -> cuda-11.1
drwxr-xr-x 14 root root 4096 Oct  9 17:21 cuda-11.1

$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64

6. Install tensorflow

Start virtualenv if not in it already

$ source tf-env/bin/activate

And then install tensorflow

(tf-env) $ pip install tensorflow

If you already have installed the nightly (unstable) version from #4 above then it is better to uninstall it first with

(tf-env) $pip uninstall tf-nightly

7. Test tensorflow

(tf-env) $ python
>>> import tensorflow as tf
2020-10-09 18:24:57.371340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> tf.__version__
'2.3.1'
>>> tf.config.list_physical_devices()
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

All seems to be running OK

8. Setup virtualenv kernel to Jupyter 

While in the virtual environment install ipykernel

(tf-env) $ pip install ipykernel

Add current virtual environment to Jupyter 

(tf-env) $ python -m ipykernel install --user --name=tf-env

tf-env will show up in the list of Jupyter kernels. The name for the Jupyter kernel can be anything. I kept it the same for consistency.

You can find the Jupyter kernels in ~/.local/share/jupyter/kernels

Test tensorflow gpu support in jupyter

(tf-env) $ jupyter notebook

import tensorflow as tf
tf.config.experimental.list_physical_devices()
tf.config.list_physical_devices()
tf.test.gpu_device_name()

Note: The tensorflow GPU detection in Jupyter will only work when Jupyter is run from within the virtual environment. Running Jupyter outside the virtualenv will not work even if the virtualenv kernel (tf-env) is chosen over regular system python kernel.

Quick intro to MySQL

Installation

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install mysql-server
sudo mysql_secure_installation
sudo mysql_install_db

Adding databases

Login as root

$ mysql -u root -p
mysql> CREATE DATABASE TEST;

Adding user

mysql> GRANT ALL ON TEST.* TO lion@localhost IDENTIFIED BY ‘temppass1’;
mysql> FLUSH PRIVILEGES;

Display specific columns from a table

mysql> SELECT Continent FROM Country;

+---------------+
| Continent |
+---------------+
| North America |
| Asia          |
| Africa        |
| North America |
| Europe        |

Display specific columns from a table but make rows unique

mysql> SELECT DISTINCT Continent FROM Country;

Display COLUMN names

How would I know which columns to ask for if the list is big and scrolls too fast and I can’t see the column names. Every column is first defined in SQL, then and only then data is added to them.
mysql> SHOW COLUMNS FROM Country;

Select specific number of rows

1. Select first 10 rows: LIMIT <no of rows>

SELECT NAME FROM country limit 10 ;

2. Select from row 4-13 : LIMIT <line no> <no of rows>

SELECT NAME FROM country limit 3,10 ;

Table data command list

Open table

DATABASES are like Excel files
Tables are like Sheets in each file

mysql> SHOW DATABASES;
mysql> USE JOURNAL;
mysql> SHOW TABLES;
mysql> SELECT * FROM FITNESS;

Create a table

mysql> CREATE TABLE DAILY_JOURNAL (Date DATE NOT NULL, 
Entry VARCHAR(200), Feeling VARCHAR(20), needsimprovement VARCHAR(40) ) ;

Add primary key

mysql> ALTER TABLE DAILY_JOURNAL ADD PRIMARY KEY (Date);

Delete data

mysql> DELETE FROM DAILY_JOURNAL WHERE needsimprovement = "goal setting";

Insert data

mysql> INSERT INTO DAILY_JOURNAL (Date, Entry, needsimprovement) VALUES("2016-12-14","1. ordered washing machine", "goal setting");

Add a new column

mysql> ALTER TABLE DAILY_JOURNAL ADD Learnt VARCHAR(40);

Rename column

Changing column name, renaming a column, alter column name

mysql> ALTER TABLE DAILY_JOURNAL 
CHANGE gratefulfor GratefulFor VARCHAR(40);

Giving just a new column name is not enough. The variable type has to be defined again.

Edit row data

mysql> UPDATE DAILY_JOURNAL SET Learnt='1. getopts 2. REGEX 3. MYSQL' 
WHERE Date='2016-12-15';

Plot a grid of plots in python by iterating over the subplots

In this article, we will make a grid of plots in python by iterating over the subplot axes and columns of a pandas dataframe.

Python has a versatile plotting framework in Matplotlib but the documentation seems extremely poor (or I was not able to find the right docs). It took me a fair amount of time to figure out how to send plots of columns of dataframe to individual subplots while rotating the xlabels for each subplot.

Usage

Plotting subplots in Matplotlib begins by using the plt.subplots() statement.

import pandas as pd
import matplotlib.pyplot as plt


fig, axs = plt.subplots(nrows=2, ncols=2)

We can omit the nrows and ncols args but I kept it for effect. This statement generates a grid of 2×2 subplots and returns the overall figure (the object which contains all plots inside it) and the individual subplots as a tuple of subplots. The subplots can be accessed using axs[0,0], axs[0,1], axs[1,0], and axs[1,1]. Or they can be unpacked during the assignment as follows.

import pandas as pd
import matplotlib.pyplot as plt


fig, ((ax1, ax2),(ax3, ax4)) = plt.subplots(nrows=2, ncols=2)

When we have 1 row and 4 columns instead of 2 rows and 2 columns it has to be unpacked as follows.

import pandas as pd
import matplotlib.pyplot as plt


fig, ((ax1, ax2, ax3, ax4)) = plt.subplots(nrows=1, ncols=4)

Flattening the grid of subplots

We, however, do not want to unpack individually. Instead, we would like to flatten the tuple of subplots and iterate over them rather than assigning each subplot to a variable. The tuple is flattened by the flatten() command.

axs.flatten()

We identify 4 columns of a dataframe we want to plot and save the column names in a list that we can iterate over. We flatten the subplots and generate an iterator or we can convert the iterator to a list and then pack it (zip) with the column names.

import pandas as pd
import matplotlib.pyplot as plt


profiles_file = 'data.csv'
df = pd.read_csv(profiles_file)

cols_to_plot = ['age', 'drinking', 'exercise', 'smoking']

fig, axs = plt.subplots(nrows=2, ncols=2)
fig.set_size_inches(20, 10)
fig.subplots_adjust(wspace=0.2)
fig.subplots_adjust(hspace=0.5)

for col, ax in zip(cols_to_plot, axs.flatten()):
    dftemp = df[col].value_counts()
    ax.bar(dftemp.index, list(dftemp))
    ax.set_title(col)
    ax.tick_params(axis='x', labelrotation=30)

plt.show()

As we iterate over each subplot axes, and the column names which are zipped with it, we plot each subplot with the ax.plot() command and we have to supply the x and y values manually. I tried plotting with pandas plot df.plot.bar() and assigning the returned object to the ax. It doesn’t work. The x values for the ax.plot() are the dataframe index (df.index) and y values are the values in the dataframe column (which needs to be converted to a list to as ax.plot() does not accept pd.Series).

Rotate x-axis of subplots

The x-axis for each subplot is rotated using

ax.tick_params(axis='x', labelrotation=30)

 

Identifying delimiter of a CSV file

The following one-liner can be used to extract the delimiter of a CSV file. This command does not work on TAB separated files. It only works on delimited files whose field separators are NOT whitespaces.

$ head -n1 bookmerged.csv  | tr -d '[a-z][A-Z][0-9]' | \
tr -d '"' | sed 's/.\{1\}/&\n/g' | sort -r | uniq -c | \
sort -nr | tr -s " " | cut -d" " -f3 | head -n1

This command generates a list of special characters and from that list selects the character with the highest frequency of occurrence. This character must be the delimiter of the file unless some other special character is used heavily. This code will fail when other special characters have a higher frequency of occurrence than the delimiter. An explanation of this code is as follows.

After head grabs the column headers, the first two trace commands (tr) removes all alphabets, numbers, and quotes. This leaves a bunch of special characters among which the character with the highest frequency of occurrence is most likely the delimiters of the fields.

,,,,,   , ,, , , ,,, ,, , ,/ , , , 

The sed command introduces a newline after every character effectively putting every single character on a new line. {1} selects one character at a time, \{ escapes the character {, and & substitutes the pattern match (the single character) with pattern+newline. We can also use \0 instead of &. sort -r | uniq -c | sort -nr generates the list of characters in descending order of prevalence.

     20 ,
     14  
      1 /
      1 

The most prevalent character appears at the top of this list. tr -s ” “ combines (squeezes) the multiple spaces into one and the cut command splices up the list along the spaces and selects the third column which is the delimiter.