Analysing and extracting results from data stored in CSV files is made simple with spreadsheet programs which can easily import them and support a wide variety of formats. For example the exported data from my HSA contribution file.
| 12/01/2019 | 11/30/2019 | Interest | None | $0.09 |
| 11/12/2019 | 11/12/2019 | CIGNA HEALTHCARE 68 | Debit Card | -$25.00 |
| 11/04/2019 | 11/04/2019 | Fee Distribution | EFT | -$1.75 |
| 11/01/2019 | 10/31/2019 | Interest | None | $0.09 |
| 10/02/2019 | 10/02/2019 | Fee Distribution | EFT | -$1.75 |
| 10/01/2019 | 09/30/2019 | Interest | None | $0.09 |
| 09/04/2019 | 09/04/2019 | Fee Distribution | EFT | -$1.75 |
| 09/01/2019 | 08/31/2019 | Interest | None | $0.09 |
| 08/03/2019 | 08/02/2019 | Fee Distribution | EFT | -$1.75 |
| 08/01/2019 | 07/31/2019 | Interest | None | $0.09 |
| 07/05/2019 | 07/05/2019 | CIGNA HEALTHCARE OF AR | Debit Card | -$25.00 |
| 07/02/2019 | 07/02/2019 | Fee Distribution | EFT | -$1.75 |
| 07/01/2019 | 06/30/2019 | Interest | None | $0.09 |
| 06/04/2019 | 06/04/2019 | Fee Distribution | EFT | -$1.75 |
| 06/03/2019 | 06/03/2019 | CIGNA HEALTHCARE OF AR | Debit Card | -$25.00 |
| 06/01/2019 | 05/31/2019 | Interest | None | $0.09 |
| 05/31/2019 | 05/31/2019 | CVS/PHARMACY #07078 | Debit Card | -$15.00 |
| 05/13/2019 | 05/13/2019 | CIGNA HLTHCRE AZ#HCFS | Debit Card | -$65.80 |
| 05/02/2019 | 05/02/2019 | Fee Distribution | EFT | -$1.75 |
| 05/01/2019 | 04/30/2019 | Interest | None | $0.09 |
| 04/02/2019 | 04/02/2019 | Fee Distribution | EFT | -$1.75 |
| 04/01/2019 | 03/31/2019 | Interest | None | $0.10 |
| 03/21/2019 | 03/21/2019 | Adjustment Contribution | EFT | $1.75 |
| 03/04/2019 | 03/04/2019 | Fee Distribution | EFT | -$1.75 |
| 03/01/2019 | 02/28/2019 | Interest | None | $0.18 |
| 02/25/2019 | 02/25/2019 | CVS/PHARMACY #07078 | Debit Card | -$15.00 |
To get the 2019 activity only in a spreadsheet program we will need to set up filters and filter out all other years except 2019. That is a lot of mouse events. In command-line we can use just use a single command – grep

Filter for specific column values and then adding a separate column
Now, let’s say we want to find what is the total expenditure in PHARMACY. In spreadsheet we have to filter again by PHARMACY and then sum those columns. While this is not too difficult there can be issues with autosum summing hidden rows too, leading to erroneous results. IN command-line we simply do grep again.
$ cat hsa.csv | grep 2019 | grep PHARMACY 
To sum the last column involves extracting that column. A single column or a list of columns can be extracted using the cut command. The -d switch defines the delimiter (separator of fields), which in the case of CSV files in the comma “,”. So the cut command cuts up the file vertically along the delimiter. -f lists the columns to be extracted – as a single number for a single column or as a list 2-5, to extract columns 2 to 5 (for more info see the man page of cut).
$ cat hsa.csv | grep 2019 | grep PHARMACY | cut -d"," -f5

And then we extract the numbers in that column only. While the numbers can be extracted in a variety of ways including sed and awk (which many people prefer) I will use tr (translate) command. tr (translate) command primarily is use to replace characters but it can also remove characters.
$ cat hsa.csv | grep 2019 | grep PHARMACY | \
> cut -d"," -f5 | tr -cd '[:digit:].\n'
15.00
15.00
15.00
tr -c chooses the complement of :digits: (0-9), period (.), and newline (\n) i.e. everything else except real numbers and newline. -d deletes those characters.
Now just have to sum those numbers.
$ cat hsa.csv | grep 2019 | grep PHARMACY | \
> cut -d"," -f5 | tr -cd '[:digit:].\n' | paste -sd+ | bc
45.00
paste accepts each line as input and joins them using the separator + which then is fed onto the terminal calculator bc which then performs the addition.
Now, this is good for learning. If we do not feel like going through the steps over and over again I have written a script which auto detects the column that needs to be summed and sums that column based on a given filter. The script below can be downloaded from my GitHub page.
#!/bin/bash # sumcolumn.sh v1.5 by Saugata copyright 07/07/2020 # autosum price column of a csv after filtering for a specific column value usage=" Usage: ./sumcolumn.sh [filename] [pattern] [-h help] Example: ./sumcolumn.sh hsa.csv PHARMACY " # ---- GETOPTS ---- # no args. print usage and exit if [[ $# -eq 0 ]]; then echo "$usage" exit fi while getopts h option do case "${option}" in h) echo "$usage" exit ;; esac done # check if first arg is a file which exists if [[ -z $1 ]]; then echo "File doesn't exist" exit fi echo "Filename: " $1 # check if file is a text file txtfile=`file "$1" | grep "text"` if [[ $txtfile == "" ]]; then echo "File not a text file. Choose a text file." exit fi # check if pattern exists in file ptn=`grep "$2" "$1"` if [[ $ptn == "" ]]; then echo "Pattern does not exist in file. No sum." exit fi # identify column containing price (elements have $ symbol) sumcol="$(head -n 2 "$1" | tail -n 1 | tr "," "\n" | nl \ | grep "\\$" | sed 's/ //g' | head -c1 )" if [[ $sumcol == "" ]]; then echo "No columns to sum" exit fi echo "Summing column: " `head -n 1 "$1" | cut -d"," -f$sumcol` echo "Total: " total="$(grep "$2" "$1" | cut -d"," -f$sumcol \ | tr -cd '[:digit:].\n' | paste -sd+ | bc)"




