Automating MP3 ID3 Tag Updates with Python

July 18, 2023July 23, 2023 ~ Saugata ~ Leave a comment

Introduction:

Managing and organizing a music collection often involves keeping track of the artist and song information associated with each MP3 file. Manually updating ID3 tags can be a time-consuming task, especially when dealing with a large number of files. However, with the power of Python and the Mutagen library, we can automate this process and save valuable time. In this article, we will explore a Python script that updates ID3 tags for MP3 files based on the filename structure.

Prerequisites:

To follow along with this tutorial, make sure you have Python and the Mutagen library installed on your system. You can install Mutagen by running pip install mutagen in your terminal.

Understanding the Code:

The Python script we’ll be using leverages the Mutagen library to handle ID3 tag manipulation. The core logic resides in the update_id3_tags function, which updates the ID3 tags of an MP3 file based on the filename structure.

The script accepts command-line arguments using the argparse module, allowing you to specify the folder containing your MP3 files, along with options to ignore files with existing ID3 tags and print verbose output. This provides flexibility and customization to suit your specific requirements.

The getargs function parses the command-line arguments and returns the parsed arguments as an object. The folder_path, ignore_existing, and verbose variables are then extracted from the parsed arguments.

The script retrieves a list of MP3 files in the specified folder and iterates over each file. For each file, the update_id3_tags function is called. It extracts the artist and song name from the filename using the specified structure. The ID3 tags are then updated with the extracted information using the Mutagen library.

Code:

#!/usr/bin/env python
import os
import argparse
from mutagen.id3 import ID3, TIT2, TPE1

def update_id3_tags(filename, ignore_existing, verbose):
    # Extract artist and song name from filename
    basename = os.path.basename(filename)
    print(f"processing --[{basename}]--")
    if "-" in basename:
        artist = basename[:-4].split(" - ")[0].strip()
        song = " - ".join(basename[:-4].split(" - ")[1:]).strip()
    else:
        print("Cannot split file not in format [artist] - [song].mp3")
        return -1

    # Load the ID3 tags from the file
    audio = ID3(filename)

    # Check if ID3 tags already exist
    if not ignore_existing or not audio.tags:
        # Update the TIT2 (song title) and TPE1 (artist) tags if they are empty
        if not audio.get("TIT2"):
            audio["TIT2"] = TIT2(encoding=3, text=song)
            if verbose:
                print(f"Updated TIT2 tag for file: {filename} with value: {song}")
        elif verbose:
            print(f"Skipping existing ID3 tag for title: {audio.get('TIT2')}")

        if not audio.get("TPE1"):
            audio["TPE1"] = TPE1(encoding=3, text=artist)
            if verbose:
                print(f"Updated TPE1 tag for file: {filename} with value: {artist}")
        elif verbose:
            print(f"Skipping existing ID3 tag for track: {audio.get('TPE1')}")           
    print('-'*10)

    # Save the updated ID3 tags back to the file
    audio.save()    


def getargs():
    # parse command-line arguments using argparse()
    parser = argparse.ArgumentParser(description='Update ID3 tags for MP3 files.')
    parser.add_argument("folder", nargs='?', default='.', help="Folder containing MP3 files (default: current directory)")
    parser.add_argument('-i', "--ignore", action="store_true", help="Ignore files with existing ID3 tags")
    parser.add_argument('-v', "--verbose", action="store_true", help="Print verbose output")
    return parser.parse_args()


if __name__ == '__main__':
    args = getargs()
    folder_path = args.folder
    ignore_existing = args.ignore
    verbose = args.verbose

    # Get a list of MP3 files in the folder
    mp3_files = [file for file in os.listdir(folder_path) if file.endswith(".mp3")]

    # Process each MP3 file
    for mp3_file in mp3_files:
        mp3_path = os.path.join(folder_path, mp3_file)
        update_id3_tags(mp3_path, ignore_existing, verbose)

Example:

Let’s assume you have a folder called “Music” that contains several MP3 files with filenames in the format “artist – song.mp3”. We want to update the ID3 tags for these files based on the filename structure.

Here’s how you can use the Python script:

python script.py Music --ignore --verbose

In this example, we’re running the script with the following arguments:

Music: The folder containing the MP3 files. Replace this with the actual path to your folder.
--ignore: This flag tells the script to ignore files that already have existing ID3 tags.
--verbose: This flag enables verbose output, providing details about the files being processed and the updates made.

By running the script with these arguments, it will update the ID3 tags for the MP3 files in the “Music” folder, ignoring files that already have existing ID3 tags, and provide verbose output to the console.

Once the script finishes running, you can check the updated ID3 tags using any media player or music library software that displays the ID3 tag information.

This example demonstrates how the Python script automates the process of updating MP3 ID3 tags based on the filename structure, making it convenient and efficient to manage your music collection.

Conclusion:

Automating the process of updating MP3 ID3 tags can save you valuable time and effort. With the Python script we’ve discussed in this article, you can easily update the ID3 tags of your MP3 files based on the filename structure. The flexibility offered by command-line arguments allows you to tailor the script to your specific needs. Give it a try and simplify your music collection management!

Predicting Stock Price Volatility with GARCH Model (Part 1)

May 9, 2023May 12, 2023 ~ Saugata ~ Leave a comment

In time series analysis, it is essential to model the volatility of a stock. One way to achieve this is through the use of the EGARCH (Exponential Generalized Autoregressive Conditional Heteroskedasticity) model. In this article, we will perform an analysis of the MSFT stock price using EGARCH to model its volatility.

GARCH Model and When to Use It

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) is a statistical model used to analyze financial time series data. It is a type of ARCH (Autoregressive Conditional Heteroskedasticity) model that takes into account the volatility clustering often observed in financial data. The GARCH model assumes that the variance of the error term in a time series is a function of both past error terms and past variances.

The GARCH model is commonly used in finance to model and forecast the volatility of asset returns. In particular, it is useful for predicting the likelihood of extreme events, such as a sudden stock market crash or a sharp increase in volatility.

When deciding whether to use a GARCH model, it is important to consider the characteristics of the financial time series data being analyzed. If the data exhibits volatility clustering or other patterns of heteroskedasticity, a GARCH model may be appropriate. Additionally, GARCH models are often used when the goal is to forecast future volatility or to estimate the risk associated with an investment. The GARCH(p,q) model can be represented by the following equation:

$\begin{aligned} r_t &= \mu_t + \epsilon_t \\ \epsilon_t &= \sigma_t z_t \\ \sigma_t^2 &= \omega + \sum_{i=1}^p \alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^q \beta_j \sigma_{t-j}^2 \end{aligned}$

where $r_t$ is the log return at time t, $\mu_t$ is the conditional mean at time t, $\epsilon_t$ is the standardized residual at time t, $\sigma_t$ is the conditional standard deviation at time t, $z_t$ is a standard normal random variable, $\omega$ is the constant, $\alpha_i$ and $\beta_i$ are the GARCH and ARCH coefficients at lag i, and p and q are the order of the GARCH and ARCH terms, respectively.

In a GARCH(p,q) model, the dependence on the error term and the volatility term at the same time reflects the notion of volatility clustering, which is a characteristic of financial time series data. The error term represents the current shock or innovation to the return series, while the volatility term captures the past history of the shocks. The dependence on the error term and the volatility term at the same time implies that the model recognizes that a current shock to the return series can have a persistent effect on future volatility. In other words, large shocks tend to be followed by large subsequent changes in volatility, and vice versa. This feature of GARCH models has important implications for risk management and financial decision-making. By accounting for the clustering of volatility, GARCH models can provide more accurate estimates of risk measures, such as Value-at-Risk (VaR) and Expected Shortfall (ES), which are used to assess the potential losses in financial portfolios. GARCH models can also be used to forecast future volatility, which can be useful for developing trading strategies and hedging positions in financial markets. We will explore these concepts in the future parts of this ongoing series.

One specific form of the GARCH model is the EGARCH model, which stands for Exponential Generalized Autoregressive Conditional Heteroskedasticity. The EGARCH model allows for both asymmetry and leverage effects in the volatility of the data. The EGARCH model can be represented by the following equation:

$\begin{aligned} r_t &= \mu_t + \epsilon_t \\ \epsilon_t &= \sigma_t z_t \\ \log(\sigma_t^2) &= \omega + \sum_{i=1}^p \alpha_i \left( \frac{\left| \epsilon_{t-i} \right|}{\sigma_{t-i}} - \sqrt{\frac{2}{\pi}} \right) + \sum_{i=1}^q \beta_i \log \sigma_{t-i}^2 \end{aligned}$

Exploratory Data Analysis

Before modeling, it is essential to explore the data to understand its characteristics. The plot below shows the time series plot of the MSFT stock price.

import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import statsmodels.api as sm
import pmdarima as pm
import yfinance as yf
import seaborn as sns
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller

msft = yf.Ticker('MSFT')
df = msft.history(period='5y')

sns.light_palette("seagreen", as_cmap=True)
sns.set_style("darkgrid", {"grid.color": ".6", "grid.linestyle": ":"})
sns.lineplot(df['Close'])
plt.title('MSFT')

We can see that the stock price exhibits a clear upward trend over the period, with some fluctuations. The plot below shows the ACF and PACF plots of the first differences of the stock price.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# ACF and PACF plots of first differences
fig, axes = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(msft_log, ax=axes[0])
plot_pacf(msft_log, ax=axes[1])
plt.tight_layout()
plt.show()

From the ACF and PACF plots, we can observe that there is no clear pattern in the data, indicating that it may be a white noise process. However, there is some significant autocorrelation at lag 1 in the PACF plot, suggesting that we may need to include an AR term in our model.

Model Selection

To model the volatility of the MSFT stock price, we will use the EGARCH model. We will begin by fitting a baseline EGARCH(1,1) model and compare it with other models.

from arch import arch_model

# Fit EGARCH(1,1) model
egarch11_model = arch_model(msft_log, vol='EGARCH',
                               p=1, o=0, q=1, dist='Normal')
egarch11_fit = egarch11_model.fit()
print(egarch11_fit.summary())

Constant Mean - GARCH Model Results                      
===============================================================
Dep. Variable:                  Close   R-squared:                       0.000
Mean Model:             Constant Mean   Adj. R-squared:                  0.000
Vol Model:                      GARCH   Log-Likelihood:                3350.57
Distribution:                  Normal   AIC:                          -6693.15
Method:            Maximum Likelihood   BIC:                          -6672.60
                                        No. Observations:                 1258
Date:                Tue, May 09 2023   Df Residuals:                     1257
Time:                        10:42:42   Df Model:                            1
                                 Mean Model                                 
===============================================================
                 coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
mu         1.5045e-03  9.713e-08  1.549e+04      0.000 [1.504e-03,1.505e-03]
                              Volatility Model                              
===============================================================
                 coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega      7.6495e-06  1.791e-12  4.272e+06      0.000 [7.650e-06,7.650e-06]
alpha[1]       0.1000  1.805e-02      5.541  3.004e-08   [6.463e-02,  0.135]
beta[1]        0.8800  1.551e-02     56.729      0.000     [  0.850,  0.910]
===============================================================

The following table shows the results of fitting various EGARCH models to the MSFT stock price data.

Overall, the models indicate that the volatility of stock returns is persistent, with all models showing significant positive values for the alpha parameters. Moreover, the models suggest that the volatility of stock returns responds asymmetrically to changes in returns, with negative shocks having a more significant impact than positive shocks. This is highlighted by the negative values of the omega parameters in all three models. In finance, the omega parameter represents the risk in the market that is unrelated to the past volatility of the asset being studied. It signifies the inherent uncertainty or randomness in the system that cannot be explained by any of the past information used in the model.

Model	Log Likelihood	AIC	BIC
EGARCH(1,1)	3355.44	-6702.88	-6682.33
EGARCH(1,2)	3356.18	-6702.36	-6676.67
EGARCH(2,1)	3356.67	-6703.34	-6677.66
EGARCH(2,2)	3356.67	-6701.34	-6670.52

Based on the information criteria, the EGARCH(2,2) model has the lowest AIC and BIC values, making it the final model of choice.

Model Diagnostics

After selecting the final model, we need to perform diagnostic checks to ensure that the model is appropriate. The following plots show the diagnostic checks for the EGARCH(2,2) model.

# Residuals plot
plt.plot(egarch22_fit.resid)
plt.title("EGARCH(2,2) Residuals")
plt.show()

# ACF/PACF of residuals
fig, axes = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(egarch22_fit.resid, ax=axes[0])
plot_pacf(egarch22_fit.resid, ax=axes[1])
plt.tight_layout()
plt.show()

From the residual plot, we can see that the residuals of the model are approximately normally distributed and have constant variance over time. Additionally, the ACF and PACF plots of the residuals show no significant autocorrelation, indicating that the model has captured all the relevant information in the data.

Forecasting

The EGARCH(2,2) model provides a volatility fit for the MSFT stock price. Notably, there were spikes in volatility around the start of COVID in 2020 and during the Fed’s interest rate increase in 2022.

# Plot conditional volatility
plt.plot(egarch22_fit.conditional_volatility)
plt.xlabel("time")
plt.title("Conditional Volatility")
plt.show()

Finally, let’s use the EGARCH(2,2) model to forecast the volatility of the MSFT stock price for the next day.

# Last 5 days of volatility
egarch22_fit.conditional_volatility[-5:]

Date
2023-05-08 00:00:00-04:00    0.017646
2023-05-09 00:00:00-04:00    0.016606
2023-05-10 00:00:00-04:00    0.015939
2023-05-11 00:00:00-04:00    0.016860
2023-05-12 00:00:00-04:00    0.016005

# Forecast next day
forecasts = egarch22_fit.forecast(reindex=False)
print("Forecasting Mean variance")
print(egarch22_forecast.mean.iloc[-3:])
print("Forecasting Residual variance")
print(forecasts.residual_variance.iloc[-3:])

Forecasting Mean variance
                                h.1
Date                               
2023-05-09 00:00:00-04:00  0.001541

Based on the model, the forecasted volatility for the next day is 0.001541. This value suggests that the average volatility will decrease compared to the last five days. However, the accuracy of this prediction remains uncertain. To assess the model’s accuracy, a rolling prediction approach can be used and compared against actual values using a measure like RMSE. Further analysis will be explored in the subsequent parts of this series.

References

Tsay, R.S. (2010) Analysis of Financial Time Series, Third Edition. Wiley.
“Introduction to ARCH/GARCH Models”. ARCH Documentation. Retrieved from https://arch.readthedocs.io/en/latest/univariate/introduction.html.

How to add (embed) Jupyter notebooks to WordPress blog posts

August 18, 2020 ~ Saugata ~ 4 Comments

Jupyter notebooks are an effective way to share research, ideas, steps, and other information to others. Sharing single notebooks or entire folders to collaborators is straightforward via GitHub but sharing the notebooks with the public through a blog post needs a few more steps (people read blogs not GitHub (unless they are software developers)). In the following we will outline the steps to embed Jupyter notebooks in a blog post.

Sharing non-interactive Jupyter notebook as static HTML

There are a few ways a non-interactive Jupyter notebook be shared as static HTML as long as they do not have interactive elements like pywidgets in them. They will still be exported as HTML but the interactive elements will not work. We will discuss embedding of interactive Jupyter notebooks in a later section.

Method I (simplest method)

Upload file to Google Colab
File -> Save a copy as GitHub Gist. It will ask for login permissions to GitHub first time ti is run.
Go to GitHub. Then go to your Gists by clicking your profile pic at the upper right corner and selecting Your gists from the menu.
Locate the Google Colab file just shared with Gist and open the Gist.
Copy the Gist ID. For example, if the address bar shows https://gist.github.com/saugatach/100a28eb7dc353feb1ed3bf18f251443 then the Gist ID is 100a28eb7dc353feb1ed3bf18f251443.
Go back to WordPress editor.
In the WordPress post start a new paragraph.
Edit as HTML.
Add the line {gist]<Gist ID>[/gist]. For the Gist ID example above the line would be {gist]100a28eb7dc353feb1ed3bf18f251443[/gist]. Change the { before gist to a [. It is not a typo. WordPress will attempt to interpret the command if [ is used instead of {.
Revert back to Edit visually. Wait for the Jupyter notebook to load with the Google Colab badge on it. If it doesn’t load within 30 seconds hit refresh.

Method II (slightly more hands-on)

Export Jupyter notebook as a IPYNB file directly from JupterLab (or Jupyter interface).
Open the IPYNB file in a text editor and copy the code.
Go to GitHub and start a new Gist.
Paste the IPYNB code in the code section.
Manually name the file in Gist ending with .ipynb, for example, test.ipynb.
Click edit at the top and after the page loads, change the privacy setting at the lower right corner of the code snippet to Public.
Copy the Gist ID. For example, if the address bar shows https://gist.github.com/saugatach/100a28eb7dc353feb1ed3bf18f251443 then the Gist ID is 100a28eb7dc353feb1ed3bf18f251443
Go back to WordPress editor.
In the WordPress post start a new paragraph.
Edit as HTML.
Add the line {gist]<Gist ID>[/gist]. For the Gist ID example above the line would be {gist]100a28eb7dc353feb1ed3bf18f251443[/gist]. Change the { before gist to a [. It is not a typo. WordPress will attempt to interpret the command if [ is used instead of {.
Revert back to Edit visually. Wait for the Jupyter notebook to load with the Google Colab badge on it. If it doesn’t load within 30 seconds hit refresh.

Method III (no extra frills, confusing steps)

Export Jupyter notebook as a HTML file directly from JupterLab (or Jupyter interface).
Open the HTML file in a text editor and copy the code.
Go back to WordPress editor.
In the WordPress post start a new paragraph.
Edit as HTML.
Paste the HTML code.
Revert back to Edit visually. Wait for the Jupyter notebook to load. If it doesn’t load within 30 seconds hit refresh.

To access the HTML editor in WordPress

For older WordPress editor, there is a HTML editor tab. For the new WordPress editor, start a new paragraph, start typing random letters, stop after a few letters, move and hover the mouse over the text, a horizontal menu will appear, click on the 3 horizontal dots and select Edit a HTML. Clear the random letters, paste the HTML code from the Jupyter file, click on the 3 horizontal dots again, and select Edit visually.