Fitting noisy data with a spline to see trends

Are you struggling with visualizing or analyzing data that is noisy or contains irregularities? Do you need to identify trends that may not be immediately apparent to the naked eye or require an analytical approach? One way to address these challenges is by using a spline, a mathematical function that smoothly connects data points. Splines can be used to “smooth out” data, making it easier to analyze and visualize.

In this blog post, we will explore the use of the UnivariateSpline function in Python’s scipy.interpolate library to fit a spline to noisy data. Spline interpolation is a method of interpolating data points by fitting a piecewise-defined polynomial function to the data.

In Python, the scipy.interpolate module provides several functions for fitting splines to data. One of these functions is UnivariateSpline, which fits a spline to a set of one-dimensional data points.

Let’s take a closer look at how UnivariateSpline works and how it can be used to smooth out noisy data in Python. We’ll use an example code snippet to illustrate the process.

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline

x = np.linspace(0, 10, 100)
y1 = 0.2*x + 0.5*np.sin(x) + 2*np.random.normal(0, 0.1, size=100)
xs = np.linspace(np.min(x), np.max(x), 1000)

In this code, we first import the necessary modules: numpy, matplotlib.pyplot, and scipy.interpolate. We then create a set of x values using numpy.linspace(), which generates a linearly spaced array of 100 values between 0 and 10. Then, we generate some noisy data points using numpy’s linspace and random functions:

Next, we define the range of our x-axis by creating a new array, xs, with 1000 evenly spaced points between the minimum and maximum x values:

Now, we use the UnivariateSpline function to fit a spline to the noisy data:

spl = UnivariateSpline(x, y1)

We then plot the original data points and the fitted spline with all default parameters:

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=AUTO, smooth=AUTO)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

The resulting plot shows that the spline fits the noisy data reasonably well, but there is some room for improvement.

Next, we try fitting the spline with a different degree, k, and a smoothing factor, s:

spl = UnivariateSpline(x, y1, k=4)
spl.set_smoothing_factor(20)

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=4, smooth=20)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

This time, we use a degree of 4 for the spline and a smoothing factor of 20. The resulting plot shows that the spline fits the data even better than before. Finally, we try fitting the spline with a different smoothing factor:

spl = UnivariateSpline(x, y1)
spl.set_smoothing_factor(1)

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=AUTO, smooth=1)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

This time, we set the smoothing factor to 1. The resulting plot shows that the spline now fits the data too closely, and has likely overfit the data.

By default, UnivariateSpline uses a smoothing factor of 0, which can result in a curve that closely follows the original data points, even if they are noisy or irregular. However, this can also lead to overfitting, where the curve follows the noise rather than the underlying trend in the data.

In conclusion, the UnivariateSpline function in Python’s scipy.interpolate library is a powerful tool for fitting a spline to noisy data. By adjusting the degree of the spline and the smoothing factor, we can achieve a good balance between fitting the data closely and avoiding overfitting. This method helps us draw envelopes around Monte Carlo chains or draw the Snell’s envelope. The possibilities are endless.

Full code

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline

x = np.linspace(0, 10, 100)
y1 = 0.2*x + 0.5*np.sin(x) + 2*np.random.normal(0, 0.1, size=100)
xs = np.linspace(np.min(x), np.max(x), 1000)

spl = UnivariateSpline(x, y1)

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=AUTO, smooth=AUTO)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

spl = UnivariateSpline(x, y1, k=4)
spl.set_smoothing_factor(20)

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=4, smooth=20)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

spl = UnivariateSpline(x, y1)
spl.set_smoothing_factor(1)

fig, ax = plt.subplots()
ax.plot(x, y1, 'k.', alpha=0.5)
plt.plot(xs, spl(xs), 'b', lw=1)
plt.title("Fitting a spline to noisy data (k=AUTO, smooth=1)")
plt.xlabel('time')
plt.ylabel('Generated Stock prices')
plt.show()

Leave a comment