tsml-eval Data Format¶

tsml-eval primarily uses numpy arrays as the datatype of choice when running experiments. The type of numpy array used however, will depend on the dataset characteristics (i.e. whether it is equal or unequal length) and the learning task.

Classification, clustering and regression use collection of time series input. Forecasting uses single time series input.

tsml-eval datasets

Time Series Collections¶

There are two types of collection datatypes used in tsml-eval:

A 3D numpy array of shape (n_samples, n_channels, n_timestamps) for equal length time series.
A list of 2D numpy arrays of shape (n_channels, n_timestamps) for unequal length time series.

These are both designed to accommodate multivariate time series, where n_channels is the number of variables in the time series. For univariate time series, n_channels is 1.

Below is an example for these formats.

[1]:

from tsml_eval.datasets._loaders import load_minimal_chinatown

X, y = load_minimal_chinatown()

print("Shape:", X.shape)
print("Type:", type(X))
print(X[:5])

Shape: (40, 1, 24)
Type: <class 'numpy.ndarray'>
[[[ 573.  375.  301.  212.   55.   34.   25.   33.  113.  143.  303.
    615. 1226. 1281. 1221. 1081.  866. 1096. 1039.  975.  746.  581.
    409.  182.]]

 [[ 394.  264.  140.  144.  104.   28.   28.   25.   70.  153.  401.
    649. 1216. 1399. 1249. 1240. 1109. 1137. 1290. 1137.  791.  638.
    597.  316.]]

 [[ 603.  348.  176.  177.   47.   30.   40.   42.  101.  180.  401.
    777. 1344. 1573. 1408. 1243. 1141. 1178. 1256. 1114.  814.  635.
    304.  168.]]

 [[ 428.  309.  199.  117.   82.   43.   24.   64.  152.  183.  408.
    797. 1288. 1491. 1523. 1460. 1365. 1520. 1700. 1797. 1596. 1139.
    910.  640.]]

 [[ 372.  310.  203.  133.   65.   39.   27.   36.  107.  139.  329.
    651.  990. 1027. 1041.  971. 1104.  844. 1023. 1019.  862.  643.
    591.  452.]]]

The labels for each time series are stored in a 1D numpy array.

[2]:

print("Shape:", y.shape)
print("Type:", type(y))
print(y)

Shape: (40,)
Type: <class 'numpy.ndarray'>
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]

numpy arrays do not support unequal length time series, so a list of 2D numpy arrays is used instead.

[3]:

from tsml_eval.datasets._loaders import load_minimal_japanese_vowels

X, _ = load_minimal_japanese_vowels()

print("Size:", len(X))
print("Type:", type(X))
print("Case 1 shape:", X[0].shape)
print("Case 2 shape:", X[1].shape)
print("Case 1 type:", type(X[0]))

Size: 40
Type: <class 'list'>
Case 1 shape: (12, 20)
Case 2 shape: (12, 26)
Case 1 type: <class 'numpy.ndarray'>

Collection datatypes can be loaded from files in the aeon .ts format using the tsml loader below.

[4]:

from aeon.datasets import load_from_ts_file

X, y = load_from_ts_file(
    "../tsml_eval/datasets/MinimalChinatown/MinimalChinatown_TRAIN.ts"
)
X.shape

[4]:

(20, 1, 24)

Single Time Series¶

Functionality for single series tasks in tsml-eval is currently limited. Using current functions, the best datatype to use is a 1D numpy array.

[5]:

import pandas as pd

X = pd.read_csv(
    "../tsml_eval/datasets/ShampooSales/ShampooSales_TRAIN.csv",
    index_col=0,
).squeeze("columns")
X = X.astype(float).to_numpy()

print("Shape:", X.shape)
print("Type:", type(X))
print(X)

Shape: (24,)
Type: <class 'numpy.ndarray'>
[266.  145.9 183.1 119.3 180.3 168.5 231.8 224.5 192.8 122.9 336.5 185.9
 194.3 149.5 210.1 273.3 191.4 287.  226.  303.6 289.9 421.6 264.5 342.3]

Generated using nbsphinx. The Jupyter notebook can be found here.