Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression

This is the webpage and repo package to support the paper “Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression” submitted to the International Workshop on Advanced Analytics and Learning on Temporal Data (AALTD) 2023.

Our results files are stored here.

Datasets

The 112 UCR archive datasets are available at timeseriesclassification.com.

The 63 regression datasets are available at the archive expansion webpage.

Install

To install the latest version of the package with up-to-date algorithms, run:

pip install tsml-eval

To install the package at the time of publication, run:

pip install tsml-eval==0.1.0

Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:

pip install tsml-eval[all_extras,deep_learning]

RIST requires the pycatch22 and pyfftw packages. To install these, run:

pip install pycatch22 pyfftw

This can be unstable on setups, if you cannot install these packages they can be disabled by editing the classifier parameters (this will change the results produced) i.e.

RISTClassifier(use_pycatch22=False, use_pyfftw=False)

To install dependency versions used at the time of publication, use the publication requirements.txt:

pip install -r tsml_eval/publications/2023/rist_pipeline/static_publication_reqs.txt

Usage

Command Line

Run run_classification_experiments.py or run_regression_experiments.py with the following arguments:

  1. Path to the data directory

  2. Path to the results directory

  3. The name of the model to run (see set_rist_classifier.py or set_rist_regressor.py, i.e. RIST, RDST, DrCIF)

  4. The name of the problem to run

  5. The resample number to run (0 is base train/test split)

i.e. to run the ItalyPowerDemand classification problem using RIST on the base train/test split:

python tsml_eval/publications/2023/rist_pipeline/run_classification_experiments.py data/ results/ RIST ItalyPowerDemand 0

Using Classifiers and Regressors

Most of our classifiers are available in the aeon Python package.

The classifiers and regressors used in our experiments extend the scikit-learn interface and can also be used like their estimators:

[7]:
import warnings

warnings.filterwarnings("ignore")

from sklearn.metrics import accuracy_score, mean_squared_error
from tsml.datasets import load_minimal_chinatown, load_minimal_gas_prices
from tsml.hybrid import RISTClassifier, RISTRegressor

Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).

A function is available for loading from .ts files.

[8]:
# load example classification dataset
X_train_c, y_train_c = load_minimal_chinatown("TRAIN")
X_test_c, y_test_c = load_minimal_chinatown("TEST")

# load example regression dataset
X_train_r, y_train_r = load_minimal_gas_prices("TRAIN")
X_test_r, y_test_r = load_minimal_gas_prices("TEST")

# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")

print(type(X_train_c), type(y_train_c))
print(X_train_c.shape, y_train_c.shape)
print(X_test_c.shape, y_test_c.shape)
X_train_c[:5]
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)
[8]:
array([[[ 573.,  375.,  301.,  212.,   55.,   34.,   25.,   33.,  113.,
          143.,  303.,  615., 1226., 1281., 1221., 1081.,  866., 1096.,
         1039.,  975.,  746.,  581.,  409.,  182.]],

       [[ 394.,  264.,  140.,  144.,  104.,   28.,   28.,   25.,   70.,
          153.,  401.,  649., 1216., 1399., 1249., 1240., 1109., 1137.,
         1290., 1137.,  791.,  638.,  597.,  316.]],

       [[ 603.,  348.,  176.,  177.,   47.,   30.,   40.,   42.,  101.,
          180.,  401.,  777., 1344., 1573., 1408., 1243., 1141., 1178.,
         1256., 1114.,  814.,  635.,  304.,  168.]],

       [[ 428.,  309.,  199.,  117.,   82.,   43.,   24.,   64.,  152.,
          183.,  408.,  797., 1288., 1491., 1523., 1460., 1365., 1520.,
         1700., 1797., 1596., 1139.,  910.,  640.]],

       [[ 372.,  310.,  203.,  133.,   65.,   39.,   27.,   36.,  107.,
          139.,  329.,  651.,  990., 1027., 1041.,  971., 1104.,  844.,
         1023., 1019.,  862.,  643.,  591.,  452.]]])

Classifiers and regressors can be built using the fit method and predictions can be made using predict. predict_proba can be used to get class probabilities for classifiers.

Here we run the RIST classifier and regressor from the publication and find the accuracy and RMSE for it on our example data.

[9]:
rist_c = RISTClassifier(random_state=0)
rist_c.fit(X_train_c, y_train_c)
y_pred_c = rist_c.predict(X_test_c)
y_pred_c
[9]:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
       2., 2., 2.])
[10]:
accuracy_score(y_test_c, y_pred_c)
[10]:
0.9
[11]:
rist_r = RISTRegressor(random_state=0)
rist_r.fit(X_train_r, y_train_r)
y_pred_r = rist_r.predict(X_test_r)
y_pred_r
[11]:
array([-0.31689489, -0.31613551, -0.32835623, -0.39940986, -0.30016315,
       -0.31231658, -0.25754774, -0.28900786, -0.31202351, -0.3132342 ,
       -0.27315226, -0.38427014, -0.32339463, -0.26477721, -0.32560753,
       -0.30756101, -0.30214585, -0.40835526, -0.38768561, -0.39179725])
[12]:
mean_squared_error(y_test_r, y_pred_r, squared=False)
[12]:
0.10593838895386118

Generated using nbsphinx. The Jupyter notebook can be found here. binder