Bake off redux: a review and experimental evaluation of recent time series classification algorithms

This is the webpage and repo package to support the paper “Bake off redux: a review and experimental evaluation of recent time series classification algorithms” published in Data Mining and Knowledge Discovery.

Our results files are stored here.

Correction:
The datasets Covid3Month_disc, FloodModeling1_disc, FloodModeling2_disc and FloodModeling3_disc have been fixed since the original pre-print. Unfortunately Table 1 and Table C4 retain some values from the previous versions of these datasets.
The correct test set sizes are 61, 202, 201 and 184 respectively for Table 1, and the correct accuracy values for Table C4 can be found here. Please use the updated datasets and results in any paper sourcing results from this publication.

Datasets

The 112 UCR archive datasets are available at the timeseriesclassification.com datasets page.

The 30 new datasets will be uploaded to timeseriesclassification.com in due course. For now, we provide the following link:

https://drive.google.com/file/d/1vuh6mgNrNKjHr9MMRQP0J0_gGA4dE7E3/view?usp=sharing

Install

To install the latest version of the package with up-to-date algorithms, run:

pip install tsml-eval

To install the package at the time of publication, run:

pip install tsml-eval==0.2.1

Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:

pip install tsml-eval[all_extras,deep_learning]

To install dependency versions used at the time of publication, use the publication requirements.txt:

pip install -r tsml_eval/publications/2023/tsc_bakeoff/static_publication_reqs.txt

Usage

Command Line

Run run_experiments.py with the following arguments:

  1. Path to the data directory

  2. Path to the results directory

  3. The name of the model to run (see set_bakeoff_classifier.py, i.e. R-STSF, HC2, InceptionTime)

  4. The name of the problem to run

  5. The resample number to run (0 is base train/test split)

i.e. to run ItalyPowerDemand using HIVE-COTE V2 on the base train/test split:

python tsml_eval/publications/2023/tsc_bakeoff/run_experiments.py data/ results/ HC2 ItalyPowerDemand 0

Exactly Reproducing Results

To better compare to past results and publications, our results on the 112 UCR datasets use the randomly generated resamples from the Java tsml package. To use these resample with our code, a flag must be toggled in the experiments file main method and individual files for each resample must be present in the data directory. These resamples in .ts file format are available for download here:

https://drive.google.com/file/d/1V36LSZLAK6FIYRfPx6mmE5euzogcXS83/view?usp=sharing - 112 UCR datasets using Java tsml resamples

The 30 new datasets used in our experiments use the resampling available by default in our experiments file. An exception to this is ProximityForest, which is implemented in Java and uses the Java resampling as a result.

We provide the resample indices used for each dataset for both Java and Python resamplers here:

Python - https://drive.google.com/file/d/1aLBP_nhnoqz075puKg30zuF3F_QBOXYM/view?usp=sharing

Java - https://drive.google.com/file/d/1FsG7Fp74y_TpaPhJ7U066ot8A07BPhr3/view?usp=sharing

Java Classifier Implementations

Three of the classifiers used in our comparison were implemented in Java due to a lack of Python implementations which function reliably and are capable of accurately reproducing published results. These classifiers are the ElasticEnsemble, ProximityForest and TS-CHIEF. We use the implementations from the Java tsml package from revisions where they are available. We make two jar files available for download which contain the implementations of these classifiers:

https://drive.google.com/file/d/1oXxpSa5PT9sBuVAbt57TLMANv4TMEejI/view?usp=sharing - TS-CHIEF and ProximityForest

https://drive.google.com/file/d/1Vmgg5u7SE2jmsakHVlxPxvT_AfaZ151e/view?usp=sharing - ElasticEnsemble

These jar files can be run from the command line using the following commands similar to the above Python classifiers:

java -jar tsml-ee.jar -dp=data/ -rp=results/  -cn="FastEE" -dn="ItalyPowerDemand" -f=0

or

java -jar tsml-forest.jar -dp=data/ -rp=results/ -cn="ProximityForest" -dn="ItalyPowerDemand" -f=0

or

java -jar tsml-forest.jar -dp=data/ -rp=results/  -cn="TS-CHIEF" -dn="ItalyPowerDemand" -f=0

Using Classifiers

Most of our classifiers are available in the aeon Python package.

The classifiers used in our experiments extend the scikit-learn interface and can also be used like their estimators:

[1]:
import warnings

warnings.filterwarnings("ignore")

from aeon.classification.interval_based import TimeSeriesForestClassifier
from sklearn.metrics import accuracy_score
from tsml.datasets import load_minimal_chinatown

from tsml_eval.estimators import SklearnToTsmlClassifier
from tsml_eval.publications.y2023.tsc_bakeoff import _set_bakeoff_classifier
from tsml_eval.utils.validation import is_sklearn_classifier

Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).

A function is available for loading from .ts files.

[2]:
# load example classification dataset
X_train, y_train = load_minimal_chinatown("TRAIN")
X_test, y_test = load_minimal_chinatown("TEST")

# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")

print(type(X_train), type(y_train))
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
X_train[:5]
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)
[2]:
array([[[ 573.,  375.,  301.,  212.,   55.,   34.,   25.,   33.,  113.,
          143.,  303.,  615., 1226., 1281., 1221., 1081.,  866., 1096.,
         1039.,  975.,  746.,  581.,  409.,  182.]],

       [[ 394.,  264.,  140.,  144.,  104.,   28.,   28.,   25.,   70.,
          153.,  401.,  649., 1216., 1399., 1249., 1240., 1109., 1137.,
         1290., 1137.,  791.,  638.,  597.,  316.]],

       [[ 603.,  348.,  176.,  177.,   47.,   30.,   40.,   42.,  101.,
          180.,  401.,  777., 1344., 1573., 1408., 1243., 1141., 1178.,
         1256., 1114.,  814.,  635.,  304.,  168.]],

       [[ 428.,  309.,  199.,  117.,   82.,   43.,   24.,   64.,  152.,
          183.,  408.,  797., 1288., 1491., 1523., 1460., 1365., 1520.,
         1700., 1797., 1596., 1139.,  910.,  640.]],

       [[ 372.,  310.,  203.,  133.,   65.,   39.,   27.,   36.,  107.,
          139.,  329.,  651.,  990., 1027., 1041.,  971., 1104.,  844.,
         1023., 1019.,  862.,  643.,  591.,  452.]]])

Classifiers can be built using the fit method and predictions can be made using predict.

[3]:
# build a TSF classifier and make predictions
tsf = TimeSeriesForestClassifier(n_estimators=100, random_state=0)
tsf.fit(X_train, y_train)
tsf.predict(X_test)
[3]:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
       2., 2., 2.])

predict_proba can be used to get class probabilities.

[4]:
tsf.predict_proba(X_test)
[4]:
array([[0.92, 0.08],
       [0.82, 0.18],
       [0.85, 0.15],
       [0.97, 0.03],
       [0.85, 0.15],
       [0.83, 0.17],
       [0.96, 0.04],
       [0.91, 0.09],
       [0.89, 0.11],
       [0.87, 0.13],
       [0.11, 0.89],
       [0.16, 0.84],
       [0.52, 0.48],
       [0.2 , 0.8 ],
       [0.07, 0.93],
       [0.97, 0.03],
       [0.11, 0.89],
       [0.  , 1.  ],
       [0.  , 1.  ],
       [0.35, 0.65]])

Here we run some of the classifiers from the publication and find the accuracy for them on our example dataset.

[5]:
classifiers = [
    "RDST",
    "R-STSF",
    "WEASEL-D",
    "MultiROCKET-Hydra",
]

accuracies = []
for classifier_name in classifiers:
    # Select a classifier by name, see set_bakeoff_classifier.py for options
    classifier = _set_bakeoff_classifier(classifier_name, random_state=0)

    # if it is a sklearn classifier, wrap it to work with time series data
    if is_sklearn_classifier(classifier):
        classifier = SklearnToTsmlClassifier(
            classifier=classifier, concatenate_channels=True, random_state=0
        )

    # fit and predict
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

accuracies
[5]:
[0.9, 0.9, 0.9, 0.85]

Classifier Parameters

Classifier

Parameters

1NN-DTW

Warping window: Full

ShapeDTW

Warping window: Full, Subsequence length: 30

EE

Neighbourhood size limit: 0.1, Parameter size limit: 0.5

PF

Number of trees: 100, Number of splits: 5

GRAIL

Classifier: SVM with CV, kernel: SINK, d: max(min(n * 0.4), 100), 3), f: 0.99

Catch22

Classifier: Random Forest with 500 trees

Signatures

Classifier: Random Forest with 500 trees, Truncation depth: 4, Window: dyadic with depth 3

TSFresh

Classifier: Random Forest with 500 trees, Parameter set: efficient, Feature extraction: FRESH

FreshPRINCE

Classifier: Rotation Forest with 200 trees, Parameter set: comprehensive

RSF

Number of trees: 100, Shapelets per node: 10

STC

n_shapelet_samples: 10000, max_shapelets: min(10 * n, 1000)

MrSQM

Feature selection strategy: RS, Features per representation: 500, Candidate features per representation: 2000, SAX transformers: 0, SFA Transformers: 5

RDST

Number of shapelets: 10000, Normalization chance: 0.8, Alpha similarity: 0.5

TSF

Number of trees: 500, Intervals per tree: sqrt(m)

RISE

Number of trees: 500, ACF lags: 100

CIF

Number of trees: 500, Intervals per tree: sqrt(m)*sqrt(d), Attributes per tree: 8, Max interval length: m/2

DrCIF

Number of trees: 500, Intervals per representation (rm=representation length): 4+(sqrt(rm)*sqrt(d))/3, Attributes per tree: 10, Max interval length: m/2

STSF

Number of trees: 500

R-STSF

Classifier: Extra trees with 500 trees, Interval extraction runs: 50

QUANT

Classifier: Extra trees with 200 trees, Interval depth: 6, Quantile divisor: 4

BOSS

Max ensemble size: 500, Accuracy threshold: 0.92

cBOSS

Parameter sets sampled: 250, Max ensemble size: 50

TDE

Parameter sets sampled: 250, Max ensemble size: 50

WEASEL v1.0

Classifier: Logistic regression, Alphabet size: 4, Feature selection threshold: 0.05

WEASEL v2.0

Classifier: Logistic regression, Alphabet size: 2, Max feature count: 30000

ROCKET

Classifier: Ridge with cross-validation, Number of kernels: 10000

Arsenal

Ensemble size: 25, Number of kernels: 2000

MultiROCKET

Classifier: Ridge with cross-validation, Number of kernels: 10000

MiniROCKET

Classifier: Ridge with cross-validation, Number of kernels: 10000

Hydra

Classifier: Ridge with cross-validation, Number of groups: 64, Kernels per group: 8,

MR-Hydra

Classifier: Ridge with cross-validation, MR kernels: 10000, g: Hydra groups, Hydra kernels per group: 8,

CNN

Average pooling size: 3, Batch size: 16, Kernel size: 7, Number of epochs: 2000, Number of layers: 2

ResNet

Batch size: 64, Number of layers: 3, Number of epochs: 1500, n_residual_blocks: 3

InceptionTime

Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32

H-InceptionTime

Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32

LiteTime

Ensemble size: 5, Batch size: 64, Kernel size: 40, Number of epochs: 1500, Inception filters: 32

TS-CHIEF

Number of trees: 500, EE splitters: 5, RISE splitters: 100, BOSS splitters: 100

HC1

Alpha: 4

HC2

Alpha: 4

RIST

Classifier: Extra trees with 500 trees, Number of intervals: sqrt(m) * sqrt(d) * 15 + 5, Number of shapelets: sqrt(m) * 200 + 5


Generated using nbsphinx. The Jupyter notebook can be found here. binder