Bake off redux: a review and experimental evaluation of recent time series classification algorithms¶

This is the webpage and repo package to support the paper “Bake off redux: a review and experimental evaluation of recent time series classification algorithms” published in Data Mining and Knowledge Discovery.

Our results files are stored here.

Correction:

The datasets Covid3Month_disc, FloodModeling1_disc, FloodModeling2_disc and FloodModeling3_disc have been fixed since the original pre-print. Unfortunately Table 1 and Table C4 retain some values from the previous versions of these datasets.

The correct test set sizes are 61, 202, 201 and 184 respectively for Table 1, and the correct accuracy values for Table C4 can be found here. Please use the updated datasets and results in any paper sourcing results from this publication.

Datasets¶

The 112 UCR archive datasets are available at the timeseriesclassification.com datasets page.

The 30 new datasets will be uploaded to timeseriesclassification.com in due course. For now, we provide the following link:

https://drive.google.com/file/d/1vuh6mgNrNKjHr9MMRQP0J0_gGA4dE7E3/view?usp=sharing

Install¶

To install the latest version of the package with up-to-date algorithms, run:

pip install tsml-eval

To install the package at the time of publication, run:

pip install tsml-eval==0.2.1

Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:

pip install tsml-eval[all_extras,deep_learning]

To install dependency versions used at the time of publication, use the publication requirements.txt:

pip install -r tsml_eval/publications/2023/tsc_bakeoff/static_publication_reqs.txt

Usage¶

Command Line¶

Run run_experiments.py with the following arguments:

Path to the data directory
Path to the results directory
The name of the model to run (see set_bakeoff_classifier.py, i.e. R-STSF, HC2, InceptionTime)
The name of the problem to run
The resample number to run (0 is base train/test split)

i.e. to run ItalyPowerDemand using HIVE-COTE V2 on the base train/test split:

python tsml_eval/publications/2023/tsc_bakeoff/run_experiments.py data/ results/ HC2 ItalyPowerDemand 0

Exactly Reproducing Results¶

To better compare to past results and publications, our results on the 112 UCR datasets use the randomly generated resamples from the Java tsml package. To use these resample with our code, a flag must be toggled in the experiments file main method and individual files for each resample must be present in the data directory. These resamples in .ts file format are available for download here:

https://drive.google.com/file/d/1V36LSZLAK6FIYRfPx6mmE5euzogcXS83/view?usp=sharing - 112 UCR datasets using Java tsml resamples

The 30 new datasets used in our experiments use the resampling available by default in our experiments file. An exception to this is ProximityForest, which is implemented in Java and uses the Java resampling as a result.

We provide the resample indices used for each dataset for both Java and Python resamplers here:

Python - https://drive.google.com/file/d/1aLBP_nhnoqz075puKg30zuF3F_QBOXYM/view?usp=sharing

Java - https://drive.google.com/file/d/1FsG7Fp74y_TpaPhJ7U066ot8A07BPhr3/view?usp=sharing

Java Classifier Implementations¶

Three of the classifiers used in our comparison were implemented in Java due to a lack of Python implementations which function reliably and are capable of accurately reproducing published results. These classifiers are the ElasticEnsemble, ProximityForest and TS-CHIEF. We use the implementations from the Java tsml package from revisions where they are available. We make two jar files available for download which contain the implementations of these classifiers:

https://drive.google.com/file/d/1oXxpSa5PT9sBuVAbt57TLMANv4TMEejI/view?usp=sharing - TS-CHIEF and ProximityForest

https://drive.google.com/file/d/1Vmgg5u7SE2jmsakHVlxPxvT_AfaZ151e/view?usp=sharing - ElasticEnsemble

These jar files can be run from the command line using the following commands similar to the above Python classifiers:

java -jar tsml-ee.jar -dp=data/ -rp=results/  -cn="FastEE" -dn="ItalyPowerDemand" -f=0

java -jar tsml-forest.jar -dp=data/ -rp=results/ -cn="ProximityForest" -dn="ItalyPowerDemand" -f=0

java -jar tsml-forest.jar -dp=data/ -rp=results/  -cn="TS-CHIEF" -dn="ItalyPowerDemand" -f=0

Using Classifiers¶

Most of our classifiers are available in the aeon Python package.

The classifiers used in our experiments extend the scikit-learn interface and can also be used like their estimators:

[1]:

import warnings

warnings.filterwarnings("ignore")

from aeon.classification.interval_based import TimeSeriesForestClassifier
from sklearn.metrics import accuracy_score
from tsml.datasets import load_minimal_chinatown

from tsml_eval.estimators import SklearnToTsmlClassifier
from tsml_eval.publications.y2023.tsc_bakeoff import _set_bakeoff_classifier
from tsml_eval.utils.validation import is_sklearn_classifier

Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).

A function is available for loading from .ts files.

[2]:

# load example classification dataset
X_train, y_train = load_minimal_chinatown("TRAIN")
X_test, y_test = load_minimal_chinatown("TEST")

# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")

print(type(X_train), type(y_train))
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
X_train[:5]

<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)

[2]:

array([[[ 573.,  375.,  301.,  212.,   55.,   34.,   25.,   33.,  113.,
          143.,  303.,  615., 1226., 1281., 1221., 1081.,  866., 1096.,
         1039.,  975.,  746.,  581.,  409.,  182.]],

       [[ 394.,  264.,  140.,  144.,  104.,   28.,   28.,   25.,   70.,
          153.,  401.,  649., 1216., 1399., 1249., 1240., 1109., 1137.,
         1290., 1137.,  791.,  638.,  597.,  316.]],

       [[ 603.,  348.,  176.,  177.,   47.,   30.,   40.,   42.,  101.,
          180.,  401.,  777., 1344., 1573., 1408., 1243., 1141., 1178.,
         1256., 1114.,  814.,  635.,  304.,  168.]],

       [[ 428.,  309.,  199.,  117.,   82.,   43.,   24.,   64.,  152.,
          183.,  408.,  797., 1288., 1491., 1523., 1460., 1365., 1520.,
         1700., 1797., 1596., 1139.,  910.,  640.]],

       [[ 372.,  310.,  203.,  133.,   65.,   39.,   27.,   36.,  107.,
          139.,  329.,  651.,  990., 1027., 1041.,  971., 1104.,  844.,
         1023., 1019.,  862.,  643.,  591.,  452.]]])

Classifiers can be built using the fit method and predictions can be made using predict.

[3]:

# build a TSF classifier and make predictions
tsf = TimeSeriesForestClassifier(n_estimators=100, random_state=0)
tsf.fit(X_train, y_train)
tsf.predict(X_test)

[3]:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
       2., 2., 2.])

predict_proba can be used to get class probabilities.

[4]:

tsf.predict_proba(X_test)

[4]:

array([[0.92, 0.08],
       [0.82, 0.18],
       [0.85, 0.15],
       [0.97, 0.03],
       [0.85, 0.15],
       [0.83, 0.17],
       [0.96, 0.04],
       [0.91, 0.09],
       [0.89, 0.11],
       [0.87, 0.13],
       [0.11, 0.89],
       [0.16, 0.84],
       [0.52, 0.48],
       [0.2 , 0.8 ],
       [0.07, 0.93],
       [0.97, 0.03],
       [0.11, 0.89],
       [0.  , 1.  ],
       [0.  , 1.  ],
       [0.35, 0.65]])

Here we run some of the classifiers from the publication and find the accuracy for them on our example dataset.

[5]:

classifiers = [
    "RDST",
    "R-STSF",
    "WEASEL-D",
    "MultiROCKET-Hydra",
]

accuracies = []
for classifier_name in classifiers:
    # Select a classifier by name, see set_bakeoff_classifier.py for options
    classifier = _set_bakeoff_classifier(classifier_name, random_state=0)

    # if it is a sklearn classifier, wrap it to work with time series data
    if is_sklearn_classifier(classifier):
        classifier = SklearnToTsmlClassifier(
            classifier=classifier, concatenate_channels=True, random_state=0
        )

    # fit and predict
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

accuracies

[5]:

[0.9, 0.9, 0.9, 0.85]

Classifier Parameters¶

Classifier	Parameters
1NN-DTW	Warping window: Full
ShapeDTW	Warping window: Full, Subsequence length: 30
EE	Neighbourhood size limit: 0.1, Parameter size limit: 0.5
PF	Number of trees: 100, Number of splits: 5
GRAIL	Classifier: SVM with CV, kernel: SINK, d: max(min(n * 0.4), 100), 3), f: 0.99
Catch22	Classifier: Random Forest with 500 trees
Signatures	Classifier: Random Forest with 500 trees, Truncation depth: 4, Window: dyadic with depth 3
TSFresh	Classifier: Random Forest with 500 trees, Parameter set: efficient, Feature extraction: FRESH
FreshPRINCE	Classifier: Rotation Forest with 200 trees, Parameter set: comprehensive
RSF	Number of trees: 100, Shapelets per node: 10
STC	n_shapelet_samples: 10000, max_shapelets: min(10 * n, 1000)
MrSQM	Feature selection strategy: RS, Features per representation: 500, Candidate features per representation: 2000, SAX transformers: 0, SFA Transformers: 5
RDST	Number of shapelets: 10000, Normalization chance: 0.8, Alpha similarity: 0.5
TSF	Number of trees: 500, Intervals per tree: sqrt(m)
RISE	Number of trees: 500, ACF lags: 100
CIF	Number of trees: 500, Intervals per tree: sqrt(m)*sqrt(d), Attributes per tree: 8, Max interval length: m/2
DrCIF	Number of trees: 500, Intervals per representation (rm=representation length): 4+(sqrt(rm)*sqrt(d))/3, Attributes per tree: 10, Max interval length: m/2
STSF	Number of trees: 500
R-STSF	Classifier: Extra trees with 500 trees, Interval extraction runs: 50
QUANT	Classifier: Extra trees with 200 trees, Interval depth: 6, Quantile divisor: 4
BOSS	Max ensemble size: 500, Accuracy threshold: 0.92
cBOSS	Parameter sets sampled: 250, Max ensemble size: 50
TDE	Parameter sets sampled: 250, Max ensemble size: 50
WEASEL v1.0	Classifier: Logistic regression, Alphabet size: 4, Feature selection threshold: 0.05
WEASEL v2.0	Classifier: Logistic regression, Alphabet size: 2, Max feature count: 30000
ROCKET	Classifier: Ridge with cross-validation, Number of kernels: 10000
Arsenal	Ensemble size: 25, Number of kernels: 2000
MultiROCKET	Classifier: Ridge with cross-validation, Number of kernels: 10000
MiniROCKET	Classifier: Ridge with cross-validation, Number of kernels: 10000
Hydra	Classifier: Ridge with cross-validation, Number of groups: 64, Kernels per group: 8,
MR-Hydra	Classifier: Ridge with cross-validation, MR kernels: 10000, g: Hydra groups, Hydra kernels per group: 8,
CNN	Average pooling size: 3, Batch size: 16, Kernel size: 7, Number of epochs: 2000, Number of layers: 2
ResNet	Batch size: 64, Number of layers: 3, Number of epochs: 1500, n_residual_blocks: 3
InceptionTime	Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32
H-InceptionTime	Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32
LiteTime	Ensemble size: 5, Batch size: 64, Kernel size: 40, Number of epochs: 1500, Inception filters: 32
TS-CHIEF	Number of trees: 500, EE splitters: 5, RISE splitters: 100, BOSS splitters: 100
HC1	Alpha: 4
HC2	Alpha: 4
RIST	Classifier: Extra trees with 500 trees, Number of intervals: sqrt(m) * sqrt(d) * 15 + 5, Number of shapelets: sqrt(m) * 200 + 5

Generated using nbsphinx. The Jupyter notebook can be found here.