Bake off redux: a review and experimental evaluation of recent time series classification algorithms¶
This is the webpage and repo package to support the paper “Bake off redux: a review and experimental evaluation of recent time series classification algorithms” published in Data Mining and Knowledge Discovery.
Our results files are stored here.
Correction:The datasets Covid3Month_disc, FloodModeling1_disc, FloodModeling2_disc and FloodModeling3_disc have been fixed since the original pre-print. Unfortunately Table 1 and Table C4 retain some values from the previous versions of these datasets.The correct test set sizes are 61, 202, 201 and 184 respectively for Table 1, and the correct accuracy values for Table C4 can be found here. Please use the updated datasets and results in any paper sourcing results from this publication.
Datasets¶
The 112 UCR archive datasets are available at the timeseriesclassification.com datasets page.
The 30 new datasets will be uploaded to timeseriesclassification.com in due course. For now, we provide the following link:
https://drive.google.com/file/d/1vuh6mgNrNKjHr9MMRQP0J0_gGA4dE7E3/view?usp=sharing
Install¶
To install the latest version of the package with up-to-date algorithms, run:
pip install tsml-eval
To install the package at the time of publication, run:
pip install tsml-eval==0.2.1
Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:
pip install tsml-eval[all_extras,deep_learning]
To install dependency versions used at the time of publication, use the publication requirements.txt:
pip install -r tsml_eval/publications/2023/tsc_bakeoff/static_publication_reqs.txt
Usage¶
Command Line¶
Run run_experiments.py with the following arguments:
Path to the data directory
Path to the results directory
The name of the model to run (see set_bakeoff_classifier.py, i.e. R-STSF, HC2, InceptionTime)
The name of the problem to run
The resample number to run (0 is base train/test split)
i.e. to run ItalyPowerDemand using HIVE-COTE V2 on the base train/test split:
python tsml_eval/publications/2023/tsc_bakeoff/run_experiments.py data/ results/ HC2 ItalyPowerDemand 0
Exactly Reproducing Results¶
To better compare to past results and publications, our results on the 112 UCR datasets use the randomly generated resamples from the Java tsml package. To use these resample with our code, a flag must be toggled in the experiments file main method and individual files for each resample must be present in the data directory. These resamples in .ts file format are available for download here:
https://drive.google.com/file/d/1V36LSZLAK6FIYRfPx6mmE5euzogcXS83/view?usp=sharing - 112 UCR datasets using Java tsml resamples
The 30 new datasets used in our experiments use the resampling available by default in our experiments file. An exception to this is ProximityForest, which is implemented in Java and uses the Java resampling as a result.
We provide the resample indices used for each dataset for both Java and Python resamplers here:
Python - https://drive.google.com/file/d/1aLBP_nhnoqz075puKg30zuF3F_QBOXYM/view?usp=sharing
Java - https://drive.google.com/file/d/1FsG7Fp74y_TpaPhJ7U066ot8A07BPhr3/view?usp=sharing
Java Classifier Implementations¶
Three of the classifiers used in our comparison were implemented in Java due to a lack of Python implementations which function reliably and are capable of accurately reproducing published results. These classifiers are the ElasticEnsemble, ProximityForest and TS-CHIEF. We use the implementations from the Java tsml package from revisions where they are available. We make two jar files available for download which contain the implementations of these classifiers:
https://drive.google.com/file/d/1oXxpSa5PT9sBuVAbt57TLMANv4TMEejI/view?usp=sharing - TS-CHIEF and ProximityForest
https://drive.google.com/file/d/1Vmgg5u7SE2jmsakHVlxPxvT_AfaZ151e/view?usp=sharing - ElasticEnsemble
These jar files can be run from the command line using the following commands similar to the above Python classifiers:
java -jar tsml-ee.jar -dp=data/ -rp=results/ -cn="FastEE" -dn="ItalyPowerDemand" -f=0
or
java -jar tsml-forest.jar -dp=data/ -rp=results/ -cn="ProximityForest" -dn="ItalyPowerDemand" -f=0
or
java -jar tsml-forest.jar -dp=data/ -rp=results/ -cn="TS-CHIEF" -dn="ItalyPowerDemand" -f=0
Using Classifiers¶
Most of our classifiers are available in the aeon
Python package.
The classifiers used in our experiments extend the scikit-learn
interface and can also be used like their estimators:
[1]:
import warnings
warnings.filterwarnings("ignore")
from aeon.classification.interval_based import TimeSeriesForestClassifier
from sklearn.metrics import accuracy_score
from tsml.datasets import load_minimal_chinatown
from tsml_eval.estimators import SklearnToTsmlClassifier
from tsml_eval.publications.y2023.tsc_bakeoff import _set_bakeoff_classifier
from tsml_eval.utils.estimator_validation import is_sklearn_classifier
Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).
A function is available for loading from .ts files.
[2]:
# load example classification dataset
X_train, y_train = load_minimal_chinatown("TRAIN")
X_test, y_test = load_minimal_chinatown("TEST")
# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")
print(type(X_train), type(y_train))
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
X_train[:5]
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)
[2]:
array([[[ 573., 375., 301., 212., 55., 34., 25., 33., 113.,
143., 303., 615., 1226., 1281., 1221., 1081., 866., 1096.,
1039., 975., 746., 581., 409., 182.]],
[[ 394., 264., 140., 144., 104., 28., 28., 25., 70.,
153., 401., 649., 1216., 1399., 1249., 1240., 1109., 1137.,
1290., 1137., 791., 638., 597., 316.]],
[[ 603., 348., 176., 177., 47., 30., 40., 42., 101.,
180., 401., 777., 1344., 1573., 1408., 1243., 1141., 1178.,
1256., 1114., 814., 635., 304., 168.]],
[[ 428., 309., 199., 117., 82., 43., 24., 64., 152.,
183., 408., 797., 1288., 1491., 1523., 1460., 1365., 1520.,
1700., 1797., 1596., 1139., 910., 640.]],
[[ 372., 310., 203., 133., 65., 39., 27., 36., 107.,
139., 329., 651., 990., 1027., 1041., 971., 1104., 844.,
1023., 1019., 862., 643., 591., 452.]]])
Classifiers can be built using the fit
method and predictions can be made using predict
.
[3]:
# build a TSF classifier and make predictions
tsf = TimeSeriesForestClassifier(n_estimators=100, random_state=0)
tsf.fit(X_train, y_train)
tsf.predict(X_test)
[3]:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
2., 2., 2.])
predict_proba
can be used to get class probabilities.
[4]:
tsf.predict_proba(X_test)
[4]:
array([[0.92, 0.08],
[0.82, 0.18],
[0.85, 0.15],
[0.97, 0.03],
[0.85, 0.15],
[0.83, 0.17],
[0.96, 0.04],
[0.91, 0.09],
[0.89, 0.11],
[0.87, 0.13],
[0.11, 0.89],
[0.16, 0.84],
[0.52, 0.48],
[0.2 , 0.8 ],
[0.07, 0.93],
[0.97, 0.03],
[0.11, 0.89],
[0. , 1. ],
[0. , 1. ],
[0.35, 0.65]])
Here we run some of the classifiers from the publication and find the accuracy for them on our example dataset.
[5]:
classifiers = [
"RDST",
"R-STSF",
"WEASEL-D",
"MultiROCKET-Hydra",
]
accuracies = []
for classifier_name in classifiers:
# Select a classifier by name, see set_bakeoff_classifier.py for options
classifier = _set_bakeoff_classifier(classifier_name, random_state=0)
# if it is a sklearn classifier, wrap it to work with time series data
if is_sklearn_classifier(classifier):
classifier = SklearnToTsmlClassifier(
classifier=classifier, concatenate_channels=True, random_state=0
)
# fit and predict
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracies.append(accuracy_score(y_test, y_pred))
accuracies
[5]:
[0.9, 0.9, 0.9, 0.85]
Classifier Parameters¶
Classifier |
Parameters |
---|---|
1NN-DTW |
Warping window: Full |
ShapeDTW |
Warping window: Full, Subsequence length: 30 |
EE |
Neighbourhood size limit: 0.1, Parameter size limit: 0.5 |
PF |
Number of trees: 100, Number of splits: 5 |
GRAIL |
Classifier: SVM with CV, kernel: SINK, d: max(min(n * 0.4), 100), 3), f: 0.99 |
Catch22 |
Classifier: Random Forest with 500 trees |
Signatures |
Classifier: Random Forest with 500 trees, Truncation depth: 4, Window: dyadic with depth 3 |
TSFresh |
Classifier: Random Forest with 500 trees, Parameter set: efficient, Feature extraction: FRESH |
FreshPRINCE |
Classifier: Rotation Forest with 200 trees, Parameter set: comprehensive |
RSF |
Number of trees: 100, Shapelets per node: 10 |
STC |
n_shapelet_samples: 10000, max_shapelets: min(10 * n, 1000) |
MrSQM |
Feature selection strategy: RS, Features per representation: 500, Candidate features per representation: 2000, SAX transformers: 0, SFA Transformers: 5 |
RDST |
Number of shapelets: 10000, Normalization chance: 0.8, Alpha similarity: 0.5 |
TSF |
Number of trees: 500, Intervals per tree: sqrt(m) |
RISE |
Number of trees: 500, ACF lags: 100 |
CIF |
Number of trees: 500, Intervals per tree: sqrt(m)*sqrt(d), Attributes per tree: 8, Max interval length: m/2 |
DrCIF |
Number of trees: 500, Intervals per representation (rm=representation length): 4+(sqrt(rm)*sqrt(d))/3, Attributes per tree: 10, Max interval length: m/2 |
STSF |
Number of trees: 500 |
R-STSF |
Classifier: Extra trees with 500 trees, Interval extraction runs: 50 |
QUANT |
Classifier: Extra trees with 200 trees, Interval depth: 6, Quantile divisor: 4 |
BOSS |
Max ensemble size: 500, Accuracy threshold: 0.92 |
cBOSS |
Parameter sets sampled: 250, Max ensemble size: 50 |
TDE |
Parameter sets sampled: 250, Max ensemble size: 50 |
WEASEL v1.0 |
Classifier: Logistic regression, Alphabet size: 4, Feature selection threshold: 0.05 |
WEASEL v2.0 |
Classifier: Logistic regression, Alphabet size: 2, Max feature count: 30000 |
ROCKET |
Classifier: Ridge with cross-validation, Number of kernels: 10000 |
Arsenal |
Ensemble size: 25, Number of kernels: 2000 |
MultiROCKET |
Classifier: Ridge with cross-validation, Number of kernels: 10000 |
MiniROCKET |
Classifier: Ridge with cross-validation, Number of kernels: 10000 |
Hydra |
Classifier: Ridge with cross-validation, Number of groups: 64, Kernels per group: 8, |
MR-Hydra |
Classifier: Ridge with cross-validation, MR kernels: 10000, g: Hydra groups, Hydra kernels per group: 8, |
CNN |
Average pooling size: 3, Batch size: 16, Kernel size: 7, Number of epochs: 2000, Number of layers: 2 |
ResNet |
Batch size: 64, Number of layers: 3, Number of epochs: 1500, n_residual_blocks: 3 |
InceptionTime |
Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32 |
H-InceptionTime |
Ensemble size: 5, Batch size: 64, Inception modules: 6, Kernel size: 40, Max pooling size: 3, Number of epochs: 1500, Number of layers: 3, Inception filters: 32 |
LiteTime |
Ensemble size: 5, Batch size: 64, Kernel size: 40, Number of epochs: 1500, Inception filters: 32 |
TS-CHIEF |
Number of trees: 500, EE splitters: 5, RISE splitters: 100, BOSS splitters: 100 |
HC1 |
Alpha: 4 |
HC2 |
Alpha: 4 |
RIST |
Classifier: Extra trees with 500 trees, Number of intervals: sqrt(m) * sqrt(d) * 15 + 5, Number of shapelets: sqrt(m) * 200 + 5 |
Generated using nbsphinx. The Jupyter notebook can be found here.