Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression¶
This is the webpage and repo package to support the paper “Extracting Features from Random Subseries: A Hybrid Pipeline for Time Series Classification and Extrinsic Regression” submitted to the International Workshop on Advanced Analytics and Learning on Temporal Data (AALTD) 2023.
Our results files are stored here.
Datasets¶
The 112 UCR archive datasets are available at timeseriesclassification.com.
The 63 regression datasets are available at the archive expansion webpage.
Install¶
To install the latest version of the package with up-to-date algorithms, run:
pip install tsml-eval
To install the package at the time of publication, run:
pip install tsml-eval==0.1.0
Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:
pip install tsml-eval[all_extras,deep_learning]
RIST requires the pycatch22
and pyfftw
packages. To install these, run:
pip install pycatch22 pyfftw
This can be unstable on setups, if you cannot install these packages they can be disabled by editing the classifier parameters (this will change the results produced) i.e.
RISTClassifier(use_pycatch22=False, use_pyfftw=False)
To install dependency versions used at the time of publication, use the publication requirements.txt:
pip install -r tsml_eval/publications/2023/rist_pipeline/static_publication_reqs.txt
Usage¶
Command Line¶
Run run_classification_experiments.py or run_regression_experiments.py with the following arguments:
Path to the data directory
Path to the results directory
The name of the model to run (see set_rist_classifier.py or set_rist_regressor.py, i.e. RIST, RDST, DrCIF)
The name of the problem to run
The resample number to run (0 is base train/test split)
i.e. to run the ItalyPowerDemand classification problem using RIST on the base train/test split:
python tsml_eval/publications/2023/rist_pipeline/run_classification_experiments.py data/ results/ RIST ItalyPowerDemand 0
Using Classifiers and Regressors¶
Most of our classifiers are available in the aeon
Python package.
The classifiers and regressors used in our experiments extend the scikit-learn
interface and can also be used like their estimators:
[7]:
import warnings
warnings.filterwarnings("ignore")
from aeon.classification.hybrid import RISTClassifier
from aeon.regression.hybrid import RISTRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
from tsml.datasets import load_minimal_chinatown, load_minimal_gas_prices
Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).
A function is available for loading from .ts files.
[8]:
# load example classification dataset
X_train_c, y_train_c = load_minimal_chinatown("TRAIN")
X_test_c, y_test_c = load_minimal_chinatown("TEST")
# load example regression dataset
X_train_r, y_train_r = load_minimal_gas_prices("TRAIN")
X_test_r, y_test_r = load_minimal_gas_prices("TEST")
# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")
print(type(X_train_c), type(y_train_c))
print(X_train_c.shape, y_train_c.shape)
print(X_test_c.shape, y_test_c.shape)
X_train_c[:5]
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)
[8]:
array([[[ 573., 375., 301., 212., 55., 34., 25., 33., 113.,
143., 303., 615., 1226., 1281., 1221., 1081., 866., 1096.,
1039., 975., 746., 581., 409., 182.]],
[[ 394., 264., 140., 144., 104., 28., 28., 25., 70.,
153., 401., 649., 1216., 1399., 1249., 1240., 1109., 1137.,
1290., 1137., 791., 638., 597., 316.]],
[[ 603., 348., 176., 177., 47., 30., 40., 42., 101.,
180., 401., 777., 1344., 1573., 1408., 1243., 1141., 1178.,
1256., 1114., 814., 635., 304., 168.]],
[[ 428., 309., 199., 117., 82., 43., 24., 64., 152.,
183., 408., 797., 1288., 1491., 1523., 1460., 1365., 1520.,
1700., 1797., 1596., 1139., 910., 640.]],
[[ 372., 310., 203., 133., 65., 39., 27., 36., 107.,
139., 329., 651., 990., 1027., 1041., 971., 1104., 844.,
1023., 1019., 862., 643., 591., 452.]]])
Classifiers and regressors can be built using the fit
method and predictions can be made using predict
. predict_proba
can be used to get class probabilities for classifiers.
Here we run the RIST classifier and regressor from the publication and find the accuracy and RMSE for it on our example data.
[9]:
rist_c = RISTClassifier(random_state=0)
rist_c.fit(X_train_c, y_train_c)
y_pred_c = rist_c.predict(X_test_c)
y_pred_c
[9]:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
2., 2., 2.])
[10]:
accuracy_score(y_test_c, y_pred_c)
[10]:
0.9
[11]:
rist_r = RISTRegressor(random_state=0)
rist_r.fit(X_train_r, y_train_r)
y_pred_r = rist_r.predict(X_test_r)
y_pred_r
[11]:
array([-0.31689489, -0.31613551, -0.32835623, -0.39940986, -0.30016315,
-0.31231658, -0.25754774, -0.28900786, -0.31202351, -0.3132342 ,
-0.27315226, -0.38427014, -0.32339463, -0.26477721, -0.32560753,
-0.30756101, -0.30214585, -0.40835526, -0.38768561, -0.39179725])
[12]:
mean_squared_error(y_test_r, y_pred_r, squared=False)
[12]:
0.10593838895386118
Generated using nbsphinx. The Jupyter notebook can be found here.