tsml-eval Results Format

tsml-eval experiment functions will output a .csv result file in the tsml format. This file can contain infromation about the experiment such as the predictions made by the algorithm and the time taken to run the experiment. These result files can be used in the evaluation module to compare the performance of different algorithms.

While the result files have some common characteristics, each learning task has its own unique format. This notebook will outline the standard parts of the result files, them go over the individual format of each learning task.

tsml-eval results format

Overview

All experiments files will follow the same general format:

  • Line 1 contains information about the experiment being run; the items are the same for all results files:

    • dataset - the name of the dataset used in the experiment.

    • algorithm - the name of the algorithm used in the experiment.

    • split - the split of the dataset used in the experiment (i.e. train or test).

    • resample id - the dataset resample id or random seed used in the experiment.

    • time unit - the time unit used to measure the experiment (i.e. milliseconds or nanoseconds).

    • description - a description of the experiment, this is optional and can contain commas.

  • Line 2 contains parameter information from the estimator used in the experiment. This is specific to the estimator used and is generally free form, containing both input parameters and relevant information from fit if required. This line can contain any number of commas.

  • Line 3 contains results from the experiment. This is specific to the learning task and will be covered in the following sections, but will contain the following as the first 5 items:

    • performance - a commonly used performance metric for the task (i.e. accuracy or MSE).

    • fit time - the time taken to fit the model.

    • predict time - the time taken to make predictions for all cases in the file.

    • benchmark time - the time taken to run a simple function (i.e. sorting n arrays), used as a hardware benchmark.

    • memory usage - the memory usage of fit, in tsml-eval experiments this is the max memory usage recorded during the fit process.

  • Remaining lines record the results for each case. There should be as many lines as there are cases in the dataset, the ordering for each case should match that of the dataset used. The format of these lines is again specific to the learning task and will be covered in the following sections. The lines will follow a similar format to the following, however, with some tasks not recording probabilities:

    • target - the target variable for the case.

    • prediction - the estimator prediction for the case.

    • probabilities - the estimator probabilities for the case, this is available for all tasks. Preceded by an extra comma.

    • prediction time - the time to make the prediction. Optional and preceded by an extra comma.

    • description - additional details for the case. Optional and preceded by an extra comma but requires prediction time.

Some results can hold default/missing values such as -1 if they are not recorded. This is sometimes the case for metrics such as memory usage and benchmark time.

In the following, we provide an example of generating a classification results file.

[ ]:
from tsml.datasets import load_minimal_chinatown
from tsml.dummy import DummyClassifier

import tsml_eval.experiments.experiments as experiments
from tsml_eval.evaluation.storage import (
    load_classifier_results,
    load_clusterer_results,
    load_forecaster_results,
    load_regressor_results,
)
from tsml_eval.experiments import run_classification_experiment
[1]:
experiments.MEMRECORD_INTERVAL = 0.1

X_train, y_train = load_minimal_chinatown(split="train")
X_test, y_test = load_minimal_chinatown(split="test")
classifier = DummyClassifier(strategy="stratified", random_state=0)

run_classification_experiment(
    X_train,
    y_train,
    X_test,
    y_test,
    classifier,
    "./generated_results/",
    dataset_name="Chinatown",
    resample_id=0,
)

with open(
    "./generated_results/DummyClassifier/Predictions/Chinatown/testResample0.csv", "r"
) as f:
    for _ in range(7):
        print(f.readline().strip())
Chinatown,DummyClassifier,TEST,0,MILLISECONDS,Generated by run_classification_experiment on 11/22/2023, 19:30:49. Encoder dictionary: {1.0: 0, 2.0: 1}
{'constant': None, 'random_state': 0, 'strategy': 'stratified', 'validate': False}
0.5,0,0,755,45056,2,,-1,-1
0,0,,1.0,0.0
0,0,,1.0,0.0
0,0,,1.0,0.0
0,0,,1.0,0.0

Classification

This section covers results files for the classification learning task. The result files for classification contain the following task-specific information:

The third line for classification results files contains the following values in the order: accuracy, fit time, predict time, benchmark time, memory usage, number of classes, train error estimate method, train error estimate time, fit plus train error estimate time. The final three items are only relevant if generating an estimate of error on the train set.

The case lines for classification result files contain the following values in the order: true class value, predicted class value, (space), n_classes * class probabilities, (space), case prediction time, (space), case description.

Classifier results files can be loaded into a ClassifierResults object for use in the evaluation module:

[2]:
cr = load_classifier_results(
    "../tsml_eval/testing/_test_results_files/classification/"
    "ROCKET/Predictions/ItalyPowerDemand/testResample0.csv"
)
cr.accuracy
[2]:
0.9698736637512148

Clustering

This section covers results files for the clustering learning task. The result files for clustering contain the following task-specific information:

The third line for clustering results files contains the following values in the order: clustering accuracy, fit time, predict time, benchmark time, memory usage, number of classes, number of clusters.

The case lines for clustering result files contain the following values in the order: true class value, cluster label, (space), n_clusters * cluster probabilities, (space), case prediction time, (space), case description.

Clusterer results files can be loaded into a ClustererResults object for use in the evaluation module:

[3]:
clr = load_clusterer_results(
    "../tsml_eval/testing/_test_results_files/clustering/"
    "KMeans-msm/Predictions/Chinatown/trainResample0.csv"
)
clr.clustering_accuracy
[3]:
0.7

Regression

This section covers results files for the regression learning task. The result files for regression contain the following task-specific information:

The third line for regression results files contains the following values in the order: mean squared error, fit time, predict time, benchmark time, memory usage, train error estimate method, train error estimate time, fit plus train error estimate time. The final three items are only relevant if generating an estimate of error on the train set.

The case lines for regression result files contain the following values in the order: target value, predicted value, (space), case prediction time, (space), case description.

Regressor results files can be loaded into a RegressorResults object for use in the evaluation module:

[4]:
rr = load_regressor_results(
    "../tsml_eval/testing/_test_results_files/regression/"
    "ROCKET/Predictions/Covid3Month/testResample0.csv"
)
rr.mean_squared_error
[4]:
0.0015126663111567206

Forecasting

This section covers results files for the forecasting learning task. Each case line is a time series value to being forecasted, rather than an independent instance. Forecasting is relatively underdeveloped in tsml-eval, and it is likely the functionality will change over time. The result files for forecasting contain the following task-specific information:

The third line for forecasting results files contains the following values in the order: mean absolute percentage error, fit time, predict time, benchmark time, memory usage.

The case lines for forecasting result files contain the following values in the order: target value, predicted value, (space), case prediction time, (space), case description.

Forecaster results files can be loaded into a ForecasterResults object for use in the evaluation module:

[5]:
fr = load_forecaster_results(
    "../tsml_eval/testing/_test_results_files/forecasting/"
    "NaiveForecaster/Predictions/ShampooSales/testResample0.csv"
)
fr.mean_absolute_percentage_error
[5]:
0.2603808539887312

Generated using nbsphinx. The Jupyter notebook can be found here. binder