ClustererResults

tsml_eval.evaluation.storage.ClustererResults(dataset_name='N/A', clusterer_name='N/A', split='N/A', resample_id=None, time_unit='nanoseconds', description='', parameters='No parameter info', fit_time=-1.0, predict_time=-1.0, benchmark_time=-1.0, memory_usage=-1.0, n_classes=None, n_clusters=None, class_labels=None, predictions=None, probabilities=None, pred_times=None, pred_descriptions=None)[source]

A class for storing and managing results from clustering experiments.

This class provides functionalities for storing clustering results, including cluster labels, probabilities, and various performance metrics. It extends the EstimatorResults class, inheriting its base functionalities.

Parameters:
dataset_namestr, default=”N/A”

Name of the dataset used.

clusterer_namestr, default=”N/A”

Name of the clusterer used.

splitstr, default=”N/A”

Type of data split used, i.e. “train” or “test”.

resample_idint or None, default=None

Random seed used for the data resample, with 0 usually being the original data.

time_unitstr, default=”nanoseconds”

Time measurement used for other fields.

descriptionstr, default=””

Additional description of the clustering experiment. Appended to the end of the first line of the results file.

parametersstr, default=”No parameter info”

Information about parameters used in the clusterer and other build information. Written to the second line of the results file.

fit_timefloat, default=-1.0

Time taken fitting the model.

predict_timefloat, default=-1.0

Time taken making predictions.

benchmark_timefloat, default=-1.0

Time taken to run a simple benchmark function. In tsml-eval experiments, this is the time spent to sort 1,000 (seeded) random numpy arrays of size 20,000.

memory_usagefloat, default=-1.0

Memory usage during the experiment. In tsml-eval experiments, this is the peak memory usage during the fit method.

n_classesint or None, default=None

Number of classes in the dataset.

n_clustersint or None, default=None

Number of clusters generated.

class_labelsarray-like or None, default=None

Actual class labels.

predictionsarray-like or None, default=None

Predicted cluster labels.

probabilitiesarray-like or None, default=None

Predicted cluster probabilities.

pred_timesarray-like or None, default=None

Prediction times for each case.

pred_descriptionslist of str or None, default=None

Descriptions for each prediction.

Attributes:
n_casesint or None

Number of cases in the dataset.

clustering_accuracyfloat or None

Clustering accuracy score.

rand_indexfloat or None

Rand score.

adjusted_rand_indexfloat or None

Adjusted Rand score.

mutual_informationfloat or None

Mutual information score.

adjusted_mutual_informationfloat or None

Adjusted mutual information score.

normalised_mutual_informationfloat or None

Normalised mutual information score.

Examples

>>> from tsml_eval.evaluation.storage import ClustererResults
>>> from tsml_eval.testing.testing_utils import _TEST_RESULTS_PATH
>>> cr = ClustererResults().load_from_file(
...     _TEST_RESULTS_PATH +
...     "/clustering/KMeans/Predictions/Trace/trainResample0.csv"
... )
>>> cr.calculate_statistics()
>>> acc = cr.clustering_accuracy