run_clustering_experiment¶

tsml_eval.experiments.run_clustering_experiment(X_train: ndarray | list, y_train: ndarray, clusterer, results_path, X_test: ndarray | list | None = None, y_test: ndarray | None = None, n_clusters=None, clusterer_name=None, dataset_name='N/A', resample_id=None, data_transforms=None, build_test_file=False, build_train_file=True, attribute_file_path=None, att_max_shape=0, benchmark_time=True)[source]¶

Run a clustering experiment and save the results to file.

Function to run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination and write the results to csv file(s) at a given location.

Parameters:

X_trainnp.ndarray or list of np.ndarray

The data to train the classifier. Numpy array or list of numpy arrays in the aeon data format.

y_trainnp.array

Training data class labels. One label per case in the training data using the same ordering.

clustererBaseClusterer

Clusterer to be used in the experiment.

results_pathstr

Location of where to write results. Any required directories will be created.

X_testnp.ndarray or list of np.ndarray

The data used to test the trained classifier. Numpy array or list of numpy arrays in the aeon data format.

y_testnp.array

Testing data class labels. One label per case in the testing data using the same ordering.

n_clustersint or None, default=None

Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.

The n_clusters parameter for arguments which are estimators will also be set to this value if it exists. Please ensure that the argument input itself has the n_clusters parameters and is not a default such as None. This is likely to be the case for parameters such as estimator or clusterer in pipelines and deep learners.

clusterer_namestr or None, default=None

Name of clusterer used in writing results. If None, the name is taken from the clusterer.

dataset_namestr, default=”N/A”

Name of dataset.

resample_idint or None, default=None

Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.

data_transformstransformer, list of transformers or None, default=None

Transformer(s) to apply to the data before running the experiment. If a list, the transformers are applied in order. If None, no transformation is applied. Calls fit_transform on the training data and transform on the test data.

build_test_filebool, default=False:

Whether to generate test files or not. If True, X_test and y_test must be provided.

build_train_filebool, default=True

Whether to generate train files or not. The clusterer is fit using train data regardless of input.

benchmark_timebool, default=True

Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.