run_clustering_experiment

tsml_eval.experiments.run_clustering_experiment(X_train, y_train, clusterer, results_path, X_test=None, y_test=None, row_normalise=False, n_clusters=None, clusterer_name=None, dataset_name='N/A', resample_id=None, build_test_file=False, build_train_file=True, attribute_file_path=None, att_max_shape=0, benchmark_time=True)[source]

Run a clustering experiment and save the results to file.

Function to run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination and write the results to csv file(s) at a given location.

Parameters:
X_trainpd.DataFrame or np.array

The data to train the clusterer.

y_trainnp.array

Training data class labels (used for evaluation).

clustererBaseClusterer

Clusterer to be used in the experiment.

results_pathstr

Location of where to write results. Any required directories will be created.

X_testpd.DataFrame or np.array, default=None

The data used to test the fitted clusterer.

y_testnp.array, default=None

Testing data class labels.

row_normalisebool, default=False

Whether to normalise the data rows (time series) prior to fitting and predicting.

n_clustersint or None, default=None

Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.

This may not work as intended for pipelines currently.

clusterer_namestr or None, default=None

Name of clusterer used in writing results. If None, the name is taken from the clusterer.

dataset_namestr, default=”N/A”

Name of dataset.

resample_idint or None, default=None

Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.

build_test_filebool, default=False:

Whether to generate test files or not. If True, X_test and y_test must be provided.

build_train_filebool, default=True

Whether to generate train files or not. The clusterer is fit using train data regardless of input.

benchmark_timebool, default=True

Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.