run_clustering_experiment¶
- tsml_eval.experiments.run_clustering_experiment(X_train, y_train, clusterer, results_path, X_test=None, y_test=None, row_normalise=False, n_clusters=None, clusterer_name=None, dataset_name='N/A', resample_id=None, build_test_file=False, build_train_file=True, attribute_file_path=None, att_max_shape=0, benchmark_time=True)[source]¶
Run a clustering experiment and save the results to file.
Function to run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination and write the results to csv file(s) at a given location.
- Parameters:
- X_trainpd.DataFrame or np.array
The data to train the clusterer.
- y_trainnp.array
Training data class labels (used for evaluation).
- clustererBaseClusterer
Clusterer to be used in the experiment.
- results_pathstr
Location of where to write results. Any required directories will be created.
- X_testpd.DataFrame or np.array, default=None
The data used to test the fitted clusterer.
- y_testnp.array, default=None
Testing data class labels.
- row_normalisebool, default=False
Whether to normalise the data rows (time series) prior to fitting and predicting.
- n_clustersint or None, default=None
Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.
This may not work as intended for pipelines currently.
- clusterer_namestr or None, default=None
Name of clusterer used in writing results. If None, the name is taken from the clusterer.
- dataset_namestr, default=”N/A”
Name of dataset.
- resample_idint or None, default=None
Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.
- build_test_filebool, default=False:
Whether to generate test files or not. If True, X_test and y_test must be provided.
- build_train_filebool, default=True
Whether to generate train files or not. The clusterer is fit using train data regardless of input.
- benchmark_timebool, default=True
Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.