run_clustering_experiment¶
- tsml_eval.experiments.run_clustering_experiment(X_train: ndarray | list, y_train: ndarray, clusterer, results_path, X_test: ndarray | list | None = None, y_test: ndarray | None = None, n_clusters=None, clusterer_name=None, dataset_name='N/A', resample_id=None, data_transforms=None, build_test_file=False, build_train_file=True, attribute_file_path=None, att_max_shape=0, benchmark_time=True)[source]¶
Run a clustering experiment and save the results to file.
Function to run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination and write the results to csv file(s) at a given location.
- Parameters:
- X_trainnp.ndarray or list of np.ndarray
The data to train the classifier. Numpy array or list of numpy arrays in the
aeondata format.- y_trainnp.array
Training data class labels. One label per case in the training data using the same ordering.
- clustererBaseClusterer
Clusterer to be used in the experiment.
- results_pathstr
Location of where to write results. Any required directories will be created.
- X_testnp.ndarray or list of np.ndarray
The data used to test the trained classifier. Numpy array or list of numpy arrays in the
aeondata format.- y_testnp.array
Testing data class labels. One label per case in the testing data using the same ordering.
- n_clustersint or None, default=None
Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.
The n_clusters parameter for arguments which are estimators will also be set to this value if it exists. Please ensure that the argument input itself has the n_clusters parameters and is not a default such as None. This is likely to be the case for parameters such as estimator or clusterer in pipelines and deep learners.
- clusterer_namestr or None, default=None
Name of clusterer used in writing results. If None, the name is taken from the clusterer.
- dataset_namestr, default=”N/A”
Name of dataset.
- resample_idint or None, default=None
Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.
- data_transformstransformer, list of transformers or None, default=None
Transformer(s) to apply to the data before running the experiment. If a list, the transformers are applied in order. If None, no transformation is applied. Calls fit_transform on the training data and transform on the test data.
- build_test_filebool, default=False:
Whether to generate test files or not. If True, X_test and y_test must be provided.
- build_train_filebool, default=True
Whether to generate train files or not. The clusterer is fit using train data regardless of input.
- benchmark_timebool, default=True
Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.