load_and_run_clustering_experiment¶

tsml_eval.experiments.load_and_run_clustering_experiment(problem_path, results_path, dataset, clusterer, row_normalise=False, n_clusters=None, clusterer_name=None, resample_id=0, build_test_file=False, write_attributes=False, att_max_shape=0, benchmark_time=True, overwrite=False, predefined_resample=False, combine_train_test_split=False)[source]¶

Load a dataset and run a clustering experiment.

Function to load a dataset, run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination, and write the results to csv file(s) at a given location.

Parameters:

problem_pathstr: Location of problem files, full path.
results_pathstr: Location of where to write results. Any required directories will be created.
datasetstr: Name of problem. Files must be <problem_path>/<dataset>/<dataset>+”_TRAIN.ts”, same for “_TEST.ts”.
clustererBaseClusterer: Clusterer to be used in the experiment.
row_normalisebool, default=False: Whether to normalise the data rows (time series) prior to fitting and predicting.
n_clustersint or None, default=None: Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.
clusterer_namestr or None, default=None: Name of clusterer used in writing results. If None, the name is taken from the clusterer.
resample_idint, default=0: Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.
build_test_filebool, default=False: Whether to generate test files or not. If true, the clusterer will assign clusters to the loaded test data.
benchmark_timebool, default=True: Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.
overwritebool, default=False: If set to False, this will only build results if there is not a result file already present. If True, it will overwrite anything already there.
predefined_resamplebool, default=False: Read a predefined resample from file instead of performing a resample. If True the file format must include the resample_id at the end of the dataset name i.e. <problem_path>/<dataset>/<dataset>+<resample_id>+”_TRAIN.ts”.
combine_train_test_split: bool, default=False: Whether the train/test split should be combined. If True then the train/test split is combined into a single train set. If False then the train/test split is used as normal.