load_and_run_clustering_experiment¶

tsml_eval.experiments.load_and_run_clustering_experiment(problem_path, results_path, dataset, clusterer, n_clusters=None, clusterer_name=None, resample_id=0, data_transforms=None, build_test_file=False, write_attributes=False, att_max_shape=0, benchmark_time=True, overwrite=False, predefined_resample=False, combine_train_test_split=False)[source]¶

Load a dataset and run a clustering experiment.

Function to load a dataset, run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination, and write the results to csv file(s) at a given location.

Parameters:

problem_pathstr

Location of problem files, full path.

results_pathstr

Location of where to write results. Any required directories will be created.

datasetstr

Name of problem. Files must be <problem_path>/<dataset>/<dataset>+”_TRAIN.ts”, same for “_TEST.ts”.

clustererBaseClusterer

Clusterer to be used in the experiment.

n_clustersint or None, default=None

Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.

The n_clusters parameter for attributes which are estimators will also be set to this value if it exists.

clusterer_namestr or None, default=None

Name of clusterer used in writing results. If None, the name is taken from the clusterer.

resample_idint, default=0

Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.

data_transformstransformer, list of transformers or None, default=None

Transformer(s) to apply to the data before running the experiment. If a list, the transformers are applied in order. If None, no transformation is applied. Calls fit_transform on the training data and transform on the test data.

build_test_filebool, default=False

Whether to generate test files or not. If true, the clusterer will assign clusters to the loaded test data.

benchmark_timebool, default=True

Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.

overwritebool, default=False

If set to False, this will only build results if there is not a result file already present. If True, it will overwrite anything already there.

predefined_resamplebool, default=False

Read a predefined resample from file instead of performing a resample. If True the file format must include the resample_id at the end of the dataset name i.e. <problem_path>/<dataset>/<dataset>+<resample_id>+”_TRAIN.ts”.

combine_train_test_split: bool, default=False

Whether the train/test split should be combined. If True then the train/test split is combined into a single train set. If False then the train/test split is used as normal.