load_and_run_clustering_experiment¶
- tsml_eval.experiments.load_and_run_clustering_experiment(problem_path, results_path, dataset, clusterer, n_clusters=None, clusterer_name=None, resample_id=0, data_transforms=None, build_test_file=False, write_attributes=False, att_max_shape=0, benchmark_time=True, overwrite=False, predefined_resample=False, combine_train_test_split=False)[source]¶
Load a dataset and run a clustering experiment.
Function to load a dataset, run a basic clustering experiment for a <dataset>/<clusterer>/<resample> combination, and write the results to csv file(s) at a given location.
- Parameters:
- problem_pathstr
Location of problem files, full path.
- results_pathstr
Location of where to write results. Any required directories will be created.
- datasetstr
Name of problem. Files must be <problem_path>/<dataset>/<dataset>+”_TRAIN.ts”, same for “_TEST.ts”.
- clustererBaseClusterer
Clusterer to be used in the experiment.
- n_clustersint or None, default=None
Number of clusters to use if the clusterer has an n_clusters parameter. If None, the clusterers default is used. If -1, the number of classes in the dataset is used.
The n_clusters parameter for attributes which are estimators will also be set to this value if it exists.
- clusterer_namestr or None, default=None
Name of clusterer used in writing results. If None, the name is taken from the clusterer.
- resample_idint, default=0
Seed for resampling. If set to 0, the default train/test split from file is used. Also used in output file name.
- data_transformstransformer, list of transformers or None, default=None
Transformer(s) to apply to the data before running the experiment. If a list, the transformers are applied in order. If None, no transformation is applied. Calls fit_transform on the training data and transform on the test data.
- build_test_filebool, default=False
Whether to generate test files or not. If true, the clusterer will assign clusters to the loaded test data.
- benchmark_timebool, default=True
Whether to benchmark the hardware used with a simple function and write the results. This will typically take ~2 seconds, but is hardware dependent.
- overwritebool, default=False
If set to False, this will only build results if there is not a result file already present. If True, it will overwrite anything already there.
- predefined_resamplebool, default=False
Read a predefined resample from file instead of performing a resample. If True the file format must include the resample_id at the end of the dataset name i.e. <problem_path>/<dataset>/<dataset>+<resample_id>+”_TRAIN.ts”.
- combine_train_test_split: bool, default=False
Whether the train/test split should be combined. If True then the train/test split is combined into a single train set. If False then the train/test split is used as normal.