tsml-eval Data Format¶
tsml-eval
primarily uses numpy
arrays as the datatype of choice when running experiments. The type of numpy
array used however, will depend on the dataset characteristics (i.e. whether it is equal or unequal length) and the learning task.
Classification, clustering and regression use collection of time series input. Forecasting uses single time series input.
Time Series Collections¶
There are two types of collection datatypes used in tsml-eval
:
A 3D
numpy
array of shape(n_samples, n_channels, n_timestamps)
for equal length time series.A list of 2D
numpy
arrays of shape(n_channels, n_timestamps)
for unequal length time series.
These are both designed to accommodate multivariate time series, where n_channels
is the number of variables in the time series. For univariate time series, n_channels
is 1.
Below is an example for these formats.
[1]:
from tsml.datasets import load_minimal_chinatown
X, y = load_minimal_chinatown()
print("Shape:", X.shape)
print("Type:", type(X))
print(X[:5])
Shape: (40, 1, 24)
Type: <class 'numpy.ndarray'>
[[[ 573. 375. 301. 212. 55. 34. 25. 33. 113. 143. 303.
615. 1226. 1281. 1221. 1081. 866. 1096. 1039. 975. 746. 581.
409. 182.]]
[[ 394. 264. 140. 144. 104. 28. 28. 25. 70. 153. 401.
649. 1216. 1399. 1249. 1240. 1109. 1137. 1290. 1137. 791. 638.
597. 316.]]
[[ 603. 348. 176. 177. 47. 30. 40. 42. 101. 180. 401.
777. 1344. 1573. 1408. 1243. 1141. 1178. 1256. 1114. 814. 635.
304. 168.]]
[[ 428. 309. 199. 117. 82. 43. 24. 64. 152. 183. 408.
797. 1288. 1491. 1523. 1460. 1365. 1520. 1700. 1797. 1596. 1139.
910. 640.]]
[[ 372. 310. 203. 133. 65. 39. 27. 36. 107. 139. 329.
651. 990. 1027. 1041. 971. 1104. 844. 1023. 1019. 862. 643.
591. 452.]]]
The labels for each time series are stored in a 1D numpy
array.
[2]:
print("Shape:", y.shape)
print("Type:", type(y))
print(y)
Shape: (40,)
Type: <class 'numpy.ndarray'>
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
numpy
arrays do not support unequal length time series, so a list of 2D numpy
arrays is used instead.
[3]:
from tsml.datasets import load_minimal_japanese_vowels
X, _ = load_minimal_japanese_vowels()
print("Size:", len(X))
print("Type:", type(X))
print("Case 1 shape:", X[0].shape)
print("Case 2 shape:", X[1].shape)
print("Case 1 type:", type(X[0]))
Size: 40
Type: <class 'list'>
Case 1 shape: (12, 20)
Case 2 shape: (12, 26)
Case 1 type: <class 'numpy.ndarray'>
Collection datatypes can be loaded from files in the aeon
.ts format using the tsml
loader below.
[4]:
from tsml.datasets import load_from_ts_file
X, y = load_from_ts_file(
"../tsml_eval/datasets/MinimalChinatown/MinimalChinatown_TRAIN.ts"
)
X.shape
[4]:
(20, 1, 24)
Single Time Series¶
Functionality for single series tasks in tsml-eval
is currently limited. Using current functions, the best datatype to use is a 1D numpy
array.
[5]:
import pandas as pd
X = pd.read_csv(
"../tsml_eval/datasets/ShampooSales/ShampooSales_TRAIN.csv",
index_col=0,
).squeeze("columns")
X = X.astype(float).to_numpy()
print("Shape:", X.shape)
print("Type:", type(X))
print(X)
Shape: (24,)
Type: <class 'numpy.ndarray'>
[266. 145.9 183.1 119.3 180.3 168.5 231.8 224.5 192.8 122.9 336.5 185.9
194.3 149.5 210.1 273.3 191.4 287. 226. 303.6 289.9 421.6 264.5 342.3]
Generated using nbsphinx. The Jupyter notebook can be found here.