I'm trying to run the following Colab project, but when I want to split the training data into validation and train parts I get this error:
KeyError: "Invalid split train[:70%]. Available splits are: ['train']"I use the following code:
(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:]'],with_info=True,as_supervised=True,)How I can fix this error?
Best Answer
According to the Tensorflow Dataset docs the approach you presented is now supported. Splitting is possible by passing split parameter to tfds.load like so split="test[:70%]".
(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:]'],with_info=True,as_supervised=True,)With the above code the training_set has 2569 entries, while validation_set has 1101.
Thank you Saman for the comment on API deprecation:
In previous Tensorflow version it was possible to use tfds.Split API which is now deprecated:
(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=[tfds.Split.TRAIN.subsplit(tfds.percent[:70]),tfds.Split.TRAIN.subsplit(tfds.percent[70:])],with_info=True,as_supervised=True,)If you need to allocate training, validation, and test subsets (70%, 15%, 15%), this is the code (got it from here)
(training_set, validation_set, test_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:85%]', 'train[85%:]',with_info=True,as_supervised=True,)