I'm trying to run the following Colab project, but when I want to split the training data into validation and train parts I get this error:

KeyError: "Invalid split train[:70%]. Available splits are: ['train']"

I use the following code:

(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:]'],with_info=True,as_supervised=True,)

How I can fix this error?

2

Best Answer


According to the Tensorflow Dataset docs the approach you presented is now supported. Splitting is possible by passing split parameter to tfds.load like so split="test[:70%]".

(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:]'],with_info=True,as_supervised=True,)

With the above code the training_set has 2569 entries, while validation_set has 1101.

Thank you Saman for the comment on API deprecation:
In previous Tensorflow version it was possible to use tfds.Split API which is now deprecated:

(training_set, validation_set), dataset_info = tfds.load('tf_flowers',split=[tfds.Split.TRAIN.subsplit(tfds.percent[:70]),tfds.Split.TRAIN.subsplit(tfds.percent[70:])],with_info=True,as_supervised=True,)

If you need to allocate training, validation, and test subsets (70%, 15%, 15%), this is the code (got it from here)

(training_set, validation_set, test_set), dataset_info = tfds.load('tf_flowers',split=['train[:70%]', 'train[70%:85%]', 'train[85%:]',with_info=True,as_supervised=True,)