Using More Sophisticated Images on Tensorflow — Part 1
Smart options 1 : Image Generators
Image generator reads images from subdirectories, and automatically labels them from the name of that subdirectory.
Smart options 2: High Number of Samples
If we have a relatively small number of training examples (2000), overfitting should be our number one concern. Overfitting happens when a model exposed to too few examples in such way the model starts using irrelevant features for making predictions. The high variance in training and validation result is a sign for this. Your training set’s accuracy may rise but after a few epochs, if the validation set levels out, this is a sign that we overfit and did not benefit from training fully. High interval on early phases of training may be another sign, as an example accuracy rises %80 right after 2 epochs. There is no point training after 2 epochs since we overfit the training data already.
Additionally the validation accuracy result is a better indicator of model performance because the validation accuracy is based on the images that the model has never seen before.
Overfitting is the central problem in machine learning. To avoid, it is better to include good number of samples with different colors, sizes and positions and so on. Using more data helps, but there are other technics that are applicable to small datasets.
Smart Options 3 : Data Augmentation
It is about generating new data by skew, rotate, crop, resize, rotate, shifting, shearing, zooming, flipping, and other methods. Tensorflow has some built in functions that can help on this. Image Generator is a good instrument to label data by the folder names. Above it, right after loading images on memory to process, tensorflow streams images for data augmentation. In this way you can try out different kinds of augmentation and there is no way to loose your data since you are not overriding it.
Image augmentation is a simple but powerful method to prevent overfit on your training data.