Improving Computer Vision Accuracy using Convolutions- [Notes]
For a sample DNN, there are three kinds of layers shown, an input layer, output layer and hidden layer.
>> input layer in the shape of data
>> output layer in the shape of the desired output
Can we have a better performance ?
Yes, via ‘Convolution’ which is about narrowing down the content of the image to focus on specific details.
In short, you take an array (usually 3x3 or 5x5) and pass it over the image.
Next, try to improve your epochs. How is your accuracy now? If it improves you are in the right step. Your validation result may have decreased. This may be due to overfitting. We call overfitting as the data is memorized by the network, not learnt. Which means the network can not achieve generalization to be able to distinguish the stuff that it has never seen before.
While forming your model , some tricks from deeplearning.ai
- The number of convolutions are purely arbitrary, but better to have something in the order of 32
- The size of the Convolution, choose the filter size, ex : 3x3 grid,5x5 grid
- The activation function to use — Ex: relu, in which you might recall is the equivalent of returning x when x>0, else returning 0
- In the first layer, the shape of the input data to use less resources [less computation]
MaxPooling is about subpressing some level of details. It is designed to compress the image, while keeping the highlighted features of conv. For a 8 by 8 matrix, for each 4x4 elements we choose one cell [with highest value] to represent this neighbourhoods.
The effect is to quarter the size, the image.
Its time to play around..
Thanks deeplearning.ai for providing nice resources.