Follow the instructions below:
Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.
The dataset is composed of dog and cat photos provided as a subset of photos from a much larger collection of 3 million manually annotated photos. This data was obtained through a collaboration between Petfinder.com and Microsoft.
The data set was originally used as a CAPTCHA, i.e., a task that a human is believed to find trivial, but that a machine cannot solve, which is used on websites to distinguish between human users and bots. The task was named "Asirra". When "Asirra" was introduced, it was mentioned "that user studies indicate that humans can solve it 99.6% of the time in less than 30 seconds." Barring a breakthrough in computer vision, we expect that computers will have no more than a 1/54,000 chance of solving it.
At the time the competition was published, the state-of-the-art result was achieved with an SVM and was described in a 2007 paper with the title "Machine Learning Attacks against Asirra's CAPTCHA" (PDF) that achieved 80% classification accuracy. It was this paper that showed that the task was no longer a suitable task for a CAPTCHA shortly after the task was proposed.
The dataset is located in Kaggle and you will need to access it to download it. You can find the competition here (or by copying and pasting the following link in your browser: https://www.kaggle.com/c/dogs-vs-cats/data
)
Download the dataset folder and unzip the files. You will now have a folder called train
containing 25,000 image files (.jpg format) of dogs and cats. The pictures are labeled by their file name, with the word dog
or cat
.
The first step when faced with a picture classification problem is to get as much information as possible through the pictures. Therefore, load and print the first nine pictures of dogs in a single figure. Repeat the same for cats. You can see that the pictures are in color and have different shapes and sizes.
This variety of sizes and formats must be sorted out before training the model. Make sure they all have a fixed size of 200x200 pixels.
As you can see, there are a lot of images. Make sure you stick to the following rules:
ImageDataGenerator
class and the flow_from_directory()
function. This will be slower to run but it will run on less capable hardware. This function prefers the data to be split into separate train and test directories, and under each directory to have a subdirectory for each class.Once you have all the images processed, create an ImageDataGenerator
object for training and test data. Then pass the folder that has training data to the trdata
object and, similarly, pass the folder that has test data to the tsdata
object. In this way, the images will be automatically labeled, and everything will be ready to enter the network.
Any classifier that fits this problem will have to be robust because some images show the cat or dog in a corner, or perhaps 2 cats or dogs in the same picture. If you have been able to research some of the winner implementations of other competitions also related to images, you will see that VGG16
is a CNN architecture used to win the Kaggle ILSVR (Imagenet) competition in 2014. It is considered one of the best performing vision model architectures to date.
It uses the following test architecture:
1model = Sequential() 2model.add(Conv2D(input_shape = (224,224,3), filters = 64, kernel_size = (3,3), padding = "same", activation = "relu")) 3model.add(Conv2D(filters = 64,kernel_size = (3,3),padding = "same", activation = "relu")) 4model.add(MaxPool2D(pool_size = (2,2),strides = (2,2))) 5model.add(Conv2D(filters = 128, kernel_size = (3,3), padding = "same", activation = "relu")) 6model.add(Conv2D(filters = 128, kernel_size = (3,3), padding = "same", activation = "relu")) 7model.add(MaxPool2D(pool_size = (2,2),strides = (2,2))) 8model.add(Conv2D(filters = 256, kernel_size = (3,3), padding = "same", activation = "relu")) 9model.add(Conv2D(filters = 256, kernel_size = (3,3), padding = "same", activation = "relu")) 10model.add(Conv2D(filters = 256, kernel_size = (3,3), padding = "same", activation = "relu")) 11model.add(MaxPool2D(pool_size = (2,2),strides = (2,2))) 12model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 13model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 14model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 15model.add(MaxPool2D(pool_size = (2,2),strides = (2,2))) 16model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 17model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 18model.add(Conv2D(filters = 512, kernel_size = (3,3), padding = "same", activation = "relu")) 19model.add(MaxPool2D(pool_size = (2,2),strides = (2,2))) 20model.add(Flatten()) 21model.add(Dense(units = 4096,activation = "relu")) 22model.add(Dense(units = 4096,activation = "relu")) 23model.add(Dense(units = 2, activation = "softmax"))
The above code applies convolutions to the data (Conv2D
and MaxPool2D
layers) and then applies dense layers (Dense
layers) for processing the numerical values obtained after the convolutions.
Then add the remaining elements to form the model, train it and measure its performance.
Import the ModelCheckpoint
and EarlyStopping
method from Keras. Create an object of both and pass them as callback functions to fit_generator
.
Load the best model from the above and use the test set to make predictions.
Store the model in the corresponding folder.
Note: We also incorporated the solution samples on
./solution.ipynb
that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.