300x250 AD TOP

Search This Blog


Paling Dilihat

Powered by Blogger.

Tuesday, June 12, 2018

Tiny Model for MNIST Dataset

The MNIST database is a database of handwritten digits, its used as an ideal beginner dataset for learning how to do simple image classification and as the dataset only contains 10 characters, its relatively easy to work with.

It has 60000 tranining examples and 10000 testing examples and it is sufficiently large. as it has 60000 images of 28 x 28 grayscale images, it takes about 50MB, so it does not add complexity for batch generation since it can all fit in memory.

Like I said, ideal.

I think the first lesson I did on image classification was with MNIST database as well. it was very amusing to see a 1.2 million parameters model for a 50MB dataset, so I've decided to see how low I can go.

Lets start with the big one.

1.2m parameters - LeNet

CNN Error: 0.80%
train_loss 0.0082
train_acc 0.9984
val_loss: 0.0552
The graph does look like its overfitting by a bit.

My first experiment was using huge convolutions, I've managed to train 99% on 300k parameters, but I was not satisfied, surely there is a better way.

This model at 0.992 accuracy, with only 36k parameters (!!)

CNN Error: 0.72%
train_loss 0.0224
train_acc 0.9928
val_loss 0.0255
val_acc 0.9928

36k only? well, that's huge, I've looked around and found out its possible with under 4k parameters, so I set up for the challenge and came up with this model.

CNN Error: 0.90%
train_loss: 0.0369
train_acc: 0.9880
val_loss: 0.0285
val_acc: 0.9910

model is under 4k parameters (3.8)

To summarise what I've learned from this exercise is that larger models will learn faster but also overfit faster, smaller models need more training to find a better fit.


I would like to say thank you to EliteDataScience.com for getting this little exercise started

My 36k model:

My 4k model: