300x250 AD TOP

Search This Blog

Paling Dilihat

Powered by Blogger.

Saturday, December 16, 2017

Machine Learning Automatic License Plate Recognition


I'm starting to study deep learning, mostly for fun and curiosity but following tutorials and reading articles is only a first step.



Though I know and programmed multiple languages in the past, somehow deep learning is associated with Python and as someone who likes C like languages it was always a dislike for me, the whole concept of using spaces to control program blocks looked ridiculous to me, but what the hell, lets try to learn it, it makes things a lot less complicated than compiling Tensorflow, Caffe or OpenCV from source and then trying to get them to talk to each-other, where in python these issues have already been solved.

Learning neural networks have been on my mind for quite a while, I've even read a few neurology books to understand the origins of these ideas but only when I've attended GTC Israel 2017 and had the chance for hand-on guided Nvidia DIGITS session I've started to take active interest, though not really achieve anything new for a while.

10 points if you can locate me in this clip



So I thought about a cool project, though I'm not sure what its usefulness is going to be now, so how about recognizing and registering all the vehicle license plates around one's car?

Algorithmic Approach

At first I've tried OpenALPR's approach, finding a large rectangle with multiple rectangles inside it. it works if the license plate is a major object in the image, but not if there are multiple vehicles, not to mention an unstructured scene, like driving on the highway or a mobile camera of some sort, though I might have not implemented it correctly in my code.

Image Segmentation

So the 2nd approach I've thought about is image segmentation. I've been reading a lot about ENet, SegNet and ICNet lately and was eager to try it. and so I've began to look for a Keras model to get things started. But then I realized, I don't necessarily need the localized polygon of the license plate, a bounding box should be more than enough. then I can pass the cropped image to tesseract and get a license plate.

Object Detection


So I've looked up a few object detection models, such as SSD, YOLO, Faster RCNN, R-FCN, RetinaNET and more are being designed as we speak. I've decided to go with YOLO, being biased to it after seeing a demo I liked.

But to train any kind of machine learning model, you need data and lots of it. I've started to look for a license plate dataset but couldn't find anything that has both the images and the polygons... but then I remembered I've seen that in the Cityscapes Dataset there is an unmarked license plate class, so theoretically all I needed to do was generate the right mask/polygons for the training.

I've cloned the basic-yolo-keras repo by Huynh Ngoc Anh, updated it to work with python 3 and ran a training session on the dataset.

Having a laptop, its a bit of a problem to train on it, since its not always on, I need to take it with me etc' etc'. so I've looked for an online solution. eventually I ended up using Azure NC6 machine at $0.90/hour, it has Nvidia K80 with 12GB of RAM so I could increase the batch size to make things run a bit faster, eventually training took less than 24 hours on a ~2400 images, some with more than one sample.



(on my 1050TI, this video was created at about 9fps)

As you can see the license plate should be readable, otherwise it doesn't really detect it, I didn't plan this, so I'm guessing YOLO training is really good or its a side effect of using the Cityscapes Dataset quality.

OCR

My next task was OCRing the license plates so I can get data I can list and log, I've had some experience with tesseract in the past, so I chose to try it this time as well.

Well.. this didn't go as smooth as I wanted... while many license plates are readable by a human, the noise is just too high for tesseract to recognize reliably.

The following video was shot with 4K camera, high shutter speed and high bitrate (SJCAM M20), but the recognition quality has marginally increased.



(creating this video was even slower, the GPU didn't work as hard, but tesseract did a lot of work (CPU), about 1.5fps)

I've had my fun with this project, but I think the next step could be another deep learning  object detection, only this time it should be the license plate numbers in case of Israel in addition, letters - for many others.

If I may guess further, the reason this project was not a complete success is the OCR process, the camera is an action camera, so very wide lens, that means very low resolution for each license plate.


I'm pretty sure further pre-processing effort might raise tesseract's recognition quality, they do look readable. I did discover that Israeli license plates are just too tall for tesseract's english detection, which is somewhat amusing.
If pre-processing doesn't work as desired, this little project has taught me that machine learning can probably do this task as well and probably with high precision.

Source Code

I'm still not ready to publish any python code, I will need to familiarize with more of it before being ready to do so.
In any case, there is nothing new there, the code for building the CityScapes dataset extract is basically just parsing the JSON files and producing VOC format XML, the YOLO code is the code from basic-yolo-keras with some adjustments. and lastly the cleanup code for the license plate is just a simple auto-levels like code on the V channel in HSV.

Conclusion

This was a fun project, I'm sure that with further research it can be a pretty cool and reliable software, using YOLO for license plate detection seemed to work pretty good, perhaps cleaning up the dataset and further optimizing the training and inference processes will make it even better, perhaps using a machine learning based number/letter recognition will make reading the plates more reliable. perhaps it can all be coded with an algorithm rather than a model.... maybe the next thing should be recognizing car maker and color?....


Further Reading:

Speed/accuracy trade-offs for modern convolutional object detectors by Felix Lau
Cityscapes Dataset
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
SSD: Single Shot MultiBox Detector
You Only Look Once: Unified, Real-Time Object Detection
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R-FCN: Object Detection via Region-based Fully Convolutional Networks
RetinaNET - Focal Loss for Dense Object Detection

Tags:

2 comments:

  1. Awesome work! I went to the Cityscape dataset website but couldn't find the license plate class that you mentioned. If you could please share a more direct link to the license plate data you mentioned, I would greatly appreciate it.

    ReplyDelete
    Replies
    1. Check out the git repo for the group: https://github.com/mcordts/cityscapesScripts

      It contains a bunch of scripts to manipulate the dataset. Particularly, in the helpers folder, the labels.py script shows that there is a Label instance with the following default properties:

      Name: 'license plate'
      Id: -1
      TrainId: -1
      Category: 'vehicle'
      CatId: 7
      HasInstances: False
      IgnoreInEval: True
      Color: (0, 0, 142)


      I'm still downloading the data so I don't know if this is sufficient or even what the author did.

      Delete