Image Recognition Tools
I’m always impressed with the advancement of machine learning, and, more recently, deep learning. However, since I am not an expert in the field I decided to let the researchers and scholars elaborate more on them.
In this post I will share the existing tools and the associated libraries to make them work, at least for me.
The reason I explored these tools is simple: I plan to deploy a poor man’s security camera in my home with some “sense” of intelligence. Since I am working at home, I want to know who is actually knocking my door. So I thought, what if I could use a web cam to monitor my door and let me know who’s actually standing at the door?
I searched around for existing face detection software and found this Python script using Haarcascade. So I was able to detect faces, but upon sharing the “findings” with a friend he said this only detects faces. How would the computer be able to recognize who’s who? Then I stumbled upon the phrase “face recognition”.
You might have noticed that if you use the image file that you import directly from your smartphone, the output will be displayed in a large file to the screen. You can use ImageMagick to resize the file to say, 640x480 pixels.
$ file makan.jpg makan.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=15, height=3120, bps=0, width=4160], baseline, precision 8, 4160x3120, frames 3 $ convert makan.jpg -resize 640x480 makan-small.jpg $ file makan-small.jpg makan-small.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=15, height=3120, bps=0, width=4160], baseline, precision 8, 640x480, frames 3
The computer doesn’t see the image directly as the humans seem to, so we need to convert the images into numerical values. For example, in the facial regcognition tools, the training file contains the following matrices:
opencv_lbphfaces: threshold: 1.7976931348623157e+308 radius: 1 neighbors: 8 grid_x: 8 grid_y: 8 histograms: - !!opencv-matrix rows: 1 cols: 16384 dt: f data: [ 2.46913582e-02, 1.85185187e-02, 0., 3.08641978e-03, 1.23456791e-02, 6.17283955e-03, 3.08641978e-03, 2.46913582e-02, 0., 0., 0., 0., 0., 3.08641978e-03, 0., 9.25925933e-03, 1.85185187e-02, 9.25925933e-03, 0., 0., 3.08641978e-03, 0., 0., 0., 3.08641978e-03, 0., 0., 0., 2.46913582e-02, 3.08641978e-03, 0., 6.79012388e-02, 0., 0., ................... 1.30385486e-02, 1.47392293e-02, 4.53514745e-03, 1.13378686e-03, 7.93650839e-03, 5.66893432e-04, 5.66893432e-04, 1.13378686e-03, 6.80272095e-03, 2.26757373e-03, 0., 0., 5.66893443e-03, 2.83446722e-03, 5.10204071e-03, 9.07029491e-03, 7.14285746e-02 ] labels: !!opencv-matrix rows: 26 cols: 1 dt: i data: [ 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 8, 8, 8, 8 ] labelsInfo: 
I continued my search for existing face recognition software and found several projects which could be tested right away, with some modifications from the original source. I found one tutorial which explained clearly how we could get the face recognition working from the web camera, in real time.
If the code provided in the video isn’t working directly, you could try my small patches, in which I corrected a typo and extended the filename extensions towards the source file from here.
Apart from that there is also a fork on GitHub which allows us to do the real-time face recognition. For now, however, some manual work needed to be done in order to add more datasets (images of faces) if you want to use the code right away.
I also searched for more related software which could possibly provide an alternative to the face recognition. I found quite an interesing piece of work for object detection by using Neural Networks. It runs on a framework called Darknet. It allows us to do post-processing object detection for still pictures and videos. It can also do real-time object recognition but requires a GPU to do it efficiently. I tried with the CPU-only mode but I could not get a real-time result (my computer almost crashed).
Still image samples
Vehicle Counting and Speed Measurement
I found a tool developed by Ahmet Ozlu which uses TensorFlow. The use case here is vechicle counting, vehicle type and color recoginition, and speed detection.
You can see the in following video how it works.
OpenCV is an open source library for computer vision, which comes together with libraries which we can use for our detection and recognition work.
In my understanding, the face detection will come first and the recognition second. In newer digital cameras and smartphones facial detection is quite common. Social media applications sometimes use facial recognition to suggest similar faces to be tagged in photo albums, or for photo album reorganization.
Tools based on or making use of OpenCV
Apart from the custom-written Python code which uses OpenCV and Numpy, I also found out there are several works which use TensorFlow together with neural networks, called YOLO (You Look Only Once). They are:
- darknet (written in C)
- darkflow (written with Python and seems to work as a wrapper for darknet) — You need to install different dependencies from darknet, for example Cython and TensorFlow. The good thing is that we could use this tool for a video post-processing, where instead of taking input directly from a webcam, we take it from existing videos. However, if you want to use the latest YOLO algorithm, then just stick to Darknet rather than using Darkflow. There is a fork on GitHub which could allow Darknet to save the output of the processed video into a file as well.
To rotate the video if it was taken from a smartphone but in a 180 degree position:
ffmpeg -i sourcefile.mp4 -vf "transpose=4" fileout.mp4
The transpose value depends on the nature of the rotation. If it’s 90 degrees, the transpose value should be 2. It also depends on whether the rotation is clockwise or counter-clockwise.
To convert the video to a slower framerate:
ffmpeg -i sourcefile.avi -r 8 fileout.mp4
For the Darkflow tool, the default output is in AVI format, but ffmpeg allows us to convert it to MP4 if you want.
ImageAI is a Python-based computer vision library which utilizes the use of TensorFlow, Keras, Matplotlib and several other dependencies which are commonly used for machine learning. In terms of usage, it is similar to darkflow.
The advancement of AI field contributes a lot of useful automation to life. It can range from helping detect tumors, helping search and rescue missions, reducing keystrokes with keyword predictions, to spam filtering. AI also accelerates the field of image processing and pattern recognition.
A lot of the hard work of smart people and scholars have produced many smart solutions to make people live a better life with the use of AI. As I have shown, some of these tools could achieve better detection given a good amount of samples to be trained on and the correct size of picture to be detected.
The tools above will work as-is, but may need some tweaking/editing if you want to customize it. For example, some of the code works with their own demos, so you may need to pass an argument such as
sys.argv inside the Python code if you want to process your own video.