See what techniques or methods I could use to achieve similar performance to a CNN for image detection, with improvements in parameters, runtime, or both.
No luck so far…
After all these failures, I realized most of my methods were focused purely on trying to find a statistical measure with clear delineation between the classes, without trying to do any reduction in the dimension of the images or further cleanup. Once I started to focus more on taking subsamples or downsampling the image before looking at statistical characteristics, I started to get closer to progress. Below is a simple example where I took a skewed downsample and then looked at the Sobel operator again of specific slices. Because I again felt the key was in focusing on the additional vertical black line from an open freezer, my ‘skewed downsample’ was merely average pixels over rectangles rather than squares at a 4:1 ratio. This yielded a clustering scheme that looked promising, until I hit something big…
The ‘DUH’ Model
After many attempts, I finally had it! A model that produced a 5-fold validated accuracy of 92.75% using just 32 features derived from a (120,30) grayscale image! I decided to name it the Down-sampled Uncertainty Hypothesis model. The assumption of this model is that when we look at the feature set from our subset of the image and downsample it, the entropy or uncertainty in the first two dimensions of our space should be minimal if the fridge is closed. Thus by simply checking that the norm of our vector is less than a specific threshold (.06 provided optimal in training) then we can assume the fridge is indeed closed. I was quite surprised that for this model, the precision was better than the recall, which is clear from the graphic of the lower dimensional space, yet opposite of the CNN.
I will add that as my roommate pointed out, my model has nothing compared to two strips of aluminum foil and a buzzer. And as a final parting thought, I will say that despite the CNN technically being less ‘efficient’ you could not beat the simplicity and ease of use in its ability to fit to the data. While the DUH method may be slightly faster, this project reaffirmed to me that Neural Nets truly are “universal function approximators.”
- Vanilla CNN
- DUH Method
- Two strips of aluminum foil + alarm
- (120, 30)
- 2 pieces of aluminum foil
- Calculation Time
- 3.8e-4 sec/img
- 5.6e-5 sec/img
Bonus Clip – Dimension Reduction
To find the optimal partition size at which to sub-sample my image, I simply did a brute force search to see where the model performed best. Clearly it would be a positively correlated relationship, with finer partitions leading to better performance, but I wanted to understand where the trade off between parameters and performance lay. Below is the video of me slowly refining my partition until I got a clear distinction between the two groups of ‘open’ and ‘closed’ images. I personally just think it looks really cool, and enjoy seeing how in the animation the signs in the first two eigenvectors swap a few times