This is part 1 of a series of posts on my experience trying to build a object detection setup for my home security cameras. Here I'm going over some preliminary results and talking about the history to this point. I plan to graph my results and look at what I have in future posts.
Deep Learning is all the rage these days. I get it, the idea of letting a computer extract out features and find things is amazing. I love it. In fact while, working on my Masters at Georgia Tech I took any Machine Learning class I could get my hands on and the Reinforcement Learning course that was also offered. They were AWESOME and they really opened my eyes to how a lot of this stuff worked.
A few months back I started with using darknet and the tiny YOLO approach. I setup an RTSP server on my raspberry pi in python to pull images and wanted to see how it did, I experimented on a few images, but to my surprise, (although in retrospect, unsurprisingly) it failed.... pretty bad. The first attempt it actually saw the car on the side of my yard, which was utterly amazing. I ran it again mere minutes later and it never saw the car again... after multiple attempts. Of course this was all running quite slow, if I recall on the order of 60 seconds per image. I decided that as long as there was some kind of "motion" that zoneminder detected then I could have it process on that. I was slightly less concerned about realtime than I was about just simply notifying me at some point in the future if something odd was going on.
I decided that over the holiday break I would try to work on this detector more. I started downloading all my camera data (which is really not all that old), and it turned out I had about 65GB of image data from motion captures using zoneminder. First off, sharing that is difficult if I wanted to with anyone to work on this, so I am working on pairing it down, it is at least broken down by camera, so I should be able to divy it up that way.
I decided to try out using the image recognition to even detect anything out of these images as a first pass. I got everything setup and used https://www.tensorflow.org/tutorials/images/image_recognition
I modified it to run through an entire folder, it then spits out 3 different files. First is a mapping of image -> human string, score and index. The index is used for looking up the human string in the mapping. The second file is a listing of the index with all images which had that index (didn't matter the score as long as it was in the top 5). Finally, I output the index to human string mapping, so I could easily look things up. The files are named image_filtering_analysis, image_filter_mapping, and image_filtering_by_class. I opened up the image_filtering_by_class and looked up the first thing I saw, which was 160. This translated to "wire-haired fox terrier". This was a view of my driveway and I thought....well I mean I suppose that might be possible.
wire-haired fox terrier
Ok so first one is not very awesome. Though to be really fair, I looked up the confidence score and that one got 0.016631886. Interestingly these were the top 5 scores.
submarine, pigboat, sub, U-boat, 0.27390754 patio, terrace, 0.064846955 steam locomotive, 0.040173832 fountain, 0.023076175 wire-haired fox terrier, 0.016631886
I looked up what some of the labels were in the dataset they used. I didn't see any just plain "car" labels. This is just odd to me. I will likely have to re-train on my data, but for now I need to figure out a way to find common features.
So of course improvements could be to require the scoring to at least reach some threshold, so suppose we set it to 50%, this would prevent seeing that particular failure. Searching through my data (which by the way hasn't finished running). It found another version of the same image above but called it a patio, terrace with over a 90% confidence. In fact the image with the highest confidence was also labeled a patio, terrace . Here it is.
patio,terrace
I can only speculate that the poor pixelated images on the left is well me, looking at it closely it sort of looks like a body. It's in the images for a few frames and it appears there are arms. I am pretty sure I was bringing a package in that day.
I wonder if someone could (maybe me ;)) build a form of a feature detector much like is used in CV, but instead of being trained to a specific task, it's generic. This might enable a way to group common features across images and make creating a supervised dataset easier.
I'll post more interesting ones as I come across them. I think at this point most of my image recognition's are at night and will eventually hit daytime which I think will show some even more humorous results.
For now, it's suffice to say. This detection is not doing well. I wanted to at least run this with the hopes that it would find cars or other things for building up a supervised set, however I don't think that's going to happen.