The first try of HAAR training was based on the tutorial mentioned in the previous post (see Collection of own hand gesture samples – the call for pictures, training method, tool and results). The aim was to go through all steps of HAAR training and see:
1) How positive and negative samples should be captured and setup for training?
2) How large should the scale of sample be? This is, what exactly are these differences between 1000 samples’ training and 9000 samples’ training in terms of accuracy and effectiveness.
Full details of the trial training can be found in the previous post A failed example of hand gesture recognition using openCV haartraining classifiers.
As mentioned in the old post, the size of samples were crucial to the training. There is a size for best training results, which is around 20X20. Bigger sizes of course are helpful to improve the detection accuracy but the memory of training machine can easily running out.
As described in the failed example of HAAR training, the sample pictures should not contain too much background information. For example, if detecting a open palm the backgrounds between fingers were also captured and trained as a part of palm, which led to very low detection accuracy. Rather, the sample pictures should be captured in a manner of such for face detection – with more important features but less backgrounds.
Although the failed example had nearly 10,000 samples its results were still poor. This does not require the increase of training samples. In another run of the failed example of training, after training closed palm gestures, 800 positive samples gave a result with approximately 70% accuracy, and the robustness seemed to be possibly improved through using more samples in different scenarios.