Hi everyone, and thanks for reading.
I am working in an automatic classifier where the classification problem itself has these characteristics:
- Several events (feature vectors) from subjects for training, and several for testing. The amount could be very different from subject to subject.
- 3 classes, very unbalanced. Lets say in term of the class representation 10-2-1
I am working with linear and quadratic classifiers for now. I dealed with problems designing the first classifier specially during the training to overcome the problem of the most represented classes to bias the trainig in its favor, and I am quite happy with the results.
The doubt I have now is regarding the most correct way of evaluating the classifier performance, since the problem is unbalanced in the subject presence, the classes present by subject, and the class presence in the whole test dataset. It seems to me that doing a simple evaluation of the trained classifier in the test dataset, will result in a biased confusion matrix (CM), at least because of the classe unbalance. Maybe the class sensitivity will not be unbalanced, but probably the class positive predictivity. By the moment, I tried some strategies of simulating a balancing of the classes in the CM (all rows sums the same in the CM), and then calculating the performance. Or groping by subjects, calculating CMs by subject and averaging the performance estimates. Then I have three persepctives of the classifier performace, but I guess this could be done more correctly, am I right ?
Thanks in advance for any comment, or just for reading :)
Mariano.

