Hi Tomasz,
by default, sdsvc splits the provided data and uses 75% for training and 25% for the validation set. So, you can use as little as 4 samples per class:
>> b
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)
>> c=randsubset(b,4)
'Fruit set' 8 by 2 sddata, 2 classes: 'apple'(4) 'banana'(4)
>> p=sdsvc(c)
....................sigma=1.65694 C=2.98 err=0.000 SVs=5
sequential pipeline 2x1 'standardization+Support Vector Machine (RBF)'
1 standardization 2x2 (sdp_affine)
2 Support Vector Machine (RBF) 2x1 (sdp_svc)
but not less:
>> c=randsubset(b,3)
'Fruit set' 6 by 2 sddata, 2 classes: 'apple'(3) 'banana'(3)
>> p=sdsvc(c)
{??? Index exceeds matrix dimensions.
You may, of course use the training set also for error estimation (no splitting is done inside). But then you fit nicely training data and may not generalize well to unseen data. However, if this data is all what you have you may probably not estimate generalization performance well anyway (two samples per class is really little :-)
>> c=randsubset(b,1)
'Fruit set' 2 by 2 sddata, 2 classes: 'apple'(1) 'banana'(1)
>> p=sdsvc(c,'test',c)
....................sigma=0.21053 C=0.001 err=0.000 SVs=2
sequential pipeline 2x1 'standardization+Support Vector Machine (RBF)'
1 standardization 2x2 (sdp_affine)
2 Support Vector Machine (RBF) 2x1 (sdp_svc)
I’d suggest to also test some very simple classifier such as nearest mean (sdnmean) to see if your SVM is not overtrained.
Check if you really need to handle the small classes together with others - maybe they are separable. If so, you could extract them with a simple classifier and build multi-class SVM only for the rest of the problem. You can put together a cascaded classifier with sdcascade.
Hope it helps,
Pavel