Username Remember Me?
Password   forgot password?
   
   
Weighted SVM for unbalanced data
Posted: 30 August 2017 01:25 PM   [ Ignore ]  
Novice
Rank
Total Posts:  7
Joined  2016-11-04

Hi,

I am working on a classification problem using SVM. However, in my data I have 4 classes with unequal size. LibSVM provides the user with the option of using ‘weighted SVM’. I was wondering if such an option is also available in sdsvc.

Thank you in advance!

Regards,
Esther

Profile
 
 
Posted: 31 August 2017 07:15 AM   [ Ignore ]   [ # 1 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  371
Joined  2008-04-26

Hi Esther,

SVC class weighting is an interesting approach building in prior knowledge about class imbalance during the optimization.
It’s trying to address the same problem as ROC however inside the algorithm, not outside as ROC.

It is now supported in sdsvc (unreleased). Find attached sdsvc.p and sdsvc.m files.

With ‘w’ option, you may provide a vector of class weights in the same order as the list of the input data set. It is supported for two-class and one-against-one multi-class option for all RBF, poly and linear SVM.  Automatic meta-parameter selection is supported as well.

The weights multiply given C (they do not need to be in <0,1> range.

To me, it’s not really clear how to fix the weights to get a desired outcome. Generally, it’s good to lower weight on large classes and increase on important small ones.

>> b
1080 by 2 sddata
3 classes'A'(1000'B'(40'C'(40

>> 
p1=sdsvc(b,'sigma',1,'c',1,'one-against-one')
one-against-one[1 of 3'A' vs 'B' SVs=141]
[2 of 3
'A' vs 'C' SVs=130]
[3 of 3
'B' vs 'C' SVs=37]
sequential pipeline       2x1 
'Scaling+SVM stack+Multi-class combiner+Decision'
 
1 Scaling                 2x2  standardization
 2 SVM stack               2x3  3 classifiers in 2D space
 3 Multi
-class combiner    3x3 
 4 Decision                3x1  weighting
3 classes

>> sdconfmat(b.lab,b*p1,'norm')

ans =

 
True      Decisions
 Labels    
|       A       B       C  Totals
-----------------------------------------------
 
A         |  0.995   0.002   0.003   1.00
 B         
|  0.850   0.150   0.000   1.00
 C         
|  0.675   0.000   0.325   1.00
-----------------------------------------------


>> 
p2=sdsvc(b,'sigma',1,'c',1,'one-against-one','w',[0.01 0.9 0.9])
one-against-one[1 of 3'A' vs 'B' SVs=785]
[2 of 3
'A' vs 'C' SVs=810]
[3 of 3
'B' vs 'C' SVs=37]

sequential pipeline       2x1 
'Scaling+SVM stack+Multi-class combiner+Decision'
 
1 Scaling                 2x2  standardization
 2 SVM stack               2x3  3 classifiers in 2D space
 3 Multi
-class combiner    3x3 
 4 Decision                3x1  weighting
3 classes

>> sdconfmat(b.lab,b*p2,'norm')

ans =

 
True      Decisions
 Labels    
|       A       B       C  Totals
-----------------------------------------------
 
A         |  0.595   0.217   0.188   1.00
 B         
|  0.000   0.925   0.075   1.00
 C         
|  0.000   0.200   0.800   1.00

Does it work for you?

Best,

Pavel

File Attachments
sdsvc.p  (File Size: 5KB - Downloads: 21)
sdsvc.m  (File Size: 4KB - Downloads: 21)
Profile
 
 
Posted: 31 August 2017 02:32 PM   [ Ignore ]   [ # 2 ]  
Novice
Rank
Total Posts:  7
Joined  2016-11-04

Dear Pavel,

Thank you so much for the fast reply and the files! I have incorporated the weight factors and it works perfectly now.

Thank you again!

Regards,
Esther

Profile
 
 
Posted: 31 August 2017 02:42 PM   [ Ignore ]   [ # 3 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  371
Joined  2008-04-26

great!

Pavel

Profile
 
 
Posted: 23 October 2017 08:33 AM   [ Ignore ]   [ # 4 ]  
Novice
Rank
Total Posts:  7
Joined  2016-11-04

Dear Pavel,

I have been working with the sdsvc script you posted above, which works really well! However, the multiclass issue (and the one-against-one strategy) makes it difficult to optimize the parameters of the SVM. Therefore I wanted to optimize multiple one-against-one SVMs and afterwards combine them using pairwise coupling. However, for that I require the probability estimates. Unfortunately, when using the weight option, sdsvc cannot return probaiblitistic soft outputs. I have tried to obtain these myself using the soft outputs of the SVM classifier but this is quite difficult. Therefore, I was wondering if this option will be available in the future.

Thank you in advance!

Regards,
Esther

Profile
 
 
Posted: 10 November 2017 02:09 PM   [ Ignore ]   [ # 5 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  371
Joined  2008-04-26

Hi Esther,

sorry for late reply. I think it should be possible, but we will need SVM combiner implemented for base classifiers with two soft outputs. The current combiner version is for 1 soft output, common in SVM without probabilistic output.

We will put this on the road map for 5.3 I think - will keep you posted.

Kind Regards,

Pavel

Profile
 
 
Posted: 22 November 2017 04:14 PM   [ Ignore ]   [ # 6 ]  
Novice
Rank
Total Posts:  7
Joined  2016-11-04

Dear Pavel,

Thank you for your reply. I will wait for version 5.3!

Regards,

Esther

Profile
 
 
   
 
 
‹‹ Threshold SVC      Weights and bias ››