Username Remember Me?
Password   forgot password?
   
   
setting priors in sddata
Posted: 07 May 2010 10:06 AM   [ Ignore ]  
Newbie
Rank
Total Posts:  6
Joined  2008-11-10

I need to set the prior probabilites in the training dataset because the class sizes are heavily unbalanced. For sddata I only find the getpriors method but nothing like setpriors.
Is there a way to set the priors?

Best,

Bernhard

Profile
 
 
Posted: 07 May 2010 11:50 AM   [ Ignore ]   [ # 1 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  240
Joined  2008-04-26

Hi Bernhard,

in PRSD Studio, prior probabilities are part of a classifier or (indirectly) of an operating point, not of the data set itself. Philosophically, we believe that that’s where these assumptions belong. I may have the same data set but different problems/labels and thus different assumptions on priors.

When you are training a classifier that makes use of priors, such as the normal-based models (sdlinear, sdquadratic, sdmixture), you may specify your assumed priors using the ‘priors’ option:

>> a
'Fruit set' 260 by 2 sddata3 classes'apple'(100'banana'(100'stone'(60

>> 
p=sdlinear(a,'priors',[0.8 0.1 0.1])
sequential pipeline     2x3 'Linear discriminant'
 
1  Gauss eq.cov.           2x3  3 classes3 components (sdp_normal)
 
2  Output normalization    3x3  (sdp_norm)

In my opinion, ROC analysis relieves you from assuming the priors. When making the decisions, we weight the model soft outputs with some class weights. Using ROC analysis, we identify the weights needed in order to reach the desirable performance on the test set, irrespective of the prior assumption we put in the model. If our assumptions change, the ROC-estimated class weights will of course change as well to provide desired performance. The beauty is we don’t need to care. We only need to assure that the test set is realistic for estimating reliable class weights. The training set may be skewed in any direction or, more practically, be balanced so we estimate as good parameters of our model as possible.

Does it help with your problem?

Pavel

Profile
 
 
Posted: 07 May 2010 12:08 PM   [ Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  6
Joined  2008-11-10

Hi Pavel,

Yes, that definitely helps:-)

I tried it out for sdmixture, within 2.1 (stable) this option is not available (as it is e.g. for sdlinear).
But I understand your arguments in preferring ROC Analysis. It IS sufficient and easier to use. The major advantage ist that it can be applied after training.

Thanks a lot and Best Regards

Bernhard

Profile
 
 
Posted: 23 July 2010 09:30 AM   [ Ignore ]   [ # 3 ]  
Newbie
Rank
Total Posts:  9
Joined  2010-06-17

Is there a way to do this becides using the constructor?
Something like:

load(’fruit’);
p = sdlinear;
p = setpriors(p,[0.8 0.1 0.1]);
p = a*p;

[ Edited: 23 July 2010 10:03 AM by Carlas]
Profile
 
 
Posted: 23 July 2010 12:15 PM   [ Ignore ]   [ # 4 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  240
Joined  2008-04-26

As mentioned above, priors are just one of the classifier parameters of normal-based model in PRSD Studio. Priors are specified when you train the pipeline or when you create based on existing parameters:

>> p=sdlinear(a)
sequential pipeline     10x2 'Linear discriminant'
 
1  Gauss eq.cov.          10x2  2 classes2 components (sdp_normal)
 
2  Output normalization    2x2  (sdp_norm)
>> 
p{1}

ans 


     
mean[2x10x2 sddata]
      cov
[10x10x2 double]
    prior
[0.5501 0.4499]

>> p=sdp_normalp{1}.meanp{1}.cov[0.8 0.2])
Gauss pipeline          10x2  2 classes2 components (sdp_normal)

>> 
p{1}

ans 


     
mean[2x10x2 sddata]
      cov
[10x10x2 double]
    prior
[0.8000 0.2000]

Hope it helps,

Pavel

Profile