Username Remember Me?
Password   forgot password?
   
   
sdroc and setcurop with cost specification
Posted: 29 May 2013 11:25 AM   [ Ignore ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Dear Pavel and Carmen,
Further questions arise. I try to find the best operating for our case, i.e. minimizing the false negatives. Because I don’t know yet, what an acceptable false negative rate might be, I want to penalize only using a weighting matrix:

ones(length(validationData.lab.list));
w(1) = name2ind(validationData.lab.list, 'pos');
w(2) = name2ind(validationData.lab.list, 'neg');
W(w(1),w(2)) = 10*W(w(1),w(2)); % Penalize false negatives

From the documentation, I came up with following two possiblites:

receiverOpCurve sdroc(validationDatatrainedClassifier'confmat''cost'W);

or

operatingPoint setcurop(sdroc(validationDatatrainedClassifier'confmat'), 'cost'W);

Both approaches do the same, right? That is, the mean error is minimized considering the weights W?
Best regards,
Stefan

Profile
 
 
Posted: 29 May 2013 02:56 PM   [ Ignore ]   [ # 1 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

Dear Stefan,

for the two-class case, like yours, both examples yield the same outcome. This is because the ROC works with a grid-defined operating points.

For multi-class it is not because the ROC optimizer leverages the cost specification matrix in the first example directly when generating a subset of operating points but not in the second case (there a unit cost matrix is used to define the points, the cost matrix W only later to select one).

one hint - to convert a name into index, you can also directly ask the list object. Sometimes it may save you some typing:

>> a
'Fruit set' 260 by 2 sddata3 classes'apple'(100'banana'(100'stone'(60
>> 
a.lab.list('banana')

ans =

     
2

With Kind Regards,

Pavel

Profile
 
 
Posted: 30 May 2013 08:57 AM   [ Ignore ]   [ # 2 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Dear Pavel,
I read again the documentation and some references on the web to better understand the ROC Analysis.

ROC is equal to the set of operating points

So, in the first case, W is the cost matrix to generate ROC. The ROC can be generated as follows:
1) Given classifier with soft outputs
2) Apply threshold to soft outputs, which results the confusion matrix
3) Apply cost matrix to confusion matrix and compute operating point in ROC
4) Repeat 2) to 3) until ROC complete

In the second case, the ROC is given for a uniform cost matrix W. The goal is to find a suitable operating point as follows:
1) Select operating point
2) Compute confusion matrix, apply cost matrix and compute mean error
3) Repeat 1) and 2) for all operating points
4) Select operating Point with minimum mean error

Is my understanding correct?

In a multi-class problem, does it make sense to use cost matrix twice?

operatingPoint setcurop(sdroc(validationDatatrainedClassifier'confmat''cost'W), 'cost'W);

Regards,
Stefan

Profile
 
 
Posted: 30 May 2013 03:27 PM   [ Ignore ]   [ # 3 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

Dear Stefan,

yes, the steps you provide are roughly what happens.

In multi-class situation, it currently does make sense to do both sdroc and setcurop with the same cost specification matrix.
It is because sdroc alone optimizes the operating points based on the cost spec (as you describe above) but by default eventually sets the current operating point to minimize mean error.  The setcurop will choose one of the existing operating points in the ROC object as current such that the loss defined using your cost specification W is minimized.

I can see that it would be handy to make the sdroc in multi-class case also directly set the cur.op.point to minimize the loss given the cost matrix W, like it happens in the two-class case. We will include this in the 4.0 final. Therefore, you will not need an extra setcurop with the cost W anymore if you provide the cost in sdroc call.

With Kind Regards,

Pavel

Profile
 
 
Posted: 31 May 2013 06:14 AM   [ Ignore ]   [ # 4 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Dear Pavel,
Thanks for the clarification. I finally feel I have a grasp on what I am doing :-)
Stefan

Profile
 
 
Posted: 11 November 2013 01:51 PM   [ Ignore ]   [ # 5 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Dear Pavel,
I am back on this issue. I have a cost matrix:

=
     
1     1     1     1
     1     1     1     1
    10    10     1    10
     1     1     1     1

and do the following optimizations:
Case 1)

receiverOpCurve sdroc(outBeforeValidation'confmat''ops'ops)
opPoint getcurop(receiverOpCurve);

ROC (1000 w-based op.points5 measures), curop14
est
1:err(neg)=0.412:err(outlier)=0.233:err(pos)=0.134:err(unclear)=0.115:mean-error=0.22

Case 2)

receiverOpCurve sdroc(outBeforeValidation'confmat''ops'ops'cost'W)
opPoint getcurop(receiverOpCurve);

ROC (1000 w-based op.points5 measures), curop14
est
1:err(neg)=0.412:err(outlier)=0.233:err(pos)=0.134:err(unclear)=0.115:mean-error=0.22

Case 3)

receiverOpCurve sdroc(outBeforeValidation'confmat''ops'ops)
receiverOpCurve setcurop(receiverOpCurve'cost'W)
opPoint getcurop(receiverOpCurve);

ROC (1000 w-based op.points5 measures), curop14
est
1:err(neg)=0.412:err(outlier)=0.233:err(pos)=0.134:err(unclear)=0.115:mean-error=0.22
ROC 
(1000 w-based op.points5 measures), curop351
est
1:err(neg)=0.022:err(outlier)=0.913:err(pos)=0.124:err(unclear)=1.005:mean-error=0.51

Case 1) and case 2) result in the same operating point. Is this, because I provide the operating points and, therefore, the cost matrix is ignored in case 2)?
If yes, then I have to do cost optimization as in case 3)?
To further increase my understanding. Is it correct that a trained classifier is not necessarily optimized for minimum mean error. The error of the classifier depends on the training procedure and if I want to guarantee minimum mean eror, I have to do ROC optimization.
Best regards,
Stefan

Profile
 
 
Posted: 13 November 2013 11:01 AM   [ Ignore ]   [ # 6 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

Dear Stefan,

yes. In 4.0, when presenting a custom set of operating points, the cost matrix is used only in two-class case. In multi-class, it is ignored.

So, at the moment, you should do cost-sensitive optimization with custom set of points using the case 3.

Just to summarize: The 4.0 public release already contains the fix we discussed above, back in May: In multi-class case, when sdroc is deriving op.points internally and cost specification matrix is given, it is directly used. Now, we discuss the case where you provide your own set of operating points.

Regarding the default op.point of a classifier without ROC: yes, I agree that it can be anything. For example, for probabilistic models it will depend on the apparent priors in your training set (or the priors set in the pipeline). To make a connection between the operating point and any performance/error measure such as mean error, we need to perform ROC.

With Kind Regards,

Pavel

Profile
 
 
Posted: 14 November 2013 10:07 AM   [ Ignore ]   [ # 7 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Thanks, Pavel, for your clarification.
Thus,

receiverOpCurve sdroc(outBeforeValidation'cost'W)

does not necessarily result in the same operating point as

receiverOpCurve sdroc(outBeforeValidation)
receiverOpCurve setcurop(receiverOpCurve'cost'W)

What is the advantage of the first case, using a cost matrix when generating the ROC? Does it outperform the second case?
Best regards,
Stefan

Profile
 
 
Posted: 15 November 2013 10:18 AM   [ Ignore ]   [ # 8 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26
Stefan - 14 November 2013 10:07 AM

Thanks, Pavel, for your clarification.
Thus,

receiverOpCurve sdroc(outBeforeValidation'cost'W)

does not necessarily result in the same operating point as

receiverOpCurve sdroc(outBeforeValidation)
receiverOpCurve setcurop(receiverOpCurve'cost'W)

What is the advantage of the first case, using a cost matrix when generating the ROC? Does it outperform the second case?
Best regards,
Stefan

Dear Stefan,

for the two-class case, there is no difference as the optimizer uses the same operating point set for both (just, to use setcurop with ‘cost’ option, the roc object needs to contain confusion matrices, so you need ‘confmat’ option in sdroc).

for the multi-class case, there is a different optimization process happening when you do sdroc(soft_output) and sdroc(soft_output,’cost’,W). If cost is given, it is leveraged also when generating operating points. Therefore, you will most probably have two different sets.

In the 4.0 release, if cost spec is provided, sdroc also sets the current operating point based on the spec (this was not happening in multi-class case in 3.x).

Hope it helps,

Pavel

Profile
 
 
Posted: 15 November 2013 12:04 PM   [ Ignore ]   [ # 9 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Thanks, Pavel. In multi-class case, I do not yet understand the purpose of the cost matrix when generating the operating points. All possible operating points constitute the ROC surface, which in my understanding should be independent of any cost specification. Cost specification are used to select a single operating point out of ROC. Thus, what is the advantage of the cost matrix when generating the ROC surface? I couldn’t find yet an answer from the literature.

Profile
 
 
Posted: 15 November 2013 12:11 PM   [ Ignore ]   [ # 10 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

In multi-class case, the complexity of ROC surface in terms of grid sizes grows exponentially. Therefore, suboptimal search strategies are needed in most problems. perClass contains several such ROC optimizers. For the cost-sensitive case, we use a specific one that directs the search for operating points to subsets that are meaningful w.r.t. cost specification.

Profile
 
 
Posted: 15 November 2013 12:51 PM   [ Ignore ]   [ # 11 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Ok, if I understand you correctly, the cost matrix is used to constrain the optimization and to speed-up the computation, but does not “alter” the ROC surface as such. Thus, the resulting operation point of

receiverOpCurve sdroc(outBeforeValidation'confmat''cost'W)

and of

receiverOpCurve sdroc(outBeforeValidation'confmat')
receiverOpCurve setcurop(receiverOpCurve'cost'W)

should be similar. I suppose, they are not exactly the same, because the underlying optimization might be different and therefore, the ROC surface approximation might not be exactly the same.

Profile
 
 
Posted: 15 November 2013 01:10 PM   [ Ignore ]   [ # 12 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

it does alter the solutions you find. As I mentioned above, the cost-spec in multi-class case is used also to direct the *generation* of potentially meaningful op.points, not only as a speed-up.

In your examples, you have two different optimization algorithms trying to find a subset of a potentially prohibitively large parameter hyper-surface. Each of them generates op.points on the fly using different criterion and search strategy + number of heuristics that make the search feasible and solution practical. As a result, you get two different subsets of solutions (operating points). The solutions returned are valid but, naturally, still sub-optimal.

If your number of classes is low, you may attempt to make a full search i.e. to generate a dense grid of operating point and select from it based on your cost-spec. perClass uses this strategy only for two classes because, otherwise, the ROC analysis becomes quickly prohibitively complex both computationally and (especially) memory-wise.

With Kind Regards,

Pavel

Profile
 
 
Posted: 18 November 2013 07:57 AM   [ Ignore ]   [ # 13 ]  
Apprentice
RankRank
Total Posts:  43
Joined  2013-03-28

Thanks for taking the time to explain it in more detail. I would like to summarize my understanding.
A multiclass ROC is given for each classifier, but its compuation becomes untractable for large number of classes. Therefore, various algorithms have been developed to estimate the ROC. sdroc chooses one of the alogrithms depending on the problem at hand.
For example,

receiverOpCurve sdroc(outValidation'confmat''cost'W);

tries to optimize given cost Matrix (first figure) and

receiverOpCurve sdroc(outValidation'confmat''ops'ops);

tries to optimize given a set of operating points (second figure). The outcome can be rather different.

Image Attachments
20131114_all_search_collection_roc.png20131114_all_search_collection_sandbox_roc.png
Profile
 
 
Posted: 20 November 2013 10:29 AM   [ Ignore ]   [ # 14 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  360
Joined  2008-04-26

yes, that’s true. We already improved the multi-class optimizer coming in the 4.1 release so that it provides more rich sets of operating points by default. We plan to improve also the default behaviour of the cost-based one.

The best thing in your case is probably what you do now, i.e. to provide your own set of operating points and optimize cost afterwords with setcurop.

Thanks for the feedback!

With Kind Regards,

Pavel

Profile