Username Remember Me?
Password   forgot password?
   
   
Combining Classifiers from outside PRtools
Posted: 21 March 2012 03:25 PM   [ Ignore ]  
Novice
Rank
Total Posts:  3
Joined  2012-03-15

Thanks for this great tool
i have a group of classifiers (around 50, based on same method with different parameters) from outside PRtools for a dataset around 30000 samples
each classifiers outputs a 30000x1 vector of probabilities that are positive (0<p<1) for class 1, negative for class 2 (-1<p<0) and 0 for reject.
i need to combine these classifiers and have both the result and the weights for the different classifiers.

Thanks
p.s:sorry if this question is repeated, i tried to search but couldn’t find a direct relation to it. if this is the case please direct me to the post.

Profile
 
 
Posted: 26 March 2012 04:45 PM   [ Ignore ]   [ # 1 ]  
Novice
Rank
Total Posts:  6
Joined  2011-06-07

Hello,
Me too I asking how to combine two outputs from outside PRtools
Thanks in advance

Profile
 
 
Posted: 26 March 2012 06:26 PM   [ Ignore ]   [ # 2 ]  
Moderator
RankRankRankRank
Total Posts:  258
Joined  2008-11-08

There is no easy and fast way to combine classifier outputs from outside PRTools. This is also not the place to give an entire PRTools course. If you are familier with PRTools the following might be of help.

Please study and try to understand the example PREX_COMBINING. There is in line 38 the statement

testc(C,V)

Here is V the set of trained classifiers to be combined and C is a PRTools dataset containing the labeled testset.
This statement can also be given as

D = C*V; testc(D);

Now D is the result of applying the testset to the combined set of trained classifiers. It is a cell array with
classifier outputs, one cell for each classifier. You should convert your results to something like this. Note that every element D(i} is a proper PRTools dataset with given true labels and with the class names as feature labels. These ‘features’ are the class confidences. They sum to one for every test object.

Once you have constructed your version of D, test performances may be listed by

testc([D{:}]*prodc) % product combiner
testc([D{:}]*maxc) % max combiner

and similarly for other combiners like minc, meanc, votec, etcetera.
[D{:}] can also be used for traning and testing trained classifiers.

Bob Duin

Profile
 
 
Posted: 26 March 2012 07:04 PM   [ Ignore ]   [ # 3 ]  
Novice
Rank
Total Posts:  3
Joined  2012-03-15

Dear Bob
Thanks for your reply, i get it somehow now,
and sorry if you got from my question that i’m looking for an entire PRtools course, belive me i’m not, i just find it hard to understand the new notation of PRtools (mainly the mappings and the results of the * operator now)

another question, if i need to choose a set of these classifiers that achieve the best performance. from what i got from your answer i can also use D as an input to feature selection methods to select the best classifiers, if i considered there outputs as features here, or i’m wrong?

Thanks

Profile
 
 
Posted: 28 March 2012 10:02 AM   [ Ignore ]   [ # 4 ]  
Moderator
RankRankRankRank
Total Posts:  258
Joined  2008-11-08

The combined classifier outputs [D{:}] can be considered as a training set if the labels are available. So create A = dataset([D{:}],labels) and A can be used for training, feature selection and testing. Be careful and keep a separate testset to avoid overtraining.

Bob Duin

Profile
 
 
Posted: 25 April 2012 01:08 PM   [ Ignore ]   [ # 5 ]  
Novice
Rank
Total Posts:  3
Joined  2012-03-15

Thanks Bob for your reply, and sorry for the delay, had some health related problems
i read your answers again and had some difficulties understanding them (and i hope this won’t imply that i want an entire PRtools course).
in your first answer you said

BobDuin - 26 March 2012 06:26 PM

D = C*V; testc(D);

Now D is the result of applying the testset to the combined set of trained classifiers. It is a cell array with
classifier outputs, one cell for each classifier. You should convert your results to something like this. Note that every element D(i} is a proper PRTools dataset with given true labels and with the class names as feature labels. These ‘features’ are the class confidences. They sum to one for every test object.

the part of “these ‘features’ are the class confidences” , by ‘features’ you refer to the outputs of the classifiers that i want to combine ?, if so do i need them to be normalized to sum to 1 for each test object?

i made a could to generate D as you mentioned which is:

D=cell(size(classout,2),1);
for i=1:size(classout,2)
D{i}=dataset(classout(:,i),vlabels);
end
testc([D{:}]*maxc);

In this code ‘classout’ is the classifier output and it is 30000x60 , corresponding to 30000 samples and 60 classifier outputs, these outputs are ‘+ve’ for class ‘1’ and ‘-ve’ for class ‘2’. ‘vlabels’ are the true labels which is ‘1’ for class one and ‘2’ for class 2. the output of this code is a classification error of nearly 0.976.
The other question is about your second answer in creating ‘A = dataset([D{:}],labels)’, you said

BobDuin - 28 March 2012 10:02 AM

Be careful and keep a separate testset to avoid overtraining.

by this you meant to keep a test set to apply on the chosen classifiers from the feature selection not to use this test set in the feature selection Right?

Of course any member can reply to these questions, i just addressed Bob because he took the initiative and replied from the start.

Thanks
Ahmed

Profile
 
 
Posted: 15 August 2012 08:29 PM   [ Ignore ]   [ # 6 ]  
Moderator
RankRankRankRank
Total Posts:  258
Joined  2008-11-08

For completeness:

Yes, I advise that the class assignements for every classifier are of the type class confidences (or posteriors) and thereby sum to one for every testobject. This avoids unwanted weighting of classifiers.

On your second question: try to avoid to use the same data multiple times during training and definitely never use test data during training. This includes normalisation, feature selection, etcetera. (Of course, the outcomes of these procedures should be applied to the testdata, but these procedures should not be influenced by testdata).

Bob Duin

Profile