Username Remember Me?
Password   forgot password?
   
   
Sum rule and Ensembling of Classifiers
Posted: 19 December 2010 12:50 PM   [ Ignore ]  
Newbie
Rank
Total Posts:  12
Joined  2010-09-13

Hi Bob,
I appreciate your answers and efforts. Thanks.
I’ve four questions:
1. I want to combine classfiers according to SUM rule. Is it possible in PRTools?
2. I’ve used bagging fusion:

dstr --> 2925 by 1 dataset with 2 classes[2538   387]
dste --> 325 by 1 dataset with 2 classes[282   43]
 wbase1 
stumpc([],'maxcrit',5);
 
wbase2 weakc([],0.2,5,1);
 
wbase [wbase1 wbase2];
 
baggingc(dstr,wbase,100);  % w-->Mean combiner1 to 2 trained  mapping   --> fixedcc.
 
err dste*w*testc;

I get an error value of 0.0615. Is this use correct for bagging?
3.Maybe I’m wrong with the terminology. But I want to ask that can I use min, max, median, product combining operators with bagging?
4. There are three methods of ensembling of classifiers(at least I know). These are bagging, random subspace and rotation forest methods. Which one is common in PRTools? Which one outperforms the others?
Regards.

Profile
 
 
Posted: 28 December 2010 11:31 PM   [ Ignore ]   [ # 1 ]  
Moderator
RankRankRankRank
Total Posts:  253
Joined  2008-11-08

1. Combining by the sum rule is implemented in PRTools by meanc
2. The use is correct, whether this is appropriate for the problem I don’t know.
3. Bagging stands for Bootstapping and AGGregatING. It was proposed by Breiman. Originally he proposed for the aggregation rule (the combiner, the aggregation terminology is used by Breiman only) voting (votec in PRTools). I prefer for PRTools meanc. Other combiners can be used as well, e.g. maxc, prodc, etcetera.
4. Next to baggingc PRTools offers adaboostc and rsscc (random subspace) as ensemble generating methods. What is best is application dependent.

Bob Duin

Profile
 
 
Posted: 29 December 2010 08:57 PM   [ Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  12
Joined  2010-09-13

Dear Bob,
I use bagging with PCA mapping:

.... % 10-fold CV
dsTR 
dataset(sTR', labelsTR');
dsTE dataset(sTE', labelsTE');
sampTR pca(dsTR,0.98);
wbase1 stumpc([],'maxcrit',5);
wbase2 weakc([],0.2,5,1);
wbase [wbase1 wbase2];
baggingc(sampTR,wbase,100);  
err dsTE*w*testc;
disp (err);
crt_pca crt_pca err;
.....

But it gives error:
??? Cell contents reference from a non-cell array object.
Error in ==> gendat at 77
if ~isdatafile(X), X = dataset(X); end
Error in ==> baggingc at 65
w = [w gendat(a)*clasf];

Do I wrongly use PCA mapping with bagging?
Thanks in advance.
Murat

Profile
 
 
Posted: 01 January 2011 11:11 AM   [ Ignore ]   [ # 3 ]  
Moderator
RankRankRankRank
Total Posts:  253
Joined  2008-11-08

sampTR = pca(dsTR,0.98) is a mapping and should be a dataset. Please apply the mapping to the datsets first, e.g.

w = pca(dsTR,0.98)
dsTR_red = dsTR*w;
dsTE_red = dsTE*w;

Use these datasets for training baggingc and testing.

B.t.w. you use a combined classifier wbase = [wbase1 wbase2] as the base classifier for bagging. You are of course free to do this, but it is somewhat against the idea that the base classifier should be simple.

Bob Duin

Profile
 
 
   
 
 
‹‹ probability svdd      Which one is correct? ››