Username Remember Me?
Password   forgot password?
   
   
single class training&tasting;
Posted: 14 September 2009 12:53 PM   [ Ignore ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Hello!

I have to train dataset of just one class and then test it with some samples of the same class and some samples of other classes. I would get as result a confirmation or confirmation of belonging each testing samples to this class.

So at first I have a question how can I train a single class using PR Tools?

Could you please help me.

Cheers,
Valeria

Profile
 
 
Posted: 14 September 2009 03:23 PM   [ Ignore ]   [ # 1 ]  
Moderator
RankRankRankRank
Total Posts:  250
Joined  2008-11-08

May be you should have a look at the one-class classifier toolbox by David Tax
which is built on PRTools:

http://ict.ewi.tudelft.nl/~davidt/oneclass.html

Best,

Bob Duin

Profile
 
 
Posted: 14 September 2009 05:28 PM   [ Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Thank you!

But if I have not optimization toolbox (for svdd and lpdd)
- statistics toolbox (for randsph)
and - neural network toolbox (for autoenc_dd)
it would be possible to solve my task?

Bests,
Valeria

[ Edited: 15 September 2009 10:14 AM by Valeria]
Profile
 
 
Posted: 14 September 2009 09:24 PM   [ Ignore ]   [ # 3 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  236
Joined  2008-04-26

Dear Valeria,

you can also build detectors using PRSD Toolbox (http://prsysdesign.net/index.php/html/software/ )

The sddetector command helps you to do that in one step. You provide dataset, class you want to detect and an untrained model.

>> a=gendatf
Fruit set
260 by 2 dataset with 3 classes[100  100   60]
>> getlablist(a)
ans =
apple 
banana
stone 

>> p=sddetector(a,'apple',parzenc)
new 
lablist:
1apple  -> apple    
2
banana -> non-apple
3
stone  -> non-apple
PR_Warning
getpriorparzencNo priors found in dataset, class frequencies are used instead
Fruit set
52 by 2 dataset with 2 classes[20  32]
sequential pipeline     2x1 
'Parzen Classifier'
 
1  sdp_parzen          2x2  Parzen Classifier2 classes208 prototypes
 2  sdp_fsel            2x1  
 3  sdp_decide          1x1  Threshold
-based decision on apple at op 30

>> sdscatter(a,p)

sddetector is able to set the operating point both using ROC analysis (if you have non-target examples in your data) or using one-class scenario where you fix the rejected fraction.

For more information check out: http://perclass.com/doc/guide/classifiers.html#detector

Hope it helps,

Pavel

[ Edited: 14 September 2009 09:27 PM by pavel]
Image Attachments
Picture 9.png
Profile
 
 
Posted: 15 September 2009 12:04 PM   [ Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Thanks a lot! It looks very good.
I will try it.

Profile
 
 
Posted: 15 September 2009 03:35 PM   [ Ignore ]   [ # 5 ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Dear Pavel,

after this:

>> pd=sddetector(A,'all',sdmixture,'reject',0.1)

I get the error:

[class 'all' initialization:.................... 7 clusters  EM:no components left??? Error using ==> sdconvert at 99
mapping 
or pipeline expected

Error in 
==>
D:\matlab\toolbox\prsd_studio_lite_1.2.3_07jul09_win_m74\prsd_toolbox\sddetector.p>sddetector
at 122

Do you know what could it be?

Bests,
Valeria

Profile
 
 
Posted: 15 September 2009 03:50 PM   [ Ignore ]   [ # 6 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  236
Joined  2008-04-26

Dear Valeria,

sdmixture initialization found 7 components. These are used as the starting point of EM algorithm. What happens is that during the iteration process of the EM algorithm the components degenerate.

This may happen for example in high-dimensional problems with small sample size. What is your sample size/dimensionality in dataset A?

If this is the case, you may consider either reducing the dimensionality using pca or regularizing the mixture so it does not have issues in estimating covariance matrices.

Example on 20D data with just 200 samples:

>> a=gendatd(200,20)
Difficult Dataset200 by 20 dataset with 2 classes[93  107]

mixture with default options looses components in EM optimization:
>> 
p=sdmixture(a)
[class '1' initialization:.................... 6 clusters  EM:...-1..-1.-2...-1..................... 1 comp] 
[class 
'2' initialization:.................... 9 clusters  EM:....-1.-3........-1......-1.......-1.... 2 comp] 
sequential pipeline     20x2 
''
 
1  sdp_normal         20x2  2 classes3 components

reduction of dimensionality to 5D by PCA:
>> 
w=pca(a,5)
PCA to 5D20 to 5 trained  mapping   --> affine

mixture optimization in reduced subspace does not suffer (so much):
>> 
p=sdmixture(a*w)
[class '1' initialization:..................... 3 clusters  EM:.............................. 3 comp] 
[class 
'2' initialization:..................... 3 clusters  EM:.-2............................. 1 comp] 
sequential pipeline     5x2 
''
 
1  sdp_normal          5x2  2 classes4 components

regularization used in the original 20D space:
>> 
p=sdmixture(a,'reg',1e-6)
[class '1' initialization:.................... 6 clusters  EM:.............................. 6 comp] 
[class 
'2' initialization:.................... 9 clusters  EM:.............................. 9 comp] 
sequential pipeline     20x2 
''
 
1  sdp_normal         20x2  2 classes15 components

Does it help?

Pavel

Profile
 
 
Posted: 16 September 2009 04:21 PM   [ Ignore ]   [ # 7 ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Thank you. Yes, I had 500D and 30 sampels.
But now I reduce number of the features.

So, I would like to train testset(6 samples, 2-10D)of one class and than to test it with trainset(50 samples, 2-10D)of 2 classes.

I do it in the following way:

[Atstr]=dataset_1(datafileperson_ID);
pca(tr,2);
tr2=tr*W;
pd=sddetector(tr2,'1',sdmixture,'reject',0.1);

sdscatter(tr2,pd);

ts2=sddetector(ts,'1',sdmixture);
sdconfmat(getlab(ts2),ts2*pd)

but I get this error:

??? Error using ==> sdexe at 128
wrong data format

Error in 
==>
D:\matlab\toolbox\prsd_studio_lite_1.2.3_07jul09_win_m74\prsd_toolbox\@sdppl\mtimes.p>mtimes
at 15


Error in 
==> one_class at 11
sdconfmat
(getlab(ts2),ts2*pd)

Could you pleas help me.

Thanks,
Daria

Profile
 
 
Posted: 16 September 2009 04:46 PM   [ Ignore ]   [ # 8 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  236
Joined  2008-04-26

Dear Valeria,

OK. The first three lines are more-or-less clear (assuming dataset_l is your own function returning datasets).
I see you reduce the dataset tr into 2D and train the detector pd in this subspace.

I don’t understand your intentions on the line ts2=sddetector…
If you wish to execute the detector pd on the test set ts, you should directly apply it to the dataset ts processed by the trained mapping W:

sdconfmat(getlab(ts), ts*W*pd)

As written, the call to sddetector invokes new training on you dataset ts (in original dimensionality). ts2 is then the trained detector pipeline, not a dataset!

Does it help a bit?

Pavel

Profile
 
 
Posted: 17 September 2009 10:34 AM   [ Ignore ]   [ # 9 ]  
Newbie
Rank
Total Posts:  16
Joined  2009-09-03

Dear Pavel,

Thank you a lot! You realy help me in this new area for me.

Now I can train and test. But I get a bit strange result. The plot you can see in attachment. The lokation of trained area is not exactly right. The Plot_5 looks a bit better then Plot_1, but still not very nice :(
And I dont understand a confusion matrix, that I got:

True Labels      Decisions                                    
    1    1    1    non
-1    â€¦    non-1    non-1    non-1    non-1    Totals
1    0.000     0.000     0.000     0.000     â€¦    0.000    1.000    1.000    1.000    1.00
1    1.000    1.000    1.000    1.000    â€¦    1.000    1.000    1.000

It is very big, I took just the beginning and the end of it.

My code looks so:

function one_class (datafileperson_ID)

[Atstr]=dataset_1(datafileperson_ID);
pca(tr,2);
tr2=tr*W;
pd=sddetector(tr2,'1',sdmixture,'reject',0.1);

sdscatter(tr2,pd);

sdconfmat(getlab(ts),ts*W*pd)
sdscatter(ts*W,pd);

return

[A, ts, tr]=dataset_1(datafile, person_ID); - is a function, that returns A -dataset of all instances, ts - test set, tr - training set

Could you recommend something to improve this training?

Cheers,
Valeria

Image Attachments
one_class_training_5.jpgone_class_training_1.jpg
Profile
 
 
Posted: 17 September 2009 12:04 PM   [ Ignore ]   [ # 10 ]  
Administrator
Avatar
RankRankRankRank
Total Posts:  236
Joined  2008-04-26

Dear Valeria,

regarding the confusion matrix. I think it is because your class names are numbers, not strings? In PRSD Studio we require only string names.
To check it out, you can run

>> class(getlablist(tr))

To make sure that your datasets have string labels, use:

>> tr=sdrelab(tr)
>> 
ts=sdrelab(ts)

Regarding the detector result, I’m not sure what’s going on. Could you maybe replace the sdmixture with sdknn? Then you should see on the training set a ball centered around each training observation.

Greetings,

Pavel

Profile