perClass Documentation
version 5.1 (31-May-2017)

kb7: How to convert LIBSVM Support Vector machine into a pipeline?

Keywords: support vector machines, LIBSVM, classifier execution

Problem: How to execute support vector classifier trained in LIBSVM using libPRSD?

Solution: Call sdp_svc pipeline constructor and fill in parameters trained by LIBSVM.

Note: Starting with release 2.1, PRSD Studio provides sdsvc command that encapsulates training of RBF support vector machines in libSVM. See example.

The following example illustrates how may be classifier parameters, trained using external libraries, imported in PRSD Studio and executed using libPRSD library out of Matlab.

PRSD Studio exposes number of pattern recognition algorithms to the user as pipeline actions. For each algorithm, we may construct an execution pipeline directly by supplying its canonical parameters (see function reference for parameters of pipeline actions). Usually, we train the classifiers directly in PRSD Studio. However, we may as well train the algorithm using external tools or libraries as long as we are able to provide its parameters to the pipeline constructor under Matlab.

In this example, we use "A simple MATLAB interface" of LIBSVM authors, you can download from here (ver.2.9.1). We use simple 'fruit' data set problem':

>> load fruit
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> b=a(:,:,{'apple','banana'})
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100) 

>> sdscatter(a)

Now, we invoke LIBSVM to train the RBF SVM with sigma=2.0 (gamma=1/2). We provide the sample labels as indices and raw data matrix:

>> model=svmtrain(-b.lab, +b, '-g 0.5')
*.*
optimization finished, #iter = 272
nu = 0.204848
obj = -22.828866, rho = 0.049309
nSV = 84, nBSV = 10
Total nSV = 84

model = 

Parameters: [5x1 double]
  nr_class: 2
   totalSV: 84
       rho: 0.0493
     Label: [2x1 double]
     ProbA: []
     ProbB: []
       nSV: [2x1 double]
   sv_coef: [84x1 double]
       SVs: [84x2 double]

In order to execute this classifier using PRSD Studio sdp_svc pipeline action, we need to construct a labeled data set with support vector objects. The number of SVs in each of the classes is:

>> model.nSV

ans =

41
43

We create a set of labels...

>> lab=sdlab({'apple','banana'},model.nSV)
sdlab with 98 entries, 2 groups: 'apple'(50) 'banana'(48) 

...and construct the labeled set of support vectors:

>> proto=sddata(model.SVs,lab)
98 by 2 sddata, 2 classes: 'apple'(50) 'banana'(48) 

Finally, we may provide SVM parameters into sdp_svc:

>> p=sdp_svc('rbf',1/0.5,proto,model.sv_coef,model.rho)
SVC pipeline            2x1   (sdp_svc)

The pipeline object may be directly executed on any 3D data:

>> rand(3,2)*p

ans =

   -1.5799
   -1.1996
   -1.2781

We can visualize the raw output of the pipeline action using sdscatter function:

>> sdscatter(b,p)

To provide decisions, we must add an operating point. We may use the sddecide command that will add the default operating point thresholding the SVM output at zero:

>> pd=sddecide(p)
sequential pipeline     2x1 'SVC+Decision'
 1  SVC                     2x1  (sdp_svc)
 2  Decision                1x1  thresholding on apple at op 1 (sdp_decide)

>> sdscatter(b,pd)

Finally, we will compare the execution speed of the trained SVC under libPRSD and LIBSVM. We create a random large dataset with 100 000 samples. We also need "labels" as the LIBSVM execution interface is designed for "testing", not only for "execution":

>> test=rand(100000,2);
>> lab=ones(size(test,1),1);

>> tic; [predict_label, accuracy, dec_values] = svmpredict(lab, test, model); toc
Accuracy = 0% (0/100000) (classification)
Elapsed time is 0.719287 seconds.

>> tic; out=test*p2; toc
Elapsed time is 0.309065 seconds.

>> 0.309065/0.719287
ans =
2.32

Execution under libPRSD gives us 2.32 times speedup.

The pipeline may be now exported using sdexport and directly run in a custom applications outside Matlab.