Username Remember Me?
Password   forgot password?
   
   
Problem using correlation coeficients to select features
Posted: 09 February 2010 07:06 PM   [ Ignore ]  
Newbie
Rank
Total Posts:  20
Joined  2009-10-19

Hi,

I want to train a classifier to tell if a person has a certain disease. I have 90 people (cases). The feature vector is composed by 7 features extracted from medical examination and the age, height and weight of the pacient (10 features ).
I use a matlab function corrcoef to calculate the correlation coefficents. The idea is to identify undesirable high correlations between the features and to identify the desirable correlation betwwen the inputs and the output.

The correlation has shown that there were no correlation betwwen the height and weight and the output ( 0 or 1). However when I use knnc using just height and weight in a cross validation I got excelent results, which somehow puzzled me. I didn´t understand what happen, Althought, you can not always explain no linear models with linear stattisticall analysis.

Either there is a true relation between the disease and the weight and height, or I was very unfortunaly in picking up the pacients. If that so, how can I avoid that, because in principle, the disease should also be related with the features extracted from the medical examination. I mean the height and weight should not be the only features that are necessary.

Thanks,

Jorge

Profile
 
 
Posted: 18 February 2010 03:32 PM   [ Ignore ]   [ # 1 ]  
Moderator
RankRankRankRank
Total Posts:  253
Joined  2008-11-08

This may happen for correlated datasets. E.g. look at the distribution of the gendatd
dataset. In the horizontal and vertical directions there is a large class overlap
and still classes are well separable.

Bob Duin

Profile