Hi,
I want to train a classifier to tell if a person has a certain disease. I have 90 people (cases). The feature vector is composed by 7 features extracted from medical examination and the age, height and weight of the pacient (10 features ).
I use a matlab function corrcoef to calculate the correlation coefficents. The idea is to identify undesirable high correlations between the features and to identify the desirable correlation betwwen the inputs and the output.
The correlation has shown that there were no correlation betwwen the height and weight and the output ( 0 or 1). However when I use knnc using just height and weight in a cross validation I got excelent results, which somehow puzzled me. I didn´t understand what happen, Althought, you can not always explain no linear models with linear stattisticall analysis.
Either there is a true relation between the disease and the weight and height, or I was very unfortunaly in picking up the pacients. If that so, how can I avoid that, because in principle, the disease should also be related with the features extracted from the medical examination. I mean the height and weight should not be the only features that are necessary.
Thanks,
Jorge

