Tuesday, January 2, 2007

Redundant Features useful or not?

I just browsed one interesting paper published in ICML'06:

nightmare at test time: robust learning by feature deletion

The motivation for the paper can be described as like buying stock.

Suppose you have several stocks with the same risk, and you have $1000. What would you do?

Of course, divide all the money evenly into these stocks should be more reliable than putting them into just one.

This is the same situation for feature selection. Suppose you select some relevant features from training data, but it could be wrong due to small samples or noise or any kind of noise.

In this process, you probably remove those redundant features as well. From this point of view, it seems more robust to keep those redundant features rather than remove them.

But from curse of dimensionality view, redundant features should be removed.

How to trade off redundancy and robustness?
I guess this is highly related to the definition of redundancy.

I'll comment on this issue more in future.

1 comment:

abhishek said...

Valid point. But dimensionality reduction is not just useful for avoiding curse of dimensionality. Sometimes it so happens that some dimensions are more corrupted by noise than others and this can create confusion in test phase if you keep all dimensions. So I think redundant dimensions can cause problems in classification if noise lies in those dimensions.
let me know your thoughts..