I just browsed one interesting paper:
Dirichlet enhanced spam filtering based on biased samples. which pose a question to me:
Is sample selection bias = multitask learning?
As in this paper, each user's spam filtering can be considered as one task.
However, there are two very "Strong" assumptions:
1. The sample selection bias of publicly available data and personalized data is still 0-1 consistent. That is P(x|public) !=0 then P(x|personal) !=0;
2. P(y|x, public) = P(y|x, personal).
The 1st assumption is still OK, since most sample selection bias adopt this one. (Otherwise, I think there's no way to inference something you have no chance to know).
The 2nd assumption is way too UNACCEPTABLE to me. Given a message, some users might treat it as a ham, some might treat it as a spam. How could this be possible to be equal?!
(Is this because it's easy to analyze?)
In my opinion, MTL can not be considered as a feature bias (in terms of sample selection bias), nor a label bias. A more general model should be a complete bias. That is, P(s=1|x,y) cannot be decomposed into any simpler form.
Actually, I am thinking whether or not it's possible to learn P(s=1|x,y) (as in the paper I mentioned) if biased and unbiased samples are both provided. OK. Let's start with the ideal case. Then a logistic regression learner can be adopted to estimate the bias. The trick here is treat (x, y) as input feature and s=1 or 0 as output.
In MTL, only very few labeled data are provided for the unbiased setting. Can we estimate the density reliably? The dilemma is that: we need "enough" data to improve the bias density estimation, but if we have "enough" data, we can already derive a very good model.
Any other possible way to enhance the estimation?
This paper adopt the Dirichlet process to improve the estimation. But why does it work? I couldn't figure it out.
The New Result on Off-diagonal Ramsey Numbers
-
(All references in this blog post can be found in the main article the post
is about which is here.)
Recall that \(R(s,k) \) is the least \(n\) so that,...
No comments:
Post a Comment