It actually gets a little more complicated than that. It's the whole problem that on one hand, you want to pool everybody's spam to train the filters, so that we all benefit from each user's experience, but on the other hand, one user will consider the Victoria's Secret emails (to pick a hypothetical example

) to be spam while another will consider them essential news.
I would say that the only effective solution to this would be to have COS that can train the spam filters and other COS that can't.
Or maybe have a two-level Bayesian database--one global and one personal, and only have selected COS able to train the global list.
These are both easy to describe; methinks they'll be a lot more complicated to implement. . .