Some news from TREC’s Gordon Cormack:
The TREC 2005 Corpus (92,000 messages – 42,000 ham; 50,000 spam) is now available for self-serve download.
TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.
The corpus can be picked up at Gordon’s site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.