Spam: Andrew ‘tridge’ Tridgell’s junkcode directory really does contain some useful snippets, like he said. Here’s spamsum, a checksum algorithm for hashing spam text:
The core of the spamsum algorithm is a rolling hash similar to the rolling hash used in ‘rsync’. The rolling hash is used to produce a series of ‘reset points’ in the plaintext that depend only on the immediate context (with a default context width of seven characters) and not on the earlier or later parts of the plaintext. A stronger hash based on the FNV algorithm is then used to produce hash values of the areas between two reset points. The resulting signature comes from the concatenation of a single character from the FNV hash per reset point.
Very very nice!