NYC generates hash-anonymised data dump, which gets reversed
There are about 1000*26**3 = 21952000 or 22M possible medallion numbers. So, by calculating the md5 hashes of all these numbers (only 24M!), one can completely deanonymise the entire data. Modern computers are fast: so fast that computing the 24M hashes took less than 2 minutes.
(via Bruce Schneier) The better fix is a HMAC (see http://benlog.com/2008/06/19/dont-hash-secrets/ ), or just to assign opaque IDs instead of hashing.(tags: hashing sha1 md5 bruce-schneier anonymization deanonymization security new-york nyc taxis data big-data hmac keyed-hashing salting)