What’s the probability of a hash collision?
Handy calculator
(tags: probability hashing hashes collision risk md5 sha sha1 calculators)
-
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps. Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression. Roaring bitmaps are used in Apache Lucene (as of version 5.0 using an independent implementation) and Apache Spark (as of version 1.2).
(tags: bitmaps bitsets sets data-structures bits compression lucene spark daniel-lemire algorithms)
‘Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm’ [PDF]
‘Unsupervised anomaly detection is the process of finding outliers in data sets without prior training. In this paper, a histogram-based outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 standard algorithms show, that it can detect global outliers as reliable as state-of-the-art algorithms, but it performs poor on local outlier problems. HBOS is in our experiments up to 5 times faster than clustering based algorithms and up to 7 times faster than nearest-neighbor based methods.’
(tags: histograms anomaly-detection anomalies machine-learning algorithms via:paperswelove outliers unsupervised-learning hbos)
Stupid Projects From The Stupid Hackathon
Amazing.
iPad On A Face by Cheryl Wu is a telepresence robot, except it’s a human with an iPad on his or her face.
(tags: funny hacking stupid hackathons ipad-on-a-face telepresence hacks via:hn)