Skip to content

Archives

Ribbon filter: Practically smaller than Bloom and Xor

  • Ribbon filter: Practically smaller than Bloom and Xor

    Building on some prior lines of research, the Ribbon filter combines a simplified, faster, and more flexible construction algorithm; a data layout optimized for filter queries; and near-continuous configurability to make a practical alternative to static (immutable) Bloom filters. While well-engineered Bloom filters are extremely fast, they use roughly 50 percent more space (overhead) than the information-theoretic lower bound for filters on arbitrary keys. When Bloom filters cannot meet an application’s space efficiency targets, Ribbon filter variants dominate in space-versus-time trade-offs with near continuous configurability and space overhead as low as 1 percent or less. Ribbon filters have O(1) query times and save roughly 1/3 of memory compared with Bloom filters. At Facebook’s scale, we expect Ribbon filters to save several percent of RAM resources, with a tiny increase in CPU usage for some major storage systems. However, we do not implement efficiency gains at all engineering costs, so it’s also important to have a user-friendly data structure. This issue stalled implementation of other Bloom alternatives offering some space savings. The Ribbon filter opens these new trade-offs without introducing notable discontinuities or hazards in the configuration space. In other words, there is some complexity to make Ribbon filters general and highly configurable, but these details can be hidden behind a relatively simple API. You have essentially free choice over any three of the four core performance dimensions — number of keys added to the set, memory usage, CPU efficiency, and accuracy — and the accuracy is automatically well optimized.
    (via Tony Finch)

    (tags: via:fanf algorithms facebook programming ribbon-filters data-structures bloom-filters set-membership papers)