Better Binary Quantization (BBQ)
Elasticsearch with a new quantization approach for vector search:
In Elasticsearch 8.16 and Lucene, we introduced Better Binary Quantization (BBQ), a new approach developed from insights drawn from a recent technique - dubbed “RaBitQ” - proposed by researchers from Nanyang Technological University, Singapore.
BBQ is a leap forward in quantization for Lucene and Elasticsearch, reducing float32 dimensions to bits, delivering ~95% memory reduction while maintaining high ranking quality. BBQ outperforms traditional approaches like Product Quantization (PQ) in indexing speed (20-30x less quantization time), query speed (2-5x faster queries), with no additional loss in accuracy.
In this blog, we will explore BBQ in Lucene and Elasticsearch, focusing on recall, efficient bitwise operations, and optimized storage for fast, accurate vector search.
Note, there are differences in this implementation than the one proposed by the original RaBitQ authors. Mainly:
- Only a single centroid is used for simple integration with HNSW and faster indexing
- Because we don't randomly rotate the codebook we do not have the property that the estimator is unbiased over multiple invocations of the algorithm
- Rescoring is not dependent on the estimated quantization error
- Rescoring is not completed during graph index search and is instead reserved only after initial estimated vectors are calculated
- Dot product is fully implemented and supported. The original authors focused on Euclidean distance only. While support for dot product was hinted at, it was not fully considered, implemented, nor measured. Additionally, we support max-inner product, where the vector magnitude is important, so simple normalization just won't suffice.
Tags: bbq rabitq quantization vectors search llms lucene elasticsearch compression