Justin's Linklog

Faster BAM Sorting with SAMtools and RocksDB

Now this is really really clever. Heap-merging a heavyweight genomics format, using RocksDB to speed it up.
There’s a problem with the single-pass merge described above when the number of intermediate files, N/R, is large. Merging the sorted intermediate files in limited memory requires constantly reading little bits from all those files, incurring a lot of disk seeks on rotating drives. In fact, at some point, samtools sort performance becomes effectively bound to disk seeking. [...] In this scenario, samtools rocksort can sort the same data in much less time, using no more memory, by invoking RocksDB’s background compaction capabilities. With a few extra lines of code we configure RocksDB so that, while we’re still in the process of loading the BAM data, it runs additional background threads to merge batches of existing sorted temporary files into fewer, larger, sorted files. Just like the final merge, each background compaction requires only a modest amount of working memory.
(via the RocksDB facebook group)

(tags: rocksdb algorithms sorting leveldb bam samtools merging heaps compaction)
Coding For Life (Battery Life, That Is)

great presentation on Android mobile battery life, and what to avoid

(tags: presentations via:sergio android mobile battery battery-life 3g wifi gprs hardware)
Oisin's mobile app release checklist

'This form is to document the testing that has been done on each app version before submitting to the App Store. For each item, indicate Yes if the testing has been done, Not Applicable if the testing does not apply (eg testing audio for an app that doesn’t play any), or No if the testing has not been done for another reason.'

(tags: apps checklists release coding ios android mobile ohurley)
"A New Data Structure For Cumulative Frequency Tables"

paper by Peter M Fenwick, 1993. 'A new method (the ‘binary indexed tree’) is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression. It is based on a decomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element (or symbol). The operations to traverse the data structure are based on the binary coding of the index. In comparison with previous methods, the binary indexed tree is faster, using more compact data and simpler code. The access time for all operations is either constant or proportional to the logarithm of the table size. In conjunction with the compact data structure, this makes the new method particularly suitable for large symbol alphabets.' via Jakob Buchgraber, who's implementing it right now in Netty ;)

(tags: netty frequency-tables data-structures algorithms coding binary-tree indexing compression symbol-alphabets)

Archives

Links for 2014-05-02