Skip to content

Archives

Cost-optimized archival in S3 using s3tar

  • Cost-optimized archival in S3 using s3tar

    “s3tar” is new to me, and looks like a perfect tool for this common use-case — aggregation and archival of existing data on S3, which often requires aggregation into large file sizes to take advantage of S3 Glacier storage classes (which have a minimum file size of 128Kb).

    s3tar optimizes for cost and performance on the steps involved in downloading the objects, aggregating them into a tar, and putting the final tar in a specified Amazon S3 storage class using a configurable “–concat-in-memory” flag. … The tool also offers the flexibility to upload directly to a user’s preferred storage class or store the tar object in S3 Standard storage and seamlessly transition it to specific archival classes using S3 Lifecycle policies.

    The only downside of s3tar is that it doesn’t support recompression, which is also a common enough requirement — especially after aggregation of multiple small input files into a larger, more compressible archive. But hey, can’t have everything.

    s3tar: https://github.com/awslabs/amazon-s3-tar-tool

    Tags: s3tar amazon s3 compression storage archival architecture aggregation logs glacier via:lwia