Skip to content

Archives

skyfirehose.com

  • skyfirehose.com

    “Query the Bluesky Jetstream with DuckDB” — this is a lovely little hack from Tobias Müller (https://bsky.app/profile/tobilg.com). Basically, it’s a pre-built DuckDB database file which contains tables which refer to Parquet files in an R2 bucket, which are (presumably) updated regularly with new Bluesky posts from their Jetstream. Tobias says: “there‘s a data gathering process that listens to the Jetstream and dumps the NDJSONs to the filesystem as hourly files. Then, DuckDB transform the data to Parquet files, they get uploaded with rclone.” It’s a lovely demo of how modern data lake tech can be exposed for public usage in a nice way.

    (tags: s3 parquet duckdb sql jetstream bluesky firehose data-lakes r2)