Category: Uncategorized

kafkacat and visidata

Published March 4, 2025

kafkacat and visidata

Two excellent tools in one blog post.

Visidata "is a commandline tool to work with data in all sorts of formats, including from stdin"; in this example it's taking lines of JSONL and producing an instant histogram of values from the stream:

Once visidata is open, use the arrow keys to move to the column on which you want to build a histogram and press Shift-F. Since it works with pipes if you leave the -e off the kafkacat argument you get a live stream of messages from the Kafka topic and the visidata will continue to update as messages arrive (although I think you need to replot the histogram if you want it to refresh).

On top of that, there's kcat, "netcat for Kafka”, "a swiss-army knife of tools for inspecting and creating data in Kafka", even supporting on-the-fly decode of Avro messages. https://github.com/edenhill/kcat

Tags: kcat kafka streams visidata tools cli avro debugging

Answers for AWS Survey for 2025

Published March 3, 2025

Answers for AWS Survey for 2025

The most-used AWS services; mainly SNS, SQS, and everyone hates Jenkins

Tags: aws sqs sns architecture cloud-computing surveys

Ruff

Published March 3, 2025

Ruff

An extremely fast Python linter and code formatter, written in Rust.

Ruff aims to be orders of magnitude faster than alternative tools while integrating more functionality behind a single, common interface.

Ruff can be used to replace Flake8 (plus dozens of plugins), Black, isort, pydocstyle, pyupgrade, autoflake, and more, all while executing tens or hundreds of times faster than any individual tool.

Tags: formatting coding python tools lint code

Inside an Amazon CoE

Published March 3, 2025

Inside an Amazon CoE

This is a decent write-up of what Amazon's "Correction of Error" documents look like. CoEs are the standard format for writing up post-mortems of significant outages or customer-impacting incidents in Amazon and AWS; I've had the unpleasant duty of writing a couple myself -- thankfully for nothing too major.

This is fairly similar to what's being used elsewhere, but it's good to have an authoritative bookmark to refer to. (via LWIA)

Tags: via:lwia aws amazon post-mortems coe incidents ops process

18F’s shutdown page

Published March 3, 2025

18F's shutdown page

"We are dedicated to the American public and we're not done yet". legends!

For over 11 years, 18F has been proudly serving you to make government technology work better. We are non-partisan civil servants. 18F has worked on hundreds of projects, all designed to make government technology not just efficient but effective, and to save money for American taxpayers.

However, all employees at 18F – a group that the Trump Administration GSA Technology Transformation Services Director called "the gold standard" of civic tech – were terminated today at midnight ET.

Tags: policy government programming tech software politics 18f maga doge

DeepSeek’s smallpond

Published March 3, 2025

DeepSeek’s smallpond

Some interesting notes about smallpond, a new high-performance DuckDB-based distributed data lake query system from DeepSeek:

DeepSeek is introducing smallpond, a lightweight open-source framework, leveraging DuckDB to process terabyte-scale datasets in a distributed manner. Their benchmark states: “Sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB/min.”

The benchmark on 100TB mentioned is actually using the custom DeepSeek 3FS framework: Fire-Flyer File System is a high-performance distributed file system designed to address the challenges of AI training and inference workloads. [...] compared to AWS S3, 3FS is built for speed, not just storage. While S3 is a reliable and scalable object store, it comes with higher latency and eventual consistency [...] 3FS, on the other hand, is a high-performance distributed file system that leverages SSDs and RDMA networks to deliver low-latency, high-throughput storage. It supports random access to training data, efficient checkpointing, and strong consistency.

So -- this is very impressive. However!
1. RDMA (remote direct memory access) networking for a large-scale storage system! That is absolutely bananas. I wonder how much that benchmark cluster cost to run... still, this is a very interesting technology for massive-scale super-low-latency storage. https://www.definite.app/blog/smallpond also notes "3FS achieves a remarkable read throughput of 6.6 TiB/s on a 180-node cluster, which is significantly higher than many traditional distributed file systems."
2. it seems smallpond operates strictly with partition-level parallelism, so if your data isn't partitioned in exactly the right way, you may still find your query bottlenecked:
Smallpond’s distribution leverages Ray Core at the Python level, using partitions for scalability. Partitioning can be done manually, and Smallpond supports:
- Hash partitioning (based on column values);
- Even partitioning (by files or row counts);
- Random shuffle partitioning
As I understand it, Trino has a better idea of how to scale out queries across worker nodes even without careful pre-partitioning, which is handy.

Tags: data-lakes deepseek duckdb rdma networking 3fs smallpond trino ray

Buying a good laptop. Not a new laptop, a good one.

Published February 27, 2025

Buying a good laptop. Not a new laptop, a good one.

Love this. Advice on how to pick a really solid, basic, but good second-hand laptop -- tl;dr: "Buy a used business laptop. Apple or PC. Try typing on it first."

Tags: laptops shopping secondhand hardware tips

Using dtrace on MacOS with SIP enabled

Published February 27, 2025

Using dtrace on MacOS with SIP enabled

"On all current MacOS versions (Catalina 10.15.x, Big Sur 11.x) System Integrity Protection (SIP) is enabled by default and prevents most uses of dtrace and other tools and scripts based on it (i.e. dtruss)."

Wow this is really complicated. Nice work, Apple (via Tony Finch)

Tags: macos mac debugging osx via:fanf dtrace tracing sip

The Anti-Capitalist Software License

Published February 27, 2025

The Anti-Capitalist Software License

Here it is in full:

ANTI-CAPITALIST SOFTWARE LICENSE (v 1.4)

This is anti-capitalist software, released for free use by individuals and organizations that do not operate by capitalist principles.

Permission is hereby granted, free of charge, to any person or organization (the "User") obtaining a copy of this software and associated documentation files (the "Software"), to use, copy, modify, merge, distribute, and/or sell copies of the Software, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or modified versions of the Software.
The User is one of the following: a. An individual person, laboring for themselves b. A non-profit organization c. An educational institution d. An organization that seeks shared profit for all of its members, and allows non-members to set the cost of their labor
If the User is an organization with owners, then all owners are workers and all workers are owners with equal equity and/or equal vote.
If the User is an organization, then the User is not law enforcement or military, or working for or under either.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

This is fun because it would make esr's head explode.

Mark Butcher on AWS sustainability claims

Published February 27, 2025

Mark Butcher on AWS sustainability claims

Sustainable IT expert lays into AWS:

3 years after shouting about Amazons total lack of transparency with sustainability reporting, here's a list of what I think they've achieved:

1) They let you export a CSV for 3 lines of numbers showing your last months made up numbers that are up to 99% too low

2) Urmmm.... that's about it

[....] I know of several very large enterprise orgs starting to proactively marginalise them (i.e. not move away 100%, but massively reducing consumption). The one's I know about will cost them around $1 billion of spend. Is that enough to make them pay attention?

This article from Canalys in the Register says "Amazon doesn't provide AWS-specific, location-based data, meaning: "We don't really know how big AWS's footprint truly is, which I think is a bit worrying."

They follow up with "Amazon has chosen not break out data on environmental stats such as greenhouse gas emissions for AWS from the rest of the company in its sustainability reports, making it almost impossible to determine whether these emissions are growing as they have been for its cloud rivals."

Interesting isn't it... if they were actually as sustainable as they pretend, you'd expect them to share open and honest numbers, instead what we get are marketing puff pieces making what seem like invented PUE claims backed by zero evidence.

Elsewhere he notes "AWS customers are still unable to natively measure actual power consumption, report on actual carbon emissions, report on water usage. This'll make life interesting for all those AI companies subject to legislation like the EU AI Act or needing to report to the EED and similar."

(Via ClimateAction.tech)

Tags: climate-change aws sustainability pue reporting amazon cloud datacenters emissions

Europe begins to worry about US-controlled clouds

Published February 26, 2025

Europe begins to worry about US-controlled clouds

Interview with Bert Hubert about this major supply chain issue for EU governments:

The Register: In the US, the argument against China supplying network hardware [was] based on the concern that the Chinese government can just order China-based vendors to insert a backdoor. It sounds like you're saying that, essentially, an analogous situation exists in the US now.

Hubert: Yeah, exactly. And that has been the case for a while. I mean, this is not an entirely new realization. The thing that is making it so interesting right now is that we are on the brink of [going all-in on Microsoft's cloud].

The Dutch government is sort of just typical, so I mention it because I am Dutch, but they're very representative of European governments right now. And they were heading to a situation where there was no email except Microsoft, which means that if one ministry wants to email the other ministry, they have to pass it by US servers.

Which leads to the odd situation that if the Dutch Ministry of Finance wants to send a secret to the Dutch National Bank, they'd have to send someone over with a typewriter to make it happen because [the communications channel has been outsourced].

There's nothing left that we do not share with the US.

Tags: supply-chains clouds eu us politics geopolitics backdoors infosec security europe

subtrace

Published February 26, 2025

subtrace

Subtrace is "Wireshark for your Docker containers. It lets developers see all incoming and outgoing requests in their backend server so that they can resolve production issues faster."
- Works out-of-the-box
- No code changes needed
- Supports all languages (Python + Node + Go + everything else)
- See full payload, headers, status code, and latency
- Less than 100µs performance overhead
- Built on Clickhouse
- Open source
Looks like it outputs to the Chrome Dev Console's Network tab, or a facsimile of it; "Open the subt.link URL in your browser to watch a live stream of your backend server’s network logs".

It may be interesting to try this out. (via LWIA)

Tags: subtrace tracing wireshark debugging docker containers ops clickhouse open-source tools tcpdump

Netflix/hollow

Published February 25, 2025

Netflix/hollow

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow focuses narrowly on its prescribed problem set: keeping an entire, read-only dataset in-memory on consumers. It circumvents the consequences of updating and evicting data from a partial cache.

Due to its performance characteristics, Hollow shifts the scale in terms of appropriate dataset sizes for an in-memory solution. Datasets for which such liberation may never previously have been considered can be candidates for Hollow. For example, Hollow may be entirely appropriate for datasets which, if represented with json or XML, might require in excess of 100GB.

Interesting approach, though possibly a bit scary in terms of circumventing the "keep things simple and boring" rule... still, a useful tool to have.

Tags: cache caching netflix java jvm memory hollow read-only architecture systems

Yahoo Mail hallucinates subject lines

Published February 24, 2025

Yahoo Mail hallucinates subject lines

OMG, this is hilarious. What a disaster from Yahoo Mail:

A quick Google search revealed that a few months ago Yahoo jumped on the AI craze with the launch of ”AI-generated, one-line email summaries”. At this point, the penny dropped. Just like Apple AI generating fake news summaries, Yahoo AI was hallucinating the fake winner messages, presumably as a result of training their model on our old emails. Worse, they were putting an untrustworthy AI summary in the exact place that users expect to see an email subject, with no mention of it being AI-generated ?

Tags: ai llms hallucinations yahoo email gen-ai

write hedging in Amazon DynamoDB

Published February 24, 2025

write hedging in Amazon DynamoDB

"Write hedging" is a nice technique to address p99 tail latencies, by increasing the volume of writes (or in the case of read hedging, reads):

Imagine you want a very low p99 read latency. One way to lower tail latencies is to hedge requests. You make a read request and then, if the response doesn’t come back quickly enough, make a second equivalent hedging request and let the two race. First response wins. If the first request suffered a dropped network packet, the second request will probably win. If things are just temporarily slow somewhere, the first request will probably win. Either way, hedging helps improve the p99 metrics, at the cost of some extra read requests.

Write hedging has a little more complexity involved, since you want to avoid accidental overwrites during races; this blog post goes into some detail on a technique to do this in DynamoDB, using timestamps. Good stuff.

(via Last Week In AWS)

Tags: via:lwia aws dynamodb write-hedging read-hedging p99 latencies tail-latencies optimization performance algorithms

tsdproxy

Published February 24, 2025

tsdproxy

I'm pretty happy with my current setup for the home network, but this is one I'll keep in the back pocket for future possible use:

[Tailscale Docker Proxy] simplifies the process of securely exposing services and Docker containers to your Tailscale network by automatically creating Tailscale machines for each tagged container. This allows services to be accessible via unique, secure URLs without the need for complex configurations or additional Tailscale containers.

Tags: docker tailscale containers home networking

3DBenchy Enters the Public Domain

Published February 14, 2025

3DBenchy Enters the Public Domain

"3DBenchy, a 3D model [of an adorable little boat] designed specifically for testing and benchmarking 3D printers, is now in the public domain."

Originally released on April 9, 2015, by Creative Tools, the model has become a beloved icon of the 3D printing community. [...] NTI has decided to release 3DBenchy to the world by making it public domain, marking its 10th anniversary with this significant gesture.

Mark your calendars for April 9, 2025, as 3DBenchy celebrates its 10th anniversary! A special surprise is planned for the 3DBenchy community to commemorate this milestone.

(Via Alan Butler)

Tags: 3dbenchy 3d-printing via:alan-butler ip public-domain creative-commons

A Visual Guide to Vèvè

Published February 14, 2025

A Visual Guide to Vèvè

These are very cool -- drawings of the vèvè, the symbology used in Haitian Vodou.

"Vodou, a [Haitian] spiritual and cultural practice that has long intrigued people from around the world, is a fascinating blend of African, Native American, and European beliefs and traditions. It’s a rich tapestry woven from the beliefs and experiences of enslaved peoples brought to the Caribbean and the Americas, and it has been shaped and evolved over centuries to become what it is today."

(via Minor Mobius, https://bsky.app/profile/minormobius.bsky.social/post/3lhzvuovycs2a )

Tags: via:minormobius sigils veve haiti caribbean religion art graphics signs symbology

Monzo Stand-in

Published February 14, 2025

Monzo Stand-in

This is great -- Monzo built "Monzo Stand-in", a full "backup" of the main stack, since uptime is critical to them:

We take reliability seriously at Monzo so we built a completely separate backup banking infrastructure called Monzo Stand-in to add another layer of defence so customers can continue to use important services provided by us. We consider Monzo Stand-in to be a backup of last resort, not our primary mechanism of providing a reliable service to our customers, by providing us with an extra line of defence.

Monzo Stand-in is an independent set of systems that run on Google Cloud Platform (GCP) and is able to take over from our Primary Platform, which runs in Amazon Web Services (AWS), in the event of a major incident. It supports the most important features of Monzo like spending on cards, withdrawing cash, sending and receiving bank transfers, checking account balances and transactions, and freezing or unfreezing cards.

Flashback to the old Pimms setup in AWS Network Monitoring; we had an entire duplicate stack in AWS -- every single piece duplicated and running independently.

Tags: architecture uptime monzo banking reliability ops via:itc

Random Numbers at 200 Gbit/s

Published February 13, 2025

Random Numbers at 200 Gbit/s

Very cool trick from Tony Finch; using the PCG random number generator, AVX or NEON vector instructions on modern CPUs allow generation of multiple RNG states at once, in parallel

Tags: rngs avx neon vector-instructions cpu parallelism pcg random randomness hacks

Language Models Do Addition Using Helices

Published February 13, 2025

Language Models Do Addition Using Helices

wtf:

Mathematical reasoning is an increasingly important indicator of large language model (LLM) capabilities, yet we lack understanding of how LLMs process even simple mathematical tasks. To address this, we reverse engineer how three mid-sized LLMs compute addition. We first discover that numbers are represented in these LLMs as a generalized helix, which is strongly causally implicated for the tasks of addition and subtraction, and is also causally relevant for integer division, multiplication, and modular arithmetic. We then propose that LLMs compute addition by manipulating this generalized helix using the "Clock" algorithm: to solve a+b, the helices for a and b are manipulated to produce the a+b answer helix which is then read out to model logits. We model influential MLP outputs, attention head outputs, and even individual neuron preactivations with these helices and verify our understanding with causal interventions. By demonstrating that LLMs represent numbers on a helix and manipulate this helix to perform addition, we present the first representation-level explanation of an LLM's mathematical capability.

Tags: llms helices trigonometry magic weird ai papers arithmetic addition subtraction

Critical Ignoring as a Core Competence for Digital Citizens

Published February 13, 2025

Critical Ignoring as a Core Competence for Digital Citizens

"Critical ignoring" as a strategy to control and immunize one's information environment (Kozyreva et al., 2023):
Low-quality and misleading information online can hijack people’s attention, often by evoking curiosity, outrage, or anger. Resisting certain types of information and actors online requires people to adopt new mental habits that help them avoid being tempted by attention-grabbing and potentially harmful content.

We argue that digital information literacy must include the competence of critical ignoring—choosing what to ignore and where to invest one’s limited attentional capacities. We review three types of cognitive strategies for implementing critical ignoring:
- self-nudging, in which one ignores temptations by removing them from one’s digital environments;
- lateral reading, in which one vets information by leaving the source and verifying its credibility elsewhere online;
- and the do-not-feed-the-trolls heuristic, which advises one to not reward malicious actors with attention.
We argue that these strategies implementing critical ignoring should be part of school curricula on digital information literacy.
Good to give names to these practices, since we're all having to do them nowadays anyway...

(Via Stan Carey)

Tags: psychology trolls media kids internet literacy attention critical-ignoring ignoring papers via:stancarey

CarbonRunner

Published February 13, 2025

CarbonRunner

"Carbon-aware infrastructure to optimize your CI/CD workflows" -- "A multi-cloud CI/CD Github Actions Runner that shifts your workflows to the lowest CO2 regions. 90% Greener. 25% Cheaper. 1 line of code. Zero Effort. ?"

(Via Dryden Williams)

Tags: green sustainability carbon github ci cd workflows development via:climateactiontech

LinuxPDF

Published February 13, 2025

LinuxPDF

It's Linux, running inside a PDF file.

"The humble PDF file format supports JavaScript – with a limited standard library, mind you. By leveraging this, [vk6] managed to compile a RISC-V emulator (TinyEMU) into JavaScript using an old version of Emscripten targeting asm.js instead of WebAssembly. The emulator, embedded within the PDF, interfaces with virtual input through a keyboard and text box."

(via Fuzzix)

Tags: via:fuzzix linux pdf hacks emulation javascript emscripten tinyemu

Undergraduate Upends a 40-Year-Old Data Science Conjecture

Published February 11, 2025

Undergraduate Upends a 40-Year-Old Data Science Conjecture

This is a great story; bonus that it's a notable improvement for the humble hash-table data structure:

Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table — one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)^2 — far faster than x. This result directly contradicted Yao’s conjecture. Farach-Colton and Kuszmaul helped Krapivin show that (log x)^2 is the optimal, unbeatable bound for the popular class of hash tables Yao had written about.

Paper here -- https://arxiv.org/abs/2501.02305 .

Tags: data-structures hash-tables cs programming coding papers optimization open-addressing

PleIAs/common_corpus

Published February 11, 2025

PleIAs/common_corpus

This is great to see:

Common Corpus is the largest open and permissible licensed text dataset, comprising 2 trillion tokens (1,998,647,168,282 tokens). It is a diverse dataset, consisting of books, newspapers, scientific articles, government and legal documents, code, and more. Common Corpus has been created by Pleias in association with several partners and contributed in-kind to Current AI initiative.

The dataset in its entirety meets the requirements of the Code of Conduct of the AI Act and goes further than the current requirements for data transparency. It aims to set a new standard of openness in AI, showing that detailed provenance at a granular document level is a realistic objective, even at the scale of 2 trillion tokens.

Tags: ai llms open-data open-source pleias common-corpus corpora training ai-act

Government agency removes spoon emoji from work platform amid protests

Published February 10, 2025

Government agency removes spoon emoji from work platform amid protests

lol. "On Wednesday, employees at the Technology Transformation Services division of the [U.S. government’s General Services Administration] reportedly unleashed a torrent of spoon emojis in the chat that accompanied an organization-wide, 600-person video conference with new leader Thomas Shedd, a former Tesla engineer." ... Workers embraced the digital cutlery to protest the Trump administration’s “Fork in the Road” resignation offer."

Tags: forks spoons funny protest us-politics emojis

Are better models better?

Published February 10, 2025

Are better models better?

This is very interesting, on the applicability and usefulness of generative AI, given their inherent error rate and probabilistic operation:

Asking if an LLM can do very specific and precise information retrieval might be like asking if an Apple II can match the uptime of a mainframe, or asking if you can build Photoshop inside Netscape. No, they can’t really do that, but that’s not the point and doesn’t mean they’re useless. They do something else, and that ‘something else’ matters more and pulls in all of the investment, innovation and company creation. Maybe, 20 years later, they can do the old thing too - maybe you can run a bank on PCs and build graphics software in a browser, eventually - but that’s not what matters at the beginning. They unlock something else.

What is that ‘something else’ for generative AI, though? How do you think conceptually about places where that error rate is a feature, not a bug?

(Via James Tindall)

Tags: errors probabilistic computing ai genai llms via:james-tindall

Woof.group vs the OSA

Published February 10, 2025

Woof.group vs the OSA

The UK's new Online Safety Act law is extremely vague, extremely punitive, and has Fediverse operators Woof.group very worried --

Ofcom carefully avoided answering almost all of our questions. They declined to say whether ~185 users was a “significant number”. Several other participants in Ofcom's livestreams also asked what a significant number meant. Every time, Ofcom responded obliquely: there are no numeric thresholds, a significant number could be “small”, Ofcom could target “a one-man band”, and providers are expected to have a robust justification for deciding they do not have a significant number of UK users. It is unclear how anyone could make a robust justification given this nebulous guidance. In their letter, Ofcom also declined to say whether non-commercial services have target markets, or whether pornography poses a “material risk of significant harm”. In short, we have no answer as to whether Woof.group or other Fediverse instances are likely to fall in scope of the OSA.

Do we block pre-emptively, or if and when Ofcom asks? This is the ethical question Woof.group's team, like other community forums, have been wrestling with. Ofcom would certainly like sites to take action immediately. As Hoskings warned:

"Don't wait until it's too late. That's the message. Once you do get the breach letter, that is when it is too late. The time doesn't start ticking from then. The time is ticking from—for part five services, from January, part three from July."

Tags: woof.group fediverse mastodon social-media uk osa laws ofcom porn blocking

Building Materials Price Tracker

Published February 7, 2025

Building Materials Price Tracker

Graphs tracking the cost of building materials in Ireland; turns out these are a prime driver of construction costs here, so this is good info to have when planning construction work...

Tags: building-materials construction costs ireland prices building

Apple Ordered by UK to Create Global iCloud Encryption Backdoor

Published February 7, 2025

Apple Ordered by UK to Create Global iCloud Encryption Backdoor

The British government has secretly demanded that Apple give it blanket access to all encrypted user content uploaded to the cloud, reports The Washington Post.

The spying order came by way of a "technical capability notice," a document sent to Apple by the Home Secretary, ordering it to provide access under the sweeping UK Investigatory Powers Act (IPA) of 2016. Critics have labeled the legislation the "Snooper's Charter," as it authorizes law enforcement to compel assistance from companies when needed to collect evidence.

Apple is likely to stop offering encrypted storage in the UK, rather than break the security promises it made to its users, people familiar with the matter told the publication. However, that would not affect the UK order for backdoor access to the service in other countries, including the United States. Apple has previously said it would consider pulling services such as FaceTime and iMessage from the UK rather than compromise future security.

(via gwire)

Tags: via:gwire apple encryption backups cloud ipa surveillance icloud backdoors security infosec

Within Bounds: Limiting AI’s environmental impact

Published February 6, 2025

Within Bounds: Limiting AI's environmental impact

A joint statement issued by the Green Screen Coalition, the Green Web Foundation, Beyond Fossil Fuels, Aspiration, and the critical infrastructure lab, regarding AI's impact on climate change:

To meet the challenge of climate change, environmental degradation, pollution and biodiversity loss, and its attendant injustices, we urge policymakers, industry leaders and all stakeholders to acknowledge the true environmental costs of AI, to phase out fossil fuels throughout the technology supply chain, to reject false solutions, and to dedicate all necessary means to bring AI systems in line with planetary boundaries. Meeting these demands is an essential step to ensure that AI is not driving further planetary degradation and could instead support a sustainable and equitable transition.

Their demands are:
- I. PHASE OUT FOSSIL FUELS
- II. COMPUTING WITHIN LIMITS
- III. RESPONSIBLE SUPPLY CHAINS
- IV. EQUITABLE PARTICIPATION
- V. TRANSPARENCY
Tags: via:climateaction climate climate-change ai fossil-fuels sustainability

ChaosSearch

Published February 5, 2025

ChaosSearch

"Live Search / ELK on the Lake":

Same ELK tools, but the scalability, cost effectiveness & durability of the lake, powered by ChaosSearch.

Recommended for log search by Corey Quinn, pricing looks reasonable too.

Tags: search elk kibana chaossearch logs data-lake ops via:cquinn

cur.vantage.sh

Published February 5, 2025

cur.vantage.sh

via Ben Schaechter: "a new microsite we’ve launched for the AWS community that helps with understanding billing codes present in either Cost Explorer or the CUR. We profiled the number of distinct billing codes across our customer base and have about ~60k unique billing codes. We hear all the time that FinOps practitioners and engineers are confused about the billing codes present in Cost Explorer or the Cost and Usage Report. Think of these as being things like “Requests-Tier1” for S3 or “CW:GMWI-Metrics” for CloudWatch. There is usually really limited resources for determining what these billing codes are even when you Google around for them."

Tags: aws billing codes cost-explorer ec2 s3 finops

Words from an ex-Zizian-adjacent person

Published February 4, 2025

Words from an ex-Zizian-adjacent person

It seems there's now a full-on Mansonesque death cult emerging from the LessWrong/rationalist/effective-altruism community: https://www.sfgate.com/bayarea/article/bay-area-death-cult-zizian-murders-20064333.php

This HN comment was very interesting for background:

[Former member of that world, roommates with one of Ziz's friends for a while, so I feel reasonably qualified to speak on this.] The problem with rationalists/EA as a group has never been the rationality, but the people practicing it and the cultural norms they endorse as a community.

As relevant here:

1) While following logical threads to their conclusions is a useful exercise, each logical step often involves some degree of rounding or unknown-unknowns. A -> B and B -> C means A -> C in a formal sense, but A -almostcertainly-> B and B -almostcertainly-> C does not mean A -almostcertainly-> C. Rationalists, by tending to overly formalist approaches, tend to lose the thread of the messiness of the real world and follow these lossy implications as though they are lossless. That leads to...

2) Precision errors in utility calculations that are numerically-unstable. Any small chance of harm times infinity equals infinity. This framing shows up a lot in the context of AI risk, but it works in other settings too: infinity times a speck of dust in your eye >>> 1 times murder, so murder is "justified" to prevent a speck of dust in the eye of eternity. When the thing you're trying to create is infinitely good or the thing you're trying to prevent is infinitely bad, anything is justified to bring it about/prevent it respectively.

3) Its leadership - or some of it, anyway - is extremely egotistical and borderline cult-like to begin with. I think even people who like e.g. Eliezer [Yudkowsky] would agree that he is not a humble man by any stretch of the imagination (the guy makes Neil deGrasse Tyson look like a monk). They have, in the past, responded to criticism with statements to the effect of "anyone who would criticize us for any reason is a bad person who is lying to cause us harm". That kind of framing can't help but get culty.

4) The nature of being a "freethinker" is that you're at the mercy of your own neural circuitry. If there is a feedback loop in your brain, you'll get stuck in it, because there's no external "drag" or forcing functions to pull you back to reality. That can lead you to be a genius who sees what others cannot. It can also lead you into schizophrenia really easily. So you've got a culty environment that is particularly susceptible to internally-consistent madness, and finally:

5) It's a bunch of very weird people who have nowhere else they feel at home. I totally get this. I'd never felt like I was in a room with people so like me, and ripping myself away from that world was not easy. (There's some folks down the thread wondering why trans people are overrepresented in this particular group: well, take your standard weird nerd, and then make two-thirds of the world hate your guts more than anything else, you might be pretty vulnerable to whoever will give you the time of day, too.)

TLDR: isolation, very strong in-group defenses, logical "doctrine" that is formally valid and leaks in hard-to-notice ways, apocalyptic utility-scale, and being a very appealing environment for the kind of person who goes super nuts -> pretty much perfect conditions for a cult. Or multiple cults, really. Ziz's group is only one of several.

Tags: zizians cults extropianism tescreal effective-altruism rationalism lesswrong death-cults

Burrows–Wheeler Transform

Published February 4, 2025

Burrows–Wheeler Transform

an algorithm used to prepare data for use with data compression techniques such as bzip2. It permutes the order of characters in a string (S), sorting all the circular shifts of the text in lexicographic order, then extracting the last column and the index of the original string in the set of sorted permutations of S.

Some day when I have lots of free time to spare, I'll spend a while getting my head around this deep magic, because it's just amazing that this works.

(via John Regehr)

Tags: compression algorithms burrows-wheeler-transform bzip2 via:john-regehr magic text

Irish spider zombies!

Published January 31, 2025

Irish spider zombies!

This is fantastic -- a newly-discovered species of fungus does the same trick as Ophiocordyceps in Brazil; it infects the brains of orb-weaving cave spiders in Ireland, and induces them to leave their lairs or webs, and migrate to die in an exposed situation, in order to favor dispersal of the fungal spores.

Ophiocordyceps is, of course, the inspiration for the zombie-forming fungus in The Last Of Us.

Tags: cordyceps fungi ireland spiders zombies fungus nature gross

The Billion Docs JSON Challenge: ClickHouse vs. MongoDB, Elasticsearch, and more

Published January 31, 2025

The Billion Docs JSON Challenge: ClickHouse vs. MongoDB, Elasticsearch, and more

This buries the lede somewhat, but here's the key bit:

We built a new powerful JSON data type for ClickHouse with true column-oriented storage, support for dynamically changing data structures without type unification and the ability to query individual JSON paths really fast. [...] ClickHouse stores the values of each unique JSON path as native columns, allowing high data compression and, as we are demonstrating in this blog, maintaining the same high query performance seen on classic types.

The performance results are very impressive, and notably also efficient in disk space usage.

Tags: clickhouse benchmarks performance json querying columnar-storage mongodb elasticsearch databases storage

ODROID-H4+

Published January 28, 2025

ODROID-H4+

The next generation of the excellent ODROID SBCs; based on Intel's N97 architecture, AVX2 extensions, faster DRAM, 4 SATA ports, and up to 48GB of RAM.

Significantly beefier in general, reportedly around the EUR180 mark in price.

Tags: odroid sbcs n97 hardware home devices servers

Coordinated Lunar Time

Published January 28, 2025

Coordinated Lunar Time

The moon may have a timezone of its own soon, Coordinated Lunar Time (LTC):

Due to the moon's lower gravity and its motion relative to Earth, moon time passes 56 microseconds faster each earth day. As a result, an atomic clock on Earth would run at a different rate than an atomic clock on the moon.

Similar to how UTC is determined, the memo suggests "an ensemble of clocks" deployed to the moon might be used to set the new time standard.

(via David Cuthbert)

Tags: via:david-cuthbert moon time timezones ltc

Understanding the BM25 full text search algorithm

Published January 28, 2025

Understanding the BM25 full text search algorithm

"BM25, or Best Match 25, is a widely used algorithm for full text search. It is the default in Lucene/Elasticsearch and SQLite, among others." At its heart, it's an interesting probabilistic ranking scheme, involving the Inverse Document Frequency of a term, term frequency in a single document, and the document length. (Via Tony Finch)

Tags: via:fanf lucene elasticsearch search text algorithms sqlite full-text bm25

git worktrees

Published January 28, 2025

git worktrees

This is a pretty nifty feature I was unaware of; git now has built-in support for "worktrees", multiple parallel checkouts of the same git repo in side-by-side directories. (via Last Week in AWS)

Tags: via:lwia git worktrees checkouts coding version-control unix

LLM-Driven Code Completion in JetBrains IDEs

Published January 27, 2025

LLM-Driven Code Completion in JetBrains IDEs

JetBrains have come up with a new relatively-lightweight LLM-driven code generation option, constrained to producing single line suggestions:

The length of the completion suggestions is a trade-off. While longer suggestions do tend to reduce how many keystrokes you have to make, which is good, they also increase the number of reviews required on your end. Taking the above into account, we decided that completing a single line of code would be a fair compromise.

Some key features:
- It works locally and is available offline. This means you can take advantage of the feature even if you aren’t connected to the internet.
- It doesn’t send any data from your machine over the internet. The language models that power full line code completion run locally, which is great for two reasons. First, your code remains safe, as it never leaves your machine. Second, there are no additional cloud-related expenses – that’s why this feature comes at no additional cost.
Also, customer code is never used for training.

I've used this (in RubyMine), and found it fairly useful; it's good for generating the obvious next line, but is easily ignored when that's not what's needed. Not bad at all.

Tags: coding code-completion jetbrains ides java ruby llms ai code-generation rubymine intellij

VIC 20 Elite

Published January 27, 2025

VIC 20 Elite

Crazy stuff. Elite, ported to the Commodore VIC 20 (albeit with a 32K expansion):

VIC 20 Elite is based on the C-64 source. VIC 20 specific graphics, text, keyboard & joystick input, and sound routines were written from scratch to replace the corresponding C-64 code.

Of course, the complete enhanced Elite won’t fit within the VIC 20’s limited memory, so some features had to be left out. Following the original 1984 BBC Cassette and Acorn Electron version, the VIC 20 version omits extended planet descriptions, planetary details (craters and meridians), and the missions that appear further on in the game. The pause mode options are dropped, and there is no Find Planet option in Galactic Chart (that would be only really useful during missions).

(via Sleepy from FP)

Tags: retrogaming commodore emulation gaming history elite vic-20

goref

Published January 27, 2025

goref

"a Go heap object reference analysis tool based on delve: It can display the space and object count distribution of Go memory references, which is helpful for efficiently locating memory leak issues or viewing persistent heap objects to optimize the garbage collector (GC) overhead."

Nice to see Go supporting similar debugging/optimisation tools to those offered by the JVM.

Tags: go heap memory gc memory-leaks

Artsy’s Technology Choices evaluation process

Published January 24, 2025

Artsy's Technology Choices evaluation process

This is a nice way to evaluate new technology options, from Artsy:
We want to accomplish a lot with a lean team, which means we must choose stable technologies. However, we also want to adopt best-of-breed technologies or best-suited tools, which may need work or still be evolving. We've borrowed from ThoughtWorks' Radar to define the following stages for evaluating, adopting, and retiring technologies:
- Adopt: Reasonable defaults for most work. These choices have been exercised successfully in production at Artsy and there is a critical mass of engineers comfortable working with them.
- Trial: These technologies are being evaluated in limited production circumstances. We don't have enough production experience to recommend them for high-risk or business-critical use cases, but they may be worth consideration if your project seems like a fit.
- Assess: Technologies we are interested in and maybe even built proofs-of-concept for, but haven't yet trialed in production.
- Hold: Based on our experience, these technologies should be avoided. We've found them to be flawed, immature, or simply supplanted by better alternatives. In some cases these remain in legacy production uses, but we should take every opportunity to retire or migrate away.
(Via Lar Van Der Jagt on the Last Week In AWS slack instance)

Tags: via:lwia tech technology radar choices evaluation process architecture planning tools

API Error Design

Published January 23, 2025

API Error Design

Some good thoughts from a SlateDB dev, regarding initial principles for errors in SlateDB, derived from experience with Kafka:
- Keep public errors separate from internal errors. The set of public errors should be kept minimal and new errors should be highly scrutinized. For internal errors, we can go to town since they can be refactored and consolidated over time without affecting the user.
- Public errors should be prescriptive. Can an operation be retried? Is the database left in an inconsistent state? Can a transaction be aborted? What should the user actually do when the error is encountered? The error should have clear guidance.
- Prefer coarse error types with rich error messages. There are probably hundreds of cases where the database can enter an invalid state. We don't need a separate type for each of them. We can use a single FatalError and pack as much information into the error message as is necessary to diagnose the root cause.
(via Chris Riccomini)

Tags: errors api design slatedb api-design error-handling exceptions architecture

7 Lessons from building a small-scale AI application

Published January 23, 2025

7 Lessons from building a small-scale AI application

These are good. tl;dr:
- AI programming is stochastic;
- Data quality is real work;
- Models are only as good as the evaluation;
- Trust/Quality is the #1 issue;
- Your training pipeline is your core IP;
- AI is yet another distributed system;
- Don’t buy the AI library hype
via Niall Murphy.

Tags: llms ai ml training via:niallmurphy models

Optimizing Java Apps on Kubernetes

Published January 23, 2025

Optimizing Java Apps on Kubernetes

"Optimizing Java Applications on Kubernetes: beyond the Basics": Bruno Borges, at the InfoQ Dev Summit Boston, discusses the strategies for enhancing Java application performance on Kubernetes, focusing on leveraging JVM ergonomics, and managing garbage collection processes. Some interesting tips here.

Tags: kubernetes java eks resources ops scaling scalability gc optimization jvm

Block AI scrapers with Anubis

Published January 22, 2025

Block AI scrapers with Anubis

Bookmarking this in case I have to use it; I have a blog-related use case that I don't want LLM scrapers to kill my blog with.

Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.

The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing "Mozilla". Nearly all AI scrapers (and browsers) use a User-Agent string that includes "Mozilla" in it. This means that Anubis is able to block nearly all AI scrapers without any configuration.

Tags: throttling robots scraping ops llms bots hashcash tarpits

Ask HN: Is anyone doing anything cool with tiny language models?

Published January 22, 2025

Ask HN: Is anyone doing anything cool with tiny language models?

For some reason, I'm slightly more biased towards tiny, self-hosted, run-on-CPU LMs. As long as high accuracy isn't in the criteria, some of these use cases are pretty nifty

Tags: local ai llms ollama via:hn

Cost-optimized archival in S3 using s3tar

Published January 22, 2025

Cost-optimized archival in S3 using s3tar

"s3tar" is new to me, and looks like a perfect tool for this common use-case -- aggregation and archival of existing data on S3, which often requires aggregation into large file sizes to take advantage of S3 Glacier storage classes (which have a minimum file size of 128Kb).

s3tar optimizes for cost and performance on the steps involved in downloading the objects, aggregating them into a tar, and putting the final tar in a specified Amazon S3 storage class using a configurable “–concat-in-memory” flag. ... The tool also offers the flexibility to upload directly to a user’s preferred storage class or store the tar object in S3 Standard storage and seamlessly transition it to specific archival classes using S3 Lifecycle policies.

The only downside of s3tar is that it doesn't support recompression, which is also a common enough requirement -- especially after aggregation of multiple small input files into a larger, more compressible archive. But hey, can't have everything.

s3tar: https://github.com/awslabs/amazon-s3-tar-tool

Tags: s3tar amazon s3 compression storage archival architecture aggregation logs glacier via:lwia

If Not React, Then What?

Published January 22, 2025

If Not React, Then What?

It's great to see pushback against React, Angular, and other SPA architectures for web app delivery. I never got my head around the applicability of these for many web app use cases so this is just confirming my biases :)

Related Mastodon thread: https://toot.cafe/@slightlyoff/113868445222841008

Tags: react angular spa web-apps webdev javascript html apps

Long-running EKS bug

Published January 22, 2025

Long-running EKS bug

Since 2019 (!), the AWS load balancer controller component doesn't safely handle pod shutdowns when the ALB target-type is set to ip. This is the bug report, still open...

Tags: aws load-balancing alb eks kubernetes ops bugs

Cryptocurrency “market caps” and notional value

Published January 20, 2025

Cryptocurrency "market caps" and notional value

Excellent explainer from Molly White, which explains the risk around quoting "market caps" for memecoins:

The “market cap” measurement has become ubiquitous within and outside of crypto, and it is almost always taken at face value. Thoughtful readers might see such headlines and ask questions like “how did a ‘$2 trillion market’ tumble without impacting traditional finance?”, but I suspect most accept the number.

When crypto projects are hacked, there are headlines about hackers stealing “$166 million worth” of tokens, when in reality the hackers only could cash out 2% of that amount (around $3 million) because their attempts to sell illiquid tokens caused the price to crash.

Tags: molly-white memecoins bitcoin rug-pulls scams liquidity market-caps cryptocurrency

isd

Published January 20, 2025

isd

a curses UI for systemd, via Nelson

Tags: systemd ubuntu sysadmin linux ops curses ui tui

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

Published January 20, 2025

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch

A discussion of the popular byte pair encoding (BPE) tokenization algorithm, which is used in large language models like GPT-2 to GPT-4, Llama 3, etc. to tokenize text. The BPE algorithm was originally described in 1994: “A New Algorithm for Data Compression” by Philip Gage.

Tags: encoding text bpe llms algorithms tokenization parsing

Hollo

Published January 17, 2025

Hollo

"A federated microblogging software for single users. ActivityPub-enabled, Mastodon-compatible API, supports CommonMark and Misskey-style quotes. Hollo is designed for single-users, so you can own your instance and have full control over your data. It’s perfect for personal microblogs, notes, and journals."

Seems fairly heavyweight, however, so I probably won't be running it, but it's a nice take on the single-user-server Fediverse use case.

Tags: fediverse mastodon hollo apps social-media blogging

GTFS-Realtime API

Published January 14, 2025

GTFS-Realtime API

The Irish National Transport Authority have an open data API for realtime public transport information; very cool. "The GTFS-R API contains real-time updates for services provided by Dublin Bus, Bus Éireann, and Go-Ahead Ireland."

The specification currently supports the following types of information:

Trip updates - delays, cancellations, changed routes; Service alerts - stop moved, unforeseen events affecting a station, route or the entire network; Vehicle positions - information about the vehicles including location and congestion level

Registration is required.

Tags: public-transport buses trains transit nta gtfs apis open-data dublin ireland

Five things privacy experts know about AI

Published January 14, 2025

Five things privacy experts know about AI

Damien Desfontaines writes some really interesting stuff about Differential Privacy in AI training, and how bad the current situation is with large language models

Tags: llms ai differential-privacy damien-desfontaines privacy training anonymisation memorization

Why the British government is so into AI

Published January 14, 2025

Why the British government is so into AI

Interesting BlueSky thread on the topic --

The UK Government believes several things:

1) The AI genie is out of the bottle and cannot be put back in

2) Embracing AI would definitely be good for the British economy

3) Enforcing copyright on AI training would put Britain out of step with rest of the world and subsequently...

4) Enforcing copyright would be ineffective as AI would just be trained elsewhere, cutting out Brit creatives entirely

5) Govt's preferred option is permissive enough to be attractive to AI firms but demands transparency so at least rights holders have some recourse; the alternative is bleaker.

Obviously, I contest all of these beliefs to one degree or another, but this is where the govt is, and it's useful to understand that. The real crux of the debate, as they see it, is how Britain's laws can practically deal with the global inevitability of AI. They believe it's untenable to make Britain a legislative pariah state for AI, and that this would not lead to good outcomes for British creatives anyway. This is a point worth considering when replying to the consultation.

However, the govt says it's not going to implement policy before it has a technical solution for rights holders to opt-out and chase down infringements. My view is that this is difficult to the point of being pure fantasy, and either means that the govt is not serious about finding a real, effective technical solution, or this policy will be kicked indefinitely down the road. My dinner partner was optimistic a solution could be achieved within the timespan of a year or two. I just don't buy it.

Government says it has not sided with AI firms over creative industries. However, its understanding of "not taking a side" creates a false equality between massive companies whose business relies on crime and individuals whose livelihoods will be destroyed.

I got the sense that there is no political will whatsoever to seriously challenge firms who offer to spend big in Britain, and that any thought of holding them to account for actual crime is simply considered naive. But we do have a bit of time while govt attempts to confect their magical, easy to use, opt-out solution—time during which one or several of these AI firms might implode, making the true cost more apparent.

Tags: uk government ai policy copyright ip britain economy future

The people should own the town square

Published January 13, 2025

The people should own the town square

Ah, this is welcome news from Mastodon:

We are going to transfer ownership of key Mastodon ecosystem and platform components to a new non-profit organization, affirming the intent that Mastodon should not be owned or controlled by a single individual. [...] Taking the first tentative steps almost a year ago, there are already multiple organizations involved with shepherding the Mastodon code and platform. The next 6 months will see the transformation of the Mastodon structures, shifting away from the early days’ single-person ownership and enshrining the envisioned independence in a dedicated European not-for-profit entity.

Tags: mastodon social-media open-source fediverse

Grafana and ClickHouse

Published January 12, 2025

Grafana and ClickHouse

As a modern option for observability through service metrics, ClickHouse seems to be decent as a self-hosted option, integrating with Grafana as described here and collecting data from OpenTelemetry instrumentation in service code. (By many accounts, this avoids some not great design decisions made in Prometheus.) Bookmarking for reference...

Tags: telemetry metrics service-metrics clickhouse sql grafana observability opentelemetry

Watch Duty

Published January 12, 2025

Watch Duty

Nice to see an important public need being met here:

The [Watch Duty] app gives users the latest alerts about fires in their area [in California] and has become a vital service for millions of users in the western U.S. struggling with the seemingly constant threat of deadly wildfires—one major reason it had over 360,000 unique visits from 8:00-8:30 a.m. local time Wednesday. And the man behind Watch Duty promises that as a nonprofit, his organization has no plans to pull an OpenAI and become a profit-seeking enterprise.

Tags: non-profits tech watch-duty apps mobile public-good

Steve Jobs vs Ireland

Published January 12, 2025

Steve Jobs vs Ireland

this is a great Steve Jobs story, from the engineer who wrote v1 of the MacOS X Dock:

At one point during a trip over, Steve was talking to Bas and asked how things were coming along with the Dock. He replied something along the lines of “going well, the engineer is over from Ireland right now, etc”. Steve left, and then visited my manager’s manager’s manager and said the fateful words (as reported to me by people who were in the room where it happened).

“It has come to my attention that the engineer working on the Dock is in FUCKING IRELAND”.

I was told that I had to move to Cupertino. Immediately. Or else.

I did not wish to move to the States. I liked being in Europe. Ultimately, after much consideration, many late night conversations with my wife, and even buying a guide to moving, I said no.

They said ok then. We’ll just tell Steve you did move.

(via Niall Murphy)

Tags: macos america osx apple history steve-jobs

Light Bars for Zoom / Video Conference

Published January 10, 2025

Light Bars for Zoom / Video Conference

recommended by someone on ITC Slack; improves videoconference lighting nicely

Tags: video slack videoconferences lighting work

Court docs allege Meta trained LLM models using pirated book trove

Published January 10, 2025

Court docs allege Meta trained LLM models using pirated book trove

This is pretty massive:

The [court] document claims that Meta decided to download documents from Library Genesis -- aka. “LibGen” -- to train its models. LibGen is the subject of a lawsuit brought by textbook publishers who believe it happily hosts and distributes [pirated] works [....]

The filing from plaintiffs in the Kadrey case claims that documents produced by Meta [...] describe internal debate about accessing LibGen, a little squeamishness about using BitTorrent in the office to do so, and eventual escalation to “MZ” [Mark Zuckerberg himself], who approved use of the contentious resource. [...]

Another filing claims that a Meta document describes how it removed copyright notifications from material downloaded from LibGen, and suggests the company did so because it realized including such text could mean a model’s output would reveal it was trained on copyrighted material.

US District Court Judge Vince Chhabria also noted that in one of the documents Meta wants to seal, an employee wrote the following:

“If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

No shit.

Tags: piracy meta copyright mark-zuckerberg law llama training libgen books

alxndr42/co2-sensor

Published January 10, 2025

alxndr42/co2-sensor

Superb little low-cost CO2 sensor project; built on ESPHome, a 3D-printed case, and 17 euro in parts (a Wemos D1 mini board and a Sensirion SCD4x CO2 sensor module). Outputs via Prometheus and a local HTTP interface

Tags: wishlist gadgets co2-monitors co2 air todo hacks esphome

Bufferbloat Test

Published January 10, 2025

Bufferbloat Test

A handy tool to test your internet connection for "bufferbloat", the error condition involving "undesirable high latency caused by other traffic on your network. It happens when a flow uses more than its fair share of the bottleneck. Bufferbloat is the primary cause of bad performance for real-time Internet applications like VoIP calls, video games, and videoconferencing."

(My home internet connection is currently rating a C: "your latency increased considerably under load", jumping from a min/mean/p95/max of 10.7, 16.9, 23.7, 30.1ms to 35.3, 98.4, 121.0, 286.0ms under load, yikes, so looks like I need to do some optimising.)

Tags: bufferbloat internet networking optimisation performance testing tools

Waymos don’t stop for pedestrians

Published January 10, 2025

Waymos don't stop for pedestrians

Ah here.

"Waymo (aka Google) admits that it trains its robotaxis to break the law. When a Washington Post reporter finds robotaxis fail to stop for pedestrians in marked crosswalk 70% of the time, Waymo says it follows "social norms" rather than laws.

Expert explains: When robotaxis obey law, they don't go fast enough to compete successfully with Uber, so Google execs ordered engineers to ignore laws."

Tags: google waymo laws pedestrians safety crosswalks crossings road-safety self-driving-cars

Garbage Day on Meta’s moderation plans

Published January 10, 2025

Garbage Day on Meta's moderation plans

This is 100% spot on, I suspect, regarding Meta's recently-announced plans to give up on content moderation:

After 2021, the major tech platforms we’ve relied on since the 2010s could no longer pretend that they would ever be able to properly manage the amount of users, the amount of content, the amount of influence they “need” to exist at the size they “need” to exist at to make the amount of money they “need” to exist.

And after sleepwalking through the Biden administration and doing the bare minimum to avoid any fingers pointed their direction about election interference last year, the companies are now fully giving up. Knowing the incoming Trump administration will not only not care, but will even reward them for it.

The question now is, what will the EU do about it? This is a flagrant raised finger in the face of the Digital Services Act.

Tags: moderation content ugc meta future dsa eu garbage-day

“uhtcearu”

Published January 10, 2025

"uhtcearu"

Via Susie Dent, Word of the Day is ‘uhtcearu’ [ucht-kay-aru, with the 'ch' as in the Scottish ‘loch']: Old English for ‘the sorrow before dawn', when you lie awake in the darkness and worries crowd your mind.

It's amazing to realise that this unpleasant phenomenon of neurochemistry is a thing that's been around for thousands of years.

See also https://www.reddit.com/r/OldEnglish/comments/e7su8n/what_is_the_proper_form_of_uhtceare/

Tags: brains worry words uhtcearu uhtceare dawn morning neurochemistry

RSS Tricks

Published January 8, 2025

RSS Tricks

"Some of my favorite tricks for finding RSS Feeds to follow", from George Hotelling. These are great, particularly RSS Bridge (via Nelson)

Tags: via:nelson syndication feeds rss blogs reddit youtube

ads.txt for a site with no ads

Published January 6, 2025

ads.txt for a site with no ads

Don Marti: "since there’s a lot of malarkey in the online advertising business, I’m putting up this file [on my website] to let the advertisers know that if someone sold you an ad and claimed it ran on here, you got burned."

The format is defined in a specification from the IAB Tech Lab. The important part is the last line. The placeholder is how you tell the tools that are supposed to be checking this stuff that you don’t have ads.

Tags: ads don-marti hacks ads-txt web

Hoarder

Published December 29, 2024

Hoarder

"Quickly save links, notes, and images and hoarder will automatically tag them for you using AI for faster retrieval. Built for the data hoarders out there!"

Self-hosted (with a docker-compose file), open-source link hoarding tool; intriguingly, this scrapes links, extracts text and images, generates automated tag suggestions using OpenAI or a local ollama LLM, and indexes the page's full text using Meilisearch, which seems to be a speedy incremental search. Could be a great place to gateway links from this blog into a super-searchable form. hmm

Tags: links archiving bookmarks web search hoarder docker ai

The AI We Deserve

Published December 29, 2024

The AI We Deserve

A very thought-provoking essay from Evgeny Morozov on AI, LLMs and their embodied political viewpoint:

Sure, I can build a personalized language learning app using a mix of private services, and it might be highly effective. But is this model scalable? Is it socially desired? Is this the equivalent of me driving a car where a train might do just as well? Could we, for instance, trade a bit of efficiency and personalization to reuse some of the sentences or short stories I’ve already generated in my app, reducing the energy cost of re-running these services for each user?

This takes us to the core problem with today’s generative AI. It doesn’t just mirror the market’s operating principles; it embodies its ethos. This isn’t surprising, given that these services are dominated by tech giants that treat users as consumers above all. Why would OpenAI, or any other AI service, encourage me to send fewer queries to their servers or reuse the responses others have already received when building my app? Doing so would undermine their business model, even if it might be better from a social or political (never mind ecological) perspective. Instead, OpenAI’s API charges me— and emits a nontrivial amount of carbon emissions— even to tell me that London is the capital of the UK or that there are one thousand grams in a kilogram.

For all the ways tools like ChatGPT contribute to ecological reason, then, they also undermine it at a deeper level—primarily by framing our activities around the identity of isolated, possibly alienated, postmodern consumers. When we use these tools to solve problems, we’re not like Storm’s carefree flâneur, open to anything; we’re more like entrepreneurs seeking arbitrage opportunities within a predefined, profit-oriented grid. [....]

The Latin American examples give the lie to the “there’s no alternative” ideology of technological development in the Global North. In the early 1970s, this ideology was grounded in modernization theory; today, it’s rooted in neoliberalism. The result, however, is the same: a prohibition on imagining alternative institutional homes for these technologies. There’s immense value in demonstrating—through real-world prototypes and institutional reforms—that untethering these tools from their market-driven development model is not only possible but beneficial for democracy, humanity, and the planet.

Tags: technology ai history eolithism neoliberalism llms openai cybernetics hans-otto-storm cybersyn

Principal Engineer Roles

Published December 29, 2024

Principal Engineer Roles

From AWS VP of Technology, Mae-Lan Tomsen Bukovec -- a set of roles which a Principal Engineer can play to get projects done:

Sponsor: A Sponsor is a project/program lead, spanning multiple teams. Yes, this role can be played by a manager but it does not have to be (at least not at Amazon). If you are a Sponsor, you have to make sure decisions are made and that people aren’t stuck in analysis paralysis. This doesn’t mean that you yourself make those decisions (that’s often a Tie-breaker’s role which you may or may not be here). But you have to drive making sure decisions get made, which can mean owning those decisions, escalating to the right people, or whatever it takes to get it done.

A Sponsor is constantly clearing obstacles and getting things moving. It is a time-consuming role. You shouldn’t have time to act as Guide or a Sponsor on more than two projects combined, and you don’t have to be a Sponsor every year. But if a few years go by, and you haven’t been a Sponsor, it might be time to think about where you can step in and play that role. It tends to build new skills because you have to operate in different dimensions to land the right outcomes for the project.

Guide: Guides tend to be domain experts that are deeply involved in the architecture of a project. Guide will often drive the design but they’re not “The Architect.” A Guide often works through others to produce the designs, and themselves produce exemplary artifacts, like design docs or bodies of code. The code produced by a Guide is usually illustrative of a broader pattern or solving a difficult problem that the rest of the team will often run with afterwards. The difference between a Guide and a Sponsor is that the Guide focuses on the technical path for the project, and the Sponsor owns all aspects of project delivery, including product definition and organizational alignment.

Guides influence teams. If you are influencing individuals, you’re likely being a mentor and not a Guide. A Guide is a time-consuming role. You shouldn’t have time to Guide more than two projects, and that drops to one project if you are a Sponsor at the same time.

Catalyst: A Catalyst gets an idea off the ground, and it’s not always their idea. In my experience, the idea might not even come from the Catalyst—it can be something we’ve been talking about doing for years but never really got off the ground. Catalysts will create docs or prototypes and drive discussions with senior decision makers to think through the concept. Catalysts are not just “idea factories.” They take the time to develop the concept, drive buy-in for the idea, and work with the larger leadership team to assign engineers to deliver the project.

A Catalyst is a time-consuming role because of all the work that needs to be done. At Amazon, that involves prototypes, docs and discussions. It is hard to effectively Catalyze more than one or two things at once. It is important to note that Catalysts, like Tie-breakers, are not permanent roles. Once a project is catalyzed (e.g., in engineering with a dedicated team working on the project), a Catalyst moves out of the role. The Catalyst might take on a Guide or Sponsor role on the project, or not. Not every project needs a Catalyst. A Catalyst is a very helpful (arguably critical) role for your most ambitious, complex, and/or ambiguous problems to solve in the organization.

Tie Breaker: A Tie-Breaker makes a decision after a debate. At Amazon, that means deeply understanding the different positions, weighing in with a choice, and then formally closing it out with an email or a doc to the larger group. Not every project needs a Tie-Breaker. But if your project gets stuck in a consensus-seeking mode without making progress on hard decisions, a senior engineer might have to step in as a Tie-Breaker. Tie-breakers own breaking a log-jam on direction in the team by making a decision. Obviously, a Tie Breaker has to have great judgment. But, it is incredibly important that the Tie-Breaker listens well and understands all the nuances to the different positions as part of breaking the tie. When a Tie -Breaker drives a choice, they must bring other engineers into their thought process so that all the engineers in the debate understand the “why” behind the choice even if some are disappointed by the direction. A Tie-Breaker must have strong engineering and organizational acumen in this role.

Sometimes an organization will depend on a small set of senior engineers to play the role of Tie-Breaker because they are so good at it. As a successful Tie-Breaker, you want to be careful not to set a tone that every decision, no matter how small, must go through you. You’ll quickly transition from Tie-Breaker to a “decision bottleneck” at that point—and that is not a role any team needs. If a team finds itself frequently seeking out a Tie-Breaker, it could be a sign that the team needs help understanding how to make decisions. That's a topic for a different time. The Tie-Breaker role is considered a “moment in time” role, versus Sponsor/Guide which are ongoing until you reach a milestone. Once the decision is made and closed out, you’re no longer the Tie-Breaker.

Catcher: A Catcher gets a project back on track, often from a technical perspective. It requires high judgement because a Catcher drives prioritization and formulating a pragmatic plan under tight deadlines. Catchers must quickly do their own detailed analysis to understand the nuances of the problem and come up with the path forward in the right timeframe. As a comparison, a Tie-breaker tends to step in when the pros/cons of the different approaches are well known and the team needs to make a hard decision. Once “caught” (i.e., the project is back on track and moving forward), a project doesn’t need the Catcher anymore.

Sometimes Principal Engineers can do too much catching. Don’t get me wrong, we are all Catchers sometimes—including me. Any fast-paced business needs Catchers in engineering and management. It teaches important skills about leadership in difficult moments and helps the business by landing deliverables. It also teaches you what not to do next time. However, it is better to generalize a Catcher skill set across more engineers and not depend on a small set to Principal Engineers as Catchers. If a Principal Engineer plays Catcher all the time through a succession of projects, it leaves no time to develop skills in other roles.

Participant: A participant works on something without one of these explicitly assigned leadership roles. A Participant can be active or passive. Active participants are hands-on, and do things like spend a few days working through a design discussion or picking up a coding task occasionally on a project, etc. Passive participants offer up a few points in a meeting and move on. In general, if you're going to participate it's better to do so actively. Time-boxing some passive participation (e.g., office hours for engineers) can be a useful mechanism to stay connected to the team. However, keep in mind that it is easy for your time to get consumed by being a Participant in too many things.

(via Marc Brooker)

Tags: roles principal-engineer work projects project-management amazon aws via:marc-brooker

Brian Eno on AI

Published December 29, 2024

Brian Eno on AI

In my own experience as an artist, experimenting with AI has mixed results. I’ve used several “songwriting” AIs and similar “picture-making” AIs. I’m intrigued and bored at the same time: I find it quickly becomes quite tedious. I have a sort of inner dissatisfaction when I play with it, a little like the feeling I get from eating a lot of confectionery when I’m hungry. I suspect this is because the joy of art isn’t only the pleasure of an end result but also the experience of going through the process of having made it. When you go out for a walk it isn’t just (or even primarily) for the pleasure of reaching a destination, but for the process of doing the walking. For me, using AI all too often feels like I’m engaging in a socially useless process, in which I learn almost nothing and then pass on my non-learning to others. It’s like getting the postcard instead of the holiday. [...]

All that said, I do believe that AI tools can be very useful to an artist in making it possible to devise systems that see patterns in what you are making and drawing them to your attention, being able to nudge you into territory that is unfamiliar and yet interestingly connected. I say this having had some good experiences in my own (pre-AI) experiments with Markov chain generators and various crude randomizing procedures. [...]

To make anything surprising and beautiful using AI you need to prepare your prompts extremely carefully, studiously closing off all the yawning, magnetic chasms of Hallmark mediocrity. If you don’t want to get moon rhyming with June, you have to give explicit instructions like, “Don’t rhyme moon with June!” And then, at the other end of the process, you need to rigorously filter the results. Now and again, something unexpected emerges. But even with that effort, why would a system whose primary programming is telling it to take the next most probable step produce surprising results? The surprise is primarily the speed and the volume, not the content.

Tags: play process technology culture future art music ai brian-eno creation

Inky Frame 7.3″

Published December 19, 2024

Inky Frame 7.3"

This, like so much of the Pimoroni catalog, is a lovely piece of gadgetry. If I hadn't already built a very nice e-ink home dashboard a couple of years back, I would definitely be doing so using one of these.

Honorable mention goes to the Pimoroni Presto: https://shop.pimoroni.com/products/presto?variant=54894104019323 , which is a beautifully designed tiny colour touchscreen with an RP2350 onboard. Not e-ink, unfortunately, which is a key feature for pervasive dashboards IMO, but still, I can see lots of use-cases for that gadget too....

Tags: e-ink gadgets dashboards home devices hardware pimoroni rp2350 hacks

Sweden’s Suspicion Machine

Published December 19, 2024

Sweden’s Suspicion Machine

Here we go, with another predictive algorithm-driven bias machine used to drive refusal of benefits:

Lighthouse Reports and Svenska Dagbladet obtained an unpublished dataset containing thousands of applicants to Sweden’s temporary child support scheme, which supports parents taking care of sick children. Each of them had been flagged as suspicious by a predictive algorithm deployed by the Social Insurance Agency. Analysis of the dataset revealed that the agency’s fraud prediction algorithm discriminated against women, migrants, low-income earners and people without a university education. Months of reporting — including conversations with confidential sources — demonstrate how the agency has deployed these systems without scrutiny despite objections from regulatory authorities and even its own data protection officer.

Tags: sweden predictive algorithms surveillance welfare benefits bias data-protection fraud

Thalidomide chirality paradox explained

Published December 18, 2024

Thalidomide chirality paradox explained

Molecule chirality ("left-handedness" and "right-handedness") has been in the news again recently.

What is little known is the relevance of chirality to the thalidomide disaster. Thalidomide, the drug which was prescribed widely to pregnant women in the 1950s for the treatment of morning sickness, was later discovered to be a chiral molecule, and while the left-handed molecule was effective, the right-handed one was extremely toxic, causing thousands of children around the world to be born with severe birth defects. The mystery is, why didn't this toxicity emerge during animal experiments? Here's a paper with a potential explanation:

Twenty years after the thalidomide disaster in the late 1950s, Blaschke et al. reported that only the (S)-enantiomer of thalidomide is teratogenic [jm: causing birth defects]. However, other work has shown that the enantiomers ["mirror" molecules] of thalidomide interconvert in vivo, which begs the question: why is teratogen activity not observed in animal experiments that use (R)-thalidomide given the ready in vivo racemization (“thalidomide paradox”)? Herein, we disclose a hypothesis to explain this “thalidomide paradox” through the in-vivo self-disproportionation of enantiomers. Upon stirring a 20% ee solution of thalidomide in a given solvent, significant enantiomeric enrichment of up to 98% ee was observed reproducibly in solution. We hypothesize that a fraction of thalidomide enantiomers epimerizes in vivo, followed by precipitation of racemic [equally mixed between R/S forms] thalidomide in (R/S)-heterodimeric form. Thus, racemic thalidomide is most likely removed from biological processes upon racemic precipitation in (R/S)-heterodimeric form. On the other hand, enantiomerically pure thalidomide remains in solution, affording the observed biological experimental results: the (S)-enantiomer is teratogenic, while the (R)-enantiomer is not.

Tags: chirality thalidomide molecules drugs medicine papers chemistry

UK passes the Online Safety Act

Published December 17, 2024

UK passes the Online Safety Act

Apparently "The Online Safety Act applies to every service which handles user-generated content and has “links to the UK”, with a few limited exceptions listed below. The scope is extraterritorial (like the GDPR) so even sites entirely operated outside the UK are in scope if they are considered to have “links to the UK”."

A service has links to the UK if any of the following apply: - the service has a “significant number” of UK users - UK users form one of the target markets for the service - the service is accessible to UK users and “there are reasonable grounds to believe that there is a material risk of significant harm to individuals in the UK” (this seems less likely to apply for smaller services but who knows)

Tags: osa uk safety regulations ofcom

How to build highly-debuggable C++ binaries

Published December 17, 2024

How to build highly-debuggable C++ binaries

aka, how to have a modern C++ development environment (for when you still need to do such a thing) -- also, wow, C++ has changed a lot since the last time I was working with it. (Via Tony Finch)

Tags: via:fanf gdb c++ coding debugging builds

Why did Silicon Valley turn right?

Published December 17, 2024

Why did Silicon Valley turn right?

A great essay on the demise of the 1990s/2000s liberal consensus in Silicon Valley:

No-one now believes - or pretends to believe - that Silicon Valley is going to connect the world, ushering in an age of peace, harmony and likes across nations. [...] A decade ago, liberals, liberaltarians and straight libertarians could readily enthuse about “liberation technologies” and Twitter revolutions in which nimble pro-democracy dissidents would use the Internet to out-maneuver sluggish governments. Technological innovation and liberal freedoms seemed to go hand in hand. Now they don’t. Authoritarian governments have turned out to be quite adept for the time being, not just at suppressing dissidence but at using these technologies for their own purposes. Platforms like Facebook have been used to mobilize ethnic violence around the world, with minimal pushback from the platform’s moderation systems [...] My surmise is that this shift in beliefs has undermined the core ideas that held the Silicon Valley coalition together. Specifically, it has broken the previously ‘obvious’ intimate relationship between innovation and liberalism. I don’t see anyone arguing that Silicon Valley innovation is the best way of spreading liberal democratic awesome around the world any more, or for keeping it up and running at home. Instead, I see a variety of arguments for the unbridled benefits of innovation, regardless of its benefits for democratic liberalism. I see a lot of arguments that AI innovation in particular is about to propel us into an incredible new world of human possibilities, provided that it isn’t restrained by DEI, ESG and other such nonsense. Others (or the same people) argue that we need to innovate, innovate, innovate because we are caught in a technological arms race with China, and if we lose, we’re toast. Others (sotto or brutto voce; again, sometimes the same people) - contend innovation isn’t really possible in a world of democratic restraint, and we need new forms of corporate authoritarianism with a side helping of exit, to allow the kinds of advances we really need to transform the world.

Tags: essays henry-farrell tech politics silicon-valley fascism democracy liberalism

Black plastic won’t kill you

Published December 17, 2024

Black plastic won't kill you

How a simple math error sparked a panic about toxic chemicals in black plastic kitchen utensils:

Plastics rarely make news like this. From Newsmax to Food and Wine, and from the Daily Mail to CNN, the media uptake was enthusiastic on a paper published in October in the peer-reviewed journal Chemosphere. “Your cool black kitchenware could be slowly poisoning you, study says. Here’s what to do,” said the LA Times. “Yes, throw out your black spatula,” said the San Francisco Chronicle. Salon was most blunt: “Your favorite spatula could kill you,” it said. [....] The paper correctly gives the reference dose for BDE-209 as 7,000 nanograms per kilogram of body weight per day, but calculates this into a limit for a 60-kilogram adult of 42,000 nanograms per day. So, as the paper claims, the estimated actual exposure from kitchen utensils of 34,700 nanograms per day is more than 80 per cent of the EPA limit of 42,000. That sounds bad. But 60 times 7,000 is not 42,000. It is 420,000. This is what Joe Schwarcz [director of McGill University’s Office for Science and Society] noticed. The estimated exposure is not even a tenth of the reference dose.

(tags: cooking research science plastics errors maths math fail papers)

pprof.me

Published December 15, 2024

pprof.me

a web app to 'share and visualize .pprof profiles on an intuitive interface'

(tags: pprof profiling optimization web tools coding)

ntfy.sh

Published December 12, 2024

ntfy.sh

Send push notifications to your phone via PUT/POST. "a simple HTTP-based pub-sub notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, and/or using a REST API. It's infinitely flexible, and 100% free software."

I've been using a personal Slack for this purpose, but this is a decent-sounding alternative.

(tags: notification push alerting open-source android ios push-messaging)

The state of Tomi’s Home Assistant in 2024

Published December 12, 2024

The state of Tomi's Home Assistant in 2024

Wow, this is some setup. Really quite a lot of automation! I note that his Mitsubishi heat pump and Midea dehumidifier have wifi control, I can see that being useful

(tags: home-automation ha home home-assistant hacks automation)

Kaffekapslen

Published December 10, 2024

Kaffekapslen

A Danish web shop selling coffee beans and capsules; my new go-to for Lavazza, with free shipping to Ireland for orders over EUR50.

(tags: shopping coffee ireland denmark coffee-beans)

Pleias language models

Published December 9, 2024

Pleias language models

OK, this is quite cool: "the first ever [language] models trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license. These are the first fully EU AI Act compliant models. In fact, Pleias sets a new standard for safety and openness."

Training large language models required copyrighted data until it did not. Today we release Pleias 1.0 models, a family of fully open small language models. Pleias 1.0 models include three base models: 350M, 1.2B, and 3B parameters. They feature two specialized models for knowledge retrieval with unprecedented performance for their size on multilingual Retrieval-Augmented Generation, Pleias-Pico (350M parameters) and Pleias-Nano (1.2B parameters). [...] Our models are: * multilingual, offering strong support for multiple European languages; * safe, showing the lowest results on the toxicity benchmark; * performant for key tasks, such as knowledge retrieval; * able to run efficiently on consumer-grade hardware locally (CPU-only, without quantisation) Pleias 1.0 family embodies a new approach to specialized small language models, for end applications: wound-up models. We have implemented a set of ideas and solutions during pretraining that produce a frugal yet powerful language model specifically optimized for further RAG implementations. We release two wound-up models further trained for Retrieval Augmented Generation (RAG): Pleias-pico-350m-RAG and Pleias-nano-1B-RAG. These models are designed to be implemented locally, so we prioritized frugal implementation. As our models are small, they can run smoothly, even on devices with limited RAM.

And here's their fully open training set: https://huggingface.co/datasets/PleIAs/common_corpus

(tags: llms models huggingface ai pleias rag ai-act open-data)

UK benefits AI system found to show bias

Published December 7, 2024

UK benefits AI system found to show bias

File this under "the least surprising news ever":

An artificial intelligence system used by the UK government to detect welfare fraud is showing bias according to people’s age, disability, marital status and nationality, the Guardian can reveal. An internal assessment of a machine-learning programme used to vet thousands of claims for universal credit payments across England found it incorrectly selected people from some groups more than others when recommending whom to investigate for possible fraud.

The most interesting aspect of the report published is that currently "there is no established numerical or statistical benchmark at which referral or outcome disparity can be defined as within tolerance".

I would have assumed a lack of bias, measured against a "false positive" rate -- ie. benefits recipients who were selected for additional checks, who were then found to be legitimate and not committing fraud, should have been a design goal, and a critical KPI for such a system.

There are going to be a lot of similar examples in the years to come -- here's hoping this "bias measurement" KPI becomes established as a concept.

(tags: bias ai kpis dwp uk benefits welfare fraud ml)

Ridding My Home Network of IP Addresses

Published December 7, 2024

Ridding My Home Network of IP Addresses

(Republishing this one on the blog, instead of just as a gist)

Recent changes in the tech scene have made it clear that relying on commercial companies to provide services I rely on isn't a good strategy in the long term, and given that Tailscale is so effective these days as a remote-access system, I've gradually been expanding a small collection of self-hosted web apps and services running on my home network.

Until now they've mainly been addressed using their IP addresses and random high ports on the internal LAN, for example:

Pihole: http://10.19.72.7/admin
Home Assistant: http://10.19.72.11:8123/
Linkding: http://10.19.72.6:9092/
Grafana: http://10.19.72.6:3000/
(plus a good few others)

Needless to say this is a bit messy and inelegant, so I've been planning to sort it out for a while. My requirements:

no more ugly bare IP addresses!
a DNS domain;
with HTTPS URLs;
one per service;
no visible port numbers;
fully valid TLS certs, no having to click through warnings or install funny CA certs;
accessible regardless of which DNS server is in use -- ie. using public DNS records. This may seem slightly unusual, but it's useful so that the internal services can still be accessed when I'm using my work VPN (which forces its own DNS servers);
accessible internally;
accessible externally, over Tailscale;
not accessible externally without Tailscale.

After a few false starts, I'm pretty happy with the current setup, which uses Caddy.

Hosting The Domain At Cloudflare

First off, since the service URLs are not to be accessible externally without Tailscale active, the HTTP challenge approach to provision Let's Encrypt certs cannot be used. That would require an open-to-the-internet publicly-accessible HTTP server on my home network, which I absolutely want to avoid.

In order to use the ACME DNS challenge instead, I set up my public domain "taint.org" to use Cloudflare as the authoritative DNS server (in Cloudflare terms, "full setup"). This lets Caddy edit the DNS records via the Cloudflare API to handle the ACME challenge process.

One of the internal hosts is needed to run the Caddy server's reverse proxies; I picked "hass", 10.19.72.11, the Home Assistant host, which didn't have anything already running on port 80 or port 443. (All of my internal hosts are running on a private /24 IP range, at 10.19.72.0/24.)

The dedicated DNS domain I'm using for my home services is "home.taint.org". In order to use this, I clicked through to the Cloudflare admin panel and created a DNS record as follows:

Type   Name      Content             Proxy Status               TTL
A      *.home    10.19.72.11         DNS only - reserved IP     Auto

Now, any hostnames under "home.taint.org" will return the IP 10.19.72.11 (where Caddy will run).

I don't particularly care about exposing my internal home network IPs to the world, as a trade-off to allow the URLs to work even if an internal host is using the work VPN, or resolving with 8.8.8.8, or whatever. That's worth missing out on a little bit of paranoia, since the IPs won't be accessible from outside without Tailscale anyway.

It is worth noting that the Cloudflare-hosted domain doesn't have to be the same one used for URLs in the home network; using dns_challenge_override_domain you can delegate the ACME challenge from any "home" domain to one which is hosted in Cloudflare.

The Caddy Setup

One wrinkle is that I had to generate a custom Caddy build in order to get the "dns.providers.cloudflare" non-standard module, from https://caddyserver.com/download . This is a click-and-download page which generates a custom Caddy binary on the fly. It would have been nicer if the Cloudflare module was standard, but hey.

Once that's installed, I can get this output:

$ /usr/local/bin/caddy list-modules
[long list of standard modules omitted]

dns.providers.cloudflare
dns.providers.route53

  Non-standard modules: 2

  Unknown modules: 0

(Yes, I have Caddy running as a normal service, not as a Docker container. No particular reason; I think Docker should work fine.)

Go to the Cloudflare account dashboard, and create a user API token as described at https://developers.cloudflare.com/fundamentals/api/get-started/create-token/ . In my case, it has Zone / DNS / Edit permission, on the specific zone taint.org.

Copy that token as it's needed in the "Caddyfile", which now looks like the following:

hass.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.11:8123
}

links.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.6:9092
}

pi.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        redir / /admin/
        reverse_proxy /admin/* 10.19.72.7:80
}

grafana.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.6:3000
}

[many other services omitted]

Running sudo caddy run in the same dir will start up and verbosely log what it's doing. (Once you're happy enough, you can get Caddy running in the normal systemd service way.)

After setting those up, I now have my services accessible locally as:

Home Assistant: https://hass.home.taint.org/
Pihole: https://pi.home.taint.org/
Grafana: https://grafana.home.taint.org/
Linkding: https://links.home.taint.org/

Caddy seamlessly goes off and configures fully valid TLS certs with no fuss. I found it much tidier than Certbot, or Nginx Proxy Manager.

The Tailscale Setup

So this has now sorted out all of the requirements bar one:

accessible externally, over Tailscale.

To do this I had to log into Tailscale's admin console and go to https://login.tailscale.com/admin/machines , pick a host on the 10.19.72/24 internal LAN, click it's dropdown menu and "Edit Route Settings...", and enable a Subnet Route for 10.19.72/24. By doing this, all of the service.home.taint.org DNS records are now accessible, remotely, once Tailscale is enabled; I don't even need to use ts.net names to access them! Perfect.

Anyway, that's the setup -- hopefully this writeup will help others. And kudos to Caddy, Let's Encrypt and Tailscale for making this relatively easy.

GenCast

Published December 6, 2024

GenCast

Google DeepMind announce their new AI model for weather forecasting, in collaboration with the ECMWF:
Today, in a paper published in Nature, we present GenCast, our new high resolution (0.25°) AI ensemble model. GenCast provides better forecasts of both day-to-day weather and extreme events than the top operational system, the European Centre for Medium-Range Weather Forecasts’ (ECMWF) ENS, up to 15 days in advance. We’ll be releasing our model’s code, weights, and forecasts, to support the wider weather forecasting community. [...] GenCast is a diffusion model, the type of generative AI model that underpins the recent, rapid advances in image, video and music generation. However, GenCast differs from these, in that it’s adapted to the spherical geometry of the Earth, and learns to accurately generate the complex probability distribution of future weather scenarios when given the most recent state of the weather as input. To train GenCast, we provided it with four decades of historical weather data from ECMWF’s ERA5 archive. This data includes variables such as temperature, wind speed, and pressure at various altitudes. The model learned global weather patterns, at 0.25° resolution, directly from this processed weather data.
It's open source: https://github.com/google-deepmind/graphcast And here are the open-released model weights: https://console.cloud.google.com/storage/browser/dm_graphcast Graphcast (the previous iteration) has public forecasts published at https://charts.ecmwf.int/?query=GraphCast , under a CC-BY-NC-SA-4 licence -- it would be great if the GenCast forecasts join this data set. Paper: https://arxiv.org/abs/2312.15796 This all looks really great, a fantastic commitment to (genuine) openness and open data, and the paper seems rigorous (to this amateur). Great stuff.

(tags: forecasting weather ai gencast graphcast deepmind google ecmwf genai)

TikTok in hot water over Romanian elections

Published December 4, 2024

TikTok in hot water over Romanian elections

‘We are getting fed up’: EU lawmakers snap at TikTok over Romanian election:
For years, the Chinese-owned social media app has brushed off security concerns in the United States and Europe that it could be used for mass manipulation and espionage. It now faces an intense regulatory storm in Bucharest over whether it played a role in skewing the democratic process in an EU country and NATO member of 19 million people. [....] "Honestly speaking, we are getting fed up by the documents and the empty promises," Swedish center-right European lawmaker Arba Kokalari said near the end of the hearing.

(tags: tiktok elections romania eu bias news propaganda democracy social-media)

noyb is now qualified to bring collective redress actions

Published December 3, 2024

noyb is now qualified to bring collective redress actions

"noyb is now approved as a so-called "Qualified Entity" to bring collective redress actions in courts throughout the European Union. Such action under Directive (EU) 2020/1828 can either be an "injunction" or a "redress" measure. "Injunctions" generally prohibit a company from engaging in illegal practices, including any GDPR violations. "Redress" measures allow a European version of a "Class Action", where thousands or millions of users could be represented by noyb and for example ask for non-material damages when their personal data was unlawfully processed." This is very interesting -- and timely, given the mass scraping of user data to feed AI training sets...

(tags: noyb data-privacy data-protection class-actions law eu collective-redress)

Privacy Disasters: FaceHuggers Are Eating Your Skeets

Published December 2, 2024

Privacy Disasters: FaceHuggers Are Eating Your Skeets

Good take from Carey Lening on the recent Hugging Face release of a million-BlueSky-post dataset:
Once again, we’ve got a collective action problem that’s being ignored in favor of technological progress, big money, data extraction, and libertarian notions of ‘public data’. It’s a shitty look. Both Bluesky and HF are acting like the host who’s egging the dickheads on, and it’s really disappointing as a user to know that this is probably what we should have expected all along.

(tags: data hugging-face ai training bluesky public data-protection privacy datasets scraping)

The Buddhabrot

Published November 27, 2024

The Buddhabrot

This was news to me! There's another fractal pattern derived from the Mandelbrot set which I'd never seen before:
As it turns out, it’s not just the boundary of the Mandelbrot set that’s mind-bogglingly complex: the same goes for the (xn, yn) escape trajectories associated with the (u, v) pixels near the set’s edge. The iterated coordinates follow elaborate, long-winded paths through space; their ethereal trails form a density plot reminiscent of the Mandelbrot fractal itself.

(tags: fractals mandelbrot buddhabrot graphics maths via:lcamtuf)

Rewilding fields massively improved bumblebee numbers in Scotland

Published November 27, 2024

Rewilding fields massively improved bumblebee numbers in Scotland

"Bumblebee population increases 116 times over in 'remarkable' Scotland project":
Rewilding Denmarkfield, a 90-acre project based just north of Perth, has been working to restore nature to green spaces in an increasingly built up area for the past two years. Statistics from the charity show in 2021, when some of the fields managed by the project were still barley monoculture, only 35 bumblebees were counted. But by 2023, after just two years of nature restoration work in the same fields, the population increased to 4,056. The diversity of bumblebee also doubled, according to the charity, from five to ten different species.

(tags: bees bumblebees scotland fields farming rewilding fallow nature)

WeSQL

Published November 26, 2024

WeSQL

"an innovative MySQL distribution that adopts a compute-storage separation architecture, with storage backed by S3 (and S3-compatible systems). WeSQL has completely replaced MySQL’s traditional disk storage with S3. All MySQL data—binlogs, schemas, storage engine metadata, WAL, and data files—are entirely (not partially!) stored as objects in S3. The 11 nines of durability provided by S3 significantly enhances data reliability. Additionally, WeSQL can start from a clean, empty instance, connect to S3, load the data, and begin serving immediately with no additional setup required. It is ideal for users who need an easy-to-manage, cost-effective, and developer-friendly MySQL database solution, especially for those needing support for both Serverless and BYOC (Bring Your Own Cloud)." (via Ian on ITC)

(tags: mysql s3 object-storage storage databases sql)

ZetaOffice

Published November 25, 2024

ZetaOffice

Libreoffice in the browser, compiled to WASM and available as open source, or as a supported product. (via David Gerard)

(tags: libreoffice wasm web javascript compilation)