Skip to content

Justin's Linklog Posts

Grafana and ClickHouse

Watch Duty

  • Watch Duty

    Nice to see an important public need being met here:

    The [Watch Duty] app gives users the latest alerts about fires in their area [in California] and has become a vital service for millions of users in the western U.S. struggling with the seemingly constant threat of deadly wildfires—one major reason it had over 360,000 unique visits from 8:00-8:30 a.m. local time Wednesday. And the man behind Watch Duty promises that as a nonprofit, his organization has no plans to pull an OpenAI and become a profit-seeking enterprise.

    Tags: non-profits tech watch-duty apps mobile public-good

Steve Jobs vs Ireland

  • Steve Jobs vs Ireland

    this is a great Steve Jobs story, from the engineer who wrote v1 of the MacOS X Dock:

    At one point during a trip over, Steve was talking to Bas and asked how things were coming along with the Dock. He replied something along the lines of “going well, the engineer is over from Ireland right now, etc”. Steve left, and then visited my manager’s manager’s manager and said the fateful words (as reported to me by people who were in the room where it happened).

    “It has come to my attention that the engineer working on the Dock is in FUCKING IRELAND”.

    I was told that I had to move to Cupertino. Immediately. Or else.

    I did not wish to move to the States. I liked being in Europe. Ultimately, after much consideration, many late night conversations with my wife, and even buying a guide to moving, I said no.

    They said ok then. We’ll just tell Steve you did move.

    (via Niall Murphy)

    Tags: macos america osx apple history steve-jobs

Court docs allege Meta trained LLM models using pirated book trove

  • Court docs allege Meta trained LLM models using pirated book trove

    This is pretty massive:

    The [court] document claims that Meta decided to download documents from Library Genesis — aka. “LibGen” — to train its models. LibGen is the subject of a lawsuit brought by textbook publishers who believe it happily hosts and distributes [pirated] works [….]

    The filing from plaintiffs in the Kadrey case claims that documents produced by Meta […] describe internal debate about accessing LibGen, a little squeamishness about using BitTorrent in the office to do so, and eventual escalation to “MZ” [Mark Zuckerberg himself], who approved use of the contentious resource. […]

    Another filing claims that a Meta document describes how it removed copyright notifications from material downloaded from LibGen, and suggests the company did so because it realized including such text could mean a model’s output would reveal it was trained on copyrighted material.

    US District Court Judge Vince Chhabria also noted that in one of the documents Meta wants to seal, an employee wrote the following:

    “If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

    No shit.

    Tags: piracy meta copyright mark-zuckerberg law llama training libgen books

Bufferbloat Test

  • Bufferbloat Test

    A handy tool to test your internet connection for “bufferbloat”, the error condition involving “undesirable high latency caused by other traffic on your network. It happens when a flow uses more than its fair share of the bottleneck. Bufferbloat is the primary cause of bad performance for real-time Internet applications like VoIP calls, video games, and videoconferencing.”

    (My home internet connection is currently rating a C: “your latency increased considerably under load”, jumping from a min/mean/p95/max of 10.7, 16.9, 23.7, 30.1ms to 35.3, 98.4, 121.0, 286.0ms under load, yikes, so looks like I need to do some optimising.)

    Tags: bufferbloat internet networking optimisation performance testing tools

Waymos don’t stop for pedestrians

Garbage Day on Meta’s moderation plans

  • Garbage Day on Meta’s moderation plans

    This is 100% spot on, I suspect, regarding Meta’s recently-announced plans to give up on content moderation:

    After 2021, the major tech platforms we’ve relied on since the 2010s could no longer pretend that they would ever be able to properly manage the amount of users, the amount of content, the amount of influence they “need” to exist at the size they “need” to exist at to make the amount of money they “need” to exist.

    And after sleepwalking through the Biden administration and doing the bare minimum to avoid any fingers pointed their direction about election interference last year, the companies are now fully giving up. Knowing the incoming Trump administration will not only not care, but will even reward them for it.

    The question now is, what will the EU do about it? This is a flagrant raised finger in the face of the Digital Services Act.

    Tags: moderation content ugc meta future dsa eu garbage-day

“uhtcearu”

ads.txt for a site with no ads

  • ads.txt for a site with no ads

    Don Marti: “since there’s a lot of malarkey in the online advertising business, I’m putting up this file [on my website] to let the advertisers know that if someone sold you an ad and claimed it ran on here, you got burned.”

    The format is defined in a specification from the IAB Tech Lab. The important part is the last line. The placeholder is how you tell the tools that are supposed to be checking this stuff that you don’t have ads.

    Tags: ads don-marti hacks ads-txt web

Hoarder

  • Hoarder

    “Quickly save links, notes, and images and hoarder will automatically tag them for you using AI for faster retrieval. Built for the data hoarders out there!”

    Self-hosted (with a docker-compose file), open-source link hoarding tool; intriguingly, this scrapes links, extracts text and images, generates automated tag suggestions using OpenAI or a local ollama LLM, and indexes the page’s full text using Meilisearch, which seems to be a speedy incremental search. Could be a great place to gateway links from this blog into a super-searchable form. hmm

    Tags: links archiving bookmarks web search hoarder docker ai

The AI We Deserve

  • The AI We Deserve

    A very thought-provoking essay from Evgeny Morozov on AI, LLMs and their embodied political viewpoint:

    Sure, I can build a personalized language learning app using a mix of private services, and it might be highly effective. But is this model scalable? Is it socially desired? Is this the equivalent of me driving a car where a train might do just as well? Could we, for instance, trade a bit of efficiency and personalization to reuse some of the sentences or short stories I’ve already generated in my app, reducing the energy cost of re-running these services for each user?

    This takes us to the core problem with today’s generative AI. It doesn’t just mirror the market’s operating principles; it embodies its ethos. This isn’t surprising, given that these services are dominated by tech giants that treat users as consumers above all. Why would OpenAI, or any other AI service, encourage me to send fewer queries to their servers or reuse the responses others have already received when building my app? Doing so would undermine their business model, even if it might be better from a social or political (never mind ecological) perspective. Instead, OpenAI’s API charges me— and emits a nontrivial amount of carbon emissions— even to tell me that London is the capital of the UK or that there are one thousand grams in a kilogram.

    For all the ways tools like ChatGPT contribute to ecological reason, then, they also undermine it at a deeper level—primarily by framing our activities around the identity of isolated, possibly alienated, postmodern consumers. When we use these tools to solve problems, we’re not like Storm’s carefree flâneur, open to anything; we’re more like entrepreneurs seeking arbitrage opportunities within a predefined, profit-oriented grid. [….]

    The Latin American examples give the lie to the “there’s no alternative” ideology of technological development in the Global North. In the early 1970s, this ideology was grounded in modernization theory; today, it’s rooted in neoliberalism. The result, however, is the same: a prohibition on imagining alternative institutional homes for these technologies. There’s immense value in demonstrating—through real-world prototypes and institutional reforms—that untethering these tools from their market-driven development model is not only possible but beneficial for democracy, humanity, and the planet.

    Tags: technology ai history eolithism neoliberalism llms openai cybernetics hans-otto-storm cybersyn

Brian Eno on AI

  • Brian Eno on AI

    In my own experience as an artist, experimenting with AI has mixed results. I’ve used several “songwriting” AIs and similar “picture-making” AIs. I’m intrigued and bored at the same time: I find it quickly becomes quite tedious. I have a sort of inner dissatisfaction when I play with it, a little like the feeling I get from eating a lot of confectionery when I’m hungry. I suspect this is because the joy of art isn’t only the pleasure of an end result but also the experience of going through the process of having made it. When you go out for a walk it isn’t just (or even primarily) for the pleasure of reaching a destination, but for the process of doing the walking. For me, using AI all too often feels like I’m engaging in a socially useless process, in which I learn almost nothing and then pass on my non-learning to others. It’s like getting the postcard instead of the holiday. […]

    All that said, I do believe that AI tools can be very useful to an artist in making it possible to devise systems that see patterns in what you are making and drawing them to your attention, being able to nudge you into territory that is unfamiliar and yet interestingly connected. I say this having had some good experiences in my own (pre-AI) experiments with Markov chain generators and various crude randomizing procedures. […]

    To make anything surprising and beautiful using AI you need to prepare your prompts extremely carefully, studiously closing off all the yawning, magnetic chasms of Hallmark mediocrity. If you don’t want to get moon rhyming with June, you have to give explicit instructions like, “Don’t rhyme moon with June!” And then, at the other end of the process, you need to rigorously filter the results. Now and again, something unexpected emerges. But even with that effort, why would a system whose primary programming is telling it to take the next most probable step produce surprising results? The surprise is primarily the speed and the volume, not the content. 

    Tags: play process technology culture future art music ai brian-eno creation

Principal Engineer Roles

  • Principal Engineer Roles

    From AWS VP of Technology, Mae-Lan Tomsen Bukovec — a set of roles which a Principal Engineer can play to get projects done:

    Sponsor: A Sponsor is a project/program lead, spanning multiple teams. Yes, this role can be played by a manager but it does not have to be (at least not at Amazon). If you are a Sponsor, you have to make sure decisions are made and that people aren’t stuck in analysis paralysis. This doesn’t mean that you yourself make those decisions (that’s often a Tie-breaker’s role which you may or may not be here). But you have to drive making sure decisions get made, which can mean owning those decisions, escalating to the right people, or whatever it takes to get it done.

    A Sponsor is constantly clearing obstacles and getting things moving. It is a time-consuming role. You shouldn’t have time to act as Guide or a Sponsor on more than two projects combined, and you don’t have to be a Sponsor every year. But if a few years go by, and you haven’t been a Sponsor, it might be time to think about where you can step in and play that role. It tends to build new skills because you have to operate in different dimensions to land the right outcomes for the project.

    Guide: Guides tend to be domain experts that are deeply involved in the architecture of a project. Guide will often drive the design but they’re not “The Architect.” A Guide often works through others to produce the designs, and themselves produce exemplary artifacts, like design docs or bodies of code. The code produced by a Guide is usually illustrative of a broader pattern or solving a difficult problem that the rest of the team will often run with afterwards. The difference between a Guide and a Sponsor is that the Guide focuses on the technical path for the project, and the Sponsor owns all aspects of project delivery, including product definition and organizational alignment.

    Guides influence teams. If you are influencing individuals, you’re likely being a mentor and not a Guide. A Guide is a time-consuming role. You shouldn’t have time to Guide more than two projects, and that drops to one project if you are a Sponsor at the same time.

    Catalyst: A Catalyst gets an idea off the ground, and it’s not always their idea. In my experience, the idea might not even come from the Catalyst—it can be something we’ve been talking about doing for years but never really got off the ground. Catalysts will create docs or prototypes and drive discussions with senior decision makers to think through the concept. Catalysts are not just “idea factories.” They take the time to develop the concept, drive buy-in for the idea, and work with the larger leadership team to assign engineers to deliver the project.

    A Catalyst is a time-consuming role because of all the work that needs to be done. At Amazon, that involves prototypes, docs and discussions. It is hard to effectively Catalyze more than one or two things at once. It is important to note that Catalysts, like Tie-breakers, are not permanent roles. Once a project is catalyzed (e.g., in engineering with a dedicated team working on the project), a Catalyst moves out of the role. The Catalyst might take on a Guide or Sponsor role on the project, or not. Not every project needs a Catalyst. A Catalyst is a very helpful (arguably critical) role for your most ambitious, complex, and/or ambiguous problems to solve in the organization.

    Tie Breaker: A Tie-Breaker makes a decision after a debate. At Amazon, that means deeply understanding the different positions, weighing in with a choice, and then formally closing it out with an email or a doc to the larger group. Not every project needs a Tie-Breaker. But if your project gets stuck in a consensus-seeking mode without making progress on hard decisions, a senior engineer might have to step in as a Tie-Breaker. Tie-breakers own breaking a log-jam on direction in the team by making a decision. Obviously, a Tie Breaker has to have great judgment. But, it is incredibly important that the Tie-Breaker listens well and understands all the nuances to the different positions as part of breaking the tie. When a Tie -Breaker drives a choice, they must bring other engineers into their thought process so that all the engineers in the debate understand the “why” behind the choice even if some are disappointed by the direction. A Tie-Breaker must have strong engineering and organizational acumen in this role.

    Sometimes an organization will depend on a small set of senior engineers to play the role of Tie-Breaker because they are so good at it. As a successful Tie-Breaker, you want to be careful not to set a tone that every decision, no matter how small, must go through you. You’ll quickly transition from Tie-Breaker to a “decision bottleneck” at that point—and that is not a role any team needs. If a team finds itself frequently seeking out a Tie-Breaker, it could be a sign that the team needs help understanding how to make decisions. That’s a topic for a different time. The Tie-Breaker role is considered a “moment in time” role, versus Sponsor/Guide which are ongoing until you reach a milestone. Once the decision is made and closed out, you’re no longer the Tie-Breaker.

    Catcher: A Catcher gets a project back on track, often from a technical perspective. It requires high judgement because a Catcher drives prioritization and formulating a pragmatic plan under tight deadlines. Catchers must quickly do their own detailed analysis to understand the nuances of the problem and come up with the path forward in the right timeframe. As a comparison, a Tie-breaker tends to step in when the pros/cons of the different approaches are well known and the team needs to make a hard decision. Once “caught” (i.e., the project is back on track and moving forward), a project doesn’t need the Catcher anymore.

    Sometimes Principal Engineers can do too much catching. Don’t get me wrong, we are all Catchers sometimes—including me. Any fast-paced business needs Catchers in engineering and management. It teaches important skills about leadership in difficult moments and helps the business by landing deliverables. It also teaches you what not to do next time. However, it is better to generalize a Catcher skill set across more engineers and not depend on a small set to Principal Engineers as Catchers. If a Principal Engineer plays Catcher all the time through a succession of projects, it leaves no time to develop skills in other roles.

    Participant: A participant works on something without one of these explicitly assigned leadership roles. A Participant can be active or passive. Active participants are hands-on, and do things like spend a few days working through a design discussion or picking up a coding task occasionally on a project, etc. Passive participants offer up a few points in a meeting and move on. In general, if you’re going to participate it’s better to do so actively. Time-boxing some passive participation (e.g., office hours for engineers) can be a useful mechanism to stay connected to the team. However, keep in mind that it is easy for your time to get consumed by being a Participant in too many things.

    (via Marc Brooker)

    Tags: roles principal-engineer work projects project-management amazon aws via:marc-brooker

Inky Frame 7.3″

Sweden’s Suspicion Machine

  • Sweden’s Suspicion Machine

    Here we go, with another predictive algorithm-driven bias machine used to drive refusal of benefits:

    Lighthouse Reports and Svenska Dagbladet obtained an unpublished dataset containing thousands of applicants to Sweden’s temporary child support scheme, which supports parents taking care of sick children. Each of them had been flagged as suspicious by a predictive algorithm deployed by the Social Insurance Agency. Analysis of the dataset revealed that the agency’s fraud prediction algorithm discriminated against women, migrants, low-income earners and people without a university education. Months of reporting — including conversations with confidential sources — demonstrate how the agency has deployed these systems without scrutiny despite objections from regulatory authorities and even its own data protection officer.

    Tags: sweden predictive algorithms surveillance welfare benefits bias data-protection fraud

Thalidomide chirality paradox explained

  • Thalidomide chirality paradox explained

    Molecule chirality (“left-handedness” and “right-handedness”) has been in the news again recently.

    What is little known is the relevance of chirality to the thalidomide disaster. Thalidomide, the drug which was prescribed widely to pregnant women in the 1950s for the treatment of morning sickness, was later discovered to be a chiral molecule, and while the left-handed molecule was effective, the right-handed one was extremely toxic, causing thousands of children around the world to be born with severe birth defects. The mystery is, why didn’t this toxicity emerge during animal experiments? Here’s a paper with a potential explanation:

    Twenty years after the thalidomide disaster in the late 1950s, Blaschke et al. reported that only the (S)-enantiomer of thalidomide is teratogenic [jm: causing birth defects]. However, other work has shown that the enantiomers [“mirror” molecules] of thalidomide interconvert in vivo, which begs the question: why is teratogen activity not observed in animal experiments that use (R)-thalidomide given the ready in vivo racemization (“thalidomide paradox”)? Herein, we disclose a hypothesis to explain this “thalidomide paradox” through the in-vivo self-disproportionation of enantiomers. Upon stirring a 20% ee solution of thalidomide in a given solvent, significant enantiomeric enrichment of up to 98% ee was observed reproducibly in solution. We hypothesize that a fraction of thalidomide enantiomers epimerizes in vivo, followed by precipitation of racemic [equally mixed between R/S forms] thalidomide in (R/S)-heterodimeric form. Thus, racemic thalidomide is most likely removed from biological processes upon racemic precipitation in (R/S)-heterodimeric form. On the other hand, enantiomerically pure thalidomide remains in solution, affording the observed biological experimental results: the (S)-enantiomer is teratogenic, while the (R)-enantiomer is not.

    Tags: chirality thalidomide molecules drugs medicine papers chemistry

UK passes the Online Safety Act

  • UK passes the Online Safety Act

    Apparently “The Online Safety Act applies to every service which handles user-generated content and has “links to the UK”, with a few limited exceptions listed below. The scope is extraterritorial (like the GDPR) so even sites entirely operated outside the UK are in scope if they are considered to have “links to the UK”.”

    A service has links to the UK if any of the following apply: – the service has a “significant number” of UK users – UK users form one of the target markets for the service – the service is accessible to UK users and “there are reasonable grounds to believe that there is a material risk of significant harm to individuals in the UK” (this seems less likely to apply for smaller services but who knows)

    Tags: osa uk safety regulations ofcom

Why did Silicon Valley turn right?

  • Why did Silicon Valley turn right?

    A great essay on the demise of the 1990s/2000s liberal consensus in Silicon Valley:

    No-one now believes – or pretends to believe – that Silicon Valley is going to connect the world, ushering in an age of peace, harmony and likes across nations. […] A decade ago, liberals, liberaltarians and straight libertarians could readily enthuse about “liberation technologies” and Twitter revolutions in which nimble pro-democracy dissidents would use the Internet to out-maneuver sluggish governments. Technological innovation and liberal freedoms seemed to go hand in hand. Now they don’t. Authoritarian governments have turned out to be quite adept for the time being, not just at suppressing dissidence but at using these technologies for their own purposes. Platforms like Facebook have been used to mobilize ethnic violence around the world, with minimal pushback from the platform’s moderation systems […] My surmise is that this shift in beliefs has undermined the core ideas that held the Silicon Valley coalition together. Specifically, it has broken the previously ‘obvious’ intimate relationship between innovation and liberalism. I don’t see anyone arguing that Silicon Valley innovation is the best way of spreading liberal democratic awesome around the world any more, or for keeping it up and running at home. Instead, I see a variety of arguments for the unbridled benefits of innovation, regardless of its benefits for democratic liberalism. I see a lot of arguments that AI innovation in particular is about to propel us into an incredible new world of human possibilities, provided that it isn’t restrained by DEI, ESG and other such nonsense. Others (or the same people) argue that we need to innovate, innovate, innovate because we are caught in a technological arms race with China, and if we lose, we’re toast. Others (sotto or brutto voce; again, sometimes the same people) – contend innovation isn’t really possible in a world of democratic restraint, and we need new forms of corporate authoritarianism with a side helping of exit, to allow the kinds of advances we really need to transform the world.

    Tags: essays henry-farrell tech politics silicon-valley fascism democracy liberalism

Black plastic won’t kill you

  • Black plastic won’t kill you

    How a simple math error sparked a panic about toxic chemicals in black plastic kitchen utensils:

    Plastics rarely make news like this. From Newsmax to Food and Wine, and from the Daily Mail to CNN, the media uptake was enthusiastic on a paper published in October in the peer-reviewed journal Chemosphere. “Your cool black kitchenware could be slowly poisoning you, study says. Here’s what to do,” said the LA Times. “Yes, throw out your black spatula,” said the San Francisco Chronicle. Salon was most blunt: “Your favorite spatula could kill you,” it said. [….] The paper correctly gives the reference dose for BDE-209 as 7,000 nanograms per kilogram of body weight per day, but calculates this into a limit for a 60-kilogram adult of 42,000 nanograms per day. So, as the paper claims, the estimated actual exposure from kitchen utensils of 34,700 nanograms per day is more than 80 per cent of the EPA limit of 42,000. That sounds bad. But 60 times 7,000 is not 42,000. It is 420,000. This is what Joe Schwarcz [director of McGill University’s Office for Science and Society] noticed. The estimated exposure is not even a tenth of the reference dose.

    (tags: cooking research science plastics errors maths math fail papers)

ntfy.sh

  • ntfy.sh

    Send push notifications to your phone via PUT/POST. “a simple HTTP-based pub-sub notification service. It allows you to send notifications to your phone or desktop via scripts from any computer, and/or using a REST API. It’s infinitely flexible, and 100% free software.”

    I’ve been using a personal Slack for this purpose, but this is a decent-sounding alternative.

    (tags: notification push alerting open-source android ios push-messaging)

Pleias language models

  • Pleias language models

    OK, this is quite cool: “the first ever [language] models trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license. These are the first fully EU AI Act compliant models. In fact, Pleias sets a new standard for safety and openness.”

    Training large language models required copyrighted data until it did not. Today we release Pleias 1.0 models, a family of fully open small language models. Pleias 1.0 models include three base models: 350M, 1.2B, and 3B parameters. They feature two specialized models for knowledge retrieval with unprecedented performance for their size on multilingual Retrieval-Augmented Generation, Pleias-Pico (350M parameters) and Pleias-Nano (1.2B parameters). […] Our models are: * multilingual, offering strong support for multiple European languages; * safe, showing the lowest results on the toxicity benchmark; * performant for key tasks, such as knowledge retrieval; * able to run efficiently on consumer-grade hardware locally (CPU-only, without quantisation) Pleias 1.0 family embodies a new approach to specialized small language models, for end applications: wound-up models. We have implemented a set of ideas and solutions during pretraining that produce a frugal yet powerful language model specifically optimized for further RAG implementations. We release two wound-up models further trained for Retrieval Augmented Generation (RAG): Pleias-pico-350m-RAG and Pleias-nano-1B-RAG. These models are designed to be implemented locally, so we prioritized frugal implementation. As our models are small, they can run smoothly, even on devices with limited RAM.

    And here’s their fully open training set: https://huggingface.co/datasets/PleIAs/common_corpus

    (tags: llms models huggingface ai pleias rag ai-act open-data)

UK benefits AI system found to show bias

  • UK benefits AI system found to show bias

    File this under “the least surprising news ever”:

    An artificial intelligence system used by the UK government to detect welfare fraud is showing bias according to people’s age, disability, marital status and nationality, the Guardian can reveal. An internal assessment of a machine-learning programme used to vet thousands of claims for universal credit payments across England found it incorrectly selected people from some groups more than others when recommending whom to investigate for possible fraud.

    The most interesting aspect of the report published is that currently “there is no established numerical or statistical benchmark at which referral or outcome disparity can be defined as within tolerance”.

    I would have assumed a lack of bias, measured against a “false positive” rate — ie. benefits recipients who were selected for additional checks, who were then found to be legitimate and not committing fraud, should have been a design goal, and a critical KPI for such a system.

    There are going to be a lot of similar examples in the years to come — here’s hoping this “bias measurement” KPI becomes established as a concept.

    (tags: bias ai kpis dwp uk benefits welfare fraud ml)

Ridding My Home Network of IP Addresses

Ridding My Home Network of IP Addresses

(Republishing this one on the blog, instead of just as a gist)

Recent changes in the tech scene have made it clear that relying on commercial companies to provide services I rely on isn’t a good strategy in the long term, and given that Tailscale is so effective these days as a remote-access system, I’ve gradually been expanding a small collection of self-hosted web apps and services running on my home network.

Until now they’ve mainly been addressed using their IP addresses and random high ports on the internal LAN, for example:

  1. Pihole: http://10.19.72.7/admin
  2. Home Assistant: http://10.19.72.11:8123/
  3. Linkding: http://10.19.72.6:9092/
  4. Grafana: http://10.19.72.6:3000/
  5. (plus a good few others)

Needless to say this is a bit messy and inelegant, so I’ve been planning to sort it out for a while. My requirements:

  1. no more ugly bare IP addresses!
  2. a DNS domain;
  3. with HTTPS URLs;
  4. one per service;
  5. no visible port numbers;
  6. fully valid TLS certs, no having to click through warnings or install funny CA certs;
  7. accessible regardless of which DNS server is in use — ie. using public DNS records. This may seem slightly unusual, but it’s useful so that the internal services can still be accessed when I’m using my work VPN (which forces its own DNS servers);
  8. accessible internally;
  9. accessible externally, over Tailscale;
  10. not accessible externally without Tailscale.

After a few false starts, I’m pretty happy with the current setup, which uses Caddy.

Hosting The Domain At Cloudflare

First off, since the service URLs are not to be accessible externally without Tailscale active, the HTTP challenge approach to provision Let’s Encrypt certs cannot be used. That would require an open-to-the-internet publicly-accessible HTTP server on my home network, which I absolutely want to avoid.

In order to use the ACME DNS challenge instead, I set up my public domain "taint.org" to use Cloudflare as the authoritative DNS server (in Cloudflare terms, "full setup"). This lets Caddy edit the DNS records via the Cloudflare API to handle the ACME challenge process.

One of the internal hosts is needed to run the Caddy server’s reverse proxies; I picked "hass", 10.19.72.11, the Home Assistant host, which didn’t have anything already running on port 80 or port 443. (All of my internal hosts are running on a private /24 IP range, at 10.19.72.0/24.)

The dedicated DNS domain I’m using for my home services is "home.taint.org". In order to use this, I clicked through to the Cloudflare admin panel and created a DNS record as follows:

Type   Name      Content             Proxy Status               TTL
A      *.home    10.19.72.11         DNS only - reserved IP     Auto

Now, any hostnames under "home.taint.org" will return the IP 10.19.72.11 (where Caddy will run).

I don’t particularly care about exposing my internal home network IPs to the world, as a trade-off to allow the URLs to work even if an internal host is using the work VPN, or resolving with 8.8.8.8, or whatever. That’s worth missing out on a little bit of paranoia, since the IPs won’t be accessible from outside without Tailscale anyway.

It is worth noting that the Cloudflare-hosted domain doesn’t have to be the same one used for URLs in the home network; using dns_challenge_override_domain you can delegate the ACME challenge from any "home" domain to one which is hosted in Cloudflare.

The Caddy Setup

One wrinkle is that I had to generate a custom Caddy build in order to get the "dns.providers.cloudflare" non-standard module, from https://caddyserver.com/download . This is a click-and-download page which generates a custom Caddy binary on the fly. It would have been nicer if the Cloudflare module was standard, but hey.

Once that’s installed, I can get this output:

$ /usr/local/bin/caddy list-modules
[long list of standard modules omitted]

dns.providers.cloudflare
dns.providers.route53

  Non-standard modules: 2

  Unknown modules: 0

(Yes, I have Caddy running as a normal service, not as a Docker container. No particular reason; I think Docker should work fine.)

Go to the Cloudflare account dashboard, and create a user API token as described at https://developers.cloudflare.com/fundamentals/api/get-started/create-token/ . In my case, it has Zone / DNS / Edit permission, on the specific zone taint.org.

Copy that token as it’s needed in the "Caddyfile", which now looks like the following:

hass.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.11:8123
}

links.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.6:9092
}

pi.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        redir / /admin/
        reverse_proxy /admin/* 10.19.72.7:80
}

grafana.home.taint.org {
        tls {
                dns cloudflare cloudflare_api_token_goes_here
        }
        reverse_proxy /* 10.19.72.6:3000
}

[many other services omitted]

Running sudo caddy run in the same dir will start up and verbosely log what it’s doing. (Once you’re happy enough, you can get Caddy running in the normal systemd service way.)

After setting those up, I now have my services accessible locally as:

  1. Home Assistant: https://hass.home.taint.org/
  2. Pihole: https://pi.home.taint.org/
  3. Grafana: https://grafana.home.taint.org/
  4. Linkding: https://links.home.taint.org/

Caddy seamlessly goes off and configures fully valid TLS certs with no fuss. I found it much tidier than Certbot, or Nginx Proxy Manager.

The Tailscale Setup

So this has now sorted out all of the requirements bar one:

  1. accessible externally, over Tailscale.

To do this I had to log into Tailscale’s admin console and go to https://login.tailscale.com/admin/machines , pick a host on the 10.19.72/24 internal LAN, click it’s dropdown menu and "Edit Route Settings…", and enable a Subnet Route for 10.19.72/24. By doing this, all of the service.home.taint.org DNS records are now accessible, remotely, once Tailscale is enabled; I don’t even need to use ts.net names to access them! Perfect.

Anyway, that’s the setup — hopefully this writeup will help others. And kudos to Caddy, Let’s Encrypt and Tailscale for making this relatively easy.

GenCast

  • GenCast

    Google DeepMind announce their new AI model for weather forecasting, in collaboration with the ECMWF:

    Today, in a paper published in Nature, we present GenCast, our new high resolution (0.25°) AI ensemble model. GenCast provides better forecasts of both day-to-day weather and extreme events than the top operational system, the European Centre for Medium-Range Weather Forecasts’ (ECMWF) ENS, up to 15 days in advance. We’ll be releasing our model’s code, weights, and forecasts, to support the wider weather forecasting community. […] GenCast is a diffusion model, the type of generative AI model that underpins the recent, rapid advances in image, video and music generation. However, GenCast differs from these, in that it’s adapted to the spherical geometry of the Earth, and learns to accurately generate the complex probability distribution of future weather scenarios when given the most recent state of the weather as input. To train GenCast, we provided it with four decades of historical weather data from ECMWF’s ERA5 archive. This data includes variables such as temperature, wind speed, and pressure at various altitudes. The model learned global weather patterns, at 0.25° resolution, directly from this processed weather data.
    It’s open source: https://github.com/google-deepmind/graphcast And here are the open-released model weights: https://console.cloud.google.com/storage/browser/dm_graphcast Graphcast (the previous iteration) has public forecasts published at https://charts.ecmwf.int/?query=GraphCast , under a CC-BY-NC-SA-4 licence — it would be great if the GenCast forecasts join this data set. Paper: https://arxiv.org/abs/2312.15796 This all looks really great, a fantastic commitment to (genuine) openness and open data, and the paper seems rigorous (to this amateur). Great stuff.

    (tags: forecasting weather ai gencast graphcast deepmind google ecmwf genai)

TikTok in hot water over Romanian elections

  • TikTok in hot water over Romanian elections

    ‘We are getting fed up’: EU lawmakers snap at TikTok over Romanian election:

    For years, the Chinese-owned social media app has brushed off security concerns in the United States and Europe that it could be used for mass manipulation and espionage. It now faces an intense regulatory storm in Bucharest over whether it played a role in skewing the democratic process in an EU country and NATO member of 19 million people. [….] “Honestly speaking, we are getting fed up by the documents and the empty promises,” Swedish center-right European lawmaker Arba Kokalari said near the end of the hearing.

    (tags: tiktok elections romania eu bias news propaganda democracy social-media)

noyb is now qualified to bring collective redress actions

  • noyb is now qualified to bring collective redress actions

    “noyb is now approved as a so-called “Qualified Entity” to bring collective redress actions in courts throughout the European Union. Such action under Directive (EU) 2020/1828 can either be an “injunction” or a “redress” measure. “Injunctions” generally prohibit a company from engaging in illegal practices, including any GDPR violations. “Redress” measures allow a European version of a “Class Action”, where thousands or millions of users could be represented by noyb and for example ask for non-material damages when their personal data was unlawfully processed.” This is very interesting — and timely, given the mass scraping of user data to feed AI training sets…

    (tags: noyb data-privacy data-protection class-actions law eu collective-redress)

Privacy Disasters: FaceHuggers Are Eating Your Skeets

The Buddhabrot

  • The Buddhabrot

    This was news to me! There’s another fractal pattern derived from the Mandelbrot set which I’d never seen before:

    As it turns out, it’s not just the boundary of the Mandelbrot set that’s mind-bogglingly complex: the same goes for the (xn, yn) escape trajectories associated with the (u, v) pixels near the set’s edge. The iterated coordinates follow elaborate, long-winded paths through space; their ethereal trails form a density plot reminiscent of the Mandelbrot fractal itself.

    (tags: fractals mandelbrot buddhabrot graphics maths via:lcamtuf)

Rewilding fields massively improved bumblebee numbers in Scotland

  • Rewilding fields massively improved bumblebee numbers in Scotland

    “Bumblebee population increases 116 times over in ‘remarkable’ Scotland project”:

    Rewilding Denmarkfield, a 90-acre project based just north of Perth, has been working to restore nature to green spaces in an increasingly built up area for the past two years. Statistics from the charity show in 2021, when some of the fields managed by the project were still barley monoculture, only 35 bumblebees were counted. But by 2023, after just two years of nature restoration work in the same fields, the population increased to 4,056. The diversity of bumblebee also doubled, according to the charity, from five to ten different species.

    (tags: bees bumblebees scotland fields farming rewilding fallow nature)

WeSQL

  • WeSQL

    “an innovative MySQL distribution that adopts a compute-storage separation architecture, with storage backed by S3 (and S3-compatible systems). WeSQL has completely replaced MySQL’s traditional disk storage with S3. All MySQL data—binlogs, schemas, storage engine metadata, WAL, and data files—are entirely (not partially!) stored as objects in S3. The 11 nines of durability provided by S3 significantly enhances data reliability. Additionally, WeSQL can start from a clean, empty instance, connect to S3, load the data, and begin serving immediately with no additional setup required. It is ideal for users who need an easy-to-manage, cost-effective, and developer-friendly MySQL database solution, especially for those needing support for both Serverless and BYOC (Bring Your Own Cloud).” (via Ian on ITC)

    (tags: mysql s3 object-storage storage databases sql)

Reversing.Works Investigation Exposes Glovo’s Data Privacy Violations

  • Reversing.Works Investigation Exposes Glovo’s Data Privacy Violations

    Ha, this is great:

    Reversing.Works, an innovative project dedicated to exposing abuses within gig economy platforms, uncovered significant labour law violations within Glovo’s algorithmic management system and provided critical evidence for an investigation by the Italian Data Protection Authority. After a year-long investigation, the DPA fined Glovo 5 Million €, and demanded corrective action from the platform. Glovo’s algorithmic management system was found to have misused workers’ personal data in ways that violated labour law, including monitoring workers’ movements outside of their work shifts, keeping hidden scores on workers, and sending detailed monitoring of their work to third parties outside the scope of their contracts. This was a mixed violation of both Italian labour law and the General Data Protection Regulation (GDPR). Reversing.Works’ investigation, using sophisticated reversing engineering techniques, sheds light on the hidden mechanics that drive the platform’s model of operation, and perhaps additional business dynamics. […] “It’s surprising that unions never used a tool like this,” says Gaetano Priori, the lead investigator at Reversing.Works. “Privacy is an individual right, so it hasn’t been seen as a tool for labour struggles. But it has potential in digitally-intermediated labour because one violation could affect all the workers in all the regions in which a company operates.” Reversing.Works has shown how GDPR and tech-enabled investigation can help expose bad practices and create fairer working conditions. This case is a call to action for all gig workers, showing that existing legal tools can be used for the collective good. Priori adds, “This should be a wake-up call for all workers managed by technology. With GDPR and tech, we have the means to challenge unfair practices.”

    (tags: reverse-engineering gdpr data-protection data-privacy gig-economy glovo italy unions)

Generative AI Pushes Outcome Over Process (And This Is Why I Hate It)

  • Generative AI Pushes Outcome Over Process (And This Is Why I Hate It)

    This is a really interesting point about education and learning, in general:

    AI technology is based on the idea that the important part of creating things is the outcome, not the process. Can’t draw? That shouldn’t stop you from making a picture. Worried about your writing? Why should that stop you from handing in a coherent essay? The ads for AI all promise that you’ll be able to produce things without all the tedious work of actually producing it – isn’t that great?  Well no, it’s not – it’s terrible. It betrays a fundamental misunderstanding of why creating things has value. It’s terrible in general, but I am especially offended by this idea in the context of education, and in this post I want to lay this idea out in a little detail. 

    (tags: education learning ai process-vs-outcome working how-we-work)