Justin's Linklog – Page 4 – (Things I found interesting recently.)

Prototype Fund

Published November 7, 2024

Prototype Fund

This looks great!
The first low-threshold funding program for independent developers and small teams creating innovative open-source software. We provide the tech-savvy civil society with access to the resources and processes needed for developing user-centered, innovative software projects. Since 2016, we have funded almost 400 projects. As a learning funding program, we have repeatedly made adjustments to become more efficient and effective. Now we are taking the next step and implement some significant changes. From now on, we are focusing on funding data security and software infrastructure. Apply with your ideas for innovative open source software in the public interest! You will receive up to €95,000 over six months or €158,000 over ten months of funding from the German Ministry of Education and Research. We will also provide you with coaching, consulting and networking opportunities.

(tags: funding open-source oss via:janl)

GOV.UK chatbot halted by hallucinations

Published November 7, 2024

GOV.UK chatbot halted by hallucinations

“AI firms must address hallucinations before GOV.UK chatbot can roll out, digital chief claims”:
Trials of a generative AI-powered chatbot for GOV.UK users have found ongoing issues with so-called hallucinations that must be addressed before the technology can be widely deployed, according to one of the government’s digital leaders. [….] Speaking at an event this morning, Paul Willmott said: “We have experimented with a generative advice [tool] on GOV.UK. You will just say ‘I’m trying to do this’, or ‘I’m annoyed about this’… The challenge we are having – which is exactly the same as in the commercial sector – is what to do with the 1% of hallucinations where the agent starts to get challenging, or abusive – or even seductive.” Even if only present in a tiny minority of instances, these issues mean that GOV.UK Chat is not yet ready for widespread deployment, according to Willmott. Addressing hallucinations will require the support of the likes of OpenAI and other creates of large language models. “Until we have managed to iron that out – which will require the support of the foundational model creators – we won’t be able to put this live,” he said.
This is hardly surprising, but it’s good to see it being acknowledged and the brakes being applied.
(tags: ai llms hallucations confabulation gov.uk chatbots chatgpt uk)

How the New sqlite3_rsync Utility Works

Published November 6, 2024

How the New sqlite3_rsync Utility Works

“I’ve enjoyed following the development of the new sqlite3_rsync utility in the SQLite project. The utility employs a bandwidth-efficient algorithm to synchronize new and modified pages from an origin SQLite database to a replica. You can learn more about the new utility here and try it out by following the instructions here. Curious about its workings, I reviewed the code” Interesting use of a truncated SHA-3 as the hash() implementation, for speed.

(tags: sqlite hashing rsync synchronization replication databases storage algorithms)

Using BlueSky as a Mastodon Bot

Published November 5, 2024

Using BlueSky as a Mastodon Bot

“A Cheap and Lazy way to create Mastodon Bots using… BlueSky?!” By using the brid.gy gateway service, it’s pretty trivial to use BlueSky as an easy means to make a mastodon bot without having to find a bot-friendly Masto host now that botsin.space is no more. For now, I’m doing this at @jmason.ie@bsky.brid.gy , which is gatewaying the posts from my BlueSky bot at https://bsky.app/profile/jmason.ie — although a more long term approach will be to host the links-to-Mastodon gateway “natively” instead of using brid.gy, IMO.

(tags: mastodon rss gateways social-media bluesky brid.gy bots linkblog)

Zuckerberg: The AI Slop Will Continue Until Morale Improves

Published November 4, 2024

Zuckerberg: The AI Slop Will Continue Until Morale Improves

Well this is just garbage, and one reason why I no longer use Facebook:
Both Facebook and Instagram are already going this way, with the rise of AI spam, AI influencers, and armies of people copy-pasting and clipping content from other social media networks to build their accounts. This content and this system, Meta said, has led to an 8 percent increase in time spent on Facebook and a 6 percent increase in time spent on Instagram, all at the expense of a shared reality and human connections to other humans. In the earnings call, Zuckerberg and Susan Li, Meta’s CFO, said that Meta has already slop-ified its ad system and said that more than 1 million businesses are now creating more than 15 million ads per month on Meta platforms using generative AI.

(tags: slop facebook ai meta social media grim instagram)

Misusing the BIG-Bench canary string

Published November 4, 2024

Misusing the BIG-Bench canary string

Interesting; this blog post discusses using the BIG-Bench canary string, intended to keep data like accuracy test cases out of LLM training corpora, as a general-purpose “don’t scrape me” flag on personal blogs. This seems like a more practical, and more likely to be observed, way to opt out of AI training — seeing as the scrapers don’t seem to reliably honour any of the others

(tags: blogging canaries opt-out scraping web ai llm openai chatgpt claude bing)

Capability Feature Flags for Backward Compatibility

Published October 30, 2024

Capability Feature Flags for Backward Compatibility

Good reference blog post for a design approach I like for APIs; instead of using numeric version attributes and mapping “version=4” means “supports feature foo”, use a capability flag of “supports_foo=1”.

(tags: apis design coding capabilities feature-flags flags)

Canary Contamination in GPT-4

Published October 29, 2024

Canary Contamination in GPT-4

The BIG-Bench canary string is an EICAR- or GTUBE-style canary string which should never appear in LLM training datasets, or by extension, in trained models or their output. Its intention is that any test documents containing that string can be excluded from training, so that benchmark tests will be accurate. Unfortunately, it looks like they weren’t excluded — Claude 3.5 Sonnet and GPT-4-base will reproduce the string; and:
Of 19 tested [benchmarking] tasks, GPT-4-base perfectly recalled large (non-trivial) portions of code for: The Abstraction and Reasoning Corpus; Simple arithmetic; Diverse Metrics for Social Biases in Language Models; Convince Me
Great work. In case you were wondering why the LLMs all seem to do so well on their benchmarks, now you know — they were training on the test data.
(tags: ai llm testing benchmarking big-bench gpt-4 claude)

Reverse engineering ML models from TikTok and Instagram

Published October 23, 2024

Reverse engineering ML models from TikTok and Instagram

This is very clever; _A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok_, from University of Wisconsin-Madison and the Technical Unversity of Munich. TikTok and Insta both use local ML models running on users’ phones; by reverse engineering these APIs it’s possible to test them and experiment on their accuracy.
Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.

(tags: tiktok instagram ml machine-learning accuracy testing reverse-engineering reversing mobile android)

Hedge Funds Bet Against Clean Energy

Published October 22, 2024

Hedge Funds Bet Against Clean Energy

Hooray! Capitalism has decided to kill off the humans:
Despite vast green stimulus packages in the US, Europe and China, more hedge funds are on average net short batteries, solar, electric vehicles and hydrogen than are long those sectors; and more funds are net long fossil fuels than are shorting oil, gas and coal, according to a Bloomberg News analysis of positions voluntarily disclosed by roughly 500 hedge funds to Hazeltree, a data compiler in the alternative investment industry.

(tags: hedge-funds capitalism short-selling clean-energy green future climate-change)

Excellent thread on SMRs

Published October 18, 2024

Excellent thread on SMRs

After Google and Kairos announced a deal to produce and consume nuclear power from future nuclear Small Modular Reactors, this discussion took place. Full of good info from Adrian Cockroft

(tags: smrs nuclear nuclear-power future sustainability david-gerard adrian-cockroft power)

Bert Hubert on Nuclear power in the EU

Published October 18, 2024

Bert Hubert on Nuclear power in the EU

“Nuclear power: no, yes, maybe, but not like this”:
Currently many (European) countries are individually trying to order up new nuclear power, from many different places. But it appears we can’t treat nuclear reactors like (say) cars you can just procure. If we’d want to do this right, it is probably indeed better to not simply try to order stuff, but to engender a nuclear revival. To not simply point our fingers at Framatome and EDF and say “do better!”. What if we actually made this a European or transatlantic project, and add the vast expertise that is still hidden within our institutes, and indeed setup a project for building 50 nuclear reactors, or more? This would allow a broad base of research that would derisk the process, so we don’t necessarily find out after 15 years of construction that the design is too complicated. And perhaps also not try to pretend that we are leaving this to the free market, but recognize this as a public activity. Doing it like this would require governments, institutes and companies to think different, and I’m reasonably sure we can’t even get this done between a few like-minded countries. Most definitely the EU would not reach consensus on this, since Germany is fundamentally opposed to anything nuclear ever.

(tags: bert-hubert nuclear nukes nuclear-power eu future sustainability)

BackJoy PLUS TEMPUR Posture Seat

Published October 18, 2024

BackJoy PLUS TEMPUR Posture Seat

“Naturally relieves back pain by optimizing sitting posture, featuring TEMPUR proprietary foam that provides the best possible support, pressure relief, and comfort when sitting for long periods of time”. As recommended by Hideo Kojima!

(tags: back-pain seats sitting comfort health posture via:hideo-kojima backjoy)

Bluesky Bots

Published October 18, 2024

Bluesky Bots

awesome, looks like a bot gatewaying this linkblog to bsky is entirely feasible

(tags: bluesky bots blogging todo)

The “ASCII Smuggling” Attack

Published October 16, 2024

The “ASCII Smuggling” Attack

Invisible text that AI chatbots understand and humans can’t?
What if there was a way to sneak malicious instructions into Claude, Copilot, or other top-name AI chatbots and get confidential data out of them by using characters large language models can recognize and their human users can’t? As it turns out, there was—and in some cases still is.
Attackers used prompt injection, hidden in (untrusted) emails sent to a Microsoft 365 Copilot user; when the email is summarized using Copilot, “inside the emails are instructions to sift through previously received emails in search of the sales figures or a one-time password and include them in a URL pointing to his web server.” The sensitive data is then steganographically encoded using Unicode “tags block” invisible codepoints, and included in the seemingly-innocent URL. Yet another case where AI developers have failed to study security history — using untrusted input for in-band signalling has been a security risk since the days of phracking; and allowing the entire list of permitted output characters across the entire Unicode range, instead of locking down to a safe subset, allows this silent exfiltration attack. Extra sting in the tail for Amazon: the researchers didn’t even bother testing on their LLM :)
(tags: ai security steganography exfiltration copilot microsoft openai llms claude infosec attacks exploits)

Does Open Source AI really exist?

Published October 16, 2024

Does Open Source AI really exist?

This is absolutely spot on:
“Open Source AI” is an attempt to “openwash” proprietary systems. In their paper “Rethinking open source generative AI: open-washing and the EU AI Act” Andreas Liesenfeld and Mark Dingemanse showed that many “Open Source” AI models offer hardly more than open model weights. Meaning: You can run the thing but you don’t actually know what it is. Sounds like something we’ve already had: It’s Freeware. The Open Source models we see today are proprietary freeware blobs. Which is potentially marginally better than OpenAI’s fully closed approach but really only marginally. […] “Open Source” is becom[ing] a sticker like “Fair Trade”, something to make your product look good and trustworthy. To position it outside of the evil commercial space, giving it some grassroots feeling. “We’re in this together” and shit. But we’re not. We’re not in this with Mark fucking Zuckerberg even if he gives away some LLM weights for free cause it hurts his competition. We, as normal people living on this constantly warmer planet, are not with any of those people.
As tante notes here, for the systems we are talking about today, Open Source AI isn’t practically possible, because we’ll never be able to download all the actual training data — and shame on the OSI for legitimising this attempt at “openwashing”.
(tags: llms open-source osi open-source-ai ai freeware meta training)

Obituary for Ward Christensen

Published October 15, 2024

Obituary for Ward Christensen

“Ward Christensen, BBS inventor and architect of our online age, dies at age 78”:
On Friday, Ward Christensen, co-inventor of the computer bulletin board system (BBS), died at age 78 in Rolling Meadows, Illinois. Christensen, along with Randy Suess, created the first BBS in Chicago in 1978, leading to an important cultural era of digital community-building that presaged much of our online world today. Prior to creating the first BBS, Christensen invented XMODEM, a 1977 file transfer protocol that made much of the later BBS world possible by breaking binary files into packets and ensuring that each packet was safely delivered over sometimes unstable and noisy analog telephone lines. It inspired other file transfer protocols that allowed ad-hoc online file sharing to flourish. While Christensen himself was always humble about his role in creating the first BBS, his contributions to the field did not go unrecognized. In 1992, Christensen received two Dvorak Awards, including a lifetime achievement award for “outstanding contributions to PC telecommunications.” The following year, the Electronic Frontier Foundation honored him with the Pioneer Award.

(tags: bbses history computing ward-christensen xmodem networking filesharing)

Brian Merchant on “AI will solve climate change”

Published October 15, 2024

Brian Merchant on “AI will solve climate change”

The neo-luddite author of “Blood in the Machine” nails the response to Eric Schmidt’s pie-in-the-sky techno-optimism around AI “solving” climate change:
Even without AGI, we already know what we have to do. […] The tricky part—the only part that matters in this rather crucial decade for climate action—is implementation. As impressive as GPT technology or the most state of the art diffusion models may be, they will never, god willing, “solve” the problem of generating what is actually necessary to address climate change: Political will. Political will to break the corporate power that has a stranglehold on energy production, to reorganize our infrastructure and economies accordingly, to push out oil and gas. Even if an AGI came up with a flawless blueprint for building cheap nuclear fusion plants—pure science fiction—who among us thinks that oil and gas companies would readily relinquish their wealth and power and control over the current energy infrastructure? Even that would be a struggle, and AGI’s not going to doing anything like that anytime soon, if at all. Which is why the “AI will solve climate change” thinking is not merely foolish but dangerous—it’s another means of persuading otherwise smart people that immediate action isn’t necessary, that technological advancements are a trump card, that an all hands on deck effort to slash emissions and transition to proven renewable technologies isn’t necessary right now. It’s techno-utopianism of the worst kind; the kind that saps the will to act.

(tags: ai climate eric-schmidt technology techno-optimism techno-utopianism agi neoluddism brian-merchant)

Capture less than you create

Published October 15, 2024

Capture less than you create

I’ve disagreed with David Heinemeier Hansson on plenty of occasions in the past, but this is one where I’m really happy to find myself in agreement. Matt Mullenwegg of WordPress went low, laying in digs about how DHH didn’t profit from the success of Rails; DHH’s response is perfect:
The moment you go down the path of gratitude grievances, you’ll see ungrateful ghosts everywhere. People who owe you something, if they succeed. A ratio that’s never quite right between what you’ve helped create and what you’ve managed to capture. If you let it, it’ll haunt you forever. So don’t! Don’t let the success of others diminish your satisfaction with your own efforts. Unless you’re literally Mark Zuckerberg, Elon Musk, or Jeff Bezos, there’ll always be someone richer than you! The rewards I withdraw from open source flow from all the happy programmers who’ve been able to write Ruby to build these amazingly successful web businesses with Rails. That enjoyment only grows the more successful these business are! The more economic activity stems from Rails, the more programmers will be able to find work where they might write Ruby. Maybe I’d feel different if I was a starving open source artist holed up somewhere begrudging the wheels of capitalism. But fate has been more than kind enough to me in that regard. I want for very little, because I’ve been blessed sufficiently. That’s a special kind of wealth: Enough. And that’s also the open source spirit: To let a billion lemons go unsqueezed. To capture vanishingly less than you create. To marvel at a vast commons of software, offered with no strings attached, to any who might wish to build. Thou shall not lust after thy open source’s users and their success.
Spot on.
(tags: open-source success rewards coding software business life gratitude gift-economy dhh rails philosophy)

Suppressing generated files in GitHub pull requests

Published October 14, 2024

Suppressing generated files in GitHub pull requests

This is a handy feature. If you have to check in generated files for some reason, you can mark them as generated in Github using this .gitattributes setting (via Tomasz Nurkiewicz)

(tags: via:nurkewicz git github code pull-requests)

GSM-Symbolic

Published October 12, 2024

GSM-Symbolic

“GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models”, from Apple Machine Learning Research:
We investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn’t contribute to the reasoning chain needed for the final answer.
Even better — “the performance of all models declines when only the numerical values in the question are altered” seems to suggest that great performance on benchmarks like GSM8K just mean that the LLMs have been trained on the answers…
(tags: training benchmarks ai llms gsm-symbolic reasoning ml apple papers gsm8k)

Shitposting, Shit-mining and Shit-farming

Published October 10, 2024

Shitposting, Shit-mining and Shit-farming

This is where we are with surveillance capitalism and Facebook/X:
Social media platforms are improved by a moderate tincture of shitposting. More than a few drops though, and the place begins to stink up, driving away advertisers and users. This then leads platform executives to explore the exciting opportunities of shit-mining. Social media generates a lot of content – it’s gotta be valuable somehow! Who needs content moderation if you can become a guano baron? But that only makes things worse, driving out more users and more advertisers, until eventually, you may find yourself left with a population dominated by two kinds of users (a) chumps, and (b) chump-vampirizing obligate predators. This can be a stable equilibrium – even quite a profitable one! But otherwise, it isn’t good news.
See also a recent story in the Garbage Day newsletter (https://www.garbageday.email/p/what-feels-real-enough-to-share) about Facebook, and how its disaster-relief FB groups are becoming overrun with AI slop images:
The Verge’s Nilay Patel recently summed up the core tension here, writing on Threads about YouTube’s own generative-AI efforts, “Every platform company is about to be at war with itself as the algorithmic recommendation AI team tries to fight off the content made by the generative AI team.” And it’s clear, at least with Meta, which side is winning the war. This week, Meta proudly announced a new video-generating tool that will make AI misinfo even more convincing — or, at least, better at generating things that feel true. And there’s really only one way to look at all of this. Meta simply does not give a shit anymore. Facebook spent most of the 2010s absorbing, and destroying, not just local journalism in the US, but the very infrastructure of how information is transmitted across the country. And they have clearly lost interest in maintaining that. Users, of course, have no where else to go, so they’re still relying on it to coordinate things like hurricane disaster relief. But the feeds are now — and seemingly forever will be — clogged with AI junk. Because you cannot be a useful civic resource and also give your users a near-unlimited ability to generate things that are not real. And I don’t think Meta are stupid enough to not know this. But like their own users, they have decided that it doesn’t matter what’s real, only what feels real enough to share.
Given that Meta are _paying_ users to pollute their platform with low-grade AI slop engagement fuel, shit-farming seems the perfect term for that.
(tags: garbage-day facebook meta ai ai-slop spam shitposting shitfarming shitmining dont-be-evil)

B-trees and database indexes

Published October 10, 2024

B-trees and database indexes

Lovely visual demonstrations of the B-tree and B+tree indexing data structures, used in MySQL, Postgres, MongoDB, Dynamo, and many others

(tags: algorithms indexes indexing databases storage b-trees b+trees binary-search-trees data-structures)

Fixing aggressive Xiaomi battery management

Published October 10, 2024

Fixing aggressive Xiaomi battery management

I’ve been using a Xiaomi phone recently, running Xiaomi HyperOS 1.011.0, and one feature that bugs me constantly is that apps lose state as soon as you flip away to another app, even if only for a second; once you flip back, the app restarts. This appears to be an aspect of Xiaomi’s built in power management. I’ve been searching for a way to disable it, and allow multiple apps in memory simultaneously, and I’ve finally tracked it down. As described here, https://piunikaweb.com/2021/04/19/miui-optimization-missing-in-developer-options-try-this-workaround/ , you need to enable Developer Mode on the phone, enter “Additional Settings” / “Developer options”, then scroll all the way down, nearly to the bottom, to “Reset to default values”. Hit this _repeatedly_ (once is not enough!) until another option appears just below, called either “Turn on MIUI optimisation” or, in my case, “Turn on system optimisation”; this is enabled by default. Turn it off. In my case, this has fixed the flipping-between-apps problem, the phone in general is significantly snappier to respond, and WhatsApp and Telegram new-message notifications don’t get auto-dismissed (which was another annoying feature previously). I suspect a load of battery optimisations and CPU throttling has been disabled. It remains to be seen what this does to my battery life, but hopefully it’ll be worth it, and it’ll be nice not to lose state in Chrome forms when I have to flip over to my banking app, etc. I won’t be getting another Xiaomi phone after this; there are numerous rough edges and outright bugs in the MIUI/HyperOS platform, at least in the international ROM images, and there’s no support or documentation to work around this stuff. It’s a crappy user experience.

(tags: phones mobile xiaomi miui workarounds battery options settings)

What If Data Is a Bad Idea?

Published October 9, 2024

What If Data Is a Bad Idea?

A thought-provoking article:
Philip Agre enumerated five characteristics of data that will help us achieve this repositioning. Agre argued that “living data” must be able to express 1. a sense of ownership, 2. error bars, 3. sensitivity, 4. dependency, and 5. semantics. Although he originally wrote this in the early 1990s, it took some time for technology and policy to catch up. I’m going to break down each point using more contemporary context and terminology: Provenance and Agency: what is the origin of the data and what can I do with it (ownership)? Accuracy: has the data been validated? If not, what is the confidence of its correctness (error bars)? Data Flow: how is data discovered, updated, and shared (sensitivity to changes)? Auditability: what data and processes were used to generate this data (dependencies)? Semantics: what does this data represent?

(tags: culture data identity data-protection data-privacy living-data open-data)

Ethical Applications of AI to Public Sector Problems

Published October 7, 2024

Ethical Applications of AI to Public Sector Problems

Jacob Kaplan-Moss:
There have been massive developments in AI in the last decade, and they’re changing what’s possible with software. There’s also been a huge amount of misunderstanding, hype, and outright bullshit. I believe that the advances in AI are real, will continue, and have promising applications in the public sector. But I also believe that there are clear “right” and “wrong” ways to apply AI to public sector problems.
He breaks down AI usage into “Assistive AI”, where AI is used to process and consume information (in ways or amounts that humans cannot) to present to a human operator, versus “Automated AI”, where the AI both processes and acts upon information, without input or oversight from a human operator. The latter is unethical to apply in the public sector.
(tags: ai ethics llm genai public-sector government automation)

Evading wireless tether speed caps

Published October 7, 2024

Evading wireless tether speed caps

Handy tip from Brian Krebs – if you are tethering to a mobile phone, and network speeds are mysteriously limited, it may be your provider is throttling tethering. Changing the TTL may help, since some providers in the US at least are using a really stupid mechanism to detect tethering

(tags: tethering mobile wireless networking ttl via:brian-krebs networks)

ClassicPress

Published October 7, 2024

ClassicPress

“A lightweight, stable, instantly familiar free open-source content management system. Based on WordPress without the block editor (Gutenberg).” Nobody seems to like the block editor, lol

(tags: cms wordpress blogs blogging forks)

Patent troll Sable pays up, dedicates all its patents to the public

Published October 3, 2024

Patent troll Sable pays up, dedicates all its patents to the public

This is a massive victory for Cloudflare — way to go!
Sable initially asserted around 100 claims from four different patents against Cloudflare, accusing multiple Cloudflare products and features of infringement. Sable’s patents — the old Caspian Networks patents — related to hardware-based router technologies common over 20 years ago. Sable’s infringement arguments stretched these patent claims to their limits (and beyond) as Sable tried to apply Caspian’s hardware-based technologies to Cloudflare’s modern software-defined services delivered on the cloud. […] Cloudflare fought back against Sable by launching a new round of Project Jengo, Cloudflare’s prior art contest, seeking prior art to invalidate all of Sable’s patents. In the end, Sable agreed to pay Cloudflare $225,000, grant Cloudflare a royalty-free license to its entire patent portfolio, and to dedicate its patents to the public, ensuring that Sable can never again assert them against another company.
(via AJ Stuyvenberg)
(tags: sable cloudflare patent-trolls patents uspto trolls routing)

ArchiveWeb.page

Published October 2, 2024

ArchiveWeb.page

“Interactive browser-based web archiving from Webrecorder. The ArchiveWeb.page browser extension and standalone application allows you to capture web archives interactively as you browse. After archiving your webpages, your archives can be viewed using ReplayWeb.page — no extension required! For those who need to crawl whole websites with automated tools, check out Browsertrix.” This is a nice way to archive a personal dynamic site online in a read-only fashion — there is a self-hosting form of the replayer at https://replayweb.page/docs/embedding/#self-hosting . As @david302 on the Irish Tech Slack notes: “you can turn on recording, browse the (public) site you want to archive, get the .wacz file and stick that+js on s3/cloudfront.”

(tags: archiving archival archives tools web recording replay via:david302)

Turning Everyday Gadgets into Bombs is a Bad Idea

Published September 22, 2024

Turning Everyday Gadgets into Bombs is a Bad Idea

Bunnie Huang investigates the Mossad pager bomb’s feasibility, and finds it deeply worrying:
I am left with the terrifying realization that not only is it feasible, it’s relatively easy for any modestly-funded entity to implement. Not just our allies can do this – a wide cast of adversaries have this capability in their reach, from nation-states to cartels and gangs, to shady copycat battery factories just looking for a big payday (if chemical suppliers can moonlight in illicit drugs, what stops battery factories from dealing in bespoke munitions?). Bottom line is: we should approach the public policy debate around this assuming that someday, we could be victims of exploding batteries, too. Turning everyday objects into fragmentation grenades should be a crime, as it blurs the line between civilian and military technologies.

(tags: batteries israel security terrorism mossad pagers hardware devices bombs)

Modal interfaces considered harmful

Published September 19, 2024

Modal interfaces considered harmful

A great line from the 99 Percent Invisible episode titled “Children of the Magenta (Automation Paradox, pt. 1)”, regarding the Air France flight 447 disaster:
When one of the co-pilots hauled back on his stick, he pitched the plane into an angle that eventually caused the stall. […] it’s possible that he didn’t understand that he was now flying in a different mode, one which would not regulate and smooth out his movements. This confusion about what how the fly-by-wire system responds in different modes is referred to, aptly, as “mode confusion,” and it has come up in other accidents.

(tags: automation aviation flying modal-interfaces ui ux interfaces modes mode-confusion air-france-447 disasters)

wordfreq/SUNSET.md

Published September 19, 2024

wordfreq/SUNSET.md

wordfreq is “a Python library for looking up the frequencies of words in many languages, based on many sources of data.” Sadly, it’s now longer going to be updated, as the author writes:
I don’t want to be part of this scene anymore: wordfreq used to be at the intersection of my interests. I was doing corpus linguistics in a way that could also benefit natural language processing tools. The field I know as “natural language processing” is hard to find these days. It’s all being devoured by generative AI. Other techniques still exist but generative AI sucks up all the air in the room and gets all the money. It’s rare to see NLP research that doesn’t have a dependency on closed data controlled by OpenAI and Google, two companies that I already despise. wordfreq was built by collecting a whole lot of text in a lot of languages. That used to be a pretty reasonable thing to do, and not the kind of thing someone would be likely to object to. Now, the text-slurping tools are mostly used for training generative AI, and people are quite rightly on the defensive. If someone is collecting all the text from your books, articles, Web site, or public posts, it’s very likely because they are creating a plagiarism machine that will claim your words as its own. So I don’t want to work on anything that could be confused with generative AI, or that could benefit generative AI. OpenAI and Google can collect their own damn data. I hope they have to pay a very high price for it, and I hope they’re constantly cursing the mess that they made themselves.

(tags: ai language llm nlp openai scraping words genai google)

The Pinecil is the best soldering iron for most people

Published September 18, 2024

The Pinecil is the best soldering iron for most people

I need a new soldering iron, this sounds nice

(tags: soldering tools pine64 pinecil gadgets to-get)

Nevada’s genAI-driven unemployment benefits system

Published September 18, 2024

Nevada’s genAI-driven unemployment benefits system

As has been shown many times before, current generative AI systems encode bias and racism in their training data. This is not going to go well:
“There’s no AI [written decisions] that are going out without having human interaction and that human review,” DETR’s director told the website. “We can get decisions out quicker so that it actually helps the claimant.” […] “The time savings they’re looking for only happens if the review is very cursory,” explained Morgan Shah, the director of community engagement for Nevada Legal Services. “If someone is reviewing something thoroughly and properly, they’re really not saving that much time.” Ultimately, Shah said, workers using the system to breeze through claims may end up “being encouraged to take a shortcut.” […] As with most attempts at using this still-nascent technology in the public sector, we probably won’t know how well the Nevada unemployment AI works unless it’s shown to be doing a bad job — which feels like an experiment being conducted on some of the most vulnerable members of society without their consent.
Of course, the definition of a “bad job” depends who’s defining it. If the system is processing a high volume of applications, it may not matter to its operators if it’s processing them _correctly_ or not.
(tags: generative-ai ai racism bias nevada detr benefits automation)

Today is EED Day

Published September 16, 2024

Today is EED Day

Significant changes in transparency requirements for EU-based datacenter operations:
Sunday September 15th was the deadline for every single organisation in Europe operating a datacentre of more than 500 KW, to publicly disclose: how much electricity they used in the last year; how much power came from renewable sources, and how much of this relied on the company buying increasingly controversial ‘unbundled’ renewable energy credits; how much water they used; and many more datapoints […] Where this information is being disclosed, in the public domain, and discoverable, [the Green Web Foundation] intend to link to it and make it easier to find. [….] There are some concessions for organisations that have classed this information as a trade secret or commercially confidential. In this case there is a second law passed, the snappily titled Commission Delegated Regulation (EU) 2024/1364, that largely means these companies need to report this information too, but to the European Commission instead. There will be a public dataset published based on this reporting released next year, containing data an agreggated level.

(tags: datacenter emissions energy sustainability gwf via:chris-adams eu europe ec)

Migraines, and CGRP inhibitors

Published September 13, 2024

Over the past decade or so, I’ve been suffering with chronic migraine, sometimes with multiple attacks per week. It’s been a curse — not only do you have to suffer the periodic migraine attacks, but also the "prodrome", where unpleasant symptoms like brain fog and an inability to concentrate can impact you.

After a long process of getting a referral to the appropriate headache clinic, and eliminating other possible medications, I finally got approved to receive Ajovy (fremanezumab), one of the new generation of CGRP inhibitor monoclonals — these work by blocking the action of a peptide on receptors in your brain. I started the course of these a month ago.

The results have, frankly, been amazing. As I hoped, the migraine episodes have reduced in frequency, and in impact; they are now milder. But on top of that, I hadn’t realised just how much impact the migraine "prodrome" had been having on my day-to-day life. I now have more ability to concentrate, without it causing a headache or brain fog; I have more energy and am less exhausted on a day-to-day basis; judging by my CPAP metrics, I’m even sleeping better. It is a radical improvement. After 10 years I’d forgotten what it was like to be able to concentrate for prolonged periods!

They are so effective that the American Headache Society is now recommending them as a first-line option for migraine prevention, ahead of almost all other treatments.

If you’re a migraine sufferer, this is a game changer. I’m delighted. It seems there may even be further options of concomitant treatment with other CGRP-targeting medications in the future, to improve matters further.

More papers on the topic: a real-world study on CGRP inhibitor effectiveness after 6 months; no "wearing-off" effect is expected.

Faster zlib/DEFLATE decompression on the Apple M1 (and x86)

Published September 13, 2024

Faster zlib/DEFLATE decompression on the Apple M1 (and x86)

Some decent low-level performance hacking on arm64/x86 (via Tony Finch)

(tags: via:fanf compression deflate optimization assembly c optimisation hacks)

Paying down tech debt

Published September 12, 2024

Paying down tech debt

by Gergely Orosz and Lou Franco:
Q: “I’d like to make a better case for paying down tech debt on my team. What are some proven approaches for this?” The tension in finding the right balance between shipping features and paying down accumulated tech debt is as old as software engineering. There’s no one answer on how best to reduce tech debt, and opinion is divided about whether zero tech debt is even a good thing to aim for. But approaches for doing it exist which work well for most teams. To tackle this eternal topic, I turned to industry veteran Lou Franco, who’s been in the software business for over 30 years as an engineer, EM, and executive. He’s also worked at four startups and the companies that later acquired them; most recently Atlassian as a Principal Engineer on the Trello iOS app.
Apparently Lou has a book on the topic imminent.
(tags: programming refactoring coding technical-debt tech-debt lou-franco software)

Aesthetic Visual Analysis at Netflix

Published September 12, 2024

Aesthetic Visual Analysis at Netflix

Good blog post about Netflix’ automated cover-shot generation using Aesthetic Visual Analysis; I’ve been meaning to hack around with this

(tags: aesthetics ava images analysis netflix algorithms)

Justin's Linklog Posts