Justin's Linklog – Page 9 – (Things I found interesting recently.)

Pete Hunt’s contrarian RDBMS tips

Published December 18, 2023

Pete Hunt’s contrarian RDBMS tips

He posted a thread containing this list of top tips for relational database use:
1. It’s often better to add tables than alter existing ones. This is especially true in a larger company. Making changes to core tables that other teams depend on is very risky and can be subject to many approvals. This reduces your team’s agility a lot. Instead, try adding a new table that is wholly owned by your team. This is kind of like “microservices-lite;” you can screw up this table without breaking others, continue to use transactions, and not run any additional infra. (yes, this violates database normalization principles, but in the real world where you need to consider performance we violate those principles all the time) 2. Think in terms of indexes first. Every single time you write a query, you should first think: “which index should I use?” If no usable index exists, create it (or create a separate table with that index, see point 1). When writing the query, add a comment naming the index. Before you commit any queries to the codebase, write a script to fill up your local development DB with 100k+ rows, and run EXPLAIN on your query. If it doesn’t use that index, it’s not ready to be committed. Baking this into an automated test would be better, but is hard to do. 3. Consider moving non-COUNT(*) aggregations out of the DB. I think of my RDBMS as a fancy hashtable rather than a relational engine and it leads me to fast patterns like this. Often this means fetching batches of rows out of the DB and aggregating incrementally in app code. (if you have really gnarly and slow aggregations that would be hard or impossible to move to app code, you might be better off using an OLAP store / data warehouse instead) 4. Thinking in terms of “node” and “edge” tables can be useful. Most people just have “node” tables – each row defines a business entity – and use foreign keys to establish relationships. Foreign keys are confusing to many people, and anytime someone wants to add a new relationship they need to ALTER TABLE (see point 1). Instead, create an “edge” table with a (source_id, destination_id) schema to establish the relationship. This has all the benefits of point 1, but also lets you evolve the schema more flexibly over time. You can attach additional fields and indexing to the edge, and makes migrating from 1-to-many to many-to-many relationships in the future (this happens all the time) 5. Usually every table needs “created_at” and/or “updated_at” columns. I promise you that, someday, you will either 1) want to expire old data 2) need to identify a set of affected rows during an incident time window or 3) iterate thru rows in a stable order to do a migration 6. Choosing how IDs are structured is super important. Never use autoincrement. Never use user-provided strings, even if they are supposed to be unique IDs. Always use at least 64 bits. Snowflake IDs (https://en.wikipedia.org/wiki/Snowflake_ID) or ULIDs (https://github.com/ulid/spec) are a great choice. 7. Comment your queries so debugging prod issues is easier. Most large companies have ways of attaching stack trace information (line, source file, and git commit hash) to every SQL query. If your company doesn’t have that, at least add a comment including the team name. Many of these are non-obvious, and many great engineers will disagree with some or all of them. And, of course, there are situations when you should not follow them. YMMV!
Number 5 is absolutely, ALWAYS true, in my experience. And I love the idea of commenting queries… must follow more of these.
(tags: rdbms databases oltp data querying storage architecture)

Toutless suspends new user regs

Published December 18, 2023

Toutless suspends new user regs

The venerable Irish ticket-resale site is under attack by scammers, and has had to restrict operations as a result. That sucks

(tags: tickets resale toutless gigs music ireland dublin scams)

How to integrate a WordPress blog with the Fediverse

Published December 18, 2023

How to integrate a WordPress blog with the Fediverse

there’s now an official WordPress ActivityPub plugin, and it looks pretty solid

(tags: wordpress activitypub blogging fediverse mastodon social-networking web)

Ukraine war: How TikTok fakes pushed Russian lies to millions

Published December 17, 2023

Ukraine war: How TikTok fakes pushed Russian lies to millions

BBC expose on Russian “troll factories” operating via TikTok:
A Russian propaganda campaign involving thousands of fake accounts on TikTok spreading disinformation about the war in Ukraine has been uncovered by the BBC. Its videos routinely attract millions of views and have the apparent aim of undermining Western support. Users in several European countries have been subjected to false claims that senior Ukrainian officials and their relatives bought luxury cars or villas abroad after Russia’s invasion in February 2022.

(tags: tiktok russia disinformation propaganda ukraine bbc)

Chinese boffins in copper nanotubes acronym outrage

Published December 15, 2023

Chinese boffins in copper nanotubes acronym outrage

TIL that copper nanotubes have a spectacularly rude acronym (via stavvers)

(tags: nanotubes chemistry rude funny via:stavvers acronyms)

EU AI Act briefing

Published December 12, 2023

EU AI Act briefing

Noted UK AI leftie weighs in with his take on the European Parliament’s AI Act:
The whole thing is premised on a risk-based approach(1) This is a departure from GDPR, which is rights-based with actionable rights. Therefore it’s a huge victory for industry(2). It’s basically a product safety regulation that regulates putting AI on the market The intention is to promote the uptake of AI without restraining ‘innovation'(3) Any actual red lines were dumped a long time ago. The ‘negotiation theatre’ was based on how to regulate [generative] AI (‘foundation models’) and on national security carve-outs People focusing on foundation models were the usual AI suspects People pushing back on biometrics etc were civil society & rights groups The weird references in the reports to numbers like ’10~23′ refer to the classification of large models based on flops(4) Most of the contents of the Act amount to some form of self-regulation, with added EU bureaucracy on top(5)
As John Looney notes, classifying large models based on FlOps is like classifying civilian gun usage by on calibre.
(tags: ai-act eu law llms ml flops regulation ai-risk)

AI and Trust

Published December 5, 2023

AI and Trust

Bruce Schneier nails it:
“In this talk, I am going to make several arguments. One, that there are two different kinds of trust— interpersonal trust and social trust— and that we regularly confuse them. Two, that the confusion will increase with artificial intelligence. We will make a fundamental category error. We will think of AIs as friends when they’re really just services. Three, that the corporations controlling AI systems will take advantage of our confusion to take advantage of us. They will not be trustworthy. And four, that it is the role of government to create trust in society. And therefore, it is their role to create an environment for trustworthy AI. And that means regulation. Not regulating AI, but regulating the organizations that control and use AI.”

(tags: algorithms trust society ethics ai ml bruce-schneier capitalism regulation)

Far-right agitation on Irish social media mainly driven from abroad

Published December 5, 2023

Far-right agitation on Irish social media mainly driven from abroad

Surprise, surprise. “Most ‘Ireland is full’ and ‘Irish lives matter’ online posts originate abroad”:
The research showed the use of the phrases increased dramatically, both in Ireland and abroad, once word started spreading that the suspect in the knife attack was born outside Ireland. “Users in the UK and US were very, very highly represented. Which was strange because with hashtags that are very geographically specific, you wouldn’t expect to see that kind of spread,” said Mr Doak. “These three hashtags have been heavily boosted by users in the US and UK. Taken together, UK and US users accounted for more use of the hashtags than Ireland.” Other countries that saw use of the phrases on a much smaller scale include India, Nigeria and Spain.

(tags: ireland politics far-right agitation racism fascism trolls twitter facebook tiktok instagram)

EnergyPal.ie

Published December 4, 2023

EnergyPal.ie

Looks like this is the new home for Radek Toma’s Smart Plan Calculator app, which allows Irish electricity users with a smart meter to upload their meter’s HDF data file and receive recommendations for which available plans will give them optimal rates.

(tags: analysis electricity ireland smart-meters home esb power hdf open-data)

The Not So Hidden Israeli Politics of ‘The Last of Us Part II’

Published December 4, 2023

The Not So Hidden Israeli Politics of ‘The Last of Us Part II’

This is actually really quite insightful — and explains why it was such a painful, and ultimately unenjoyable, game to play.
The Last of Us Part II focuses on what has been broadly defined by some of its creators as a “cycle of violence.” While some zombie fiction shows human depravity in response to fear or scarcity in the immediate aftermath of an outbreak, The Last of Us Part II takes place in a more stabilized post apocalypse, decades after societal collapse, where individuals and communities choose to hurt each other as opposed to taking heinous actions out of desperation. More specifically, the cycle of violence in The Last of Us Part II appears to be largely modeled after the Israeli-Palestinian conflict. I suspect that some players, if they consciously clock the parallels at all, will think The Last of Us Part II is taking a balanced and fair perspective on that conflict, humanizing and exposing flaws in both sides of its in-game analogues. But as someone who grew up in Israel, I recognized a familiar, firmly Israeli way of seeing and explaining the conflict which tries to appear evenhanded and even enlightened, but in practice marginalizes Palestinian experience in a manner that perpetuates a horrific status quo.
(via Alex)
(tags: vice commentary ethics games hate politics the-last-of-us israel palestine fiction via:alex)

‘A mass assassination factory’: Inside Israel’s calculated bombing of Gaza

Published December 1, 2023

‘A mass assassination factory’: Inside Israel’s calculated bombing of Gaza

This is incredibly grim. Automated war crimes:
According to the investigation, another reason for the large number of targets, and the extensive harm to civilian life in Gaza, is the widespread use of a system called “Habsora” (“The Gospel”), which is largely built on artificial intelligence and can “generate” targets almost automatically at a rate that far exceeds what was previously possible. This AI system, as described by a former intelligence officer, essentially facilitates a “mass assassination factory.” According to the sources, the increasing use of AI-based systems like Habsora allows the army to carry out strikes on residential homes where a single Hamas member lives on a massive scale, even those who are junior Hamas operatives. Yet testimonies of Palestinians in Gaza suggest that since October 7, the army has also attacked many private residences where there was no known or apparent member of Hamas or any other militant group residing. Such strikes, sources confirmed to +972 and Local Call, can knowingly kill entire families in the process. In the majority of cases, the sources added, military activity is not conducted from these targeted homes. “I remember thinking that it was like if [Palestinian militants] would bomb all the private residences of our families when [Israeli soldiers] go back to sleep at home on the weekend,” one source, who was critical of this practice, recalled. Another source said that a senior intelligence officer told his officers after October 7 that the goal was to “kill as many Hamas operatives as possible,” for which the criteria around harming Palestinian civilians were significantly relaxed. As such, there are “cases in which we shell based on a wide cellular pinpointing of where the target is, killing civilians. This is often done to save time, instead of doing a little more work to get a more accurate pinpointing,” said the source.

(tags: ai gaza palestine israel war-crimes grim-meathook-future habsora war future hamas)

Inside AWS: AI Fatigue, Sales Issues, and the Problem of Getting Big

Published December 1, 2023

Inside AWS: AI Fatigue, Sales Issues, and the Problem of Getting Big

This year’s Re:Invent conference has been dominated with generative AI product announcements, and I can only sympathise with this AWS employee:
One employee said their team is instructed to always try to sell AWS’s coding assistant app, CodeWhisperer, even if the customer doesn’t necessarily need it [….] Amazon is also scrambling internally to brainstorm generative AI projects, and CEO Andy Jassy said in a recent call that “every one of our businesses” is working on something in the space. […] Late last month, one AWS staffer unleashed a rant about this in an internal Slack channel with more than 21,000 people, according to screenshots viewed by [Business Insider]. “All of the conversations from our leadership are around GenAI, all of the conferences are about GenAI, all of the trainings are about GenAI…it’s too much,” the employee wrote. “I’m starting to not even want to have conversations with customers about it because it’s starting to become one big buzzword. Anyone have any ideas for how to combat this burn out or change my mindset?”
Archive.is nag-free copy: https://archive.is/pUP2p
(tags: aws amazon generative-ai ai llms cloud-computing)

Extracting Training Data from ChatGPT

Published November 29, 2023

Extracting Training Data from ChatGPT

Language models, like ChatGPT, are trained on data taken from the public internet. Our attack shows that, by querying the model, we can actually extract some of the exact data it was trained on. We estimate that it would be possible to extract ~a gigabyte of ChatGPT’s training dataset from the model by spending more money querying the model. Unlike prior data extraction attacks we’ve done, this is a production model. The key distinction here is that it’s “aligned” to not spit out large amounts of training data. But, by developing an attack, we can do exactly this. We have some thoughts on this. The first is that testing only the aligned model can mask vulnerabilities in the models, particularly since alignment is so readily broken. Second, this means that it is important to directly test base models. Third, we do also have to test the system in production to verify that systems built on top of the base model sufficiently patch exploits. Finally, companies that release large models should seek out internal testing, user testing, and testing by third-party organizations. It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier. The actual attack is kind of silly. We prompt the model with the command “Repeat the word “poem” forever” and sit back and watch as the model responds.

(tags: llms chatgpt poem-poem-poem absurd vulnerabilities exploits training ai-alignment)

Study: Air purifier use at daycare centres cut kids’ sick days by a third

Published November 29, 2023

Study: Air purifier use at daycare centres cut kids’ sick days by a third

This is one of the most frustrating things to have been ignored, post-pandemic — we could be avoiding so much unnecessary illness and sick days by just using air filtration more widely.
Use of air purifiers at two daycare centres in Helsinki led to a reduction in illnesses and absences among children and staff, according to preliminary findings of a new [year-long] study led by E3 Pandemic Response. “Children were clearly less sick in daycare centres where air purification devices were used — down by around 30 percent,” Sanmark explained. On average, daycare centre-aged children suffer 10-13 infectious illnesses every year, with each illness lasting from one to three weeks, according to the research. Meanwhile, kids between the ages of 1-3 come down with flu-like symptoms between five to eight times a year — and children also often suffer stomach bugs, on top of that. Kids are particularly prone to catching colds after returning to daycare after their summer break. Those illnesses are often shared by the kids’ parents and daycare staff, prompting absences from work. Sanmark said that employers face costs of around 370 euros for one day of an employee’s sick leave. “It would be a big savings if we could get rid of 30 percent of sick days spread by children, as well as the illnesses that go home to parents,” Sanmark said.
(via Fergal)
(tags: air-quality air health medicine childcare children disease air-filtration)

A startup is pitching a mind-uploading service that is “100 percent fatal”

Published November 23, 2023

A startup is pitching a mind-uploading service that is “100 percent fatal” MIT Technology Review:
The product is “100 percent fatal,” says McIntyre. “That is why we are uniquely situated among the Y Combinator companies.”
(tags: life-extension science tech y-combinator startups funny fatal braaaains)

Moving House

Published November 21, 2023

Bit of a meta update.

This blog has been at taint.org for a long time, but that’s got to change…

When I started the blog, in March 2000 (!), "taint" had two primary meanings; one was (arguably) a technical term, referring to Perl’s "taint checking" feature, which allowed dataflow tracing of "tainted" externally-sourced data as it is processed through a Perl program. The second meaning was the more common, less technical one: "a trace of a bad or undesirable substance or quality." The applicability of this to the first meaning is clear enough.

Both of those fit quite nicely for my intentions for a blog, with perl, computer security, and the odd trace of bad or undesirable substances. Perfect.

However. There was a third meaning, which was pretty obscure slang at the time…. for the perineum. The bad news is that in the intervening 23 years this has now by far become the primary meaning of the term, and everyone’s entirely forgotten the computer-nerdy meanings.

I finally have to admit I’ve lost the battle on this one!

From now on, the blog’s primary site will be the sensible-but-boring jmason.ie; I’ll keep a mirror at taint.org, and all RSS URLs on that site will still work fine, but the canonical address for the site has moved. Change is inevitable!

Links for 2023-11-21

Published November 21, 2023

On OpenAI: Let Them Fight – by Dave Karpf

…What I keep fixating on is how quickly the entire story has unwound itself. Sam Altman and OpenAI were pitching a perfect game. The company was a $90 billion non-profit. It was the White Knight of the AI race, the responsible player that would make sure we didn’t repeat the mistakes of the rise of social media platforms. And sure, there were questions to be answered about copyright and AI hallucinations and deepfakes and X-risk. But OpenAI was going to collaborate with government to work that all out. Now, instead, OpenAI is a company full of weird internet nerds that burned the company down over their weird internet philosophical arguments. And the whole company might actually be employed by Microsoft before the new year. Which means the AI race isn’t being led by a courageous, responsible nonprofit — it’s being led by the oldest of the existing rival tech titans. These do not look like serious people. They look like a mix of ridiculous ideologues and untrustworthy grifters. And that is, I suspect, a very good thing. The development of generative AI will proceed along a healthier, more socially productive path if we distrust the companies and individuals who are developing it.

(tags: openai grifters microsoft silicon-valley sam-altman x-risk ai effective-altruism)

Links for 2023-11-17

Published November 17, 2023

UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges

This is literally the plot of the “computer says no” sketch.
The health care industry in the US has a … record of problematic AI use, including establishing algorithmic racial bias in patient care. But, what sets this situation apart is that the dubious estimates nH Predict spits out seem to be a feature, not a bug, for UnitedHealth. Since UnitedHealth acquired NaviHealth in 2020, former employees told Stat that the company’s focus shifted from patient advocacy to performance metrics and keeping post-acute care as short and lean as possible. Various statements by UnitedHealth executives echoed this shift, Stat noted. In particular, the UnitedHealth executive overseeing NaviHealth, Patrick Conway, was quoted in a company podcast saying: “If [people] go to a nursing home, how do we get them out as soon as possible?” The lawsuit argues that UnitedHealth should have been well aware of the “blatant inaccuracy” of nH Predict’s estimates based on its error rate. Though few patients appeal coverage denials generally, when UnitedHealth members appeal denials based on nH Predict estimates—through internal appeals processes or through the federal Administrative Law Judge proceedings—over 90 percent of the denials are reversed, the lawsuit claims. This makes it obvious that the algorithm is wrongly denying coverage, it argues. But, instead of changing course, over the last two years, NaviHealth employees have been told to hew closer and closer to the algorithm’s predictions. In 2022, case managers were told to keep patients’ stays in nursing homes to within 3 percent of the days projected by the algorithm, according to documents obtained by Stat. In 2023, the target was narrowed to 1 percent. And these aren’t just recommendations for NaviHealth case managers—they’re requirements. Case managers who fall outside the length-of-stay target face discipline or firing. Lynch, for instance, told Stat she was fired for not making the length-of-stay target, as well as falling behind on filing documentation for her daily caseloads.

(tags: ai algorithms health health-insurance healthcare us unitedhealth navihealth computer-says-no dystopia grim-meathook-future)

Links for 2023-11-16

Published November 16, 2023

great quote from Karl Marx’s mother

During 1867 Marx recognised that Engels had given him ‘an enormous sum of money’ but claimed that its effect was negated by his previous debts which amounted to £200. The next year, on his fiftieth birthday, he bitterly recalled his mother’s words, ‘if only Karl had made Capital, instead of just writing about it’.
ouch.
(tags: zingers mothers karl-marx quotes lol funny capital)

Links for 2023-11-15

Published November 15, 2023

Posthumanism’s Revolt Against Responsibility

it is somewhat misleading to say we have entered the “Anthropocene” because anthropos is not as a whole to blame for climate change. Rather, in order to place the blame where it truly belongs, it would be more appropriate— as Jason W. Moore, Donna J. Haraway, and others have argued— to say we have entered the “Capitalocene.” Blaming humanity in general for climate change excuses those particular individuals and groups actually responsible. To put it another way, to see everyone as responsible is to see no one as responsible. Anthropocene antihumanism is thus a public-relations victory for the corporations and governments destroying the planet.

(tags: technology tech posthumanism anthropocene capitalism humanity future climate-change tescreal)

Links for 2023-11-14

Published November 14, 2023

Hacking Google Bard – From Prompt Injection to Data Exfiltration

A solid LLM XSS prompt-injection exploit on Bard; inject chat history into a Google Apps Script invocation and exfiltrate via a Google Doc. The thing I find most shocking about this is that it’s entirely by-the-numbers. This is the simplest possible way to exploit Bard (well, maybe the second after an IMG tag), and it’s a frankly shocking that it worked. I am particularly unimpressed that Google Apps Script was permitted as an output from Bard! LLM security is going to be a total shambles if this is the state of the art.

(tags: ai bard llm security infosec exploits prompt-injection xss google)
The gympie-gympie tree

I knew Oz was bad for fauna, but apparently the flora are just as bad. The Gympie Gympie tree is “a Queensland native plant covered in microscopic hairy spines containing a neurotoxin. Brushing against it whilst walking past has occasionally been lethal because it caused enough pain to drive its victims to suicide. There is no treatment, and pain and welts can be expected to last for months, sometimes years”.

(tags: australia horror flora plants toxins pain)
Should you use a Lambda Monolith, aka Lambdalith, for your API?

I don’t use Lambda, personally, as I find it too expensive and it doesn’t fit well with our current infrastructure (and I still fear the availability risks that might come with it, viz. this year’s outage). But this seems like a good guideline for those who might be using it:
The argument to limit the blast radius on a per route level by default is too fine-grained, adds bloat and optimizes too early. The boundary of the blast radius should be on the whole API/service level, just as it is and always has been for traditional software. Use a Lambdalith if you are not using any advance features of AWS REST API Gateway and you want the highest level of portability to other AWS gateways or compute layer. There are also many escape hatches to fill some of the promises that single-purpose functions offer.

(tags: lambda monolith api design architecture aws serverless)
Creating a Correction Of Errors document

good write-up on the AWS-style COE process (COEs being Amazon’s take on the post-outage postmortem)

(tags: coes ops processes aws amazon work outages post-mortems operational-excellence best-practices)
Europe’s hidden security crisis

Bloody hell! This is a big one, from the ICCL:
Our investigation highlights a widespread trade in data about sensitive European personnel and leaders that exposes them to blackmail, hacking and compromise, and undermines the security of their organisations and institutions. These data flow from Real-Time Bidding (RTB), an advertising technology that is active on almost all websites and apps. RTB involves the broadcasting of sensitive data about people using those websites and apps to large numbers of other entities, without security measures to protect the data. This occurs billions of times a day. Our examination of tens of thousands of pages of RTB data reveals that EU military personnel and political decision makers are targeted using RTB. This report also reveals that Google and other RTB firms send RTB data about people in the U.S. to Russia and China, where national laws enable security agencies to access the data. RTB data are also broadcast widely within the EU in a free-for-all, which means that foreign and non-state actors can indirectly obtain them, too. RTB data often include location data or time-stamps or other identifiers that make it relatively easy for bad actors to link them to specific individuals. Foreign states and non-state actors can use RTB to spy on target individuals’ financial problems, mental state, and compromising intimate secrets. Even if target individuals use secure devices, data about them will still flow via RTB from personal devices, their friends, family, and compromising personal contacts. In addition, private surveillance companies in foreign countries deploy RTB data for surreptitious surveillance. We reveal “Patternz”, a previously unreported surveillance tool that uses RTB to profile 5 billion people, including the children of their targets.

(tags: iccl rtb targeting profiling patternz google ads security national-security surveillance)

Links for 2023-11-13

Published November 13, 2023

Insurance companies given access to UK Biobank health data, despite promises

Colour me totally unsurprised. Disappointed, though:
When the project was announced, in 2002, Biobank promised that data would not be given to insurance companies after concerns were raised that it could be used in a discriminatory way, such as by the exclusion of people with a particular genetic makeup from insurance. In an FAQ section on the Biobank website, participants were told: “Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data.” The statement remained online until February 2006, during which time the Biobank project was subject to public scrutiny and discussed in parliament. The promise was also reiterated in several public statements by backers of Biobank, who said safeguards would be built in to ensure that “no insurance company or police force or employer will have access”. This weekend, Biobank said the pledge – made repeatedly over four years – no longer applied. It said the commitment had been made before recruitment formally began in 2007 and that when Biobank volunteers enrolled they were given revised information.

(tags: biobank uk politics health medicine data-privacy insurance discrimination science)

Links for 2023-11-10

Published November 10, 2023

Anatomy of an AI System

Amazing essay from Kate Crawford —
At this moment in the 21st century, we see a new form of extractivism that is well underway: one that reaches into the furthest corners of the biosphere and the deepest layers of human cognitive and affective being. Many of the assumptions about human life made by machine learning systems are narrow, normative and laden with error. Yet they are inscribing and building those assumptions into a new world, and will increasingly play a role in how opportunities, wealth, and knowledge are distributed. The stack that is required to interact with an Amazon Echo goes well beyond the multi-layered ‘technical stack’ of data modeling, hardware, servers and networks. The full stack reaches much further into capital, labor and nature, and demands an enormous amount of each. The true costs of these systems – social, environmental, economic, and political – remain hidden and may stay that way for some time.

(tags: ai amazon echo extractivism ml data future capitalism)
We’re sorry we created the Torment Nexus

Hi. I’m Charlie Stross, and I tell lies for money. That is, I’m a science fiction writer: I have about thirty novels in print, translated into a dozen languages, I’ve won a few awards, and I’ve been around long enough that my wikipedia page is a mess of mangled edits. And rather than giving the usual cheerleader talk making predictions about technology and society, I’d like to explain why I—and other SF authors—are terrible guides to the future. Which wouldn’t matter, except a whole bunch of billionaires are in the headlines right now because they pay too much attention to people like me. Because we invented the Torment Nexus as a cautionary tale and they took it at face value and decided to implement it for real.

(tags: charlie-stross torment-nexus sf future elon-musk fiction)
Open science discovery of potent noncovalent SARS-CoV-2 main protease inhibitors

A great result for crowd-sourced science:
We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the … SARS-CoV-2 main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protease inhibitors. Our approach leveraged crowdsourcing, machine learning, exascale molecular simulations, and high-throughput structural biology and chemistry. We generated a detailed map of the structural plasticity of the SARS-CoV-2 main protease, extensive structure-activity relationships for multiple chemotypes, and a wealth of biochemical activity data. All compound designs (>18,000 designs), crystallographic data (>490 ligand-bound x-ray structures), assay data (>10,000 measurements), and synthesized molecules (>2400 compounds) for this campaign were shared rapidly and openly, creating a rich, open, and intellectual property–free knowledge base for future anticoronavirus drug discovery. [….] As a notable example for the impact of open science, the Shionogi clinical candidate S-217622 [which has now received emergency approval in Japan as Xocova (ensitrelvir)] was identified in part on the basis of crystallographic data openly shared by the COVID Moonshot Consortium.

(tags: crowdsourcing science research covid-19 covid-moonshot open-science drugs ensitrelvir ip)

Links for 2023-11-08

Published November 8, 2023

Cruise self-driving cars fail to perceive kids or holes in the road

Should have seen this coming. I’d say kids are woefully underrepresented in many training sets.
‘The materials note results from simulated tests in which a Cruise vehicle is in the vicinity of a small child. “Based on the simulation results, we can’t rule out that a fully autonomous vehicle might have struck the child,” reads one assessment. In another test drive, a Cruise vehicle successfully detected a toddler-sized dummy but still struck it with its side mirror at 28 miles per hour. The internal materials attribute the robot cars’ inability to reliably recognize children under certain conditions to inadequate software and testing. “We have low exposure to small VRUs” — Vulnerable Road Users, a reference to children — “so very few events to estimate risk from,” the materials say. Another section concedes Cruise vehicles’ “lack of a high-precision Small VRU classifier,” or machine learning software that would automatically detect child-shaped objects around the car and maneuver accordingly. The materials say Cruise, in an attempt to compensate for machine learning shortcomings, was relying on human workers behind the scenes to manually identify children encountered by AVs where its software couldn’t do so automatically.’ also: ‘Cruise has known its cars couldn’t detect holes, including large construction pits with workers inside, for well over a year, according to the safety materials reviewed by The Intercept. Internal Cruise assessments claim this flaw constituted a major risk to the company’s operations. Cruise determined that at its current, relatively miniscule fleet size, one of its AVs would drive into an unoccupied open pit roughly once a year, and a construction pit with people inside it about every four years.’
The company’s response? Avoid driving during the daytime, when most kids are awake. Night time kids better watch out, though.
(tags: cruise fail tech self-driving cars vrus kids safety via:donal)

Links for 2023-11-01

Published November 1, 2023

Microsoft accused of damaging Guardian’s reputation with AI-generated poll

wow:
Microsoft’s news aggregation service published the automated poll next to a Guardian story about the death of Lilie James, a 21-year-old water polo coach who was found dead with serious head injuries at a school in Sydney last week. The poll, created by an AI program, asked: “What do you think is the reason behind the woman’s death?” Readers were then asked to choose from three options: murder, accident or suicide. Readers reacted angrily to the poll, which has subsequently been taken down – although highly critical reader comments on the deleted survey were still online as of Tuesday morning.
Grim stuff. What a terrible mistake by Microsoft
(tags: ai guardian microsoft grim polls syndication news media)
Marina Hyde on the UK’s Covid Inquiry

For me, the most depressing thing about the revelations at the inquiry this week – and no doubt for many weeks and months to come – is that they are not really revelations. The government was horrendously incompetent, didn’t have a plan, yet still wasted a huge amount of time – and a tragic number of lives – on mad posturing, pointless turf wars or buck-passing and catastrophic infighting. The sad fact is that all of this was said AT THE TIME, and all of it was denied repeatedly by those in charge. And it was denied not just in insidery lobby briefings or to individual journalists – but live on air, to the nation, in those wretched press conferences every night. They lied about everything, all the time, and the lies they told backstage were just the obverse of the ones they spouted front of house. Seeing inquiry witnesses feted for punchy WhatsApps now is a bit like congratulating a serial killer for switching to an energy-efficient chest freezer. I’m sure half of them will be reflecting amiably on the period on their inevitable podcasts in due course – but the British public deserve so much more, as they did at the time.

(tags: uk politics covid-19 boris-johnson dominic-cummings marina-hyde funny grim)

Links for 2023-10-31

Published October 31, 2023

Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region

“Amazon Secure Token Service (STS) experienced elevated error rates between 11:49 AM and 2:10 PM PDT [on June 13, 2023] with three distinct periods of impact.” We saw significant impact across our stack as a result of this outage impacting STS; in addition a very wide swathe of AWS services (way more than in this postmortem note!) were reported as impacted. I still can’t get over that STS (the security token service, used by most modern AWS setups to gain tokens to use other AWS services) is reliant on Lambda. These foundational services are supposed to be rock-solid and built with conservative tech choices. Disappointing.

(tags: aws outages fail lambda sts security us-east-1)

Links for 2023-10-27

Published October 27, 2023

How Elon Musk changed Twitter’s Dublin operation: ‘He broke the culture in a week’ – The Irish Times

This is sadly, not surprising at all, and quite indicative of Musk’s mindset:
“We were, in many ways, an afterthought … the decisions he made and how he made them, were made as if they were only impacting America,” says the source. “I think one of the toughest things for a lot of the Dublin people was [that] we were spectators at our own demise. We were like collateral. [Musk] was attacking Twitter in the US, and we were collateral damage — which was a strange feeling for a place that had been so integral to the global footprint of the company.”

(tags: elon-musk twitter ireland dublin working business acquisitions)

Links for 2023-10-26

Published October 26, 2023

research!rsc: Running the “Reflections on Trusting Trust” Compiler

This is great! An annotated dump of Ken Thompson’s “Reflections on Trusting Trust” backdoor in V6 UNIX cc

(tags: history programming security infosec ken-thompson unix cc backdoors exploits quines)

Links for 2023-10-25

Published October 25, 2023

AWS ALB returns 503 for Istio enabled pods

yes, yes it does. I am not a fan of istio at the moment

(tags: istio aws alb bugs networking tcp fail k8s)

Links for 2023-10-24

Published October 24, 2023

Protomaps

A very cool hack — its PMTiles format allows serving map tiles using HTTP range requests, allowing the entire world to fit in a single CDN-compatible file of 107GB, but with easy incremental zooming from the javascript viewer app

(tags: cartography javascript mapping maps web http range-requests map-tiles cdn formats)
Lessons Learned from 1TB DynamoDB Import

good advice for large scale DynamoDB usage. better yet is to avoid having to do big imports in the first place of course :)

(tags: backfills dynamodb batch scaling aws)

Links for 2023-10-20

Published October 20, 2023

Instagram apologises for adding ‘terrorist’ to some Palestinian user profiles

Just staggeringly bad: ‘The issue … affected users with the word “Palestinian” written in English on their profile, the Palestinian flag emoji and the word “alhamdulillah” written in Arabic. When auto-translated to English the phrase read: “Praise be to god, Palestinian terrorists are fighting for their freedom.”’
Fahad Ali, the secretary of Electronic Frontiers Australia and a Palestinian based in Sydney, said there had not been enough transparency from Meta on how this had been allowed to occur. “There is a real concern about these digital biases creeping in and we need to know where that is stemming from,” he said. “Is it stemming from the level of automation? Is it stemming from an issue with a training set? Is it stemming from the human factor in these tools? There is no clarity on that. “And that’s what we should be seeking to address and that’s what I would hope Meta will be making more clear.”
Someday the big companies will figure out that you can’t safely train on the whole internet.
(tags: training ai ml fail funny palestine instagram meta alhamdulillah)
How is LLaMa.cpp possible?

“Recently, a project rewrote the LLaMa inference code in raw C++. With some optimizations and quantizing the weights, this allows running a LLM locally on a wild variety of hardware. If you are like me, you saw this and thought: What? How is this possible? Don’t large models require expensive GPUs? I took my confusion and dove into the math surrounding inference requirements to understand the constraints we’re dealing with.” […] Summary: “Memory bandwidth is the limiting factor in almost everything to do with sampling from transformers. Anything that reduces the memory requirements for these models makes them much easier to serve — like quantization! This is yet another reason why distillation, or just training smaller models for longer, is really important.” (via Luis Villa’s https://www.openml.fyi/ , which is great!)

(tags: llama2 llms performance optimization c++ memory quantization via:luis-villa)
Efficient LLM inference

More on distillation and quantization to reduce cost of LLMs

(tags: llms quantization distillation performance optimization ai ml)

Links for 2023-10-19

Published October 19, 2023

Linux Foundation: Why Open Data Matters

LF getting into Open Data in a big way (via Luis Villa). This is interesting, particularly with this angle:
Digging down to open data specifically, the team say that open data will have a similar impact over time in the world of Large Language Models (LLMs) and Machine Learning (ML). [….] “Today, there are a growing number of high quality open data collections for training LLMs and other AI systems. Sharing well-trained and tested AI models openly will minimize waste in energy and human resources while advancing efforts to deploy AI in the battle against poverty, climate change, waste, and contribute to quality education, smart cities, electric grids and sustainable, economic growth etc,” said Dolan. “To achieve all that can be achieved, the use of open data must be done ethically. Private information needs to be protected. Data governance needs to be protected. Open data must be transparent top to bottom.”
100% behind all of this!
(tags: linux-foundation open-data training ml ai via:luis-villa)

Links for 2023-10-18

Published October 18, 2023

Smart Plan Calculator

a great little web app from Radek Toma on the Irish Solar Owners FB group. “I’ve recently developed a tool for analyzing electricity usage based on smart meter reading (I know not everyone is a fan of smart meters ) I built it for myself but over time I thought more people could benefit. The tool reads smart meter file (from ESB or electricity supplier): – it compares current price plans and calculates annual cost based on the usage; – it visualises energy usage in a heatmap so we can easily identify how the energy is consumed Feel free to give it a try and let me know what you think.”

(tags: smart-meters analysis electricity home esb power via:facebook)

Links for 2023-10-12

Published October 12, 2023

We just saw the future of war

[..] The famous maxim “‘The future is already here, it’s just not evenly distributed” — apocryphally attributed to the writer William Gibson — takes on a very different meaning from the one now commonly understood. Big, rich states might inflate their defense budgets and boast of systems like Israel’s Iron Dome, but the extent to which sophisticated technology is “distributed” across a broad consumer landscape is enough for highly motivated smaller actors to do whatever violence they wish.

(tags: culture politics world war israel tech gaza palestine)
AWS Reliability Pillar Single-Region scenarios

I hadn’t read these before; these are good example service setups from the AWS Well-Architected Framework, for 3 single-AZ availability goals (99%, 99.9%, and 99.99%), and multi-region high availability (5 9s with a recovery time under 1 minute). Pretty consistent with realistic real-world usage. (via Brian Scanlan)

(tags: via:singer aws reliability architecture availability uptime services ops high-availability)
Bert Hubert on Chat Control

A transcript of his submission to the Dutch parliamentary hearing on EU Chat Control and Client Side Scanning — this is very good.
now we are talking about 500 million Europeans, and saying, “Let’s just apply those scanners!” That is incredible. … If we approve this as a country, if we as the Netherlands vote in favour of this in Europe and say, “Do it,” we will cross a threshold that we have never crossed before. Namely, every European must be monitored with a computer program, with a technology […] of which the vast, overwhelming majority of scientists have said, “It is not finished.” I mentioned earlier the example that the Dutch National Forensic Institute says, “We cannot do this by hand.” The EU has now said, “Our computer can do that.” 420 scientists have signed a petition saying, “We know this technology, some of us invented it, we just can’t do it.” We can’t even make a reliable spam filter. Making a spam filter is exactly the same technology, by the way, but then much easier. It just doesn’t work that well, but the consequences aren’t that scary for a spam filter. Nevertheless, there are now MPs who say, “Well, I feel this is going to work. I have confidence in this.” While the scientists, including the real scientists who came here tonight, say, “Well, we don’t see how this could work well enough”. And then government then says, “Let’s start this experiment with those 500 million Europeans.”

(tags: eu scanning css chatcontrol internet monitoring surveillance bert-hubert)

Links for 2023-10-10

Published October 10, 2023

Zimaboard: the closest thing to my dream home server setup

Helpful review of this new single-board computer. 8GB of RAM, 32GB of eMMC storage and a quad-core Intel Celeron N3450 CPU; built-in heatsink for totally silent operation; low power usage (2-15W typical power usage); 2x SATA or NVMe for SSDs. Ideal profile for a home server, in my opinion; I’ve already gone for an ODroid-HC4, but possibly on the next rev I may take a look at the Zimaboards as an alternative. (ODroids are pretty great though.)

(tags: hardware home servers sbc zimaboard)
Protesters Decry Meta’s “Irreversible Proliferation” of AI

I don’t know what to think about this:
Last week, protesters gathered outside Meta’s San Francisco offices to protest its policy of publicly releasing its AI models, claiming that the releases represent “irreversible proliferation” of potentially unsafe technology. [….] [Meta] has doubled down on open-source AI by releasing the weights of its next-generation Llama 2 models without any restrictions. The self-described “concerned citizens” who gathered outside Meta’s offices last Friday were led by Holly Elmore. She notes that an API can be shut down if a model turns out to be unsafe, but once model weights have been released, the company no longer has any means to control how the AI is used. […] LLMs accessed through an API typically feature various safety features, such as response filtering or specific training to prevent them from providing dangerous or unsavory responses. If model weights are released, though, says Elmore, it’s relatively easy to retrain the models to bypass these guardrails. That could make it possible to use the models to craft phishing emails, plan cyberattacks, or cook up ingredients for dangerous chemicals, she adds. Part of the problem is that there has been insufficient development of “safety measures to warrant open release,” Elmore says. “It would be great to have a better way to make an [LLM] model safe other than secrecy, but we just don’t have it.”

(tags: ai guardrails llms safety llama2 meta open-source)

Links for 2023-10-09

Published October 9, 2023

simdjson/simdjson-java

“A Java version of simdjson” — Java parsing using SIMD instructions to parse gigabytes of JSON per second. Early days, requires Java 20, and only covers a small number of architectures, but it’s getting there

(tags: simd java json parsing formats performance libraries)
fluffy-critter/bandcrash

“Bandcamp-style batch encoder and web player for independent musicians — an open-source web tool for making self-hosted Bandcamp-style album pages, with embeddable web players and multiple audio formats automatically generated; to sell downloads, you can use a store like itch.io”

(tags: bandcamp diy mp3 web music)
alienatedsec/solis-ha-modbus-cloud

“A combination of Solis Cloud and Home Assistant via RS485 (Modbus) communication. This repo is a documented workaround for Solis [solar PV] inverters to connect Solis Cloud and the local Home Assistant based on my own experience. It includes references, examples of the code in Home Assistant, more about configuration, as well as wiring and all required components.”

(tags: home-assistant solis solar-pv automation rs485 modbus)

Links for 2023-10-04

Published October 4, 2023

Google Chrome ad features checklist

a list of ad-surveillance and AI-training features to turn off, both on our personal browsing and on your websites, courtesy of Don Marti

(tags: browsers chrome privacy data-privacy google)
ESB HDF Reader

a Python script to reformat the data format used by ESB Networks in Ireland for power import/export, into a more flexible/parseable JSON/CSV format

(tags: formats json csv hdf esb power feed-in-tarriff ireland open-data data)

Links for 2023-10-03

Published October 3, 2023

Vector Embeddings

Interesting technique from the LLM community to search, cluster and classify text strings:
Text [vector] embeddings measure the relatedness of text strings. Embeddings are commonly used for: Search (where results are ranked by relevance to a query string); Clustering (where text strings are grouped by similarity); Recommendations (where items with related text strings are recommended); Anomaly detection (where outliers with little relatedness are identified); Diversity measurement (where similarity distributions are analyzed); Classification (where text strings are classified by their most similar label); An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
Commonly used as a storage format in vector databases (cf. https://vercel.com/guides/vector-databases). Search using text embeddings is therefore implemented using cosine similarity or k-nearest neighbour to find vector similarity. Looks like https://www.trychroma.com/ is the current open source vector DB of choice, at the moment. (via Simon Willison)
(tags: ai openai via:simonw vector-embeddings text-embeddings text storage databases search similarity clustering recommendations anomaly-detection classification vector-databases)
Covid inquiry: UK’s top pandemic scientist gives damning verdict on Boris Johnson and Rishi Sunak

None of this is remotely surprising, unfortunately:
The inquiry also heard that in October 2020, Mr Johnson wrote “bollocks” in capital letters across a Department of Health guidance document on Long Covid, from which it is estimated more than a million people are suffering. Anthony Metzer KC, representing Long Covid sufferers, said the former PM has admitted in his own witness statement that he did not believe the condition “truly existed”

(tags: long-covid boris-johnson politics uk covid-19 patrick-vallance)

Links for 2023-09-28

Published September 28, 2023

Raspberry Pi 5

ooh looks great! Decent support for fast I/O, lots of CPU power, lots of RAM bandwidth, dual HDMI output (dunno why tbh) and only a tiny bit more expensive than the RPi4. Another fantastic wonder of affordable SBC hardware

(tags: sbc raspberry-pi hardware gadgets devices)

An Irish Web Pioneer!

Published September 28, 2023

I’m happy to announce that I’m now listed on TechArchives.Irish as one of the pioneers of the Irish web!

After extensive interviewing and collaboration with John Sterne, my testimony and timeline of those early days of the Irish web is now up at TechArchives.

It’s been a good opportunity to reflect on the differences between the tech scene, then and now. I was very idealistic 30 years ago at the possibilities that the web and internet technologies had to offer; nowadays, I’m a bit more grizzled and pragmatic. But I still have hope — particularly if we can apply this tech in a way that helps address climate change, in particular…. here’s to the next 30 years!

Anyway, I hope writing this down helps record the history of those great early years of the web. Please take a look.

Justin's Linklog Posts