Skip to content

Justin's Linklog Posts

FOSS funding vanishes from EU’s 2025 Horizon program plans

  • FOSS funding vanishes from EU’s 2025 Horizon program plans

    EU funding for open source dries up, redirected to AI slop instead:

    Funding for free and open source software (FOSS) initiatives under the EU’s Horizon program has mostly vanished from next year’s proposal, claim advocates who are worried for the future of many ongoing projects. Pierre-Yves Gibello, CEO of open-source consortium OW2, urged EU officials to re-evaluate the elimination of funding for the Next Generation Internet (NGI) initiative from its draft of 2025 Horizon funding programs in a recently published open letter. Gibello said the EU’s focus on enterprise-level FOSS is essential as the US, China and Russia mobilize “huge public and private resources” toward capturing the personal data of consumers, which the EU’s regulatory regime has decided isn’t going to fly in its territory. [….] “Our French [Horizon national contact point] was told – as an unofficial answer – that because lots of [Horizon] budget are allocated to AI, there is not much left for Internet infrastructure,” Gibello said.

    (tags: ai funding eu horizon foss via:the-register ow2 europe)

Retool

  • Retool

    A decent looking no-code app builder, recommended by Cory of Last Week In AWS. Nice features: * offers a self-hosted version running in a Docker container * Free tier for up to 5 users and 500 workflow runs per month * Integration with AWS services (S3, Athena, DynamoDB), Postgres, MySQL and Google Sheets * Push notifications for mobile

    (tags: retool apps hacking no-code coding via:lwia integration)

Invasions of privacy during the early years of the photographic camera

  • Invasions of privacy during the early years of the photographic camera

    “Overexposed”, at the History News Network:

    In 1904, a widow named Elizabeth Peck had her portrait taken at a studio in a small Iowa town. The photographer sold the negatives to Duffy’s Pure Malt Whiskey, a company that avoided liquor taxes for years by falsely advertising its product as medicinal. Duffy’s ads claimed the fantastical: that it cured everything from influenza to consumption; that it was endorsed by clergymen; that it could help you live until the age of 106. The portrait of Elizabeth Peck ended up in one of these dubious ads, published in newspapers across the country alongside what appeared to be her unqualified praise: “After years of constant use of your Pure Malt Whiskey, both by myself and as given to patients in my capacity as nurse, I have no hesitation in recommending it.” Duffy’s lies were numerous. Elizabeth Peck was not a nurse, and she had not spent years constantly slinging back malt beverages. In fact, she fully abstained from alcohol. Peck never consented to the ad.  The camera’s first great age — which began in 1888 when George Eastman debuted the Kodak — is full of stories like this one. Beyond the wonders of a quickly developing artform and technology lay widespread lack of control over one’s own image, perverse incentives to make a quick buck, and generalized fear at the prospect of humiliation and the invasion of privacy.
    Fantastic story, and interesting to see parallels with the modern experience of AI.

    (tags: ai future history photography privacy camera)

Phone geodata is being widely collected by US government agencies

  • Phone geodata is being widely collected by US government agencies

    More info on the current state of the post-Snowden geodata scraping:

    [Byron Tau was told] the government was buying up reams of consumer data — information scraped from cellphones, social media profiles, internet ad exchanges and other open sources — and deploying it for often-clandestine purposes like law enforcement and national security in the U.S. and abroad. The places you go, the websites you visit, the opinions you post — all collected and legally sold to federal agencies. In his new book, _Means of Control_, Tau details everything he’s learned since that dinner: An opaque network of government contractors is peddling troves of data, a legal but shadowy use of American citizens’ information that troubles even some of the officials involved. And attempts by Congress to pass privacy protections fit for the digital era have largely stalled, though reforms to a major surveillance program are now being debated.
    Great quote:
    Politico: You compare to some degree the state of surveillance in China versus the U.S. You write that China wants its citizens to know that they’re being tracked, whereas in the U.S., “the success lies in the secrecy.” What did you mean by that? That was a line that came in an email from a police officer in the United States who got access to a geolocation tool that allowed him to look at the movement of phones. And he was essentially talking about how great this tool was because it wasn’t widely, publicly known. The police could buy up your geolocation movements and look at them without a warrant. And so he was essentially saying that the success lies in the secrecy, that if people were to know that this was what the police department was doing, they would ditch their phones or they would not download certain apps.
    Based on Wolfie Christl’s research in Germany, the same data is being scraped here, too, regardless of any protection the GDPR might supposedly provide: https://x.com/WolfieChristl/status/1813221172927975722

    (tags: government privacy surveillance geodata phones mobile us-politics data-protection gdpr)

Mini.WebVM

  • Mini.WebVM

    Your own Linux box, build from a Dockerfile, virtualized in the browser via WebAssembly:

    WebVM is a Linux-like virtual machine running fully client-side in the browser. It is based on CheerpX: a x86 execution engine in WebAssembly by Leaning Technologies. With today’s update, you can deploy your own version of WebVM by simply forking the repo on GitHub and editing the included Dockerfile. A GitHub Actions workflow will automatically deploy it to GitHub pages.
    This is absurdly cool. Demo at https://webvm.io/ (via Oisin)

    (tags: docker virtualization webassembly wasm web containers webvm virtual-machines hacks via:oisin)

OliveTin

  • OliveTin

    “Give safe and simple access to predefined shell commands from a web interface.” This is great; my home server has a small set of hacky CGI scripts to run things like df(1), nice to have a nicer UI for this purpose

    (tags: ui cli shell self-hosted home unix linux web)

_An Architectural Risk Analysis of Large Language Models_ [pdf]

  • _An Architectural Risk Analysis of Large Language Models_ [pdf]

    The Berryville Institute of Machine Learning presents “a basic architectural risk analysis (ARA) of large language models (LLMs), guided by an understanding of standard machine learning (ML) risks as previously identified”. “This document identifies a set of 81 specific risks associated with an LLM application and its LLM foundation model. We organize the risks by common component and also include a number of critical LLM black box foundation model risks as well as overall system risks. Our risk analysis results are meant to help LLM systems engineers in securing their own particular LLM applications. We present a list of what we consider to be the top ten LLM risks (a subset of the 81 risks we identify). In our view, the biggest challenge in secure use of LLM technology is understanding and managing the 23 risks inherent in black box foundation models. From the point of view of an LLM user (say, someone writing an application with an LLM module, someone using a chain of LLMs, or someone simply interacting with a chatbot), choosing which LLM foundation model to use is confusing. There are no useful metrics for users to compare in order to make a decision about which LLM to use, and not much in the way of data about which models are best to use in which situations or for what kinds of application. Opening the black box would make these decisions possible (and easier) and would in turn make managing hidden LLM foundation risks possible. For this reason, we are in favor of regulating LLM foundation models. Not only the use of these models, but the way in which they are built (and, most importantly, out of what) in the first place.” This is excellent as a baseline for security assessment of LLM-driven systems. (via Adam Shostack)

    (tags: security infosec llms machine-learning biml via:adam-shostack ai risks)

Long Covid: The Answers

  • Long Covid: The Answers

    A new, reliable resource for LC sufferers, featuring expert advice from Prof Danny Altmann, Dr Funmi Okunola, and Dr Daniel Griffin (of This Week in Virology fame):

    Navigating the complexities of long Covid can feel overwhelming amidst the sea of conflicting and mis- information. That’s why we’ve built Long Covid The Answers: to provide clarity and credible insights. We’re proud to have a Certified CPD Podcast for Educating Medical Staff.  Earn certified up to 15 Mainpro+® credits for the podcast series! Earn Certified CPD credits indirectly using the site in your clinical practice. We’re dedicated to providing hand-curated, credible information and relief for individuals battling Long COVID. We’re proud to have a team of esteemed Doctors, Professors, Scientists, and individuals directly affected by long Covid and their caregivers onboard.
    Given the decent profile of the experts involved, this could be handy for anyone attempting to receive treatment for LC and facing ignorance from their healthcare providers.

    (tags: long-covid covid-19 medicine health)

The bogus CVE problem [LWN.net]

  • The bogus CVE problem [LWN.net]

    As curl’s Daniel Stenberg writes:

    It was obvious already before that NVD really does not try very hard to actually understand or figure out the problem they grade. In this case it is quite impossible for me to understand how they could come up with this severity level. It’s like they saw “integer overflow” and figure that wow, yeah that is the most horrible flaw we can imagine, but clearly nobody at NVD engaged their brains nor looked at the “vulnerable” code or the patch that fixed the bug. Anyone that looks can see that this is not a security problem.

    (tags: cve cvss infosec security-circus lwn vulnerabilities curl soc2)

DOJ seizes ‘bot farm’ operated by RT editor on behalf of the Russian government

  • DOJ seizes ‘bot farm’ operated by RT editor on behalf of the Russian government

    Lest anyone was thinking Russian bot farms were no more after the demise of Prigozhin:

    The Department of Justice announced on Tuesday that it seized two domain names and more than 900 social media accounts it claims were part of an “AI-enhanced” Russian bot farm. Many of the accounts were designed to look like they belonged to Americans and posted content about the Russia-Ukraine war, including videos in which Russian President Vladimir Putin justified Russia’s invasion of Ukraine.  The Justice Department claims that an employee of RT — Russia’s state media outlet — was behind the bot farm. RT’s leadership signed off on a plan to use the bot farm to “distribute information on a wide-scale basis,” amplifying the publication’s reach on social media,” an FBI agent alleged in an affidavit. To set up the bot farm, the employee bought two domain names from Namecheap, an Arizona-based company, that were then used to create two email servers, the affidavit claims. The servers were then used to create 968 email addresses, which were in turn used to set up social media accounts, according to the affidavit and the DOJ.  The effort was concentrated on X, where profiles were created with Meliorator, an “AI-enabled bot farm generation and management software”. “Russia intended to use this bot farm to disseminate AI-generated foreign disinformation, scaling their work with the assistance of AI to undermine our partners in Ukraine and influence geopolitical narratives favorable to the Russian government.”
    Looks like it used a lot of now quite familiar bot attributes, such as following high-profile accounts and other bot accounts, liking other bot posts, and using AI-generated profile images. It’s not clear but it sounds like the content posted is also AI-generated based on defined “personalities”. More on Meliorator and the operations of this AI bot farming tool, in this Joint Advisory PDF: https://www.ic3.gov/Media/News/2024/240709.pdf

    (tags: bots russia bot-farms twitter x meliorator ai social-media spam propaganda rt ukraine)

Preliminary Notes on the Delvish Dialect, by Bruce Sterling

  • Preliminary Notes on the Delvish Dialect, by Bruce Sterling

    I’m inventing a handy neologism (as is my wont), and I’m calling all of these Large Language Model dialects “Delvish.” […] Delvish is a language of struggle. Humans struggle to identify and sometimes to weed out texts composed in “Delvish.” Why? Because humans can deploy fast-and-cheap Delvish and then falsely claim to have laboriously written these texts with human effort, all the while demanding some expensive human credit-and-reward for this machine-generated content. Obviously this native 21st-century high-tech/lowlife misdeed is a novel form of wickedness, somehow related to plagiarism, or impersonation, or false-witness, or classroom-cheating, or “fake news,” or even dirt-simple lies and frauds, but newly chrome-plated with AI machine-jargon. These newfangled crimes need a whole set of neologisms, but in the meantime, the frowned-upon Delvish dialect is commonly Considered-Bad and is under active linguistic repression. Unwanted, spammy Delvish content has already acquired many pejorative neologisms, such as “fluff,” “machine slop,” “botshit” and “ChatGPTese.” Apparently good or bad, they’re all Delvish, though. Some “Delvish” is pretty easy to recognize, because of how it feels to the reader. The emotional affect of LLM consumer-chatbots has the tone of a servile, cringing, oddly scatterbrained university professor. This approach to the human reader is a feature, not a bug, because it is inhumanly and conspicuously “honest, helpful and harmless.”

    (tags: commentary cyberpunk language llms delvish bruce-sterling neologisms dialects)

turbopuffer

  • turbopuffer

    A new proprietary vector-search-oriented database, built statelessly on object storage (S3) with “smart caching” on SSD/RAM — “a solution that scales effortlessly to billions of vectors and millions of tenants/namespaces”. Apparently it uses a new storage engine: “an object-storage-first storage engine where object storage is the source of truth (LSM). […] In order to optimize cold latency, the storage engine carefully handles roundtrips to object storage. The query planner and storage engine have to work in concert to strike a delicate balance between downloading more data per roundtrip, and doing multiple roundtrips (P90 to object storage is around 250ms for <1MB). For example, for a vector search query, we aim to limit it to a maximum of three roundtrips for sub-second cold latency." HN comments thread: https://news.ycombinator.com/item?id=40916786

    (tags: aws s3 storage search vectors vector-search fuzzy-search lsm databases via:hn)

Journals should retract Richard Lynn’s racist ‘research’ articles

  • Journals should retract Richard Lynn’s racist ‘research’ articles

    Richard Lynn was not the finest example of Irish science:

    Lynn, who died in 2023, was a professor at the University of Ulster and the president of the Pioneer Fund, a nonprofit foundation created in 1937 by American Nazi sympathizers to support “race betterment” and “race realism.” It has been a primary funding source of scientific racism and, for decades, Lynn was one of the loudest proponents of the unfounded idea that Western civilization is threatened by “inferior races” that are genetically predisposed to low intelligence, violence, and criminality. Lynn’s work has been repeatedly condemned by social scientists and biologists for using flawed methodology and deceptively collated data to support racism. In particular, he created deeply flawed datasets purporting to show differences in IQ culminating in a highly cited national IQ database. Many of Lynn’s papers appear in journals owned by the billion-dollar publishing giants Elsevier and Springer, including Personality and Individual Differences and Intelligence.
    The ESRI, for whom Lynn was a Research Professor in the 1960s and 70s, have quietly removed his output from their archives, thankfully. But as this article notes, his papers and faked datasets still feature in many prestigious journals. (via Ben)

    (tags: richard-lynn racists research papers elsevier iq via:bwalsh)

Three-finger salute: Hunger Games symbol adopted by Myanmar protesters

Microsoft AI CEO doesn’t understand copyright

  • Microsoft AI CEO doesn’t understand copyright

    Mustafa Suleyman, the CEO of Microsoft AI, says “the social contract for content that is on the open web is that it’s “freeware” for training AI models”, and it “is fair use”, and “anyone can copy it”. As Ed Newton-Rex of Fairly Trained notes:

    This is categorically false. Content released online is still protected by copyright. You can’t copy it for any purpose you like simply because it’s on the open web. Creators who have been told for years to publish online, often for free, for exposure, may object to being retroactively told they were entering a social contract that let anyone copy their work.
    It’s really shocking to see this. How on earth has Microsoft’s legal department not hit the brakes on this?

    (tags: ai law legal ip open-source freeware fair-use copying piracy)

Perplexity AI is susceptible to prompt injection

Microsoft Refused to Fix Flaw Years Before SolarWinds Hack

_ChatGPT is bullshit_ Ethics and Information Technology vol. 26

  • _ChatGPT is bullshit_ Ethics and Information Technology vol. 26

    Can’t argue with this paper. Abstract:

    Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (_On Bullshit_, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

    (tags: ai chatgpt hallucinations bullshit funny llms papers)

Death from the Skies, Musk Edition

  • Death from the Skies, Musk Edition

    Increasing launches means increasing space junk falling from the skies:

    SpaceX has dumped 250 pounds of trash on Saskatchewan. Things you don’t want coming your way at terminal velocity include an 8 foot, 80 pound tall wall panel shaped like a spear. It turns out that Canada is an entirely other country than Texas, so this is something of an international incident, which Sam Lawler has been documenting in this epic thread over the past few months.

    (tags: space space-junk saskatchewan canada via:jwz)

How to keep using adblockers on chrome and chromium

  • How to keep using adblockers on chrome and chromium

    Google’s manifest v3 has no analouge [sic] to the webRequestBlocking API, which is neccesary for (effective) adblockers to work starting in chrome version 127, the transition to mv3 will start cutting off the use of mv2 extensions alltogether this will inevitably piss of enterprises when their extensions don’t work, so the ExtensionManifestV2Availability key was added and will presumably stay forever after enterprises complain enough You can use this as a regular user, which will let you keep your mv2 extensions even after they’re supposed to stop working.

    (tags: google chrome chromium adblockers extensions via:micktwomey privacy)

AI trained on photos from kids’ entire childhood without their consent

  • AI trained on photos from kids’ entire childhood without their consent

    Here’s the terrible thing about AI model training sets —

    LAION began removing links to photos from the dataset while also advising that “children and their guardians were responsible for removing children’s personal photos from the Internet.” That, LAION said, would be “the most effective protection against misuse.” [Hye Jung Han] told Wired that she disagreed, arguing that previously, most of the people in these photos enjoyed “a measure of privacy” because their photos were mostly “not possible to find online through a reverse image search.” Likely the people posting never anticipated their rarely clicked family photos would one day, sometimes more than a decade later, become fuel for AI engines.
    And indeed, here we are, with our family photos ingested long ago into many, many models, mainly hosted in jurisdictions outside the GDPR, and with no practical way to avoid it. Is there a genuine way to opt out, at this stage? Even if we do it for LAION, what about all the other model scrapes that have gone into OpenAI, Apple, Google, et al? Ugh, what a mess.

    (tags: privacy data-protection kids children family laion web-scraping ai models photos)

Apple’s Private Cloud Compute

  • Apple’s Private Cloud Compute

    “A new frontier for AI privacy in the cloud” — the core models are not built on user data; they’re custom, built with licensed data ( https://machinelearning.apple.com/research/introducing-apple-foundation-models ) plus some scraping of the “public web”, and hosted in Apple DCs. The quality of the core hosted models was evaluated against gpt-3.5-turbo-0125, gpt-4-0125-preview, and a bunch of open source (Mistral/Gemma) models, with favourable results on safety and harmfulness and output quality. The cloud API for devices to call out to are built with a pretty amazing set of steps to validate security and avoid PII leakage (accidental or not). User data is sent alongside each request, and securely wiped immediately afterwards. This actually looks like a massive step forward, kudos to Apple! I hope it pans out like this blog post suggests it should. At the very least it now provides a baseline that other hosted AI systems need to meet — OpenAI are screwed. Having said that there’s still a very big question about the legal issues of scraping the “public web” for training data relying on opt-outs, and where it meets GDPR rights — as with all current major AI model scrapes. But this is undoubtedly a step forward.

    (tags: ai apple security privacy pii)

Vercel charges Cara $96k for serverless API calls

I watched Nvidia’s Computex 2024 keynote and it made my blood run cold | TechRadar

  • I watched Nvidia’s Computex 2024 keynote and it made my blood run cold | TechRadar

    This article doesn’t pull any punches — “all I saw was the end of the last few glaciers on Earth and the mass displacement of people that will result from the lack of drinking water; the absolutely massive disruption to the global workforce that ‘digital humans’ are likely to produce; and ultimately a vision for the future that centers capital-T Technology as the ultimate end goal of human civilization rather than the 8 billion humans and counting who will have to live — and a great many will die before the end — in the world these technologies will ultimately produce with absolutely no input from any of us. […] I always feared that the AI data center boom was likely going to make the looming climate catastrophe inevitable, but there was something about seeing it all presented on a platter with a smile and an excited presentation that struck me as more than just tone-deaf. It was damn near revolting.”

    (tags: ai energy gpus nvidia humanity future climate-change neo-luddism)

“TIL you need to hire a prompt engineer to get actual customer support at Stripe”

  • “TIL you need to hire a prompt engineer to get actual customer support at Stripe”

    This is the kind of shit that happens when you treat technical support as just a cost centre to be automated away. Check out the last line: “I’m reaching out to the official Stripe support forum here because our account has been closed and Stripe is refusing to export our card data. We are set to lose half our revenue in recurring Stripe subscriptions with no way to migrate them and no recourse. […. omitting long tale of woe here…] Now, our account’s original closure date has come, and sure enough, our payments have been disabled. The extension was not honored. I’m sure this was an honest mistake, but I wonder if Stripe has reviewed our risk as carefully as they confirmed our extension (not very). Stripe claims to have 24/7 chat and phone support, but I wasn’t able to convince the support AI this was urgent enough to grant me access.”

    (tags: ai fail stripe support technical-support cost-centres business llms)

_Surveilling the Masses with Wi-Fi-Based Positioning Systems_

  • _Surveilling the Masses with Wi-Fi-Based Positioning Systems_

    This is pretty crazy stuff, I had no idea the WPSes were fully queryable:

    Wi-Fi-based Positioning Systems (WPSes) are used by modern mobile devices to learn their position using nearby Wi-Fi access points as landmarks. In this work, we show that Apple’s WPS can be abused to create a privacy threat on a global scale. We present an attack that allows an unprivileged attacker to amass a worldwide snapshot of Wi-Fi BSSID geolocations in only a matter of days. Our attack makes few assumptions, merely exploiting the fact that there are relatively few dense regions of allocated MAC address space. Applying this technique over the course of a year, we learned the precise locations of over 2 billion BSSIDs around the world. The privacy implications of such massive datasets become more stark when taken longitudinally, allowing the attacker to track devices’ movements. While most Wi-Fi access points do not move for long periods of time, many devices — like compact travel routers — are specifically designed to be mobile. We present several case studies that demonstrate the types of attacks on privacy that Apple’s WPS enables: We track devices moving in and out of war zones (specifically Ukraine and Gaza), the effects of natural disasters (specifically the fires in Maui), and the possibility of targeted individual tracking by proxy — all by remotely geolocating wireless access points. We provide recommendations to WPS operators and Wi-Fi access point manufacturers to enhance the privacy of hundreds of millions of users worldwide. Finally, we detail our efforts at responsibly disclosing this privacy vulnerability, and outline some mitigations that Apple and Wi-Fi access point manufacturers have implemented both independently and as a result of our work.

    (tags: geolocation location wifi wps apple google infosec privacy)

Faking William Morris, Generative Forgery, and the Erosion of Art History

Technical post-mortem on the Google/UniSuper account deletion

  • Technical post-mortem on the Google/UniSuper account deletion

    “Google operators followed internal control protocols. However, one input parameter was left blank when using an internal tool to provision the customer’s Private Cloud. As a result of the blank parameter, the system assigned a then unknown default fixed 1 year term value for this parameter. After the end of the system-assigned 1 year period, the customer’s GCVE Private Cloud was deleted. No customer notification was sent because the deletion was triggered as a result of a parameter being left blank by Google operators using the internal tool, and not due a customer deletion request. Any customer-initiated deletion would have been preceded by a notification to the customer.” Ouch.

    (tags: cloud ops google tools ux via:scott-piper fail infrastructure gcp unisuper)

Innards of MS’ new Recall app

  • Innards of MS’ new Recall app

    Some technical details on the implementation of this new built-in key- and screen-logger, bundled with current versions of Windows, via Kevin Beaumont: “Microsoft have decided to bake essentially an infostealer into base Windows OS and enable by default. From the Microsoft FAQ: “Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers.” Info is stored locally – but rather than something like Redline stealing your local browser password vault, now they can just steal the last 3 months of everything you’ve typed and viewed in one database.” It requires ARM based hardware with a dedicated NPU (“neural processor”). “Recall uses a bunch of services themed CAP – Core AI Platform. Enabled by default. It spits constant screenshots … into the current user’s AppData as part of image storage. The NPU processes them and extracts text, into a database file. The database is SQLite, and you can access it as the user including programmatically. It 100% does not need physical access and can be stolen.” “[The screenshots are] written into an ImageStorage folder and there’s a separate process and SqLite database for them too, it categorises what’s in them. There’s a GUI that lets you view any of them.” Data is not stored with any additional crypto, beyond disk-level encryption via BitLocker. On the upside: for non-corporate users, “there’s a tray icon and you can disable it in Settings.” But for corps: “Recall has been enabled by default globally in Microsoft Intune managed users, for businesses.”

    (tags: microsoft recall security infosec keyloggers via:kevin-beaumont sqlite)