Justin's Linklog

Research highlight: Cliopatra: Extracting Private Information from LLM Insights

Research highlight: Cliopatra: Extracting Private Information from LLM Insights:

When Anthropic came up with a new "privacy-preserving analysis system" to gain insights into AI use, and didn't use any provably robust notion to back up their privacy claims, I was mildly surprised. Surely they have both the money and the scientific maturity level to do better?

But Clio, the system in question, sounded relatively reasonable, with multiple layers of risk mitigation built-in. Maybe adding differential privacy would have been overkill. I also didn't want to publicly criticize their approach in the absence of demonstrated real-world risk. So I didn't comment on their approach.

You can probably guess where this is going.

Fast forward to last week, and a new paper: Cliopatra: Extracting Private Information from LLM Insights, by Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, and Peter Kairouz. The authors show that with carefully designed attacks on Clio, they can bypass all the ad hoc mitigations, and successfully extract users' medical histories (1), in a way that provides 100% attacker certainty for some records.

This is a new and clever take on an old attack. We've known for decades that k-anonymity is vulnerable to active attacks. Here, this is combined with prompt injection to encourage the LLM "summarizer" to actually include information from unique records. Perhaps more surprisingly, the authors find that some defensive layers are simply ineffective: the "LLM auditors" systematically report low privacy risk, and entirely fail to detect the attacks.

Tags: privacy differential-privacy anonymity data-protection claude llms cliopatra infosec leakage

Archives

Research highlight: Cliopatra: Extracting Private Information from LLM Insights