-
“A new frontier for AI privacy in the cloud” — the core models are not built on user data; they’re custom, built with licensed data ( https://machinelearning.apple.com/research/introducing-apple-foundation-models ) plus some scraping of the “public web”, and hosted in Apple DCs. The quality of the core hosted models was evaluated against gpt-3.5-turbo-0125, gpt-4-0125-preview, and a bunch of open source (Mistral/Gemma) models, with favourable results on safety and harmfulness and output quality. The cloud API for devices to call out to are built with a pretty amazing set of steps to validate security and avoid PII leakage (accidental or not). User data is sent alongside each request, and securely wiped immediately afterwards. This actually looks like a massive step forward, kudos to Apple! I hope it pans out like this blog post suggests it should. At the very least it now provides a baseline that other hosted AI systems need to meet — OpenAI are screwed. Having said that there’s still a very big question about the legal issues of scraping the “public web” for training data relying on opt-outs, and where it meets GDPR rights — as with all current major AI model scrapes. But this is undoubtedly a step forward.